Mateusz Stawecki

Personal Site

UPDATE: This is an old piece, you probably want: phantom.js and casper.js

In this post I’ll show you how to remotely control your Google Chrome browser using JavaScript and scrape some data, even if it’s on an AJAX powered website or behind HTTPS authentication. Nice? I LOVE WebKit and now it got even sweeter. Very recently there was a small buzz about a new feature: WebKit Remote Debugging that allows to use Element Inspector remotely! (since it’s essentially just a web page and some javascript + websockets). What’s the real icing on the cake for me, is the ability to plug-in a different interface. I wrote a simple interface that can execute several pieces of JavaScript and return the values back to you. Here’s how to set it up. Open Terminal and find the Google Chrome executable. To use remote debugging, run it with a special parameter:

$ cd /Applications/Google Chrome.app/Contents/MacOS/ $ ./Google Chrome –remote-debugging-port=9222

If you get:

[0513/205852:FATAL:foundation_util.mm(102)] Check failed: bundle. Failed to load the bundle at /Applications/Google Chrome.app/Contents/MacOS/Versions/11.0.696.68/Google Chrome Framework.framework

Try symlinking Versions:

$ ln -s ../Versions/ ./Versions

The browser should start normally. Now go to a different browser, e.g. Safari and check out: http://localhost:9222 Select a page and you should see the Inspector. That’s all nice and neat. But let’s see my remote script: http://gist.github.com/972742 – To connect to the debugger, we’re using WebSockets. Change the page number based on the link from http://localhost:9222 . Every “Tab” has a different “Page” number.

	// Set page number!
	var host = "ws://localhost:9222/devtools/page/5";
	socket = new WebSocket(host);

- To execute JavaScript I wrapped a JSON-RPC-like command into a method with callback. More protocol schema here.

function remoteEval(scriptString,callback) {
	seqCallback[seqNo] = callback;
	socket.send('{"seq":'+seqNo+',"domain":"Runtime",'+
	'"command":"evaluate","arguments":{"expression":"'+
	scriptString.replace(/["]/g,'\"')
	+'","objectGroup":"console","includeCommandLineAPI":false}}');
	seqNo++;
}

- And this is how a sample script works:

remoteOnLoad = function(result) {

// We might've ended up on the login page, so let's log in!
if (remoteURL.indexOf("ServiceLogin") > 0 )
remoteEval( " document.getElementById('Email').value = 'username'; "+
	" document.getElementById('Passwd').value = 'password'; "+
	" document.getElementsByTagName('form')[0].submit(); "
	, function(result) { alert(result); } );

	// We're home!
	if (remoteURL.indexOf("mail.google.com/mail/") > 0 )
	remoteEval( " try { document.getElementById('canvas_frame')"+
".contentWindow.document.getElementsByClassName( 'md' )[0].innerText } catch(e) { -1 }"
	, function(result) {  // Waiting for AJAX. Try again in 2 sec.
	if (result == -1) { setTimeout(remoteOnLoad,2000); }
				else { alert(result); } } );
				// This should return scraped information
				// about your data usage on gmail!
				// E.g. You're currently using 150MB out of 7000MB
			} ;

// This happenes first:
remoteEval(" location.href = 'http://gmail.com' ");
//Let's go to Gmail!

Don’t know how about you, but I just wanna wrap it in Node.js, run it somewhere on Linux with a dummy X11 server for Chrome and write lot’s of crazy tasks, so it does it all for me! Imagine e.g. that instead of that alert(result) you make it a WebHook or a service?

Leave a comment

Name: Web Form Analyzer
Motto: Simple form analyzer from URL

Link: http://formanalyzer.net/

Summary:
A very simple web form analyzer! Just enter the form’s URL or HTML code and you’ll see a nice print out of different forms and values submitted on the website.

Had a bit of time and need for a simple tool like that. Dead simple, very clear to read. Nicer than ‘view source’, if you’re just interested in what and where is being posted from a website or piece of code. Enjoy.

Leave a comment

Ever had this problem? You were so excited to see, if some piece of code works on your live system, that you forgot to change database access configuration and file paths? Probably not, because we’re all respected professionals here *wink* and the case is usually: it’s 10pm, still at the office and x product is launching tomorrow and you accidentally overwritten the configuration, because you didn’t have time to finish the deployment script :P. Or maybe you have to deploy your application to even more than two machines? Well, here’s a small cheat sheet.

The lazy way


switch (php_uname('n')) {
    case 'livedevhost04':
		$dbhost = 'sql.example.com'; $dbuser = 'myapp_user';
		$dbpass = 's7d6y3726ye86'; $db = 'myappdb';
		break;
    case 'Mateusz-Laptop.local':
		$dbhost = 'localhost'; $dbuser = 'root';
		$dbpass = ''; $db = 'testdb';
    	break;
    default:
       echo 'No configuration found for host: '.php_uname('n'); exit;
}

This way is quite nice for most scenarios. Very convenient. Get the machine’s hostname, add a “case” to the switch with server’s configuration and you’re good! If you’re deploying through some sort of SFTP/WebDAV protocol, you can easily upload files without any additional modification before running the script. The same with deployment techniques like Deploy using Git.
The only problem is that you’re slightly exposing configuration settings for all your boxes. If you don’t feel comfortable with this, try a different technique like symlinks to a local configuration file.
Personally, I use it quite often. It’s better and way less annoying than swapping commented settings.

The ‘bash’ way

This can be used for many different scenarios. Not just files, but also directories. Here’s a very nice and readable script for symlinking stuff based on local hostname:

#!/bin/bash

targetfile=webroot/config.php
fromscheme=webroot/config._HOST_.php

fromfile=${fromscheme/_HOST_/`hostname`};

if [ -e $fromfile ]
then
 rm $targetfile
 ln -s $fromfile $targetfile
 echo $fromfile == $targetfile
else
 echo [ERROR] Local configuration file not found: $fromfile
fi

Additionally, you might want to execute a custom script that will do something for you after retrieving a configuration set.
Notice the “webroot/”, please keep sensitive scripts outside your document root, mkaay? Maybe even clean them up after deploy and keep them in repo!

You’re only in trouble, if you don’t have access to a bash shell on your hosting server (use first method) or you’re running Windows (you can try cygwin if you’re mad enough ;] )

A JavaScript Bonus

You’d be surprised how many times, I almost did something very silly on a deployed version of an ajax based application. As a bonus, here’s a script you can put in your app, to help you identify, which build you’re currently working on. Especially useful, when running on iOS in web app mode.

<div style='display:none;color:red;' id='devnotification'>TEST SERVER</div>


var currentHost = location.href.split('/')[2];
if (currentHost == 'localhost' || currentHost == 'dev.example.com')
document.getElementById('devnotification').style.display = 'block';

Enjoy! And remember to run your tests kids!

1 Comment

Getting cross-site content with JavaScript is cool, since it usually requires a tiny bit of extra effort when designing the feeds or deploying a callback proxy. Thanks to Yahoo Pipes it has now become very easy to get any content, even a whole web page and crunch it in your JavaScript web app.
First go to pipes.yahoo.com and Create a new Pipe. You’re going to have to log in.
Drag “URL Input” from “User inputs” to your pipe diagram. Now get back to “Sources”. If you’re planning on fetching a website, drag “Fetch Page”. If it’s a JSON or XML feed “Fetch Data” (all data will be automatically transformed ).
The next step is linking the elements together. Drag a link from the “URL Input” into you source’s URL Attribute and from sources output to Pipe Output.
Your pipe should look something like this:

Click “Save” and “Run Pipe…”. You’ll be taken to the pipe information page, where you can enter the URL which you want to proxy. Submit with “Run Pipe”, after it has loaded, select “Get as JSON”.
You should see some nice JSON with the transformed content. The URL to such pipe looks like this:

http://pipes.yahoo.com/pipes/pipe.run?_id=d53d0d4793aa292d3e02885ba9b22cba&_render=json&fromurl=http://stawecki.com

Now guess how to add a callback ;)

http://pipes.yahoo.com/pipes/pipe.run?_id=d53d0d4793aa292d3e02885ba9b22cba&_render=json&fromurl=http://stawecki.com&_callback=test

Surprise ;) There is an underscore before “callback” attribute name. (weirrrd) For anyone who is new to XSS, “callback” is basically the name of a JavaScript function that receives the content. To show how it works, here’s a simple script that downloads a website and returns the response’s character length:


	function test(response) {
		try {
			alert( 'Content size: '+
			response.value.items[0].content.length );
			} catch(err) { alert('Invalid response'+err); }
		}

	function getURL(urlstr) {
		var ka = document.createElement('script');
		ka.type = 'text/javascript';
		ka.src =
'http://pipes.yahoo.com/pipes/pipe.run?_id=d53d0d4793aa292d3e02885ba9b22cba&_render=json&fromurl='+
		escape(urlstr) + '&_callback=test';
		var ks = document.getElementsByTagName('script')[0];
		ks.parentNode.insertBefore(ka, ks);
	}

And here’s a live version to try:
Simple Pipes Callback

Here’s a slightly more complicated example. I’m getting HTML content, parsing it internally and listing all links on the website (NICE!):
Link Scraper Demo

These examples used “Fetch Page” in pipes. Here’s an example of reading an RSS feed using “Fetch Data”. This script alerts the most recent news item from an external RSS feed. Notice that Yahoo Pipes conveniently transforms any feed to JSON (or XML if you really really want).


	function test(response) {
		try {
			var newsItem = response.value.items[0].channel.item[0];
			alert( newsItem.pubDate +" - " + newsItem.title +" - " + newsItem.description );
			} catch(err) { alert('Invalid response'+err); }
	}

	function getURL(urlstr) {
		var ka = document.createElement('script');
		ka.type = 'text/javascript';
		ka.src =
		'http://pipes.yahoo.com/pipes/pipe.run?_id=4d625dfe6977e71acb45db4aa51726a6&_render=json&feedurl='+
		escape(urlstr) + '&_callback=test';
		var ks = document.getElementsByTagName('script')[0];
		ks.parentNode.insertBefore(ka, ks);
	}

And here’s a live version to try with BBC News RSS feed:
Simple RSS Callback

Now go and cross site the hell out of the internetz! Soon slightly more practical examples.

1 Comment

I’ve been recently having a lot of thoughts on live data and visualisation. Tweet Splash presents data available through Twitter API. This time, I’m going to set up cheap event tracking, which you can very easily use in many different situations and review later on Google Spreadsheets.

1. Setting up and playing with Google Docs: Forms

Log into your Google Account and go to Google Docs. Press “Create New” -> Form
I’ll call my form “Blog Searches”. Let’s have only one “question” which will represent our event or query made by a user. In my case, question title is “keyword”, question type Text and question is required. Click “Done” and make sure you delete “Sample question 2″. Save the form and click the link at the bottom to see the public version of your form.

It should look more or less like this. You can try, if it works by posting a test keyword, then go back to the document list and enter the form again, this time in a Spreadsheet form.

Now it’s time for the hacky part. Go to the public version of the form and look up the source (usually right-click “View Source” ;] ).

We need to retrieve 2 values: Form Action URL – to know where the form is posted to and Input Text’s Name, to know what’s the name of out “Keyword” parameter.

Form Action URL will be here (in red, it’s the attribute value of “action” in the “form” tag):

<form action="https://spreadsheets.google.com/formResponse?formkey=dDVCMlYzMmpuWTg0VUhZUkltaXl5OEE6MQ&amp;ifq" method="POST" id="ss-form">

Input Text’s Name (in red, attribute value of “name” in the only “text” input field):

<input type="text" name="entry.0.single" value="" class="ss-q-short" id="entry_0">

Now let’s try a small trick I discovered. Usually the form data is sent through a POST request, however turns out Google doesn’t mind, if you send values over a typical GET, which means you can put them as a part of your URL! In my case I’m going to this URL in my browser (note that I ignored the “ifq” parameter):

https://spreadsheets.google.com/formResponse?formkey=dDVCMlYzMmpuWTg0VUhZUkltaXl
5OEE6MQ&entry.0.single=Testing+GET+form+submission

Basically, to the URL, you append an “ampersand symbol”, name of the field e.g. “entry.0.single”, an “equals sign” and finally a message. Remember that a text in a URL has to be properly encoded e.g. spaces are turned into a “+” or “%20″, a browser will do it for you.
If you check your spreadsheet now, you should see the message:

Now I will show two methods for mixing this up with your website or service: sever side and client side.

2a. Submitting an event – Server side

I’m going to show the server side method first, since it’s really the best way to do it. If you can you should avoid exposing tricks like this, primarily because it’s easy to recover your form URL from a piece of a publicly available JavaScript, which means somebody might play a nasty trick on you and post some trash into your spreadsheet. On the other hand, if you do it through JavaScript, the user can decide not to be tracked by turning JavaScript off. Here are some examples of a server-side post.

In PHP there are several ways of fetching a URL, this is an example of posting a search query made by user. You can put this code in your index.php and it won’t be called unless parameter “s” is passed. Change it to the parameter name you are using for search or page name, or put a different version of the fetch in different files and provide a predefined event description string.

if (isset($_GET['s']))
file_get_contents(
'https://spreadsheets.google.com/formResponse?formkey=dDVCMlYzMmpuWTg0VUhZUkltaXl5OEE6MQ'.
'&entry.0.single=' . urlencode($_GET['s']) );

Java: (works also on App Engine)


public class ExampleServlet extends HttpServlet {

	public void doGet(HttpServletRequest request, HttpServletResponse resp)
			throws IOException {
		resp.setContentType("text/plain");
		if (request.getParameter("search") != null &&
			!request.getParameter("search").trim().equals("") )
			ExampleServlet.touchURL(
		"https://spreadsheets.google.com/formResponse?formkey="+
		"dDVCMlYzMmpuWTg0VUhZUkltaXl5OEE6MQ&entry.0.single="+
		URLEncoder.encode( request.getParameter("search") , "UTF-8") );
	}

	   public static void touchURL(String address) {
	    	  try {
	              URL url = new URL(address);
	              url.openConnection().getInputStream().close();
	          } catch (Exception e) {
	        	  e.printStackTrace();
	          }
	    }
      }

2b. Submitting an event – Client side

If you’re not lucky enough to be able to modify your backend code e.g. your using Shopify or similar, but you have access to the site’s templates and you can put in some JavaScript, you can still submit an event. Here’s how you do it in JS:


(function() {
var paramsplit = location.href.split('s=');
if (paramsplit.length > 1) {
var keyword = paramsplit[1].split('&')[0];
var ka = document.createElement('script'); ka.type = 'text/javascript';
ka.src =
'https://spreadsheets.google.com/formResponse?formkey='+
'dDVCMlYzMmpuWTg0VUhZUkltaXl5OEE6MQ&entry.0.single='
+ keyword;
var ks = document.getElementsByTagName('script')[0];
ks.parentNode.insertBefore(ka, ks);
}
})();

The script above performs a string split looking for parameter “s=”, which in my case is the search query parameter. If there is none, there will only be one element in the array – the location url itself and the rest of the script will not be executed. Otherwise, another split will separate the keyword from other parameters and the URL to the spreadsheet form will be fetched like an external script file.
This method has some pitfalls. First of all, anyone can see this code and start sending some trash or fake data. It’s also not very clean, because it causes a JavasScript parse error. The browser basically tries to execute the fetched HTML Form code as JavaScript. In order to avoid this you can deploy a simple Google App Engine application or a PHP script somewhere that fetches the URL but ignores the content and provide the URL to this application instead of the Form URL. You can use the Java example from before.
Update: you can also retrieve it as a valid response by calling the form through Yahoo Pipes! Check out my next article on how to do that. Great if you’re expecting a lot of form submissions but don’t want to host it ;)
To hide your code you can try tools like http://www.javascriptobfuscator.com/ , but currently there isn’t a 100% secure method of doing this, that’s why server-side submission is the best option.

3. Reviewing collected data

Let’s get back to the spreadsheet now. Of course, we can just observe the events here just as they are, but there is just one more trick I will reveal :)
Go do a different cell e.g. D2 and try a sorting formula:
=Sort(A1:B1000, A1:A1000, FALSE)
This will sort the whole data set based only on the date. The last attribute is FALSE which means it will be descending sort. This way you don’t need to scroll down to see the newest events.
You can also publish the top of the sorted data set as an RSS feed, by going to the Share menu, “Publish as a web page”. Set automatic publishing (unfortunately far from real-time ;/ ) and generate a link in the appropriate format to your sorted data set in cells e.g. D3:E15. You can use tools like http://twitterfeed.com/ to link it to your Twitter notification account!

4. Use cases

I started off with a simple use case of event logging on websites, but there are many other ways in which you can use Forms. You can do any type of logging information: from servers, mobile devices, receiving errors with stack traces, receiving other server notifications, keeping a simple backup database. First off, a lot of you might say: this is ridiculous! you don’t use forms for suff like this! You use specialised analytics and loggers etc. That’s true, but still the beauty of Forms is it’s simplicity. You can very quickly deploy any kind of logging or tracking and see, if it’s even worth going further. So, before you spend hours implementing a cool fast real time notification system for your platform, see what kind of data you might get by taking Forms for a spin.
And that’s it for now. Next time i’ll show how to deploy a dedicated real-time event tracking application.

2 Comments

Name: Tweet Splash
Motto: Make a splash for your event, conference, presentation, gathering…

Link: http://tweetsplash.com/

Summary:
JavaScript based Twitter client for real time Twitter search, easily configured for any event. The purpose of this program is to quickly deliver a “splash screen” on a monitor or projector that will engage event participants into a discussion on twitter.
Tweet Splash is also a YouTube API, Google Mobilizer mashup.

Everything is embedded in one file, easy to customise, great for playing around with Twitter Search API.

Features:
– Link preview
– Remote YouTube jukebox
– Polls from tweets

Coming soon:
– URL Handler
– Html5 Local Storage for configuration

Leave a comment

This scenario is for people who want to put their Macs to sleep after a certain activity that you know will take ‘x’ amount of time, like: watching a movie, downloading a file.

In other words: auto shutdown Mac after certain amount of time.

There’s quite a few apps that might do this for you, but I’m going to show you how to do it the geeky way using the Terminal :)

Let’s say you’re going to watch a 2h movie and you want your Mac to go to sleep right after it finishes. Open your Terminal and write:

sleep $[120*60]; osascript -e ‘tell application “System Events” to sleep’

The first part with ‘sleep’ calculates the number of seconds in 120 minutes using bash’s arithmetic expressions and passes it to the ‘sleep’ function. After ‘sleep’ wakes up, we run a simple script that puts your Mac to sleep. Of course you can put any amount of time you need (it’s usually better to put too much than too little ;) )

If you plan on using this quite often, it’s best to write a short script:

#!/bin/bash
d=$1
while [ -z $d ]; do
read -p “Duration (minutes): ” d
done
sleep $[$d*60]
echo “. . . z z z Z Z Z”
osascript -e ‘tell application “System Events” to sleep’

Save it somewhere in your path, if you can (enabling and using “root” user in Mac OS X), remember to do a ‘chmod +x’ on the file and you’re ready to go.
If you don’t want to use “root”, put it somewhere in your user directory e.g. make a directory “scripts” in “Library” and put your script there.
Create a “.bash_login” script file that will append your private scripts directory to the current path:

#!/bin/bash
PATH=$PATH:/Users/mateusz/Library/scripts

If you’re ready, just pass the number of minutes as an argument (I called my script “sleepafter”):

bash-3.2$ sleepafter 120

The “while” loop in my script makes sure that “sleep” has a value to work with. This way you can easily run the script from Spotlight:

bash-3.2$ sleepafter
Duration (minutes): 60

If you are invoking a task that ends the application process after finishing, then simply add the “put to sleep” script at the end. E.g. when compiling something:

make; make install; sleepafter now

E.g. when using applications like ‘wget’ for downloading files:

wget “http://server/bigfile.bin&#8221;; sleepafter now

Let me know, if you have any good ideas on modifying that script!
That is all :)

Leave a comment
Follow

Get every new post delivered to your Inbox.