Using Phantom.js / Casper.js as a sequential file downloader

I’m sharing a little script that may be useful for you!

The context: I am subscribed to a newspaper, which allows its subscribers to log into the members area of their website, browse the newspaper’s past issues, and download them in PDF format.

I thought it would be practical in the future to have a complete archive of those past issues, but doing that manually would take forever, as there’s no way to obtain the PDFs in bulk.

I attempted to do this using PhantomJS (a headless WebKit scriptable with a JavaScript API) and CasperJS (a navigation scripting & testing utility for PhantomJS).

One difficulty I encountered was linked to the fact that CasperJS’s parsing operations are running as asynchronous processes. This makes it a bit more difficult to pass them a value (in this case, the date string) which is generated by a synchronous javascript loop.

After some research, I found a method that works:

Obviously you will need to change some variables to make it fit your purpose, but I think it’s useful to document the general method.

It’s not perfect yet: when entering a wide date range, the script happens to stall sometimes. Improvements are welcome :)