I’m sharing a little script that may be useful for you!
The context: I am subscribed to a newspaper, which allows its subscribers to log into the members area of their website, browse the newspaper’s past issues, and download them in PDF format.
I thought it would be practical in the future to have a complete archive of those past issues, but doing that manually would take forever, as there’s no way to obtain the PDFs in bulk.
I attempted to do this using PhantomJS (a headless WebKit scriptable with a JavaScript API) and CasperJS (a navigation scripting & testing utility for PhantomJS).
One difficulty I encountered was linked to the fact that CasperJS’s parsing operations are running as asynchronous processes. This makes it a bit more difficult to pass them a value (in this case, the date string) which is generated by a synchronous javascript loop.
After some research, I found a method that works:
Obviously you will need to change some variables to make it fit your purpose, but I think it’s useful to document the general method.
It’s not perfect yet: when entering a wide date range, the script happens to stall sometimes. Improvements are welcome :)