Scraping dynamic web pages with JS rendering

Created Sunday 17 March 2018

You have three options

Drive a real browser from a script
Use a fake/phantom browser to get contents
Use in-browser js console to do the scraping

Differences

The first one makes it easy to see current page and debug
The second makes it possible to run the scraper in a cloud or on a headless server.
The third does not need any third party tools installed, just your browser

Option 1

With Ruby
watir to drive a browser - download chromedriver and have it do what you want from the script or a command line
watir::Browser.new :phantimjs
Nikogiri - an HTML parser and extractor.
browser.screenshot.png

With Python
Use scrapy and selenium to drive a browser

Option 2

With JS
PhantomJS for browser emulation, jsdom is a more lightweight one, Cheerio for scraping (JQuery like syntax)

With Python
Get spynner (PyQT and WebKit) for browser simulation and Beautiful Soup 4 for scraping
or splash renderer with scrapy scraper via a bridge

Option 3

In-browser JS: Artoo

Scraping dynamic web pages with JS rendering

Scraping dynamic web pages with JS rendering

Option 1

Option 2

Option 3

Navigation menu

Namespaces

Views

Actions

Navigation