Skip to main content

Scraping the World with JavaScript

Average rating: ****.
(4.83, 6 ratings)

Crawlers and scrapers have been around for ages.

However, being able to write them easily in JavaScript has been a significant advance in recent years because there is now little that a browser can do and a scraper can’t.

I will explain and compare the main different ways to scrape in JS: jsdom, cheerio, libxml, phantomjs

I will also go through funny examples from my experience where little tricks and creative solutions were needed to beat scraping countermeasures ;)

Crawling has been a passion of mine for a long time and I’m also the author of the npm crawler module (https://github.com/sylvinus/node-crawler)

Photo of Sylvain Zimmer

Sylvain Zimmer

dotConferences

Sylvain is a hacker at heart turned entrepreneur.

In 2004 he founded Jamendo, today the largest Creative Commons music platform. He later co-founded Joshfire, TEDxParis and the dotJS conference. In 2011 his team won the Node Knockout in the “Completeness” category with Chess@home, a distributed Chess AI written entirely in JavaScript.

He also solved the hidden equation in the “How to remain calm” Chromebook ad and recently became one of the first Google Developer Experts for HTML5.

Sponsors

For exhibition and sponsorship opportunities at Fluent conference, contact Sharon Cordesse at (707) 827-7065 or scordesse@oreilly.com

Download the Fluent Sponsor/Exhibitor Prospectus

For information on trade opportunities with O'Reilly conferences contact Jaimey Walking Bear at mediapartners
@oreilly.com

View a complete list of Fluent 2013 contacts