Extracting data from the web is often error-prone, hard to test, and slow. Scrapy changes all of that.
In this talk, we consider two different types of web data retrieval – one that scrapes data out of HTML, and another that uses a RESTful API – and show how both can be improved by Scrapy.
Part I: Scraping without Scrapy
Part II: Importing Scrapy components for programmer sanity
Part IV: Automated testing when using Scrapy
Part V: Improving a Wikipedia API client with Scrapy
Conclusion: Asheesh’s rules for sane scraping
Even if you use something else, you will love Scrapy’s documentation on scraping in general.
Asheesh loves growing camaraderie among geeks. He chaired the Johns Hopkins Association for Computing Machinery and taught Python classes at Noisebridge, San Francisco’s hackerspace. He realizes that most of the work that makes projects successful is hidden underneath the surface.
He has volunteered his technical skills for the UN in Uganda, the EFF, and Students for Free Culture, and is a Developer in Debian. Until recently, he engineered software and scalability at Creative Commons in San Francisco; today, he works at OpenHatch as its project lead.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or firstname.lastname@example.org.
View a complete list of OSCON contacts