• 10gen
  • DataStax, Inc.
  • Dell
  • Google
  • Lexis Nexis
  • Oracle
  • VMware
  • Percona

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at scordesse@oreilly.com

Download the OSCON Data Sponsor/Exhibitor Prospectus

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

OSCON Bulletin

To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)

Contact Us

View a complete list of OSCON contacts

The Hitchhiker’s Guide to A Kaggle Competition

Data: Roulette
Location: Oregon Ballroom 203
Average rating: ***..
(3.00, 3 ratings)

An introductory hands-on workshop, aimed at the Amateur Data Scientists among us, to the Heritage Health Prize competition. First, we will quickly look at the classes of algorithms & what they do through competition problems & datasets. Next we will dig deeper into one completion the Kaggle RTA Challenge(Ensemble/Random Forest). We will then dive into the Heritage Health Prize, work through the dataset & submit an entry!

Note: While there is not enough time for the participants to work through the different datasets, we will provide links to a hands-on tutorial which you’all can do after the workshop.


  • Algorithms for the Amateur Data Scientist
    • A look at the broader algorithms leading to Trees & Random Forests
  • The Art of Analytics Competitions – The Kaggle challenges
  • Anatomy of a competition – How the RTA was won
    • Predicting traffic at RTA using Ensemble /Random Forest Trees
  • Competition in flight – The HHP
    • Dataset Organization
    • Analytics Walkthrough
    • Submit our entry
  • Conclusion
Photo of Krishna Sankar

Krishna Sankar

Volvo Cars

Krishna Sankar is a consulting data scientist working on retail analytics, social media data science, and forays into deep learning, as well as codeveloping the DeepLearnR package interfacing R over TensorFlow/Skflow. Previously, Krishna was a chief data scientist at Blackarrow.tv, where he focused on optimizing user experience via inference, intelligence, and interfaces. Earlier stints include principal architect/data scientist at Tata America Intl., director of data science at a bioinformatics startup, and distinguished engineer at Cisco. He is a frequent speaker at conferences, including Spark Summit, Spark Camp, OSCON, PyCon, and PyData, on topics such as predicting NFL winners, Spark, data science, machine learning, and social media analysis, as well as a guest lecturer at the Naval Postgraduate School. Krishna’s occasional blogs can be found at Doubleclix.wordpress.com. His other passion is Lego robotics. You will find him at the St. Louis First Lego League World Competition as a robot design judge.

Comments on this page are now closed.


Picture of Krishna Sankar
Krishna Sankar
07/27/2011 5:01pm PDT
There was a question from today’s workshop about good books on algorithms. The best list I have seen are answers at Quora and one at Linkedin:
Picture of Krishna Sankar
Krishna Sankar
07/25/2011 3:59pm PDT

I have downloaded a WIP snapshot at www.slideshare.net/ksankar/.... WOuld appreciate any comments. Beware – I have too many slides, it is intentional.