Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.

The official Jupyter Conference

August 22-23, 2017: Training

August 23-25, 2017: Tutorials & Conference

New York, NY

Add to Your Schedule

Lessons learned from tens of thousands of Kaggle notebooks

Megan Risdal (Kaggle), Wendy Chih-wen Kan (Kaggle)

5:00pm–5:40pm Thursday, August 24, 2017

Reproducible research and open science
Location: Murray Hill Level: Beginner

Average rating:

(4.50, 2 ratings)

Who is this presentation for?

Anyone who uses open source data science languages (i.e., R and Python) to work with data

Prerequisite knowledge

A basic understanding of machine learning (useful but not required)

What you'll learn

Understand the benefits of collaborative data science
Learn how to share and work on data projects using Kaggle Kernels

Description

Sharing and building off insights in collaboration with others is integral to open data science. On Kaggle, the data science community uses Kernels as a platform to share reproducible code, data, and knowledge. Since the introduction of code sharing on Kaggle in 2015, users have written tens of thousand of kernels, of which 45% are R, Python, and Julia notebooks. Over this time, Kernels has transformed how Kagglers tackle competitive machine learning problems, collaborate, and learn.

Megan Risdal and Wendy Chih-wen Kan discuss what Kernels has taught Kaggle about collaborative data science. Megan and Wendy begin by highlighting how code sharing in competitions has allowed users to learn and incorporate ideas and approaches from others, ultimately raising the competitive bar while fostering an online culture more inclusive to data scientists of all skill levels. They then describe how Kernels combined with public datasets published on Kaggle creates a repository of knowledge and reproducible analyses around high-value data. They conclude by demonstrating the ingredients of a “successful” notebook on Kaggle, based on community metrics.

Megan Risdal

Kaggle

Megan Risdal is a marketing manager at Kaggle. She holds master’s degrees in linguistics from the University of California, Los Angeles, and North Carolina State University. Her curiosities lie at the intersection of data, science, language, and learning.

Website

Wendy Chih-wen Kan

Kaggle

Wendy Kan is a data scientist at Kaggle, the largest global data science community, where she works with companies and organizations to transform their data into machine learning competitions. Previously, Wendy was a software engineer and researcher. She holds BS and MS degrees in electrical engineering and a PhD in biomedical engineering.

Elite Sponsors

Strategic Sponsor

Bloomberg

Contributing Sponsor

Impact Sponsor

Domino Data Lab

Supporting Sponsors

Premier Exhibitors

Innovators

Community Partners

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email jupytersponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with JupyterCon, email partners@oreilly.com

Contact Us

View a complete list of JupyterCon contacts

©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com