Sharing and building off insights in collaboration with others is integral to open data science. On Kaggle, the data science community uses Kernels as a platform to share reproducible code, data, and knowledge. Since the introduction of code sharing on Kaggle in 2015, users have written tens of thousand of kernels, of which 45% are R, Python, and Julia notebooks. Over this time, Kernels has transformed how Kagglers tackle competitive machine learning problems, collaborate, and learn.
Megan Risdal and Wendy Chih-wen Kan discuss what Kernels has taught Kaggle about collaborative data science. Megan and Wendy begin by highlighting how code sharing in competitions has allowed users to learn and incorporate ideas and approaches from others, ultimately raising the competitive bar while fostering an online culture more inclusive to data scientists of all skill levels. They then describe how Kernels combined with public datasets published on Kaggle creates a repository of knowledge and reproducible analyses around high-value data. They conclude by demonstrating the ingredients of a “successful” notebook on Kaggle, based on community metrics.
Megan Risdal is a marketing manager at Kaggle. She holds master’s degrees in linguistics from the University of California, Los Angeles, and North Carolina State University. Her curiosities lie at the intersection of data, science, language, and learning.
Wendy Kan is a data scientist at Kaggle, the largest global data science community, where she works with companies and organizations to transform their data into machine learning competitions. Previously, Wendy was a software engineer and researcher. She holds BS and MS degrees in electrical engineering and a PhD in biomedical engineering.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org