Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Machine learning In Python with scikit-learn

Andreas Mueller (NYU, scikit-learn)
9:00am–12:30pm Tuesday, 12/01/2015
Data Science and Advanced Analytics
Location: 331 Level: Intermediate
Average rating: ***..
(3.83, 6 ratings)

Prerequisite Knowledge

Basic Python knowledge. Numpy knowledge and slight machine learning knowledge an advantage.


Scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry. Scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models. The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline. We will also cover how to build machine learning models on text data, and how to handle very large data sets.

Photo of Andreas Mueller

Andreas Mueller

NYU, scikit-learn

Andreas Mueller received his PhD in machine learning from the University of Bonn. After working as a machine learning researcher on computer vision applications at Amazon for a year, he recently joined the Center for Data Science at New York University. In the last four years, he has been maintainer and one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, and author and contributor to several other widely-used machine learning packages. His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science, and democratize access to high-quality machine learning algorithms.