Hands-on data science with Python (Day 2)

Michael Cullan (Pragmatic Institute)

Location: 1A 15/16

Data Science, Machine Learning, & AI

Who is this presentation for?

You're a software engineer or programmer with a background in Python, and you want to develop a basic understanding of machine learning.
You're in a nontechnical role, and you want to more effectively communicate about machine learning with the engineers and data scientists in your company.

Level

Intermediate

Description

Michael Cullan walks you through developing a machine learning pipeline from prototyping to production. You’ll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python.

Outline

Day 1: Anomaly detection

Data format and goal
Limitations of time series data
Detrending and seasonality
Windowing and local scores
Setting thresholds for classification
Online learning

Day 2: Recommendation engine

Overview of data and its wrangling
Item-item correlations and finding similar items
User similarity and predicting user ratings
Collaborative filtering
Evaluating model performance

Prerequisite knowledge

A working knowledge of Python
Familiarity with pandas (useful but not required)

What you'll learn

Understand the basics of machine learning, feature engineering, anomaly detection, and recommendation engines
Explore scikit-learn fundamentals
Create machine learning processes with scikit-learn
Evaluate and apply machine learning to real-world problems

Michael Cullan

Pragmatic Institute

Michael Cullan is a data scientist in residence at Pragmatic Institute, where he teaches hands-on courses in data science and business-oriented topics in managing data science initiatives at the organizational level. He also leads internal data science projects in support of marketing and operations teams. He earned a master’s degree in statistics and a bachelor’s degree in mathematics. His academic research areas ranged from computational paleobiology, where he developed software for measuring evidence for disparate evolutionary models based on fossil data, to music and AI, where he assisted in modeling musical data for a jazz improvisation robot. In his free time, he applies his math and programming skills toward creating code-based visual art and design projects.