Sep 23–26, 2019

Hands-on data science with Python

Michael Cullan (The Data Incubator)
9:00am—5:00pm Monday, September 23—Tuesday, September 24
Location: 1A 15/16
Secondary topics:  Deep dive into specific tools, platforms, or frameworks

Participants should plan to attend both days of training course. Note: to attend training courses, you must be registered for a Platinum or Training pass; does not include access to tutorials on Tuesday.

Michael Cullan walks you through developing a machine learning pipeline, from prototyping to production. You'll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python.

What you'll learn, and how you can apply it

  • Understand the basics of machine learning, feature engineering, anomaly detection, and recommendation engines
  • Explore scikit-learn fundamentals
  • Create machine learning processes with scikit-learn
  • Evaluate and apply machine learning to real-world problems

Who is this presentation for?

  • You're a software engineer or programmer with a background in Python, and you want to develop a basic understanding of machine learning.
  • You're in a nontechnical role, and you want to more effectively communicate about machine learning with the engineers and data scientists in your company.

Level

Intermediate

Prerequisites:

  • A working knowledge of Python
  • Familiarity with pandas (useful but not required)

Michael Cullan walks you through developing a machine learning pipeline, from prototyping to production. You’ll learn about data cleaning, feature engineering, model building and evaluation, and deployment and then extend these models into two applications from real-world datasets. All work will be done in Python.

Outline

Day 1: Anomaly detection

  • Data format and goal
  • Limitations of time series data
  • Detrending and seasonality
  • Windowing and local scores
  • Setting thresholds for classification
  • Online learning

Day 2: Recommendation engine

  • Overview of data and its wrangling
  • Item-item correlations and finding similar items
  • User similarity and predicting user ratings
  • Collaborative filtering
  • Evaluating model performance

About your instructor

Photo of Michael  Cullan

Michael Cullan holds a Masters in Statistics and has 4 years of research experience spanning topics in nonparametric statistics, applied mathematics, and artificial intelligence. He has 3 years of teaching experience in academic and professional settings. He combines a passion for teaching and statistical programming as a Data Scientist in Residence at The Data Incubator.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package. Best Price ends June 28

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts