Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Practical machine learning

Michael Li (The Data Incubator), Robert Schroll (The Data Incubator)
9:00am–5:00pm Tuesday, 09/27/2016
Data science & advanced analytics
Location: 1 E 10/1 E11
Average rating: ***..
(3.00, 6 ratings)

What you'll learn

  • Gain a foundation in building intelligent business applications using machine learning
  • Description

    Tianhui Li and Robert Schroll of the Data Incubator offer a foundation in building intelligent business applications using machine learning, walking you through all the steps to prototyping and production—data cleaning, feature engineering, model building and evaluation, and deployment—and diving into an application for anomaly detection and a personalized recommendation engine. All concepts will be presented with example code in Python.

    Topics include:
    Personalization Recommendation Engine:

    • Overview of data and its wrangling
    • Item-item correlations and finding similar items
    • User similarity and predicting user ratings
    • Collaborative filtering
    • Evaluating model performance

    Anomaly Detection:

    • Data format and goal
    • Limitations of time series data
    • Detrending and seasonality
    • Windowing and local scores
    • Setting thresholds for classification
    • Online learning
    Photo of Michael Li

    Michael Li

    The Data Incubator

    Tianhui Michael Li is the founder and president of the Data Incubator, a data science training and placement firm. Michael bootstrapped the company and navigated it to a successful sale to the Pragmatic Institute. Previously, he headed monetization data science at Foursquare and has worked at Google, Andreessen Horowitz, JPMorgan, and D.E. Shaw. He’s a regular contributor to the Wall Street JournalTechCrunchWiredFast CompanyHarvard Business ReviewMIT Sloan Management ReviewEntrepreneurVentureBeat, TechTarget, and O’Reilly. Michael was a postdoc at Cornell, a PhD at Princeton, and a Marshall Scholar in Cambridge.

    Photo of Robert Schroll

    Robert Schroll

    The Data Incubator

    Robert Schroll is a data scientist in residence at the Data Incubator. Previously, he held postdocs in Amherst, Massachusetts, and Santiago, Chile, where he realized that his favorite parts of his job were teaching and analyzing data. He made the switch to data science and has been at the Data Incubator since. Robert holds a PhD in physics from the University of Chicago.

    Comments on this page are now closed.


    Picture of Robert Schroll
    09/26/2016 7:05am EDT

    Hi all — Just a reminder that you should get the course material from our Github repo: The curriculum will be up shortly, but you should clone the repo, set up your conda environment or Docker container, and download the data now. Then, all you’ll need is a git pull to update the curriculum as soon as I get it merged in.

    09/24/2016 5:31pm EDT

    What’s the link to the repo for this tutorial? Can I do this tutorial with a standard anaconda install on a windows machine or do I need to set the environment as below?

    Picture of Robert Schroll
    09/20/2016 9:56am EDT

    Hi Everyone — I’ve just updated the course repo with a new environment.yml and a new Dockerfile. Hopefully, these will work better.

    The new environment.yml file contains only the top-level packages we’ll need, thereby allowing your conda version to work out the necessary dependencies. We may all end up with slightly different versions, but everything should work regardless.

    The new Dockerfile will launch a IPython Notebook Server by default. If you follow the updated instructions in the README, it will mount the repository as a volume on the container, letting you work on files local or via the Python in the container equally easily.

    I hope at least one of these will work for everyone, but please let us know if you run into problems!

    09/20/2016 6:37am EDT

    Setting the environment, I got the following issues:
    “environment.yml is not a valid yaml file.
    Environment with requierements.txt file needs a name”

    I removed the prefix and updated my conda env as suggested in some site but still encountering the issues.
    Any idea?

    09/18/2016 4:22pm EDT

    I’d like to set up the Linux VM with the required packages. Where can I download Dockerfile?

    Picture of Robert Schroll
    09/17/2016 3:56pm EDT

    None of those packages are critical themselves. If you can get the rest of the packages to install without those dependencies, you should be good to go. Alternatively, we’ve added a Dockerfile to the repo, which will allow you to spin up a Linux VM with the required packages.

    09/16/2016 11:18am EDT

    I’m setting up my environment for the session and got the following error message:

    $ conda env create -f environment.yml
    Using Anaconda Cloud api site
    Fetching package metadata …….
    Solving package specifications: .
    Error: Packages missing in current osx-64 channels:
    – cairo 1.12.18 6
    – fontconfig 2.11.1 6
    – glib 2.43.0 1
    – harfbuzz 0.9.39 1
    – libffi 3.2.1 0
    – libgfortran 3.0.0 1
    – libsodium 1.0.10 0
    – mistune 0.7.2 py27_0
    – pango 1.39.0 1
    – pixman 0.32.6 0
    – pycairo 1.10.0 py27_0
    – zeromq 4.1.4 0

    Do these packages exist or do I need to do something different?


    Picture of Robert Schroll
    09/08/2016 10:19am EDT

    We are planning on splitting things about 50-50 between presentation and exercises. Actually working on code yourself is very important!

    09/07/2016 5:00pm EDT

    Will this tutorial include programming practice time?

    Picture of Robert Schroll
    08/29/2016 9:58am EDT

    Yes, we will be presenting code for both examples. All of the code will be Python, but we hope to explain the underlying concepts well enough that you can implement them in your favorite language.

    Arash Sadati
    08/24/2016 5:07am EDT

    Will this be a tutorial on how to program these two cases (recommendation and anomaly) in Python or Java? If it’s actual coding, what language will be used?