Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Data science that works: Best practices for designing data-driven improvements, making them real, and driving change in your enterprise

1:30pm–5:00pm Tuesday, 09/27/2016
Data-driven business
Location: 1 E 06 Level: Intermediate
Tags: r-lang
Average rating: ****.
(4.29, 7 ratings)

Prerequisite knowledge

  • Familiarity with R programming and basic machine-learning algorithms
  • Materials or downloads needed in advance

  • A laptop with a web browser and an Internet connection
  • What you'll learn

  • Understand simple strategies for selecting tools and technologies that can deliver the business insights you need
  • Learn techniques for building and using data pipelines to make sure you always have enough data to do something useful
  • Explore ways to build maps from your most important business questions and methods for using those maps to build and execute data strategies
  • Learn techniques for quickly building data products that you understand and that are fit for purpose
  • Understand best practices for running experiments in very short sprints, discovering insights, and making improvements to the enterprise in small, meaningful chunks
  • Description

    There are people who can imagine ways of using data to improve an enterprise—they can explain the vision, make it real, and affect change in their organization. The combination of skills this requires may seem mystical, magical, and imaginary, but no, these folks are not unicorns. They’re professional data scientists and engineers with the ability to make a real impact on the enterprise.

    Join expert Jerry Overton as he walks you through how to get started “doing” data science in the enterprise, how to build and execute a data strategy, how to write algorithms, and how to experiment on an enterprise-scale.

    This course is for you if you’re a data scientist, engineer, or architect who aspires to become better at visualizing data-driven improvements, making them real, and driving change in an organization.


    Part 1: How to get started on the journey to pro

    Going pro in data science

    • Introduction
    • The pro’s journey
    • What it means to go pro

    Forget about the stack

    • An algorithm is to a data scientist as a microscope is to a biologist
    • The stack
    • Stack thinking
    • The utility
    • Utility thinking

    Think utility instead

    • The big picture for utility thinking
    • The scientific method is key
    • The reality of the scientific method
    • And that’s why you start with a pipeline

    Anatomy of the pipeline

    • Ingest (annotated code)
    • Clean and transform (annotated code)
    • Monitor (annotated code)
    • Automate (annotated code)
    • Example

    Part 2: How to build (and execute) a data strategy

    What’s a data strategy

    • Strategy is more than rules
    • The principles of strategic game play
    • You need a map

    Building a data strategy

    • Business questions
    • Value chain
    • Map
    • Identify systems (what)
    • Pick out points of attack (where)
    • Decide on action (how)
    • Example

    Part 3: How to write algorithms like a pro

    Why an algorithm is to a data scientist as a microscope is to a biologist

    Tools of the pro

    • Foundational concepts in machine learning and data science

    Things the pros know

    • Quote from pro blog
    • Without a hypothesis, you’re doomed
    • Stackoverflow is your friend (in small steps you understand)
    • Get the risky stuff out of the way first
    • Reality is the only opinion that matters

    Part 4: How to experiment on an enterprise scale

    The dimensions of industrialized machine learning

    • Access and collect the data
    • Ingest and clean
    • Agile experimentation
    • Generate insights
    • Transform the enterprise
    • Predicting hospital lengths of stays (a case study)

    Part 5: The top five habits of a professional data scientist or engineer

    • #5: Put aside the technology stack
    • #4: Keep data lying around
    • #3: Have a strategy
    • #2: Hack
    • #1: Experiment
    Photo of Jerry Overton

    Jerry Overton


    Jerry Overton is a data scientist and distinguished technologist in DXC’s Analytics Group, where he is the principal data scientist for industrial machine learning, a strategic alliance between DXC and Microsoft comprising enterprise-scale applications across six different industries: banking and capital markets, energy and technology, insurance, manufacturing, healthcare, and retail. Jerry is the author of Going Pro in Data Science: What It Takes to Succeed as a Professional Data Scientist (O’Reilly) and teaches the O’Reilly training course Mastering Data Science at Enterprise Scale. In his blog, Doing Data Science, Jerry shares his experience leading open research and transforming organizations using data science.