Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Create advanced analytic models with open source

Kyle Ambert (Intel)
11:20am–12:00pm Wednesday, 09/28/2016
Location: 1 C04 / 1 C05
Average rating: ***..
(3.00, 2 ratings)

What you'll learn

  • Explore the Trusted Analytics Platform, an open source-based platform that enables data scientists to ask bigger questions of their data and carry out principled data science experiments
  • Description

    Creating production-ready analytical pipelines can be a messy, error-prone undertaking. In the simplest case, connecting a workflow of heterogeneous components, such as databases, feature enrichment and visualization tools, programming languages, and analytical engines, requires maintaining connections between multiple tools. And each of these tools is subject to its own development cycle. In the case of projects involving big data or analytics over real-time streaming data, the difficulties only increase.

    The Trusted Analytics Platform (TAP) is an open source-based platform combining elements from popular projects, including Python, Spark Streaming, GearPump, and Docker. TAP enables data scientists to ask bigger questions of their data and carry out principled data science experiments—all while engaging in iterative, collaborative development of production solutions with application developers. Since TAP was introduced in 2015, project contributions have included popular analytics tools and libraries, including the ability to “ bring your own." Kyle Ambert offers an overview of these open source project contributions, which include a new Docker-based architecture and improved Spark integration, and explains what they mean to data scientists. Kyle also discusses a healthcare machine-learning-based solution focused on the identification of hospital patients at risk for readmittance.

    This session is sponsored by Intel.

    Photo of Kyle Ambert

    Kyle Ambert


    Kyle Ambert is lead data scientist at Intel’s Artificial Intelligence and Analytics Solutions group, where he uses machine learning and statistical methods to solve real-world big data problems. Currently, his research centers around novel applications of machine learning in the health and life sciences. Kyle contributes to the data science direction of the Trusted Analytics Platform, particularly as it pertains to analytical pipeline and algorithm development. He holds a BA in biological psychology from Wheaton College and a PhD in biomedical informatics from Oregon Health & Science University, where his research focused on text analytics and developing machine-learning optimization solutions for biocuration workflows in the neurosciences.