Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Augmented data engineering: Leveraging machine learning in data profiling and discovery (sponsored by Io-Tahoe)

Arun Murugan (GE Digital), Jeff Miller (GE)
11:20am–12:00pm Thursday, 09/13/2018
Location: 1E 06
Average rating: **...
(2.00, 2 ratings)

What you'll learn

  • Discover how GE uses machine learning (powered by Io-Tahoe) in data discovery and profiling for data engineering


Today’s technology allows us to gather, transform, and consume data at a scale and complexity that was previously unimaginable. As companies leverage these new capabilities, it remains critical that they are able to discover and surface the relationships between events and transactions captured from different systems and processes. Use cases such as financial reporting and analysis require a well-integrated, harmonized, and accurate data model.

GE is leveraging the power of machine learning in data discovery and profiling as its data engineering teams surface the threads among its data assets. This data threading is essential in delivering next-generation analytics and powers an increasing catalogue of data science models and microservices. Join Arun Murugan and Jeff Miller to learn how GE’s data engineers adopt machine learning to deliver an integrated set of enterprise canonical models to deliver value to the enterprise.

Topics include:

  • Business criticality of finance closing process and dependency on analytical solutions
  • Data modeling needs for enterprise data lakes
  • Why data threading is important
  • Machine learning use cases for data associations and correlations

This session is sponsored by Io-Tahoe.

Photo of Arun Murugan

Arun Murugan

GE Digital

Arun Murugan is senior director of data engineering at GE, where he leads the data architecture team at GE Digital responsible for data modeling, data engineering, and core ETL framework for finance data lake within GE. He has extensive experience in translating complex business requirements into technology solutions, with a specialization in creating data-centric processes involving complex data integration and analytical solutions. Arun has expertise in designing and architecting enterprise data lakes by leveraging big data ecosystems, state-of-the-art tools and technologies, and multiple distributed data processing platforms.

Photo of Jeff Miller

Jeff Miller


Jeff Miller is vice president of data and analytics, where he oversees the product management and development of a diversified set of analytics products and data science services for GE’s industrial businesses and finance professionals and leads a multidisciplinary team of both technical and functional resources in the development of a suite of persona-aware solutions. Drawing on his unique blend of financial system, data engineering, business intelligence, and product management experience, Jeff provides architectural leadership and oversight across the data lifecycle, from ingestion and modeling to visualization and delivery.