Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Data science from idea to pilot to production: Challenges and lessons learned

Amihai Savir (EMC)
10:00am–10:30am Tuesday, 09/27/2016
Data 101
Location: 1B 01/02 Level: Non-technical
Average rating: ***..
(3.67, 3 ratings)

In the age of big data analytics, smart monitoring and predicting abnormal behavior of corporation mission-critical systems can save large amounts of time and money. Drawing on a real-world case study from EMC, Amihai Savir examines the winding path from idea to viable solution in a corporate environment and walks you through challenges encountered and lessons learned.

EMC’s centralized data science team was established to provide data science services to EMC business units (BUs) and extract actionable insights from the data. To that end, EMC assembled together a group of machine-learning practitioners, statisticians, and mathematicians to develop complex, advanced big data solutions.

The team went on a road show to identify the most promising data science use cases. One such opportunity was an engagement with EMC’s internal IT, whose systems generate millions of entries per second from a large number of subsystems. For example, the authentication environment alone generates 10,000 events per second from more than five major subsystems. With this volume, velocity, and variety of data, meeting IT’s quality of service (QoS) and service-level agreement (SLA) demands is a very challenging task. The team was asked to develop a model capable of predicting when one of the services will fail based on their collective log and performance data.

Amihai offers an overview of the team’s remarkable journey, discussing the multiple phases and development stages as well as the many questions and doubts that arose along the way. Eventually, the project proved a great success, with an expected ROI of $25M/year, and is now running in production for monitoring the MS Exchange and Authentication (ITOA) environments. Amihai shares the team’s experience and insights, which will provide value and a solid knowledge foundation for managers, data scientists, analytics professionals, and IT operations to leverage in order to drive and build data-driven processes.

Amihai also shares some key questions that guided the team, including:

  • What is the specific business question the engagement is intended to answer?
  • What is the type of engagement? Is this a quick-win, low-complexity project or a long-term engagement? Do we expect it to get to deployment?
  • Which data should we use and where is it located?
  • How do you combine a holistic approach to the problem with drill-down or expansion capabilities as needed?
  • How do you design the model’s output such that it can be efficiently consumed by the business? How much effort should be invested in smart and dynamic visualization when a mock-up visualization might be sufficient?
  • How do you find a balance between solutions tailored specifically for one application to a more general “lift and shift” solution that can be used in other applications?
  • How do you promote the operationalization of the solution in the enterprise?
  • What is the right support model once the solution is deployed in production?
Photo of Amihai Savir

Amihai Savir

EMC

Amihai Savir is a seasoned data scientist and currently leading team of data scientists in EMC. Amihai is also a lecturer at Ben-Gurion University, where he has has taught a variety of subjects including C programing, advanced Java programing, data structures, algorithms, and complexity. Prior to joining EMC, he held several research and development positions in Israeli high-tech companies and in academia, where he focused on various aspects of data science and software engineering. Amihai holds a master’s degree in computer science from Ben-Gurion University, where he specialized in recommender systems and machine learning.