Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Monday, 12/04/2017

8:30am

8:30am–9:00am Monday, 12/04/2017
Location: Foyer 5
Coffee Break (30m)

9:00am

Add to your personal schedule
9:00am–5:00pm Monday, 12/04/2017
Big data and the cloud
Location: 335
Jesse Anderson (Big Data Institute)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data, and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 12/04/2017
Robert Schroll (The Data Incubator)
Robert Schroll demonstrates TensorFlow's capabilities through its Python interface and explores TFLearn, a high-level deep learning library built on TensorFlow. Join in to learn how to use TFLearn and TensorFlow to build machine learning models on real-world data. Read more.

10:30am

10:30am–11:00am Monday, 12/04/2017
Location: Foyer 5
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, 12/04/2017
Location: Summit 1 & 2
Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, 12/04/2017
Location: Foyer 5
Afternoon (30m)

Tuesday, 12/05/2017

8:30am

8:30am–9:00am Tuesday, 12/05/2017
Location: Foyer 3 & 5
Coffee Break (30m)

9:00am

Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Big data and the cloud
Location: 308/309 Level: Intermediate
Vinithra Varadharajan (Cloudera), Philip Langdale (Cloudera), Jason Wang (Cloudera), Fahd Siddiqui (Cloudera)
Vinithra Varadharajan, Philip Langdale, Jason Wang, and Fahd Siddiqui lead a deep dive into running data engineering workloads in a managed service capacity in the public cloud, highlighting cloud infrastructure best practices and illustrating how data engineering workloads interoperate with data analytic engines. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Becoming a data-centric company, Strata Business Summit
Location: 310/311 Level: Non-technical
John Akred (Silicon Valley Data Science)
Big data, AI, and data science have great potential for accelerating business, but how do you reconcile business opportunity with the sea of possible technologies? Data should serve the strategic imperatives of a business—those aspirations that will define an organization’s future vision. John Akred explains how to create a modern data strategy that powers data-driven business. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Data science and advanced analytics, Machine Learning
Location: 321/322 Level: Intermediate
Jared Lander (Lander Analytics)
Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today's incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Data science and advanced analytics, Machine Learning
Location: 328/329 Level: Intermediate
Yufeng Guo (Google)
Yufeng Guo walks you through training and deploying a machine learning system using TensorFlow, a popular open source library. Yufeng takes you from a conceptual overview all the way to building complex classifiers and explains how you can apply deep learning to complex problems in science and industry. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Location: 323
Alistair Croll (Solve For Interesting), kyungtaak Noh (SK Telecom), Jisung Kim (SK Telecom), Mike Prorock (mesur.io), Hugo Sheng (Qlik), Alexandre Chade (Dotz), Jonathan Seidman (Cloudera), Ted Malaska (Blizzard Entertainment), Mike Koelemay (Sikorsky Aircraft, Lockheed Martin)
In a series of half-hour talks aimed at a business audience, you’ll hear from household brands and global companies as they explain the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further. Read more.

10:30am

10:30am–11:00am Tuesday, 12/05/2017
Location: Foyer 3 & 5
Morning break (30m)

12:30pm

12:30pm–1:30pm Tuesday, 12/05/2017
Location: Summit 1 & 2
Lunch (1h)

1:30pm

Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Data engineering and architecture
Location: 308/309 Level: Intermediate
Jonathan Seidman (Cloudera), Ted Malaska (Blizzard Entertainment)
Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Design, UX, visualization, and VR, Machine Learning
Location: 310/311 Level: Beginner
Bargava Subramanian (Independent), Amit Kapoor (narrativeVIZ Consulting)
One of the challenges in traditional data visualization is that they are static and have bounds on limited physical/pixel space. Interactive visualizations allows us to move beyond this limitation by adding layers of interactions. Bargava Subramanian and Amit Kapoor teach the art and science of creating interactive data visualizations. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Machine Learning, Spark and beyond
Location: 321/322 Level: Intermediate
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Vartika Singh and Jeffrey Shmain walk you through various approaches using the machine learning algorithms available in Spark ML to understand and decipher meaningful patterns in real-world data. Vartika and Jeff also demonstrate how to leverage open source deep learning frameworks to run classification problems on image and text datasets leveraging Spark. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Data science and advanced analytics, Machine Learning
Location: 328/329 Level: Intermediate
Tim Seears (Think Big, a Teradata company), David Mueller (Teradata)
Tim Seears and David Mueller explain how to apply deep learning to improve consumer recommendations by training neural nets to learn categories of interest using embeddings. They then demonstrate how to extend this with WALS matrix factorization to achieve wide and deep learning—a process which is now used in production for the Google Play Store. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Location: 323
Alistair Croll (Solve For Interesting), Clifton Phua (NCS Group), Mark Donsky (Cloudera), Syed Rafice (Cloudera), Victor Chua (StarHub Ltd), Arun Kejariwal (MZ), Francois Orsini (MZ), Isaac Reyes (DataSeer), Zhihao Lin (Teralytics)
The modern city is awash in data. Cheap sensors on cars, roads, and people give us a real-time understanding of traffic. We can track pollution, temperature, and climate with unerring precision. Satellite photographs reveal shade cover, property values, and building development. Read more.

3:00pm

3:00pm–3:30pm Tuesday, 12/05/2017
Location: Foyer 3 & 5
Afternoon break (30m)

Wednesday, 12/06/2017

8:00am

8:00am–8:15am Wednesday, 12/06/2017
Location: Hall 404 Foyer
Coffee break sponsored by TigerGraph (15m)

8:15am

Add to your personal schedule
8:15am–8:45am Wednesday, 12/06/2017
Location: Hall 404 Foyer
Ready, set, network! Meet fellow attendees who are looking to connect at Strata. We'll gather before Wednesday keynotes to host an informal speed networking event. Be sure to bring your business cards and have fun. Read more.

8:50am

Add to your personal schedule
8:50am–9:00am Wednesday, 12/06/2017
Location: Hall 404AXF
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

9:00am

Add to your personal schedule
9:00am–9:15am Wednesday, 12/06/2017
Location: Hall 404AXF
Melanie Johnston-Hollitt (Victoria University of Wellington)
Keynote with Melanie Johnston-Hollitt Read more.

9:15am

Add to your personal schedule
9:15am–9:30am Wednesday, 12/06/2017
Location: Hall 404AXF
Mick Hollison (Cloudera), Cesar Delgado (Apple)
Twenty years ago, a company implored us to “think different” about personal computers. Today, Apple continues to live and breathe that legacy. It’s evident in the machine learning and analytics architectures that power many of the company’s most innovative applications. Cesar Delgado joins Mick Hollison to discuss how Apple is using its big data stack and expertise to solve non-data problems. Read more.

9:30am

Add to your personal schedule
9:30am–9:45am Wednesday, 12/06/2017
Location: Hall 404AXF
Steve Leonard (SGInnovate)
Keynote with Steve Leonard Read more.

9:45am

Add to your personal schedule
9:45am–9:55am Wednesday, 12/06/2017
Location: Hall 404AXF
Ben Lorica (O'Reilly Media)
Machine learning models are becoming increasingly widely used and deployed. Ben Lorica explains how to guard against flaws and failures in your machine learning deployments. Read more.

10:00am

Add to your personal schedule
10:00am–10:20am Wednesday, 12/06/2017
Location: Hall 404AXF
Joshua Bloom (GE Digital)
The ongoing digitization of the industrial-scale machines that power and enable human activity is itself a major global transformation. Joshua Bloom explains why the real revolution—in efficiencies and in improved and saved lives—will happen when machine learning automation and insights are properly coupled to the complex systems of industrial data. Read more.

10:20am

Add to your personal schedule
10:20am–10:40am Wednesday, 12/06/2017
Location: Hall 404AXF
Keynote by Bruno Fernandez-Ruiz Read more.

10:45am

10:45am–11:15am Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Morning break sponsored by Google (30m)

11:15am

Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data engineering and architecture
Location: 308/309 Level: Intermediate
Ted Malaska (Blizzard Entertainment)
Ted Malaska shares the top five mistakes that no one talks about when you start writing your streaming app along with the practices you'll inevitably need to learn along the way. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Neelesh Srinivas Salian offers an overview of the data platform used by data scientists at Stitch Fix, based on the Spark ecosystem. Neelesh explains the development process and shares some lessons learned along the way. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Non-technical
John Akred (Silicon Valley Data Science), Mark Hunter (Sainsburys Bank)
Deploying machine learning in business requires far more than just selecting an algorithm. You need the right architecture, tools, and team organization to drive your agenda successfully. John Akred and Mark Hunter share practical advice on the technical and human sides of machine learning, based on experience preparing Sainsbury’s for its ML-enabled future. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Strata Business Summit
Location: 328/329
Shilpa Aggarwal (McKinsey & Company)
After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we're still early in the cycle of adoption. Shilpa Aggarwal explains where investment is going, patterns of AI adoption, and how the value potential of AI across sectors and business functions is beginning to emerge in Asia. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Machine Learning
Location: 323
Anomalies occur frequently in live data for a multitude of reasons, so detection and filtering of anomalies is of paramount importance for robust decision making. Dhruv Choudhary, Arun Kejariwal, and Francois Orsini explore the design and architecture of MZ's Satori platform and share techniques for anomaly detection on live data. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning
Location: Summit 1 Level: Beginner
In the current Agile business environment, where developers are required to experiment multiple ideas and also react to various situations, doing cloud-native development is the way to go. Harjinder Mistry and Bargava Subramanian explain how to design and build a microservices-based cloud-native machine learning application. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: Summit 2 Level: Intermediate
Wolff Dobson (Google)
TensorFlow, the world's most popular machine learning framework, is fast, flexible, and production ready. Wolff Dobson, representing the Google Brain team, shares the latest developments in TensorFlow, including tensor processing units (TPUs), distributed training, new APIs and models, and mobile features. Join in to learn what's in store for TensorFlow and how ML can change your business. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Sponsored
Location: 334/335
Felipe Hoffa (Google)
Stop worrying about infrastructure; focus on your data and insights. Felipe Hoffa explains how Google Cloud brings easy solutions to previously hard problems. Read more.

12:05pm

Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Location: 308/309 Level: Intermediate
Bas Geerdink (ING)
Bas Geerdink explains why and how ING is becoming more and more data-driven, sharing use cases, architecture, and technology choices along the way. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Location: 310/311 Level: Intermediate
Kostas Sakellis (Cloudera)
With its scalable data store, elastic compute, and pay-as-you-go cost model, cloud infrastructure is well-suited for large-scale data engineering workloads. Kostas Sakellis explores the latest cloud technologies, focusing on data engineering workloads, cost, security, and ease-of-use implications for data engineers. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Design, UX, visualization, and VR, Strata Business Summit
Location: 321/322 Level: Beginner
Isaac Reyes (DataSeer)
Isaac Reyes explores the art and science of data storytelling, covering the essential elements of a good data story, chart design and why it matters, the Gestalt principals of visual perception and how they can be used to tell better stories with data, and how to make over a poor visualization. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Jesse Anderson (Big Data Institute)
Early project success is predicated on management making sure a data engineering team is ready and has all of the skills needed. Jesse Anderson outlines five of the most common nontechnology reasons why data engineering teams fail. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
For the first time, messaging apps have surpassed social networks in usage and growth. Mohammed Abdoolcarim shares best practices for designing for AI-based conversational UIs, such as those employed in messaging apps, drawn from work done at Apple, Google, and GoButler. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Jared Lander (Lander Analytics)
One common (but false) knock against R is that it doesn't scale well. Jared Lander shows how to use R in a performant matter both in terms of speed and data size and offers an overview of packages for running R at scale. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: Summit 2 Level: Intermediate
Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft)
Transfer learning enables you to use pretrained deep neural networks (e.g., AlexNet, ResNet, and Inception V3) and adapt them for custom image classification tasks. Danielle Dean and Wee Hyong Tok walk you through the basics of transfer learning and demonstrate how you can use the technique to bootstrap the building of custom image classifiers. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Sponsored
Location: 334/335
Vira Shanty (Lippo Group)
Vira Shanty explains how the Lippo Group, one of the largest business conglomerates in Indonesia, is integrating data from multiple lines of business into a single big data analytic platform featuring an API layer with subsecond latency and how the company's mantra “deep and fast analytics” is opening new opportunities for improved customer engagement and new revenue streams. Read more.

12:45pm

Add to your personal schedule
12:45pm–1:45pm Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Looking to network with other attendees during lunch? Topic Table discussions help you connect with people in similar industries or interested in the same topics. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Ofir Sharony (MyHeritage)
What are the most important considerations for shipping billions of daily events to analysis? Ofir Sharony shares MyHeritage's journey to find a reliable and efficient way to achieve real-time analytics. Along the way, Ofir compares several data loading techniques, helping you make better choices when building your next data pipeline. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Vickye Jain (ZS Associates), Raghav Sharma (ZS Associates)
Vickye Jain and Raghav Sharma explain how they built a very high-performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Financial technology and data, Strata Business Summit
Location: 321/322 Level: Non-technical
Amit Das (Think Analytics India)
Access to credit in emerging markets is impeded by issues around identity verification, risk assessment and monitoring, and the costs of underwriting and collections. At the core of it all is a lack of data. Amit Das explains how accessing alternate data, real-time risk monitoring and data access solutions, and smart analytics is changing the lending landscape in India. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Becoming a data-centric company, Strata Business Summit
Location: 328/329 Level: Non-technical
Jessica Chen Riolfi (TransferWise)
Data is essential to unlock growth opportunities, and successful companies use it in every decision. Jessica Chen Riolfi explains how to build an organization with decentralized, data-driven decision making that enables teams to focus on the products and features that matter and ultimately unlock exponential growth. Read more.
1:45pm–2:25pm Wednesday, 12/06/2017
Location: 323
TBC
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Aki Ariga (Cloudera)
Aki Ariga explains how to put your machine learning model into production, discusses common issues and obstacles you may encounter, and shares best practices and typical architecture patterns of deployment ML models with example designs from the Hadoop and Spark ecosystem using Cloudera Data Science Workbench. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: Summit 2 Level: Intermediate
Bargava Subramanian and Harjinder Mistry share data engineering and machine learning strategies for building an efficient real-time recommendation engine when the transaction data is both big and wide. They also outline a novel way of generating frequent patterns using collaborative filtering and matrix factorization on Apache Spark and serving it using Elasticsearch in the cloud. Read more.

2:35pm

Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Big data in telecommunications, Data engineering and architecture
Location: 308/309 Level: Intermediate
Yousun Jeong (SK Telecom), Ah Young Hwang (SK Telecom)
Data transfer is one of the most pressing problems for telecom companies, as cost increases in tandem with the growing data requirements. Yousun Jeong and Ah Young Hwang detail how SKT has dealt with this problem. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Mingxi Wu (TigerGraph), Yu Xu (TigerGraph)
Mingxi Wu and Yu Xu offer an overview of TigerGraph, a high-performance enterprise graph data platform that enables businesses to transform structured, semistructured, and unstructured data and massive enterprise data silos into an intelligent interconnected data network, allowing them to uncover the implicit patterns and critical insights to drive business growth. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data case studies, Strata Business Summit
Location: 321/322 Level: Intermediate
Eric Tham (National University of Singapore)
Graphical techniques are increasingly being used for big data. These techniques can be broadly classified into the three C's: centrality, clustering, and connectedness. Eric Tham explains how these concepts are applied to supply chain analysis and financial portfolio management. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Ricky Barron (InfoStrategy)
To many organizations, big data analytics is still a solution looking for a problem. Ricky Barron shares practical methods for getting the best out of your big data analytics capability and explains why establishing an "insights group" can improve the bottom line, drive performance, optimize processes, and create new data-driven products and solutions. Read more.
2:35pm–3:15pm Wednesday, 12/06/2017
Location: 323
TBC
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning
Location: Summit 1 Level: Beginner
Wai Yau (Zendesk), Jeffrey Theobald (Zendesk)
Simply building a successful machine learning product is extremely challenging, and just as much effort is needed to turn that model into a customer-facing product. Drawing on their experience working on Zendesk's article recommendation product, Wai Yau and Jeffrey Theobald discuss design challenges and real-world problems you may encounter when building a machine learning product at scale. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data engineering and architecture
Location: Summit 2 Level: Intermediate
Modern engineering requires machine learning engineers, who are needed to monitor and implement ETL and machine learning models in production. Natalino Busa shares technologies, techniques, and blueprints on how to robustly and reliably manage data science and ETL flows from inception to production. Read more.

3:15pm

3:15pm–4:15pm Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Afternoon break (1h)

4:15pm

Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Xiaochang Wu (Intel)
Xiaochang Wu explains how to design and implement a real-time processing platform using the Spark Structured Streaming framework to intelligently transform production lines in the manufacturing industry. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Big data and the cloud, Data engineering and architecture
Location: 310/311 Level: Beginner
Feng Cheng (Grab), Yanyu Qu (Grab)
Grab uses Presto to support operational reporting (batch and near real-time), ad hoc analyses, and its data pipeline. Currently, Grab has 5+ clusters with 100+ instances in production on AWS and serves up to 30K queries per day while supporting more than 200 internal data users. Feng Cheng and Yanyu Qu explain how Grab operationalizes Presto in the cloud and share lessons learned along the way. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Beginner
Benjamin Wright-Jones (Microsoft), Simon Lidberg (Microsoft)
As organizations turn to data-driven strategies, they are also increasingly exploring the creation of a data science or analytic center of excellence (COE). Benjamin Wright-Jones and Simon Lidberg outline the building blocks of a center of excellence and describe the value for organizations embarking on data-driven strategies. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Executive Briefing, Spark and beyond, Strata Business Summit
Location: 328/329 Level: Non-technical
John Akred (Silicon Valley Data Science)
AI is white-hot at the moment, but where can it really be used? Developers are usually the first to understand why some technologies cause more excitement than others. John Akred relates this insider knowledge, providing a tour through the hottest emerging data technologies of 2017 to explain why they’re exciting in terms of both new capabilities and the new economies they bring. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Shirshanka Das (LinkedIn), Tushar Shanbhag (LinkedIn)
LinkedIn houses the most valuable professional data in the world. Protecting the privacy of member data has always been paramount. Shirshanka Das and Tushar Shanbhag outline three foundational building blocks for scalable data management that can meet data compliance regulations: a central metadata system, an integrated data movement framework, and a unified data access layer. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning, Spark and beyond
Location: Summit 1 Level: Intermediate
Holden Karau (Google)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau introduces Spark’s ML pipelines and explains how to extend them with your own custom algorithms, allowing you to take advantage of Spark's meta-algorithms and existing ML tools. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Big data and the cloud, Machine Learning
Location: Summit 2 Level: Intermediate
Yufeng Guo (Google)
Yufeng Guo demonstrates how to use TensorFlow to easily combine linear regression models and deep neural networks with a machine learning model that has the benefits of both. You'll also learn what is happening under the hood and how you can use this model for your own datasets. Read more.

5:05pm

Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Data case studies, Data engineering and architecture
Location: 308/309 Level: Intermediate
Andreas Hadimulyono discusses the challenges that Grab is facing with the ever-increasing volume and velocity of its data and shares the company's plans to overcome them. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Big data and the cloud, Data engineering and architecture
Location: 310/311 Level: Intermediate
Henry Robinson (Cloudera), Greg Rahn (Cloudera)
Cloud environments will likely play a key role in your business’s future. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud for analytics and discuss common architectural patterns to optimize price and performance. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: 321/322 Level: Intermediate
Anand Chitipothu (rorodata)
There are many challenges to deploying machine models in production, including managing multiple versions of models, maintaining staging and production models, keeping track of model performance, logging, and scaling. Anand Chitipothu explores the tools, techniques, and system architecture of a cloud platform built to solve these challenges and the new opportunities it opens up. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Teresa Tung (Accenture Labs)
A data-driven enterprise maximizes the value of its data. But how do enterprises emerging from technology and organization silos get there? Teresa Tung explains how to create a data-driven enterprise maturity model that spans technology and business requirements and walks you through use cases that bring the model to life. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Mark Donsky (Cloudera), Steven Ross (Cloudera)
In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Steven Ross and Mark Donsky outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning, Spark and beyond
Location: Summit 1 Level: Intermediate
Peng Meng (Intel)
Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in JD.com’s production environment. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Machine Learning
Location: Summit 2 Level: Intermediate
YIQUN HU (Singapore Power)
Energy usage is a significant part of daily life, so the ability to monitor this use offers a number of benefits, from cost savings to improved safety. A key challenge is the lack of labeled data. Yiqun Hu shares a new solution: a RNN-based network trained to learn good features from unlabeled data. Read more.

5:45pm

Add to your personal schedule
5:45pm–6:45pm Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Need to unwind after a long day of sessions? Join us at the Sponsor Pavilion Reception and enjoy beverages and snacks with fellow Strata Data sponsors, attendees, and speakers. Read more.

Thursday, 12/07/2017

8:00am

8:00am–8:15am Thursday, 12/07/2017
Location: Hall 404 Foyer
Coffee break sponsored by TigerGraph (15m)

8:15am

Add to your personal schedule
8:15am–8:45am Thursday, 12/07/2017
Location: Hall 404 Foyer
Ready, set, network! Meet fellow attendees who are looking to connect at Strata. We'll gather before Thursday keynotes to host an informal speed networking event. Be sure to bring your business cards and have fun. Read more.

8:50am

Add to your personal schedule
8:50am–8:55am Thursday, 12/07/2017
Location: Hall 404AXF
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

8:55am

Add to your personal schedule
8:55am–9:10am Thursday, 12/07/2017
Location: Hall 404AXF
Carme Artigas (Synergic Partners)
The concept of smart cities has evolved from sensored urban centers to platform ecosystems that combine data with new technologies such as the IoT, the cloud, and AI. Carme Artigas explores the challenges and opportunities of evolving from smart cities to intelligent societies. Read more.

9:10am

Add to your personal schedule
9:10am–9:20am Thursday, 12/07/2017
Location: Hall 404AXF
Amr Awadallah (Cloudera)
We are witnessing a new revolution in data—the age of decision automation. Amr Awadallah explains the historic importance of this next wave in automation and highlights the foundational capabilities required to enable it: machine learning and analytics optimized for the cloud. Read more.

9:20am

Add to your personal schedule
9:20am–9:35am Thursday, 12/07/2017
Location: Hall 404AXF
Ajey Gore (GO-JEK)
Keynote with Ajey Gore Read more.

9:35am

Add to your personal schedule
9:35am–9:45am Thursday, 12/07/2017
Location: Hall 404AXF
Rhea Liu (China Tech Insights)
Smartphones have deeply changed the way we consume information today, but they have also profoundly influenced the process of content production. Rhea Liu shares insights on how the content business in China today has been impacted by data and technology and offers some thoughts on the future. Read more.

9:45am

Add to your personal schedule
9:45am–10:00am Thursday, 12/07/2017
Location: Hall 404AXF
Secondary topics:  ecommerce
Tony Lee (JD.com)
Details to come. Read more.

10:00am

Add to your personal schedule
10:00am–10:20am Thursday, 12/07/2017
Location: Hall 404AXF
Kira Radinsky (eBay | Technion)
Kira Radinsky offers an overview of a system that jointly mines 10 years of nation-wide medical records of more than 1.5 million people and extracts medical knowledge from Wikipedia to provide guidance about drug repurposing—the process of applying known drugs in new ways to treat diseases. Read more.

10:20am

Add to your personal schedule
10:20am–10:40am Thursday, 12/07/2017
Location: Hall 404AXF
Pascale Fung (The Hong Kong University of Science and Technology)
Keynote with Pascale Fung Read more.

10:45am

10:45am–11:15am Thursday, 12/07/2017
Location: Sponsor Pavilion, Concourse 1-4
Morning break (30m)

11:15am

Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Data engineering and architecture
Location: 308/309 Level: Beginner
Wataru Yukawa (LINE)
Data is a very important asset to LINE, one of the most popular messaging applications in Asia. Wataru Yukawa explains how LINE gets the most out of its data using a Hadoop data lake and an in-house log analysis platform. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Data engineering and architecture, Spark and beyond
Location: 310/311 Level: Intermediate
Holden Karau (Google), Joey Echeverria (Rocana)
Apache Spark offers greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark, and more. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Beginner
Grace Tang (Uber)
Being a data-driven company means that we have to move fast and fail often. But how do we learn to not only be proud of our failures but also turn these fails into wins? Grace Tang explains how to set up experiments so that negative results become epic wins, saving your team time, effort, and money, instead of just being swept under the carpet. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Carme Artigas (Synergic Partners)
Carme Artigas explains why companies need an IoT strategy based on data analytics to create value for business. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Paco Nathan (O'Reilly Media)
Paco Nathan explains how O'Reilly employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
Deep neural networks are responsible for many advances in natural language processing, computer vision, speech recognition, and forecasting. Danielle Dean and Wee Hyong Tok illustrate how cloud computing has been leveraged for exploration, programmatic training, real-time scoring, and batch scoring of deep learning models for projects in healthcare, manufacturing, and utilities. Read more.

12:05pm

Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Tzu-Li (Gordon) Tai (data Artisans)
Apache Flink is evolving from a framework for streaming data analytics to a platform that offers a foundation for event-driven applications that replaces the data management aspects that are typically handled by a database in more conventional architectures. Tzu-Li (Gordon) Tai explores the key features that are powering Flink's evolution, along with demonstrations of them in action. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Data engineering and architecture, Spark and beyond
Location: 310/311 Level: Intermediate
Carson Wang (Intel), Yucai Yu (Intel)
Spark SQL is one of the most popular components of Apache Spark. Carson Wang and Yucai Yu explore Intel's efforts to improve SQL performance and offer an overview of an adaptive execution mode they implemented for Spark SQL. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Big data and the cloud, Strata Business Summit
Location: 321/322 Level: Beginner
John Mertic (The Linux Foundation), Cupid Chan (4C Decision )
John Mertic and Cupid Chan share real end-user perspectives from companies like GE on how they are using big data tools, challenges they face, and where they are looking to focus investments—all from a vendor-neutral viewpoint. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Mick Hollison (Cloudera)
Mick Hollison shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Data engineering and architecture, Machine Learning
Location: Summit 1 Level: Intermediate
Graham Gear (Cloudera)
How can we drive more data pipelines, advanced analytics, and machine learning models into production? How can we do this both faster and more reliably? Graham Gear draws on real-world processes and systems to explain how it's possible to apply continuous delivery techniques to advanced analytics, realizing business value earlier and more safely. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Xianyan Jia (Intel), zhenhua wang (JD.com)
Xianyan Jia and Zhenhua Wang explore deep learning applications built successfully with BigDL. They also teach you how to develop fast prototypes with BigDL's off-the-shelf deep learning toolkit and build end-to-end deep learning applications with flexibility and scalability using BigDL on Spark. Read more.

12:45pm

Add to your personal schedule
12:45pm–1:45pm Thursday, 12/07/2017
Location: Sponsor Pavilion, Concourse 1-4
Looking to network with other attendees during lunch? Topic Table discussions help you connect with people in similar industries or interested in the same topics. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Data engineering and architecture, Spark and beyond
Location: 308/309 Level: Advanced
Apache Beam allows data pipelines to work in batch, streaming, and a variety of open source and private cloud data processing backends, including Apache Flink, Apache Spark, and Google Cloud Dataflow. Jean-Baptiste Onofré offers an overview of Apache Beam's programming model, explores mechanisms for efficiently building data pipelines, and demos an IoT use case dealing with MQTT messages. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Big data and the cloud, Data engineering and architecture
Location: 310/311 Level: Beginner
Calvin Jia (Alluxio), Haoyuan Li (Alluxio)
Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors, considerations, and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Law, ethics, and open data, Strata Business Summit
Location: 321/322 Level: Beginner
Gaurav Godhwani (Open Budgets India, Centre for Budget and Governance Accountability)
Most of the India’s budget documents aren’t easily accessible. Those published online are mostly available as unstructured PDFs, making it difficult to search, analyze, and use this crucial data. Gaurav Godhwani discusses the process of creating Open Budgets India and making India’s budgets open, usable, and easy to comprehend. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Strata Business Summit
Location: 328/329
Rhea Liu (China Tech Insights)
Rhea Liu discusses recent internet trends in China. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Teresa Tung (Accenture Labs), Ishmeet Grewal (Accenture Labs), Jurgen Weichenberger (Accenture Analytics)
As Accenture scaled to millions of predictive models, it required automation to ensure accuracy, prevent false alarms, and preserve trust. Teresa Tung, Ishmeet Grewal, and Jurgen Weichenberger explain how Accenture implemented a DevOps process for analytical models that's akin to software development—guaranteeing analytics modeling at scale and even in noncloud environments at the edge. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
YONGLIANG XU (StarHub), Masaru Dobashi (NTT Data Corp.)
SmartHub and NTT DATA have embarked on a partnership to design next-generation architecture to power the data products that will help generate new insights. YongLiang Xu and Masaru Dobashi explain how deep learning and other analytics models coexist within the same platform to address issues relating to smart cities. Read more.

2:35pm

Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Supreet Oberoi (Oracle)
Time series data is any dataset that is plotted over a range of time. Often, in IoT use cases, what is of interest is finding a pattern in the sequence of measurements. However, queries on time series data do not traditionally scale. Supreet Oberoi explains how Oracle adapted and extended symbolic aggregate approximation (SAX) to solve such challenges. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Dong Li (Kyligence), Luke Han (Kyligence)
Apache Kylin is an extreme distributed OLAP engine on Hadoop. Well-tuned cubes bring about the best performance with the least cost but require a comprehensive understanding of tuning principles to use. Dong Li and Luke Han explain advanced tuning and introduce KyBot, which helps find and solve bottlenecks in an intelligent way with AI methods performed on log analysis results. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Dirk Jungnickel (du Telecom)
Dirk Jungnickel explains how Dubai-based telco du leverages a centralized data lake to improve customer experience, create smart cities, address unexpected business challenges, and even enable data monetization. Along the way, he covers business outcomes, technical challenges, architectural considerations, platform requirements for the IoT, and performing root cause analyses. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Nikki Rouda (Cloudera), Kelly Schupp (Zaloni)
Managing the security and governance of big data can be challenging on-premises but becomes far more difficult in a heterogeneous environment spanning a public cloud or across multiple cloud services. Nikki Rouda and Kelly Schupp share unbiased best practices to ensure your data is under control everywhere. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Kazunori Sato (Google)
BigQuery is Google's fully managed, petabyte-scale data warehouse. Its user-defined function realizes "smart" queries with the power of machine learning, such as similarity searches or recommendations on images or documents with feature vectors and neural network prediction. Kazunori Sato demonstrates how BigQuery and TensorFlow together enable a powerful "data warehouse + ML" solution. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Chris Hausler (Zendesk), Arwen Griffioen (Zendesk)
Chris Hausler and Arwen Griffioen discuss Zendesk's experience with deep learning, using the example of Answer Bot, a question-answering system that resolves support tickets without agent intervention. They cover the benefits Zendesk has already seen and challenges encountered along the way. Read more.

3:15pm

3:15pm–4:15pm Thursday, 12/07/2017
Location: Sponsor Pavilion, Concourse 1-4
Afternoon break (1h)

4:15pm

Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Data engineering and architecture
Location: 308/309 Level: Intermediate
Xie Qi (Intel China), quanfu wang (Intel China)
Xie Qi and Quanfu Wang offer an overview of a configurable FPGA-based Spark SQL acceleration architecture that leverages FPGAs' very high parallel computing capability to tremendously accelerate Spark SQL queries and FPGAs' power efficiency to lower power consumption. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Data case studies, Data engineering and architecture
Location: 310/311 Level: Intermediate
Wei Chen (Intel), Zhaojuan Bian (Intel)
Kudu is designed to fill the gap between HDFS and HBase. However, designing a Kudu-based cluster presents a number of challenges. Wei Chen and Zhaojuan Bian share a real-world use case from the automobile industry to explain how to design a Kudu-based E2E system. They also discuss key indicators to tune Kudu and OS parameters and how to select the best hardware components for different scenarios. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Intermediate
Sarang Anajwala (Autodesk)
Autodesk's centralized data platform enables data-driven decision making by democratizing analytics across the various teams based on their personas and proficiencies. Sarang Anajwala explores the various user personas of the big data platform, challenges in enabling them for efficient interactions with big data, and his experience navigating these challenges. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Thomas Dinsmore (Cloudera), Johnson Poh (DBS)
Data science alone is easy. Data science with others, in the enterprise, on shared distributed systems, requires a bit more work. Thomas Dinsmore and Johnson Poh share common technology considerations and patterns for collaboration in large teams and best practices for moving machine learning into production at scale. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Prateek Nagaria (The Data Team)
Most data scientists use traditional methods of forecasting, such as exponential smoothing or ARIMA, to forecast a product demand. However, when the product experiences several periods of zero demand, approaches such as Croston may provide a better accuracy over these traditional methods. Prateek Nagaria compares traditional and Croston methods in R on intermittent demand time series. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Adam Gibson (Skymind)
Adam Gibson demonstrates how to use variational autoencoders to automatically label time series location data. You'll explore the challenge of imbalanced classes and anomaly detection, learn how to leverage deep learning for automatically labeling (and the pitfalls of this), and discover how you can deploy these techniques in your organization. Read more.

5:05pm

Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Data engineering and architecture
Location: 308/309 Level: Advanced
Yu-Xi Lim (Teralytics), Michal Wegrzyn (Teralytics)
Yu-Xi Lim and Michal Wegrzyn outline a high-throughput distributed software pattern capable of processing event streams in real time. At its core, the pattern relies on functional reactive programming idioms to shard and splice state fragments, ensuring high horizontal scalability, reliability, and high availability. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Graham Dumpleton (Red Hat)
Jupyter notebooks provide a rich interactive environment for working with data. Running a single notebook is easy, but what if you need to provide a platform for many users at the same time. Graham Dumpleton demonstrates how to use JupyterHub to run a highly scalable environment for hosting Jupyter notebooks in education and business. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Becoming a data-centric company
Location: 321/322 Level: Non-technical
Daniel Ng (Cloudera)
Daniel Ng explores the current state of data professional talent in the APAC region and discusses some solutions to expand the profession, including an open source ecosystem for data professional development and a collaboration between Microsoft, Red Hat, Talend, and Cloudera in Malaysia to help realize the target of 20,000 data professionals in 2020. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Big data and the cloud, Strata Business Summit
Location: 328/329 Level: Beginner
Arun Veettil (Skellam AI)
Arun Veettil shares his experience and lessons learned developing a customized, enterprise-level NLP platform to replace a leading text analytics vendor platform. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Big data and the cloud
Location: Summit 1 Level: Intermediate
Le Zhang (Microsoft), Graham Williams (Microsoft)
R has long been criticized for its limitations on scalable data analytics. What's needed is an R-centric paradigm that enables data scientists to elastically harness cloud resources of manifold computing capability for large-scale data analytics. Le Zhang and Graham Williams demonstrate how to operationalize an E2E enterprise-grade pipeline for big data analytics—all within R. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Markus Kirchberg (Wismut Labs Pte. Ltd.)
As the share of digital payments increases so does payment fraud, which has almost tripled between 2013 and 2016. Markus Kirchberg explains how recent advances in AI and machine learning, decision sciences, and network sciences are driving the development of next-generation payment fraud capabilities for fraud scoring, deceptive merchant detection, and merchant compromise detection. Read more.