Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore
 
Summit 1
11:15am AI within O'Reilly Media Paco Nathan (derwen.ai)
1:45pm DevOps for models: How to manage millions of models in production—and at the edge Teresa Tung (Accenture Labs), Ishmeet Grewal (Accenture Labs), Jurgen Weichenberger (Accenture Analytics)
Summit 2
11:15am Training and scoring deep neural networks in the cloud Wee Hyong Tok (Microsoft), Danielle Dean (iRobot)
12:05pm Bringing deep learning into big data analytics using BigDL Xianyan Jia (Intel), zhenhua wang (JD.com)
1:45pm Fusing a deep learning platform with a big data platform Yongliang Xu (StarHub), Masatake Iwasaki (NTT DATA)
2:35pm Aha moments in deep learning at Zendesk Christopher Hausler (Zendesk), Arwen Griffioen (Zendesk)
308/309
11:15am LINE's log analysis platform Wataru Yukawa (LINE)
1:45pm How Apache Beam can advance your enterprise workloads Jean-Baptiste Onofre (Talend)
2:35pm Querying time series patterns with SAX Supreet Oberoi (Oracle)
4:15pm FPGA-based acceleration architecture for Spark SQL Xie Qi (Intel), Quanfu Wang (Intel China)
5:05pm Distributed real-time highly available stream processing Yu-Xi Lim (Teralytics), Michal Wegrzyn (Teralytics)
310/311
11:15am Debugging Apache Spark Holden Karau (Independent), Joey Echeverria (Rocana)
12:05pm An adaptive execution mode for Spark SQL Carson Wang (Intel), Yucai Yu (Intel)
1:45pm Decoupling compute and storage with open source Alluxio Calvin Jia (Alluxio), Haoyuan Li (Alluxio)
2:35pm Apache Kylin: Advanced tuning and best practices with KyBot Dong Li (Kyligence), Luke Han (Kyligence)
321/322
11:15am Turning fails into wins Grace Tang (Uber)
12:05pm Big data on the rise: Views of emerging trends and predictions from real-life end users John Mertic (Linux Foundation), Cupid Chan (4C Decision )
1:45pm Open Budgets India: Lessons from the front line Gaurav Godhwani (Open Budgets India, Centre for Budget and Governance Accountability)
2:35pm
5:05pm
328/329
1:45pm The dumb consequences of smart cities Alistair Croll (Solve For Interesting)
5:05pm Architecting a text analytics system in the cloud Arun Veettil (Skellam AI)
Hall 404AXF
8:50am Thursday keynote welcome Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
8:55am From smart cities to intelligent societies carme artigas (Synergic Partners)
9:10am The sixth wave: Automation of decisions Amr Awadallah (Cloudera)
9:20am Impacting a nation Ajey Gore (GO-JEK)
10:20am Sentiment and emotion-aware natural language processing Pascale Fung (The Hong Kong University of Science and Technology)
8:00am Coffee break sponsored by TigerGraph | Room: Hall 404 Foyer
8:15am Speed Networking | Room: Hall 404 Foyer
10:45am Morning break | Room: Sponsor Pavilion, Concourse 1-4
12:45pm Thursday Topic Tables at Lunch (Located in The Links) | Room: Sponsor Pavilion, Concourse 1-4
3:15pm Afternoon break | Room: Sponsor Pavilion, Concourse 1-4
11:15am-11:55am (40m) Data engineering and architecture, Data science and advanced analytics, Machine Learning
AI within O'Reilly Media
Paco Nathan (derwen.ai)
Paco Nathan explains how O'Reilly employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video.
12:05pm-12:45pm (40m) Data engineering and architecture, Machine Learning
Real-world patterns for continuously deployed advanced analytics
Graham Gear (Cloudera)
How can we drive more data pipelines, advanced analytics, and machine learning models into production? How can we do this both faster and more reliably? Graham Gear draws on real-world processes and systems to explain how it's possible to apply continuous delivery techniques to advanced analytics, realizing business value earlier and more safely.
1:45pm-2:25pm (40m) Data engineering and architecture, Data science and advanced analytics, Machine Learning
DevOps for models: How to manage millions of models in production—and at the edge
Teresa Tung (Accenture Labs), Ishmeet Grewal (Accenture Labs), Jurgen Weichenberger (Accenture Analytics)
As Accenture scaled to millions of predictive models, it required automation to ensure accuracy, prevent false alarms, and preserve trust. Teresa Tung, Ishmeet Grewal, and Jurgen Weichenberger explain how Accenture implemented a DevOps process for analytical models that's akin to software development—guaranteeing analytics modeling at scale and even in noncloud environments at the edge.
2:35pm-3:15pm (40m) Data engineering and architecture, Data science and advanced analytics, Machine Learning
BigQuery and TensorFlow: How a data warehouse + machine learning enables "smart" queries
Kaz Sato (Google)
BigQuery is Google's fully managed, petabyte-scale data warehouse. Its user-defined function realizes "smart" queries with the power of machine learning, such as similarity searches or recommendations on images or documents with feature vectors and neural network prediction. Kazunori Sato demonstrates how BigQuery and TensorFlow together enable a powerful "data warehouse + ML" solution.
4:15pm-4:55pm (40m) Data engineering and architecture, Data science and advanced analytics, Machine Learning
Forecasting intermittent demand: Traditional smoothing approaches versus the Croston method
Prateek Nagaria (The Data Team)
Most data scientists use traditional methods of forecasting, such as exponential smoothing or ARIMA, to forecast a product demand. However, when the product experiences several periods of zero demand, approaches such as Croston may provide a better accuracy over these traditional methods. Prateek Nagaria compares traditional and Croston methods in R on intermittent demand time series.
5:05pm-5:45pm (40m) Big data and the cloud, Machine Learning
R you ready for the cloud? Using R for operationalizing an enterprise-grade data science solution on Azure
Le Zhang (Microsoft), Graham Williams (Microsoft)
R has long been criticized for its limitations on scalable data analytics. What's needed is an R-centric paradigm that enables data scientists to elastically harness cloud resources of manifold computing capability for large-scale data analytics. Le Zhang and Graham Williams demonstrate how to operationalize an E2E enterprise-grade pipeline for big data analytics—all within R.
11:15am-11:55am (40m) Big data and the cloud, Machine Learning
Training and scoring deep neural networks in the cloud
Wee Hyong Tok (Microsoft), Danielle Dean (iRobot)
Deep neural networks are responsible for many advances in natural language processing, computer vision, speech recognition, and forecasting. Danielle Dean and Wee Hyong Tok illustrate how cloud computing has been leveraged for exploration, programmatic training, real-time scoring, and batch scoring of deep learning models for projects in healthcare, manufacturing, and utilities.
12:05pm-12:45pm (40m) Data engineering and architecture, Data science and advanced analytics, Machine Learning
Bringing deep learning into big data analytics using BigDL
Xianyan Jia (Intel), zhenhua wang (JD.com)
Xianyan Jia and Zhenhua Wang explore deep learning applications built successfully with BigDL. They also teach you how to develop fast prototypes with BigDL's off-the-shelf deep learning toolkit and build end-to-end deep learning applications with flexibility and scalability using BigDL on Spark.
1:45pm-2:25pm (40m) Data engineering and architecture, Data science and advanced analytics, Machine Learning
Fusing a deep learning platform with a big data platform
Yongliang Xu (StarHub), Masatake Iwasaki (NTT DATA)
SmartHub and NTT DATA have embarked on a partnership to design next-generation architecture to power the data products that will help generate new insights. YongLiang Xu and Masatake Iwasaki explain how deep learning and other analytics models can coexist on the same platform to address opportunities and challenges in initiatives such as smart cities.
2:35pm-3:15pm (40m) Data science and advanced analytics, Machine Learning
Aha moments in deep learning at Zendesk
Christopher Hausler (Zendesk), Arwen Griffioen (Zendesk)
Chris Hausler and Arwen Griffioen discuss Zendesk's experience with deep learning, using the example of Answer Bot, a question-answering system that resolves support tickets without agent intervention. They cover the benefits Zendesk has already seen and challenges encountered along the way.
4:15pm-4:55pm (40m) Machine Learning
Unsupervised fuzzy labeling using deep learning to improve anomaly detection
Adam Gibson (Skymind)
Adam Gibson demonstrates how to use variational autoencoders to automatically label time series location data. You'll explore the challenge of imbalanced classes and anomaly detection, learn how to leverage deep learning for automatically labeling (and the pitfalls of this), and discover how you can deploy these techniques in your organization.
5:05pm-5:45pm (40m) Financial technology and data, Machine Learning
Payment fraud detection and prevention in the age of big data, network science, and AI
Markus Kirchberg (Wismut Labs Pte. Ltd.)
As the share of digital payments increases so does payment fraud, which has almost tripled between 2013 and 2016. Markus Kirchberg explains how recent advances in AI and machine learning, decision sciences, and network sciences are driving the development of next-generation payment fraud capabilities for fraud scoring, deceptive merchant detection, and merchant compromise detection.
11:15am-11:55am (40m) Data engineering and architecture
LINE's log analysis platform
Wataru Yukawa (LINE)
Data is a very important asset to LINE, one of the most popular messaging applications in Asia. Wataru Yukawa explains how LINE gets the most out of its data using a Hadoop data lake and an in-house log analysis platform.
12:05pm-12:45pm (40m) Data engineering and architecture, Stream processing and analytics
The stream processor as a database: Building event-driven applications with Apache Flink
Tzu-Li (Gordon) Tai (data Artisans)
Apache Flink is evolving from a framework for streaming data analytics to a platform that offers a foundation for event-driven applications that replaces the data management aspects that are typically handled by a database in more conventional architectures. Tzu-Li (Gordon) Tai explores the key features that are powering Flink's evolution, along with demonstrations of them in action.
1:45pm-2:25pm (40m) Data engineering and architecture, Spark and beyond
How Apache Beam can advance your enterprise workloads
Jean-Baptiste Onofre (Talend)
Apache Beam allows data pipelines to work in batch, streaming, and a variety of open source and private cloud data processing backends, including Apache Flink, Apache Spark, and Google Cloud Dataflow. Jean-Baptiste Onofré offers an overview of Apache Beam's programming model, explores mechanisms for efficiently building data pipelines, and demos an IoT use case dealing with MQTT messages.
2:35pm-3:15pm (40m) Data engineering and architecture, IoT and intelligent real-time applications
Querying time series patterns with SAX
Supreet Oberoi (Oracle)
Time series data is any dataset that is plotted over a range of time. Often, in IoT use cases, what is of interest is finding a pattern in the sequence of measurements. However, queries on time series data do not traditionally scale. Supreet Oberoi explains how Oracle adapted and extended symbolic aggregate approximation (SAX) to solve such challenges.
4:15pm-4:55pm (40m) Data engineering and architecture
FPGA-based acceleration architecture for Spark SQL
Xie Qi (Intel), Quanfu Wang (Intel China)
Xie Qi and Quanfu Wang offer an overview of a configurable FPGA-based Spark SQL acceleration architecture that leverages FPGAs' very high parallel computing capability to tremendously accelerate Spark SQL queries and FPGAs' power efficiency to lower power consumption.
5:05pm-5:45pm (40m) Data engineering and architecture
Distributed real-time highly available stream processing
Yu-Xi Lim (Teralytics), Michal Wegrzyn (Teralytics)
Yu-Xi Lim and Michal Wegrzyn outline a high-throughput distributed software pattern capable of processing event streams in real time. At its core, the pattern relies on functional reactive programming idioms to shard and splice state fragments, ensuring high horizontal scalability, reliability, and high availability.
11:15am-11:55am (40m) Data engineering and architecture, Spark and beyond
Debugging Apache Spark
Holden Karau (Independent), Joey Echeverria (Rocana)
Apache Spark offers greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark, and more.
12:05pm-12:45pm (40m) Data engineering and architecture, Spark and beyond
An adaptive execution mode for Spark SQL
Carson Wang (Intel), Yucai Yu (Intel)
Spark SQL is one of the most popular components of Apache Spark. Carson Wang and Yucai Yu explore Intel's efforts to improve SQL performance and offer an overview of an adaptive execution mode they implemented for Spark SQL.
1:45pm-2:25pm (40m) Big data and the cloud, Data engineering and architecture
Decoupling compute and storage with open source Alluxio
Calvin Jia (Alluxio), Haoyuan Li (Alluxio)
Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors, considerations, and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform.
2:35pm-3:15pm (40m) Data engineering and architecture
Apache Kylin: Advanced tuning and best practices with KyBot
Dong Li (Kyligence), Luke Han (Kyligence)
Apache Kylin is an extreme distributed OLAP engine on Hadoop. Well-tuned cubes bring about the best performance with the least cost but require a comprehensive understanding of tuning principles to use. Dong Li and Luke Han explain advanced tuning and introduce KyBot, which helps find and solve bottlenecks in an intelligent way with AI methods performed on log analysis results.
4:15pm-4:55pm (40m) Data case studies, Data engineering and architecture
Best practices with Kudu: An end-to-end user case from the automobile industry
Wei Chen (Intel), Zhaojuan Bian (Intel)
Kudu is designed to fill the gap between HDFS and HBase. However, designing a Kudu-based cluster presents a number of challenges. Wei Chen and Zhaojuan Bian share a real-world use case from the automobile industry to explain how to design a Kudu-based E2E system. They also discuss key indicators to tune Kudu and OS parameters and how to select the best hardware components for different scenarios.
5:05pm-5:45pm (40m) Data engineering and architecture, Data science and advanced analytics
Deploying a scalable JupyterHub environment for running Jupyter notebooks
Graham Dumpleton (Red Hat)
Jupyter notebooks provide a rich interactive environment for working with data. Running a single notebook is easy, but what if you need to provide a platform for many users at the same time. Graham Dumpleton demonstrates how to use JupyterHub to run a highly scalable environment for hosting Jupyter notebooks in education and business.
11:15am-11:55am (40m) Becoming a data-centric company, Strata Business Summit
Turning fails into wins
Grace Tang (Uber)
Being a data-driven company means that we have to move fast and fail often. But how do we learn to not only be proud of our failures but also turn these fails into wins? Grace Tang explains how to set up experiments so that negative results become epic wins, saving your team time, effort, and money, instead of just being swept under the carpet.
12:05pm-12:45pm (40m) Big data and the cloud, Strata Business Summit
Big data on the rise: Views of emerging trends and predictions from real-life end users
John Mertic (Linux Foundation), Cupid Chan (4C Decision )
John Mertic and Cupid Chan share real end-user perspectives from companies like GE on how they are using big data tools, challenges they face, and where they are looking to focus investments—all from a vendor-neutral viewpoint.
1:45pm-2:25pm (40m) Law, ethics, and open data, Strata Business Summit
Open Budgets India: Lessons from the front line
Gaurav Godhwani (Open Budgets India, Centre for Budget and Governance Accountability)
Most of the India’s budget documents aren’t easily accessible. Those published online are mostly available as unstructured PDFs, making it difficult to search, analyze, and use this crucial data. Gaurav Godhwani discusses the process of creating Open Budgets India and making India’s budgets open, usable, and easy to comprehend.
2:35pm-3:15pm (40m)
Session
4:15pm-4:55pm (40m) Becoming a data-centric company, Strata Business Summit
Enabling data-driven decision making: Challenges of logical and physical scale
Sarang Anajwala (Autodesk)
Sarang Anajwala offers an overview of Autodesk’s centralized data platform, which democratizes analytics across various teams within Autodesk. The platform has gone through multiple iterations to optimize the balance between a complex one-size-fits-all data access layer and multiple fragmented noncohesive data access layers.
5:05pm-5:45pm (40m)
Session
11:15am-11:55am (40m) Strata Business Summit
Executive briefing: Analytics centers of excellence as a way to accelerate big data adoption by business
carme artigas (Synergic Partners)
Carme Artigas explains why an analytics center of excellence (ACoE), whether internal or outsourced, is an effective way to create mechanisms to deploy big data across the entire organization rather than simply serving a particular department or use case.
12:05pm-12:45pm (40m) Executive Briefing, Strata Business Summit
Executive Briefing: Machine learning—Why you need it, why it's hard, and what to do about it
Mick Hollison (Cloudera)
Mick Hollison shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them.
1:45pm-2:25pm (40m) Smart cities and urban automation
The dumb consequences of smart cities
Alistair Croll (Solve For Interesting)
We infuse urban spaces with sensors, drinking from a torrent of data, making sense of city life. But this reliance on data has real risks: Complex systems often have unintended consequences, and it's hard to experiment. Alistair Croll shares lessons from the past and explains how paving the cowpaths, examining the models, and iterating everything can mitigate these risks.
2:35pm-3:15pm (40m) Executive Briefing, Security and governance, Strata Business Summit
Good everywhere: Managing security and governance in a hybrid- and multicloud world
Nikki Rouda (Cloudera)
Managing the security and governance of big data can be challenging on-premises but becomes far more difficult in a heterogeneous environment spanning a public cloud or across multiple cloud services. Nikki Rouda shares unbiased best practices to ensure your data is under control everywhere.
4:15pm-4:55pm (40m) Becoming a data-centric company, Data science and advanced analytics, Strata Business Summit
Data science at team scale: Considerations for sharing, collaborating, and getting to production
Thomas Dinsmore (DataRobot), Johnson POH (DBS)
Data science alone is easy. Data science with others, in the enterprise, on shared distributed systems, requires a bit more work. Thomas Dinsmore and Johnson Poh share common technology considerations and patterns for collaboration in large teams and best practices for moving machine learning into production at scale.
5:05pm-5:45pm (40m) Big data and the cloud, Strata Business Summit
Architecting a text analytics system in the cloud
Arun Veettil (Skellam AI)
Arun Veettil shares his experience and lessons learned developing a customized, enterprise-level NLP platform to replace a leading text analytics vendor platform.
8:50am-8:55am (5m)
Thursday keynote welcome
Ben Lorica (O'Reilly), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes.
8:55am-9:10am (15m) Strata Business Summit
From smart cities to intelligent societies
carme artigas (Synergic Partners)
The concept of smart cities has evolved from sensored urban centers to platform ecosystems that combine data with new technologies such as the IoT, the cloud, and AI. Carme Artigas explores the challenges and opportunities of evolving from smart cities to intelligent societies.
9:10am-9:20am (10m)
The sixth wave: Automation of decisions
Amr Awadallah (Cloudera)
We are witnessing a new revolution in data—the age of decision automation. Amr Awadallah explains the historic importance of this next wave in automation and highlights the foundational capabilities required to enable it: machine learning and analytics optimized for the cloud.
9:20am-9:40am (20m)
Impacting a nation
Ajey Gore (GO-JEK)
Drawing on his experience at GO-JEK, Ajey Gore explains how the impossible can be made possible with technology and data insights.
9:40am-10:00am (20m) Data science and advanced analytics, Security and governance ecommerce
JD.com security intelligence and analytics: From big data to big impact
Tony Lee (JD.com)
Details to come.
10:00am-10:20am (20m) Machine Learning
Mining electronic health records and the web for drug repurposing
Kira Radinsky (eBay | Technion)
Kira Radinsky offers an overview of a system that jointly mines 10 years of nation-wide medical records of more than 1.5 million people and extracts medical knowledge from Wikipedia to provide guidance about drug repurposing—the process of applying known drugs in new ways to treat diseases.
10:20am-10:40am (20m)
Sentiment and emotion-aware natural language processing
Pascale Fung (The Hong Kong University of Science and Technology)
Keynote with Pascale Fung
8:00am-8:15am (15m)
Break: Coffee break sponsored by TigerGraph
8:15am-8:45am (30m)
Speed Networking
Ready, set, network! Meet fellow attendees who are looking to connect at Strata. We'll gather before Thursday keynotes to host an informal speed networking event. Be sure to bring your business cards and have fun.
10:45am-11:15am (30m)
Break: Morning break
12:45pm-1:45pm (1h)
Thursday Topic Tables at Lunch (Located in The Links)
Looking to network with other attendees during lunch? Topic Table discussions help you connect with people in similar industries or interested in the same topics.
3:15pm-4:15pm (1h)
Break: Afternoon break