Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Data Science & Machine Learning

December 5-7, 2017
Singapore

Ben Lorica, Strata Conference Chair

If you're in data, you need to understand machine learning

Using algorithms that learn iteratively, machine learning lets you discover hidden insight from your data. It's a simple idea with phenomenal impact and sophisticated use cases like recommenders, text mining, real-time analytics, large-scale anomaly detection, and business forecasting.

Strata is a unique opportunity to get up to speed quickly on the latest in machine and deep learning. Take a look at the machine learning sessions available to you at Strata.

Tuesday December 5: Tutorials (Gold & Silver passes)
Location: 310/311 Location: 321/322 Location: 328/329
12:30pm | Location: TBD
Lunch
Wednesday December 6: Keynotes & Sessions (Gold, Silver & Bronze passes)
Location: Summit 1 Location: Summit 2
12:45pm
Wednesday Topic Tables at Lunch
5:45pm | Location: Sponsor Pavilion
Sponsor Pavilion Reception
Thursday December 7: Keynotes & Sessions (Gold, Silver & Bronze passes)
Location: Summit 1 Location: Summit 2
12:45pm
Thursday Topic Tables at Lunch
Add to your personal schedule
9:00am12:30pm Tuesday, December 5, 2017
Location: 321/322
Jared Lander (Lander Analytics)
Average rating: ***..
(3.00, 1 rating)
Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today's incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, December 5, 2017
Location: 328/329
Yufeng Guo (Google)
Average rating: ***..
(3.18, 17 ratings)
Yufeng Guo walks you through training and deploying a machine learning system using TensorFlow, a popular open source library. Yufeng takes you from a conceptual overview all the way to building complex classifiers and explains how you can apply deep learning to complex problems in science and industry. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, December 5, 2017
Location: 310/311
Bargava Subramanian (Impel Labs), Amit Kapoor (narrativeVIZ Consulting)
Average rating: ***..
(3.00, 4 ratings)
One of the challenges in traditional data visualization is that they are static and have bounds on limited physical/pixel space. Interactive visualizations allows us to move beyond this limitation by adding layers of interactions. Bargava Subramanian and Amit Kapoor teach the art and science of creating interactive data visualizations. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, December 5, 2017
Location: 321/322
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Vartika Singh and Jeffrey Shmain walk you through various approaches using the machine learning algorithms available in Spark ML to understand and decipher meaningful patterns in real-world data. Vartika and Jeff also demonstrate how to leverage open source deep learning frameworks to run classification problems on image and text datasets leveraging Spark. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, December 5, 2017
Location: 328/329
Tim Seears (Think Big, a Teradata company), Karthik Bharadwaj Thirumalai (Teradata)
Average rating: *....
(1.17, 6 ratings)
Tim Seears and Karthik Bharadwaj Thirumalai explain how to apply deep learning to improve consumer recommendations by training neural nets to learn categories of interest using embeddings. They then demonstrate how to extend this with WALS matrix factorization to achieve wide and deep learning—a process which is now used in production for the Google Play Store. Read more.
Add to your personal schedule
9:45am9:55am Wednesday, December 6, 2017
Location: Hall 404AXF
Ben Lorica (O'Reilly Media)
Average rating: ****.
(4.00, 5 ratings)
Machine learning models are becoming increasingly widely used and deployed. Ben Lorica explains how to guard against flaws and failures in your machine learning deployments. Read more.
Add to your personal schedule
11:15am11:55am Wednesday, December 6, 2017
Location: Summit 1
Average rating: *....
(1.25, 4 ratings)
In the current Agile business environment, where developers are required to experiment multiple ideas and also react to various situations, doing cloud-native development is the way to go. Harjinder Mistry and Bargava Subramanian explain how to design and build a microservices-based cloud-native machine learning application. Read more.
Add to your personal schedule
11:15am11:55am Wednesday, December 6, 2017
Location: Summit 2
Wolff Dobson (Google)
Average rating: ***..
(3.50, 2 ratings)
TensorFlow, the world's most popular machine learning framework, is fast, flexible, and production ready. Wolff Dobson, representing the Google Brain team, shares the latest developments in TensorFlow, including tensor processing units (TPUs), distributed training, new APIs and models, and mobile features. Join in to learn what's in store for TensorFlow and how ML can change your business. Read more.
Add to your personal schedule
11:15am11:55am Wednesday, December 6, 2017
Location: 323
Paco Nathan (derwen.ai)
Average rating: *****
(5.00, 1 rating)
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Read more.
Add to your personal schedule
12:05pm12:45pm Wednesday, December 6, 2017
Location: Summit 1
Jared Lander (Lander Analytics)
Average rating: ***..
(3.33, 3 ratings)
One common (but false) knock against R is that it doesn't scale well. Jared Lander shows how to use R in a performant matter both in terms of speed and data size and offers an overview of packages for running R at scale. Read more.
Add to your personal schedule
12:05pm12:45pm Wednesday, December 6, 2017
Location: Summit 2
Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft)
Average rating: ***..
(3.25, 4 ratings)
Transfer learning enables you to use pretrained deep neural networks (e.g., AlexNet, ResNet, and Inception V3) and adapt them for custom image classification tasks. Danielle Dean and Wee Hyong Tok walk you through the basics of transfer learning and demonstrate how you can use the technique to bootstrap the building of custom image classifiers. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, December 6, 2017
Location: Summit 1
Aki Ariga (Cloudera)
Average rating: ***..
(3.00, 1 rating)
Aki Ariga explains how to put your machine learning model into production, discusses common issues and obstacles you may encounter, and shares best practices and typical architecture patterns of deployment ML models with example designs from the Hadoop and Spark ecosystem using Cloudera Data Science Workbench. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, December 6, 2017
Location: Summit 2
Bargava Subramanian and Harjinder Mistry share data engineering and machine learning strategies for building an efficient real-time recommendation engine when the transaction data is both big and wide. They also outline a novel way of generating frequent patterns using collaborative filtering and matrix factorization on Apache Spark and serving it using Elasticsearch in the cloud. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, December 6, 2017
Location: Summit 1
Wai Yau (Zendesk), Jeffrey Theobald (Zendesk)
Average rating: ****.
(4.75, 8 ratings)
Simply building a successful machine learning product is extremely challenging, and just as much effort is needed to turn that model into a customer-facing product. Drawing on their experience working on Zendesk's article recommendation product, Wai Yau and Jeffrey Theobald discuss design challenges and real-world problems you may encounter when building a machine learning product at scale. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, December 6, 2017
Location: Summit 2
Natalino Busa (DBS), Matteo Pelati (DataRobot)
Average rating: ****.
(4.00, 3 ratings)
Modern engineering requires machine learning engineers, who are needed to monitor and implement ETL and machine learning models in production. Natalino Busa shares technologies, techniques, and blueprints on how to robustly and reliably manage data science and ETL flows from inception to production. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, December 6, 2017
Location: 323
Philips PRASETYO (Living Analytics Research Centre, Singapore Management University), Ee-Peng Lim (Singapore Management University)
Average rating: ****.
(4.00, 2 ratings)
Analyzing talent flow behavior is important for the understanding of job preference and career progression of working individuals. When analyzed at the workforce population level, talent flow analytics helps to gain insights of talent flow and organization competition. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, December 6, 2017
Location: Summit 1
Holden Karau (Google)
Average rating: ****.
(4.50, 6 ratings)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau introduces Spark’s ML pipelines and explains how to extend them with your own custom algorithms, allowing you to take advantage of Spark's meta-algorithms and existing ML tools. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, December 6, 2017
Location: Summit 2
Yufeng Guo (Google)
Yufeng Guo demonstrates how to use TensorFlow to easily combine linear regression models and deep neural networks with a machine learning model that has the benefits of both. You'll also learn what is happening under the hood and how you can use this model for your own datasets. Read more.
Add to your personal schedule
5:05pm5:45pm Wednesday, December 6, 2017
Location: Summit 1
Peng Meng (Intel)
Average rating: *....
(1.00, 1 rating)
Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in JD.com’s production environment. Read more.
Add to your personal schedule
5:05pm5:45pm Wednesday, December 6, 2017
Location: Summit 2
YIQUN HU (Singapore Power)
Average rating: ****.
(4.83, 6 ratings)
Energy usage is a significant part of daily life, so the ability to monitor this use offers a number of benefits, from cost savings to improved safety. A key challenge is the lack of labeled data. Yiqun Hu shares a new solution: a RNN-based network trained to learn good features from unlabeled data. Read more.
Add to your personal schedule
5:05pm5:45pm Wednesday, December 6, 2017
Location: 321/322
Anand Chitipothu (rorodata)
Average rating: *****
(5.00, 2 ratings)
There are many challenges to deploying machine models in production, including managing multiple versions of models, maintaining staging and production models, keeping track of model performance, logging, and scaling. Anand Chitipothu explores the tools, techniques, and system architecture of a cloud platform built to solve these challenges and the new opportunities it opens up. Read more.
Add to your personal schedule
10:00am10:20am Thursday, December 7, 2017
Location: Hall 404AXF
Kira Radinsky (eBay | Technion)
Average rating: *****
(5.00, 8 ratings)
Kira Radinsky offers an overview of a system that jointly mines 10 years of nation-wide medical records of more than 1.5 million people and extracts medical knowledge from Wikipedia to provide guidance about drug repurposing—the process of applying known drugs in new ways to treat diseases. Read more.
Add to your personal schedule
11:15am11:55am Thursday, December 7, 2017
Location: Summit 1
Paco Nathan (derwen.ai)
Average rating: ****.
(4.60, 5 ratings)
Paco Nathan explains how O'Reilly employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video. Read more.
Add to your personal schedule
11:15am11:55am Thursday, December 7, 2017
Location: Summit 2
Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
Deep neural networks are responsible for many advances in natural language processing, computer vision, speech recognition, and forecasting. Danielle Dean and Wee Hyong Tok illustrate how cloud computing has been leveraged for exploration, programmatic training, real-time scoring, and batch scoring of deep learning models for projects in healthcare, manufacturing, and utilities. Read more.
Add to your personal schedule
12:05pm12:45pm Thursday, December 7, 2017
Location: Summit 1
Graham Gear (Cloudera)
Average rating: *****
(5.00, 3 ratings)
How can we drive more data pipelines, advanced analytics, and machine learning models into production? How can we do this both faster and more reliably? Graham Gear draws on real-world processes and systems to explain how it's possible to apply continuous delivery techniques to advanced analytics, realizing business value earlier and more safely. Read more.
Add to your personal schedule
12:05pm12:45pm Thursday, December 7, 2017
Location: Summit 2
Xianyan Jia (Intel), zhenhua wang (JD.com)
Xianyan Jia and Zhenhua Wang explore deep learning applications built successfully with BigDL. They also teach you how to develop fast prototypes with BigDL's off-the-shelf deep learning toolkit and build end-to-end deep learning applications with flexibility and scalability using BigDL on Spark. Read more.
Add to your personal schedule
1:45pm2:25pm Thursday, December 7, 2017
Location: Summit 1
Teresa Tung (Accenture Labs), Ishmeet Grewal (Accenture Labs), Jurgen Weichenberger (Accenture Analytics)
Average rating: ****.
(4.50, 2 ratings)
As Accenture scaled to millions of predictive models, it required automation to ensure accuracy, prevent false alarms, and preserve trust. Teresa Tung, Ishmeet Grewal, and Jurgen Weichenberger explain how Accenture implemented a DevOps process for analytical models that's akin to software development—guaranteeing analytics modeling at scale and even in noncloud environments at the edge. Read more.
Add to your personal schedule
1:45pm2:25pm Thursday, December 7, 2017
Location: Summit 2
YONGLIANG XU (StarHub), Masatake Iwasaki (NTT DATA)
Average rating: *****
(5.00, 1 rating)
SmartHub and NTT DATA have embarked on a partnership to design next-generation architecture to power the data products that will help generate new insights. YongLiang Xu and Masatake Iwasaki explain how deep learning and other analytics models can coexist on the same platform to address opportunities and challenges in initiatives such as smart cities. Read more.
Add to your personal schedule
2:35pm3:15pm Thursday, December 7, 2017
Location: Summit 1
Kaz Sato (Google)
Average rating: ****.
(4.00, 1 rating)
BigQuery is Google's fully managed, petabyte-scale data warehouse. Its user-defined function realizes "smart" queries with the power of machine learning, such as similarity searches or recommendations on images or documents with feature vectors and neural network prediction. Kazunori Sato demonstrates how BigQuery and TensorFlow together enable a powerful "data warehouse + ML" solution. Read more.
Add to your personal schedule
2:35pm3:15pm Thursday, December 7, 2017
Location: Summit 2
Chris Hausler (Zendesk), Arwen Griffioen (Zendesk)
Average rating: ****.
(4.62, 8 ratings)
Chris Hausler and Arwen Griffioen discuss Zendesk's experience with deep learning, using the example of Answer Bot, a question-answering system that resolves support tickets without agent intervention. They cover the benefits Zendesk has already seen and challenges encountered along the way. Read more.
Add to your personal schedule
4:15pm4:55pm Thursday, December 7, 2017
Location: Summit 1
Prateek Nagaria (The Data Team)
Most data scientists use traditional methods of forecasting, such as exponential smoothing or ARIMA, to forecast a product demand. However, when the product experiences several periods of zero demand, approaches such as Croston may provide a better accuracy over these traditional methods. Prateek Nagaria compares traditional and Croston methods in R on intermittent demand time series. Read more.
Add to your personal schedule
4:15pm4:55pm Thursday, December 7, 2017
Location: Summit 2
Adam Gibson (Skymind)
Average rating: ****.
(4.75, 4 ratings)
Adam Gibson demonstrates how to use variational autoencoders to automatically label time series location data. You'll explore the challenge of imbalanced classes and anomaly detection, learn how to leverage deep learning for automatically labeling (and the pitfalls of this), and discover how you can deploy these techniques in your organization. Read more.
Add to your personal schedule
5:05pm5:45pm Thursday, December 7, 2017
Location: Summit 1
Le Zhang (Microsoft), Graham Williams (Microsoft)
Average rating: ***..
(3.00, 1 rating)
R has long been criticized for its limitations on scalable data analytics. What's needed is an R-centric paradigm that enables data scientists to elastically harness cloud resources of manifold computing capability for large-scale data analytics. Le Zhang and Graham Williams demonstrate how to operationalize an E2E enterprise-grade pipeline for big data analytics—all within R. Read more.
Add to your personal schedule
5:05pm5:45pm Thursday, December 7, 2017
Location: Summit 2
Markus Kirchberg (Wismut Labs Pte. Ltd.)
As the share of digital payments increases so does payment fraud, which has almost tripled between 2013 and 2016. Markus Kirchberg explains how recent advances in AI and machine learning, decision sciences, and network sciences are driving the development of next-generation payment fraud capabilities for fraud scoring, deceptive merchant detection, and merchant compromise detection. Read more.