Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Data Science & Machine Learning

December 5-7, 2017
Singapore

Ben Lorica, Strata Conference Chair

If you're in data, you need to understand machine learning

Using algorithms that learn iteratively, machine learning lets you discover hidden insight from your data. It's a simple idea with phenomenal impact and sophisticated use cases like recommenders, text mining, real-time analytics, large-scale anomaly detection, and business forecasting.

Strata is a unique opportunity to get up to speed quickly on the latest in machine and deep learning. Take a look at the machine learning sessions available to you at Strata.

Tuesday December 5: Tutorials (Gold & Silver passes)
Location: 310/311 Location: 321/322 Location: 328/329
12:30pm | Location: TBD
Lunch
Wednesday December 6: Keynotes & Sessions (Gold, Silver & Bronze passes)
Location: Summit 1 Location: Summit 2
12:45pm
Wednesday Topic Tables at Lunch
5:45pm | Location: Sponsor Pavilion
Sponsor Pavilion Reception
Thursday December 7: Keynotes & Sessions (Gold, Silver & Bronze passes)
Location: Summit 1 Location: Summit 2
12:45pm
Thursday Topic Tables at Lunch
Add to your personal schedule
9:00am12:30pm Tuesday, December 5, 2017
Location: 321/322 Level: Intermediate
Jared Lander (Lander Analytics)
Modern statistics has become almost synonymous with machine learning; a collection of techniques that utilize today's incredible computing power. This course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theories behind the curtain, covering the Elastic Net, Boosted Trees and cross-validation. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, December 5, 2017
Location: 328/329 Level: Intermediate
Yufeng Guo (Google)
We will walk you through training and deploying a machine-learning system using TensorFlow, a popular open source ML library. Starting from conceptual overviews, we will build all the way up to complex classifiers. You’ll gain insight into deep learning and how it can apply to complex problems in science and industry. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, December 5, 2017
Location: 310/311 Level: Beginner
Bargava Subramanian (Independent), Amit Kapoor (narrativeVIZ Consulting)
One of the challenges in traditional data visualization is that they are static and have bounds on limited physical/pixel space. Interaction allows us to move beyond this limitation by adding layers of interactions. Bargava Subramanian and Amit Kapoor teach the art and science of creating interactive data visualizations. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, December 5, 2017
Location: 321/322 Level: Intermediate
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
We walk you through approaches available via machine-learning algorithms available in Spark ml to understand and decipher meaningful patterns in real-world data. Along with discussing the common problems encountered as the data and model sizes scale we will also leverage a few open source deep learning frameworks to run a few classification problems on image and text data sets leveraging Spark. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, December 5, 2017
Location: 328/329 Level: Intermediate
Tim Seears (Think Big, a Teradata company), David Mueller (Teradata)
Tim Seears and David Mueller explain how to apply deep learning to improve consumer recommendations by training neural nets to learn categories of interest using embeddings and demonstrate how to extend this with WALS matrix factorization to achieve wide and deep learning—a process which is now used in production for the Google Play Store. Read more.
Add to your personal schedule
10:00am10:15am Wednesday, December 6, 2017
Location: Hall 404AXF
Kira Radinsky (eBay | Technion)
We jointly harness large-scale electronic health records and feasible conceptual links among concepts drawn from Wikipedia to provide guidance about drug repurposing -- the process of applying known drugs in new ways to treat diseases. We claim that researchers decide on exploratory targets for repurposing based on trends in research and observations on small numbers of cases, leading to... Read more.
Add to your personal schedule
11:15am11:55am Wednesday, December 6, 2017
Location: Summit 1 Level: Beginner
Harjinder Mistry (Red Hat), Bargava Subramanian (Independent)
In the current Agile business environment, where developers are required to experiment multiple ideas and also react to various situations, doing cloud-native development is the way to go. Harjinder Mistry and Bargava Subramanian explain how to design and build a microservices-based cloud-native machine learning application. Read more.
Add to your personal schedule
11:15am11:55am Wednesday, December 6, 2017
Location: Summit 2 Level: Intermediate
Wolff Dobson (Google)
TensorFlow, the world's most popular machine learning framework, is fast, flexible, and production ready. Wolff Dobson, representing the Google Brain team, shares the latest developments in TensorFlow, including tensor processing units (TPUs), distributed training, new APIs and models, and mobile features. Join in to learn what's in store for TensorFlow and how ML can change your business. Read more.
Add to your personal schedule
12:05pm12:45pm Wednesday, December 6, 2017
Location: Summit 1 Level: Intermediate
Jared Lander (Lander Analytics)
One common, but false, knock against R is that it doesn't scale well. This talk shows how to use R in a performant matter both in terms of speed and data size. In this talk we learn packages for running R at scale. Read more.
Add to your personal schedule
12:05pm12:45pm Wednesday, December 6, 2017
Location: Summit 2 Level: Intermediate
Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft)
Transfer learning enables you to use pretrained deep neural networks (e.g., AlexNet, ResNet, and Inception V3) and adapt them for custom image classification tasks. Danielle Dean and Wee Hyong Tok walk you through the basics of transfer learning and demonstrate how you can use the technique to bootstrap the building of custom image classifiers. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, December 6, 2017
Location: Summit 1 Level: Intermediate
Aki Ariga (Cloudera)
Aki Ariga explains how to put your machine learning model into production, discusses common issues and obstacles you may encounter, and shares best practices and typical architecture patterns of deployment ML models with example designs from the Hadoop and Spark ecosystem using Cloudera Data Science Workbench. Read more.
Add to your personal schedule
1:45pm2:25pm Wednesday, December 6, 2017
Location: Summit 2 Level: Intermediate
Bargava Subramanian (Independent), Harjinder Mistry (Red Hat)
Bargava Subramanian and Harjinder Mistry share data engineering and machine learning strategies for building an efficient real-time recommendation engine when the transaction data is both big and wide and outline a novel way of generating frequent patterns using collaborative filtering and matrix factorization on Apache Spark and serving it using Elasticsearch in the cloud. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, December 6, 2017
Location: Summit 1 Level: Beginner
Wai Yau (Zendesk), Jeffrey Theobald (Zendesk)
Building a successful machine learning product is extremely challenging. It is easy to assume that building the model is most of the work. However, just as much effort is needed to turn that model into a customer facing product. We'll delve the various design challenges and real world problems when building a machine learning product at scale. Read more.
Add to your personal schedule
2:35pm3:15pm Wednesday, December 6, 2017
Location: Summit 2 Level: Intermediate
Siddha Ganju (Deep Vision)
We've come a long way since the advent of the IoT to a network of almost 30 billion IoT devices that include sensors and cameras. The data they gather and transmit is becoming increasingly complex. Siddha Ganju explains how deep learning can revolutionize IoT applications to recognize half a million faces at international airports using existing airport cameras. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, December 6, 2017
Location: Summit 1 Level: Intermediate
Holden Karau (IBM)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. This talk introduces Spark’s ML pipelines, and then looks at how to extend them with your own custom algorithms. Adding your own pipeline stages allow you to use Spark's meta algorithms and existing ML tools. Read more.
Add to your personal schedule
4:15pm4:55pm Wednesday, December 6, 2017
Location: Summit 2 Level: Intermediate
Yufeng Guo (Google)
Learn how to use TensorFlow to easily combine linear regression models and deep neural networks with a machine learning model that has the benefits of both. You will also gain intuition about what is happening under the hood, and learn how you can use this model for your own datasets. Read more.
Add to your personal schedule
5:05pm5:45pm Wednesday, December 6, 2017
Location: Summit 1 Level: Intermediate
Peng Meng (Intel)
Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in JD.com’s production environment. Read more.
Add to your personal schedule
5:05pm5:45pm Wednesday, December 6, 2017
Location: Summit 2 Level: Intermediate
YIQUN HU (Singapore Power)
Energy disaggregation is very useful for energy-related applications such as energy monitoring, but only small amount of labeled data is available because labelling is very expensive. Yiqun Hu shares a new solution using two deep networks: the first RNN-based network extracts good features from unlabeled data; the second deep network uses these features to disaggregate target appliances. Read more.
Add to your personal schedule
11:15am11:55am Thursday, December 7, 2017
Location: Summit 1 Level: Beginner
Paco Nathan (O'Reilly Media)
Paco Nathan explains how O'Reilly employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video. Read more.
Add to your personal schedule
11:15am11:55am Thursday, December 7, 2017
Location: Summit 2 Level: Intermediate
Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
Deep neural networks are responsible for many advances in natural language processing, computer vision, speech recognition, and forecasting. Danielle Dean and Wee Hyong Tok illustrate how cloud computing has been leveraged for exploration, programmatic training, real-time scoring, and batch scoring of deep learning models for projects in healthcare, manufacturing, and utilities. Read more.
Add to your personal schedule
12:05pm12:45pm Thursday, December 7, 2017
Location: Summit 1 Level: Intermediate
Graham Gear (Cloudera)
How can we drive more data pipelines, advanced analytics, and machine learning models into production? How can we do this both faster and more reliably? Graham Gear draws on real-world processes and systems to explain how it's possible to apply continuous delivery techniques to advanced analytics, realizing business value earlier and more safely. Read more.
Add to your personal schedule
12:05pm12:45pm Thursday, December 7, 2017
Location: Summit 2 Level: Beginner
Xianyan Jia (Intel), zhenhua wang (JD.com)
Xianyan Jia and Zhenhua Wang explore deep learning applications built successfully with BigDL and teach you how to develop fast prototypes with BigDL's off-the-shelf deep learning toolkit and build end-to-end deep learning applications with flexibility and scalability using BigDL on Spark. Read more.
Add to your personal schedule
1:45pm2:25pm Thursday, December 7, 2017
Location: Summit 1 Level: Intermediate
Teresa Tung (Accenture Labs), Ishmeet Grewal (Accenture Technology Labs), Jurgen Weichenberger (Accenture Analytics)
As Accenture scaled to millions of predictive models, it required automation to ensure accuracy, prevent false alarms, and preserve trust. Teresa Tung, Ishmeet Grewal, and Jurgen Weichenberger explain how Accenture implemented a DevOps process for analytical models that's akin to software development—guaranteeing analytics modeling at scale and even in non-cloud environments at the edge. Read more.
Add to your personal schedule
1:45pm2:25pm Thursday, December 7, 2017
Location: Summit 2 Level: Beginner
YONGLIANG XU (StarHub), Masaru Dobashi (NTT Data Corp.)
SmartHub and NTT DATA have embarked on a partnership to design next-generation architecture to power the data products that will help generate new insights. YongLiang Xu and Masaru Dobashi explain how deep learning and other analytics models coexist within the same platform to address issues relating to smart cities. Read more.
Add to your personal schedule
2:35pm3:15pm Thursday, December 7, 2017
Location: Summit 1 Level: Intermediate
Kazunori Sato (Google)
BigQuery is Google's fully managed, petabyte scale data warehouse. It's User Defined Function realizes "smart" queries with the power of machine learning, such as similarity search or recommendation on images or documents with feature vectors and neural network prediction. In this session we will see BigQuery and TensorFlow enables a powerful "data warehouse + ML" solution. Read more.
Add to your personal schedule
2:35pm3:15pm Thursday, December 7, 2017
Location: Summit 2 Level: Intermediate
Chris Hausler (Zendesk), Arwen Griffioen (Zendesk)
Deep Learning is presently -the- coolest kid on the machine learning block, but few companies are using this technology in a production environment. Zendesk uses deep learning to power Answer Bot, a question answering system that resolves support tickets without agent intervention. In this session we’ll share our descent into deep learning, the challenges and benefits we’ve seen along the way. Read more.
Add to your personal schedule
4:15pm4:55pm Thursday, December 7, 2017
Location: Summit 1 Level: Beginner
Prateek Nagaria (The Data Team)
Most data scientists use traditional methods of forecasting, such as exponential smoothing or ARIMA, to forecast a product demand. However, when the product experiences several periods of zero demand, approaches such as Croston may provide a better accuracy over these traditional methods. Prateek Nagaria compares traditional and Croston methods in R on intermittent demand time series. Read more.
Add to your personal schedule
5:05pm5:45pm Thursday, December 7, 2017
Location: Summit 2 Level: Intermediate
Markus Kirchberg (Wismut Labs Pte. Ltd.)
As the share of digital payments increases so does payment fraud, which has almost tripled between 2013 and 2016. Markus Kirchberg explains how recent advances in AI and machine learning, decision sciences, and network sciences are driving the development of next-generation payment fraud capabilities for fraud scoring, deceptive merchant detection, and merchant compromise detection. Read more.