Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Monday, 12/04/2017

8:30am

8:30am–9:00am Monday, 12/04/2017
Location: Foyer 5
Coffee Break (30m)

9:00am

Add to your personal schedule
9:00am–5:00pm Monday, 12/04/2017
Big data and the cloud
Location: 335
Jesse Anderson (Big Data Institute)
To handle real-time big data, you need to solve two difficult problems: how do you ingest that much data, and how will you process that much data? Jesse Anderson explores the latest real-time frameworks (both open source and managed cloud services), discusses the leading cloud providers, and explains how to choose the right one for your company. Read more.
Add to your personal schedule
9:00am–5:00pm Monday, 12/04/2017
Robert Schroll (The Data Incubator)
Average rating: *****
(5.00, 1 rating)
Robert Schroll demonstrates TensorFlow's capabilities through its Python interface and explores TFLearn, a high-level deep learning library built on TensorFlow. Join in to learn how to use TFLearn and TensorFlow to build machine learning models on real-world data. Read more.

10:30am

10:30am–11:00am Monday, 12/04/2017
Location: Foyer 5
Morning break (30m)

12:30pm

12:30pm–1:30pm Monday, 12/04/2017
Location: Summit 1 & 2
Lunch (1h)

3:00pm

3:00pm–3:30pm Monday, 12/04/2017
Location: Foyer 5
Afternoon (30m)

Tuesday, 12/05/2017

8:30am

8:30am–9:00am Tuesday, 12/05/2017
Location: Foyer 3 & 5
Coffee Break (30m)

9:00am

Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Big data and the cloud
Location: 308/309 Level: Intermediate
Vinithra Varadharajan (Cloudera), Philip Langdale (Cloudera), Jason Wang (Cloudera), Fahd Siddiqui (Cloudera)
Average rating: **...
(2.80, 5 ratings)
Vinithra Varadharajan, Philip Langdale, Jason Wang, and Fahd Siddiqui lead a deep dive into running data engineering workloads in a managed service capacity in the public cloud, highlighting cloud infrastructure best practices and illustrating how data engineering workloads interoperate with data analytic engines. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Becoming a data-centric company, Strata Business Summit
Location: 310/311 Level: Non-technical
John Akred (Silicon Valley Data Science)
Average rating: *****
(5.00, 1 rating)
Big data, AI, and data science have great potential for accelerating business, but how do you reconcile business opportunity with the sea of possible technologies? Data should serve the strategic imperatives of a business—those aspirations that will define an organization’s future vision. John Akred explains how to create a modern data strategy that powers data-driven business. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Data science and advanced analytics, Machine Learning
Location: 321/322 Level: Intermediate
Jared Lander (Lander Analytics)
Modern statistics has become almost synonymous with machine learning—a collection of techniques that utilize today's incredible computing power. Jared Lander walks you through the available methods for implementing machine learning algorithms in R and explores underlying theories such as the elastic net, boosted trees, and cross-validation. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Data science and advanced analytics, Machine Learning
Location: 328/329 Level: Intermediate
Yufeng Guo (Google)
Average rating: ***..
(3.07, 14 ratings)
Yufeng Guo walks you through training and deploying a machine learning system using TensorFlow, a popular open source library. Yufeng takes you from a conceptual overview all the way to building complex classifiers and explains how you can apply deep learning to complex problems in science and industry. Read more.
Add to your personal schedule
9:00am–12:30pm Tuesday, 12/05/2017
Location: 323
Alistair Croll (Solve For Interesting), kyungtaak Noh (SK Telecom), Mike Prorock (mesur.io), Hugo Sheng (Qlik), Neil Hirano (Dotz), Leandro Andrade (Dotz), Praveen Deorani (Holmusk), Ted Malaska (Blizzard Entertainment), Mike Koelemay (Sikorsky Aircraft, Lockheed Martin)
In a series of half-hour talks aimed at a business audience, you’ll hear from household brands and global companies as they explain the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further. Read more.

10:30am

10:30am–11:00am Tuesday, 12/05/2017
Location: Foyer 3 & 5
Morning break (30m)

12:30pm

12:30pm–1:30pm Tuesday, 12/05/2017
Location: Summit 1 & 2
Lunch (1h)

1:30pm

Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Data engineering and architecture
Location: 308/309 Level: Intermediate
Jonathan Seidman (Cloudera), Ted Malaska (Blizzard Entertainment)
Average rating: ****.
(4.67, 3 ratings)
Using Customer 360 and the IoT as examples, Jonathan Seidman and Ted Malaska explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Design, UX, visualization, and VR, Machine Learning
Location: 310/311 Level: Beginner
Bargava Subramanian (Independent), Amit Kapoor (narrativeVIZ Consulting)
Average rating: ***..
(3.67, 3 ratings)
One of the challenges in traditional data visualization is that they are static and have bounds on limited physical/pixel space. Interactive visualizations allows us to move beyond this limitation by adding layers of interactions. Bargava Subramanian and Amit Kapoor teach the art and science of creating interactive data visualizations. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Machine Learning, Spark and beyond
Location: 321/322 Level: Intermediate
Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
Vartika Singh and Jeffrey Shmain walk you through various approaches using the machine learning algorithms available in Spark ML to understand and decipher meaningful patterns in real-world data. Vartika and Jeff also demonstrate how to leverage open source deep learning frameworks to run classification problems on image and text datasets leveraging Spark. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Data science and advanced analytics, Machine Learning
Location: 328/329 Level: Intermediate
Tim Seears (Think Big, a Teradata company), Karthik Bharadwaj Thirumalai (Teradata)
Average rating: *....
(1.00, 4 ratings)
Tim Seears and Karthik Bharadwaj Thirumalai explain how to apply deep learning to improve consumer recommendations by training neural nets to learn categories of interest using embeddings. They then demonstrate how to extend this with WALS matrix factorization to achieve wide and deep learning—a process which is now used in production for the Google Play Store. Read more.
Add to your personal schedule
1:30pm–5:00pm Tuesday, 12/05/2017
Location: 323
Alistair Croll (Solve For Interesting), Clifton Phua (NCS Group), Mark Donsky (Cloudera), Syed Rafice (Cloudera), Victor Chua (StarHub Ltd), Carme Artigas (Synergic Partners), Zhihao Lin (Teralytics)
The modern city is awash in data. Cheap sensors on cars, roads, and people give us a real-time understanding of traffic. We can track pollution, temperature, and climate with unerring precision. Satellite photographs reveal shade cover, property values, and building development. Read more.

3:00pm

3:00pm–3:30pm Tuesday, 12/05/2017
Location: Foyer 3 & 5
Afternoon break (30m)

Wednesday, 12/06/2017

8:00am

8:00am–8:15am Wednesday, 12/06/2017
Location: Hall 404 Foyer
Coffee break sponsored by TigerGraph (15m)

8:15am

Add to your personal schedule
8:15am–8:45am Wednesday, 12/06/2017
Location: Hall 404 Foyer
Ready, set, network! Meet fellow attendees who are looking to connect at Strata. We'll gather before Wednesday keynotes to host an informal speed networking event. Be sure to bring your business cards and have fun. Read more.

8:50am

Add to your personal schedule
8:50am–9:00am Wednesday, 12/06/2017
Location: Hall 404AXF
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the first day of keynotes. Read more.

9:00am

Add to your personal schedule
9:00am–9:15am Wednesday, 12/06/2017
Location: Hall 404AXF
Melanie Johnston-Hollitt (Victoria University of Wellington)
Average rating: *****
(5.00, 3 ratings)
Keynote with Melanie Johnston-Hollitt Read more.

9:15am

Add to your personal schedule
9:15am–9:30am Wednesday, 12/06/2017
Location: Hall 404AXF
Mick Hollison (Cloudera), Cesar Delgado (Apple)
Average rating: *****
(5.00, 1 rating)
Twenty years ago, a company implored us to “think different” about personal computers. Today, Apple continues to live and breathe that legacy. It’s evident in the machine learning and analytics architectures that power many of the company’s most innovative applications. Cesar Delgado joins Mick Hollison to discuss how Apple is using its big data stack and expertise to solve non-data problems. Read more.

9:30am

Add to your personal schedule
9:30am–9:45am Wednesday, 12/06/2017
Location: Hall 404AXF
Steve Leonard (SGInnovate)
Average rating: *****
(5.00, 1 rating)
Steve Leonard details how Singapore is bringing together ambitious and capable individuals and teams to imagine, start, build, and scale technology that can solve the world’s toughest challenges. Read more.

9:45am

Add to your personal schedule
9:45am–9:55am Wednesday, 12/06/2017
Location: Hall 404AXF
Ben Lorica (O'Reilly Media)
Machine learning models are becoming increasingly widely used and deployed. Ben Lorica explains how to guard against flaws and failures in your machine learning deployments. Read more.

9:55am

Add to your personal schedule
9:55am–10:00am Wednesday, 12/06/2017
Location: Hall 404AXF
Felipe Hoffa (Google)
Average rating: *****
(5.00, 1 rating)
Organizations waste hours to endless discussions, and people lose sleep to internet debates. Can big data change this? Google Cloud is here to help. Felipe Hoffa explains that solid data-based conclusions are possible when stakeholders have easy access to analyze all relevant data. Read more.

10:00am

Add to your personal schedule
10:00am–10:20am Wednesday, 12/06/2017
Location: Hall 404AXF
Joshua Bloom (GE Digital)
Average rating: *****
(5.00, 1 rating)
The ongoing digitization of the industrial-scale machines that power and enable human activity is itself a major global transformation. Joshua Bloom explains why the real revolution—in efficiencies and in improved and saved lives—will happen when machine learning automation and insights are properly coupled to the complex systems of industrial data. Read more.

10:20am

Add to your personal schedule
10:20am–10:40am Wednesday, 12/06/2017
Location: Hall 404AXF
Keynote by Bruno Fernandez-Ruiz Read more.

10:45am

10:45am–11:15am Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Morning break sponsored by Google (30m)

11:15am

Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data engineering and architecture
Location: 308/309 Level: Intermediate
Ted Malaska (Blizzard Entertainment)
Average rating: ****.
(4.80, 5 ratings)
Ted Malaska shares the top five mistakes that no one talks about when you start writing your streaming app along with the practices you'll inevitably need to learn along the way. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Average rating: ***..
(3.00, 1 rating)
Neelesh Srinivas Salian offers an overview of the data platform used by data scientists at Stitch Fix, based on the Spark ecosystem. Neelesh explains the development process and shares some lessons learned along the way. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Non-technical
John Akred (Silicon Valley Data Science), Mark Hunter (Sainsburys Bank)
Deploying machine learning in business requires far more than just selecting an algorithm. You need the right architecture, tools, and team organization to drive your agenda successfully. John Akred and Mark Hunter share practical advice on the technical and human sides of machine learning, based on experience preparing Sainsbury’s for its ML-enabled future. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Strata Business Summit
Location: 328/329
Sachin Chitturu (McKinsey & Company)
After decades of extravagant promises, artificial intelligence is finally starting to deliver real-life benefits to early adopters. However, we're still early in the cycle of adoption. Shilpa Aggarwal explains where investment is going, patterns of AI adoption, and how the value potential of AI across sectors and business functions is beginning to emerge in Asia. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Paco Nathan (O'Reilly Media)
Average rating: *****
(5.00, 1 rating)
Human-in-the-loop is an approach which has been used for simulation, training, UX mockups, etc. A more recent design pattern is emerging for human-in-the-loop (HITL) as a way to manage teams working with machine learning (ML). A variant of semi-supervised learning called _active learning_ allows for mostly automated processes based on ML, where exceptions get referred to human experts. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning
Location: Summit 1 Level: Beginner
Average rating: *....
(1.50, 2 ratings)
In the current Agile business environment, where developers are required to experiment multiple ideas and also react to various situations, doing cloud-native development is the way to go. Harjinder Mistry and Bargava Subramanian explain how to design and build a microservices-based cloud-native machine learning application. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: Summit 2 Level: Intermediate
Wolff Dobson (Google)
Average rating: ****.
(4.00, 1 rating)
TensorFlow, the world's most popular machine learning framework, is fast, flexible, and production ready. Wolff Dobson, representing the Google Brain team, shares the latest developments in TensorFlow, including tensor processing units (TPUs), distributed training, new APIs and models, and mobile features. Join in to learn what's in store for TensorFlow and how ML can change your business. Read more.
Add to your personal schedule
11:15am–11:55am Wednesday, 12/06/2017
Sponsored
Location: 334/335
Felipe Hoffa (Google)
Stop worrying about infrastructure; focus on your data and insights. Felipe Hoffa explains how Google Cloud brings easy solutions to previously hard problems. Read more.

12:05pm

Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Location: 308/309 Level: Intermediate
Bas Geerdink (ING)
Average rating: ****.
(4.00, 2 ratings)
Bas Geerdink explains why and how ING is becoming more and more data-driven, sharing use cases, architecture, and technology choices along the way. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Location: 310/311 Level: Intermediate
Kostas Sakellis (Cloudera), Philip Langdale (Cloudera)
With its scalable data store, elastic compute, and pay-as-you-go cost model, cloud infrastructure is well-suited for large-scale data engineering workloads. Kostas Sakellis explores the latest cloud technologies, focusing on data engineering workloads, cost, security, and ease-of-use implications for data engineers. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Design, UX, visualization, and VR, Strata Business Summit
Location: 321/322 Level: Beginner
Isaac Reyes (DataSeer)
Average rating: ****.
(4.00, 1 rating)
Isaac Reyes explores the art and science of data storytelling, covering the essential elements of a good data story, chart design and why it matters, the Gestalt principals of visual perception and how they can be used to tell better stories with data, and how to make over a poor visualization. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Jesse Anderson (Big Data Institute)
Average rating: *****
(5.00, 1 rating)
Early project success is predicated on management making sure a data engineering team is ready and has all of the skills needed. Jesse Anderson outlines five of the most common nontechnology reasons why data engineering teams fail. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Average rating: ****.
(4.00, 1 rating)
For the first time, messaging apps have surpassed social networks in usage and growth. Mohammed Abdoolcarim shares best practices for designing for AI-based conversational UIs, such as those employed in messaging apps, drawn from work done at Apple, Google, and GoButler. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Jared Lander (Lander Analytics)
One common (but false) knock against R is that it doesn't scale well. Jared Lander shows how to use R in a performant matter both in terms of speed and data size and offers an overview of packages for running R at scale. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: Summit 2 Level: Intermediate
Danielle Dean (Microsoft), Wee Hyong Tok (Microsoft)
Average rating: **...
(2.67, 3 ratings)
Transfer learning enables you to use pretrained deep neural networks (e.g., AlexNet, ResNet, and Inception V3) and adapt them for custom image classification tasks. Danielle Dean and Wee Hyong Tok walk you through the basics of transfer learning and demonstrate how you can use the technique to bootstrap the building of custom image classifiers. Read more.
Add to your personal schedule
12:05pm–12:45pm Wednesday, 12/06/2017
Sponsored
Location: 334/335
Vira Shanty (Lippo Group)
Vira Shanty explains how the Lippo Group, one of the largest business conglomerates in Indonesia, is integrating data from multiple lines of business into a single big data analytic platform featuring an API layer with subsecond latency and how the company's mantra “deep and fast analytics” is opening new opportunities for improved customer engagement and new revenue streams. Read more.

12:45pm

Add to your personal schedule
12:45pm–1:45pm Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Looking to network with other attendees during lunch? Topic Table discussions help you connect with people in similar industries or interested in the same topics. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Ofir Sharony (MyHeritage)
Average rating: ****.
(4.67, 3 ratings)
What are the most important considerations for shipping billions of daily events to analysis? Ofir Sharony shares MyHeritage's journey to find a reliable and efficient way to achieve real-time analytics. Along the way, Ofir compares several data loading techniques, helping you make better choices when building your next data pipeline. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Vickye Jain (ZS Associates), Raghav Sharma (ZS Associates)
Vickye Jain and Raghav Sharma explain how they built a very high-performance data processing platform powered by Spark that balances the considerations of extreme performance, speed of development, and cost of maintenance. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Financial technology and data, Strata Business Summit
Location: 321/322 Level: Non-technical
Amit Das (Think Analytics India)
Average rating: ****.
(4.00, 2 ratings)
Access to credit in emerging markets is impeded by issues around identity verification, risk assessment and monitoring, and the costs of underwriting and collections. At the core of it all is a lack of data. Amit Das explains how accessing alternate data, real-time risk monitoring and data access solutions, and smart analytics is changing the lending landscape in India. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Becoming a data-centric company, Strata Business Summit
Location: 328/329 Level: Non-technical
Jessica Chen Riolfi (TransferWise)
Average rating: ****.
(4.00, 1 rating)
Data is essential to unlock growth opportunities, and successful companies use it in every decision. Jessica Chen Riolfi explains how to build an organization with decentralized, data-driven decision making that enables teams to focus on the products and features that matter and ultimately unlock exponential growth. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Strata Business Summit
Location: 323
Jesse Anderson (Big Data Institute)
We have an explosion of new architectures. Are these new architectures because engineers love new things or is there a good business reason for these changes? In this talk, we will consider these new architectures and the actual business problems they solve. You may find out that your team is far less productive if you don’t move to these architectures. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Aki Ariga (Cloudera)
Average rating: ***..
(3.00, 1 rating)
Aki Ariga explains how to put your machine learning model into production, discusses common issues and obstacles you may encounter, and shares best practices and typical architecture patterns of deployment ML models with example designs from the Hadoop and Spark ecosystem using Cloudera Data Science Workbench. Read more.
Add to your personal schedule
1:45pm–2:25pm Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: Summit 2 Level: Intermediate
Bargava Subramanian and Harjinder Mistry share data engineering and machine learning strategies for building an efficient real-time recommendation engine when the transaction data is both big and wide. They also outline a novel way of generating frequent patterns using collaborative filtering and matrix factorization on Apache Spark and serving it using Elasticsearch in the cloud. Read more.

2:35pm

Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Big data in telecommunications, Data engineering and architecture
Location: 308/309 Level: Intermediate
Yousun Jeong (SK Telecom)
Average rating: *****
(5.00, 1 rating)
Data transfer is one of the most pressing problems for telecom companies, as cost increases in tandem with the growing data requirements. Yousun Jeong details how SKT has dealt with this problem. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Mingxi Wu (TigerGraph), Yu Xu (TigerGraph)
Mingxi Wu and Yu Xu offer an overview of TigerGraph, a high-performance enterprise graph data platform that enables businesses to transform structured, semistructured, and unstructured data and massive enterprise data silos into an intelligent interconnected data network, allowing them to uncover the implicit patterns and critical insights to drive business growth. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data case studies, Strata Business Summit
Location: 321/322 Level: Intermediate
Eric Tham (National University of Singapore), Radha Pendyala (Thomson Reuters)
Average rating: *****
(5.00, 1 rating)
Graphical techniques are increasingly being used for big data. These techniques can be broadly classified into the three C's: centrality, clustering, and connectedness. Eric Tham explains how these concepts are applied to supply chain analysis and financial portfolio management. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Ricky Barron (InfoStrategy)
To many organizations, big data analytics is still a solution looking for a problem. Ricky Barron shares practical methods for getting the best out of your big data analytics capability and explains why establishing an "insights group" can improve the bottom line, drive performance, optimize processes, and create new data-driven products and solutions. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: 323 Level: Non-technical
Philips PRASETYO (Living Analytics Research Centre, Singapore Management University), Ee-Peng Lim (Singapore Management University)
Average rating: *****
(5.00, 1 rating)
Analyzing talent flow behavior is important for the understanding of job preference and career progression of working individuals. When analyzed at the workforce population level, talent flow analytics helps to gain insights of talent flow and organization competition. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning
Location: Summit 1 Level: Beginner
Wai Yau (Zendesk), Jeffrey Theobald (Zendesk)
Average rating: *****
(5.00, 2 ratings)
Simply building a successful machine learning product is extremely challenging, and just as much effort is needed to turn that model into a customer-facing product. Drawing on their experience working on Zendesk's article recommendation product, Wai Yau and Jeffrey Theobald discuss design challenges and real-world problems you may encounter when building a machine learning product at scale. Read more.
Add to your personal schedule
2:35pm–3:15pm Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning
Location: Summit 2 Level: Intermediate
Natalino Busa (DBS), Matteo Pelati (DataRobot)
Average rating: ****.
(4.50, 2 ratings)
Modern engineering requires machine learning engineers, who are needed to monitor and implement ETL and machine learning models in production. Natalino Busa shares technologies, techniques, and blueprints on how to robustly and reliably manage data science and ETL flows from inception to production. Read more.

3:15pm

3:15pm–4:15pm Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Afternoon break (1h)

4:15pm

Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Xiaochang Wu (Intel)
Xiaochang Wu explains how to design and implement a real-time processing platform using the Spark Structured Streaming framework to intelligently transform production lines in the manufacturing industry. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Big data and the cloud, Data engineering and architecture
Location: 310/311 Level: Beginner
Feng Cheng (Grab), Yanyu Qu (Grab)
Average rating: *****
(5.00, 1 rating)
Grab uses Presto to support operational reporting (batch and near real-time), ad hoc analyses, and its data pipeline. Currently, Grab has 5+ clusters with 100+ instances in production on AWS and serves up to 30K queries per day while supporting more than 200 internal data users. Feng Cheng and Yanyu Qu explain how Grab operationalizes Presto in the cloud and share lessons learned along the way. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Beginner
Benjamin Wright-Jones (Microsoft), Simon Lidberg (Microsoft)
Average rating: ****.
(4.00, 1 rating)
As organizations turn to data-driven strategies, they are also increasingly exploring the creation of a data science or analytic center of excellence (COE). Benjamin Wright-Jones and Simon Lidberg outline the building blocks of a center of excellence and describe the value for organizations embarking on data-driven strategies. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Executive Briefing, Spark and beyond, Strata Business Summit
Location: 328/329 Level: Non-technical
John Akred (Silicon Valley Data Science)
AI is white-hot at the moment, but where can it really be used? Developers are usually the first to understand why some technologies cause more excitement than others. John Akred relates this insider knowledge, providing a tour through the hottest emerging data technologies of 2017 to explain why they’re exciting in terms of both new capabilities and the new economies they bring. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Shirshanka Das (LinkedIn), Tushar Shanbhag (LinkedIn)
Average rating: *****
(5.00, 1 rating)
LinkedIn houses the most valuable professional data in the world. Protecting the privacy of member data has always been paramount. Shirshanka Das and Tushar Shanbhag outline three foundational building blocks for scalable data management that can meet data compliance regulations: a central metadata system, an integrated data movement framework, and a unified data access layer. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning, Spark and beyond
Location: Summit 1 Level: Intermediate
Holden Karau (Google)
Average rating: ****.
(4.25, 4 ratings)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau introduces Spark’s ML pipelines and explains how to extend them with your own custom algorithms, allowing you to take advantage of Spark's meta-algorithms and existing ML tools. Read more.
Add to your personal schedule
4:15pm–4:55pm Wednesday, 12/06/2017
Big data and the cloud, Machine Learning
Location: Summit 2 Level: Intermediate
Yufeng Guo (Google)
Yufeng Guo demonstrates how to use TensorFlow to easily combine linear regression models and deep neural networks with a machine learning model that has the benefits of both. You'll also learn what is happening under the hood and how you can use this model for your own datasets. Read more.

5:05pm

Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Data case studies, Data engineering and architecture
Location: 308/309 Level: Intermediate
Average rating: ****.
(4.00, 2 ratings)
Andreas Hadimulyono discusses the challenges that Grab is facing with the ever-increasing volume and velocity of its data and shares the company's plans to overcome them. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Big data and the cloud, Data engineering and architecture
Location: 310/311 Level: Intermediate
Greg Rahn (Cloudera)
Average rating: ****.
(4.00, 1 rating)
Cloud environments will likely play a key role in your business’s future. Henry Robinson and Greg Rahn explore the workload considerations when evaluating the cloud for analytics and discuss common architectural patterns to optimize price and performance. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Data science and advanced analytics, Machine Learning
Location: 321/322 Level: Intermediate
Anand Chitipothu (rorodata)
Average rating: *****
(5.00, 2 ratings)
There are many challenges to deploying machine models in production, including managing multiple versions of models, maintaining staging and production models, keeping track of model performance, logging, and scaling. Anand Chitipothu explores the tools, techniques, and system architecture of a cloud platform built to solve these challenges and the new opportunities it opens up. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Teresa Tung (Accenture Labs)
Average rating: *****
(5.00, 2 ratings)
A data-driven enterprise maximizes the value of its data. But how do enterprises emerging from technology and organization silos get there? Teresa Tung explains how to create a data-driven enterprise maturity model that spans technology and business requirements and walks you through use cases that bring the model to life. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Mark Donsky (Cloudera), Steven Ross (Cloudera)
In May 2018, the General Data Protection Regulation (GDPR) goes into effect for firms doing business in the EU, but many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). Steven Ross and Mark Donsky outline the capabilities your data environment needs to simplify compliance with GDPR and future regulations. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Data engineering and architecture, Machine Learning, Spark and beyond
Location: Summit 1 Level: Intermediate
Peng Meng (Intel)
Average rating: *....
(1.00, 1 rating)
Apache Spark ML and MLlib are hugely popular in the big data ecosystem, and Intel has been deeply involved in Spark from a very early stage. Peng Meng outlines the methodology behind Intel's work on Spark ML and MLlib optimization and shares a case study on boosting the performance of Spark MLlib ALS by 60x in JD.com’s production environment. Read more.
Add to your personal schedule
5:05pm–5:45pm Wednesday, 12/06/2017
Machine Learning
Location: Summit 2 Level: Intermediate
YIQUN HU (Singapore Power)
Average rating: ****.
(4.00, 1 rating)
Energy usage is a significant part of daily life, so the ability to monitor this use offers a number of benefits, from cost savings to improved safety. A key challenge is the lack of labeled data. Yiqun Hu shares a new solution: a RNN-based network trained to learn good features from unlabeled data. Read more.

5:45pm

Add to your personal schedule
5:45pm–6:45pm Wednesday, 12/06/2017
Location: Sponsor Pavilion, Concourse 1-4
Need to unwind after a long day of sessions? Join us at the Sponsor Pavilion Reception and enjoy beverages and snacks with fellow Strata Data sponsors, attendees, and speakers. Read more.

Thursday, 12/07/2017

8:00am

8:00am–8:15am Thursday, 12/07/2017
Location: Hall 404 Foyer
Coffee break sponsored by TigerGraph (15m)

8:15am

Add to your personal schedule
8:15am–8:45am Thursday, 12/07/2017
Location: Hall 404 Foyer
Ready, set, network! Meet fellow attendees who are looking to connect at Strata. We'll gather before Thursday keynotes to host an informal speed networking event. Be sure to bring your business cards and have fun. Read more.

8:50am

Add to your personal schedule
8:50am–8:55am Thursday, 12/07/2017
Location: Hall 404AXF
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program chairs Ben Lorica, Doug Cutting, and Alistair Croll welcome you to the second day of keynotes. Read more.

8:55am

Add to your personal schedule
8:55am–9:10am Thursday, 12/07/2017
Location: Hall 404AXF
Carme Artigas (Synergic Partners)
Average rating: *****
(5.00, 1 rating)
The concept of smart cities has evolved from sensored urban centers to platform ecosystems that combine data with new technologies such as the IoT, the cloud, and AI. Carme Artigas explores the challenges and opportunities of evolving from smart cities to intelligent societies. Read more.

9:10am

Add to your personal schedule
9:10am–9:20am Thursday, 12/07/2017
Location: Hall 404AXF
Amr Awadallah (Cloudera)
We are witnessing a new revolution in data—the age of decision automation. Amr Awadallah explains the historic importance of this next wave in automation and highlights the foundational capabilities required to enable it: machine learning and analytics optimized for the cloud. Read more.

9:20am

Add to your personal schedule
9:20am–9:40am Thursday, 12/07/2017
Location: Hall 404AXF
Ajey Gore (GO-JEK)
Average rating: *****
(5.00, 2 ratings)
Drawing on his experience at GO-JEK, Ajey Gore explains how the impossible can be made possible with technology and data insights. Read more.

9:40am

Add to your personal schedule
9:40am–10:00am Thursday, 12/07/2017
Location: Hall 404AXF
Secondary topics:  ecommerce
Tony Lee (JD.com)
Average rating: ****.
(4.00, 2 ratings)
Details to come. Read more.

10:00am

Add to your personal schedule
10:00am–10:20am Thursday, 12/07/2017
Location: Hall 404AXF
Kira Radinsky (eBay | Technion)
Average rating: *****
(5.00, 4 ratings)
Kira Radinsky offers an overview of a system that jointly mines 10 years of nation-wide medical records of more than 1.5 million people and extracts medical knowledge from Wikipedia to provide guidance about drug repurposing—the process of applying known drugs in new ways to treat diseases. Read more.

10:20am

Add to your personal schedule
10:20am–10:40am Thursday, 12/07/2017
Location: Hall 404AXF
Pascale Fung (The Hong Kong University of Science and Technology)
Average rating: *****
(5.00, 3 ratings)
Keynote with Pascale Fung Read more.

10:45am

10:45am–11:15am Thursday, 12/07/2017
Location: Sponsor Pavilion, Concourse 1-4
Morning break (30m)

11:15am

Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Data engineering and architecture
Location: 308/309 Level: Beginner
Wataru Yukawa (LINE)
Average rating: ****.
(4.00, 2 ratings)
Data is a very important asset to LINE, one of the most popular messaging applications in Asia. Wataru Yukawa explains how LINE gets the most out of its data using a Hadoop data lake and an in-house log analysis platform. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Data engineering and architecture, Spark and beyond
Location: 310/311 Level: Intermediate
Holden Karau (Google), Joey Echeverria (Rocana)
Average rating: *****
(5.00, 3 ratings)
Apache Spark offers greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark, and more. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Beginner
Grace Tang (Uber)
Average rating: ****.
(4.50, 4 ratings)
Being a data-driven company means that we have to move fast and fail often. But how do we learn to not only be proud of our failures but also turn these fails into wins? Grace Tang explains how to set up experiments so that negative results become epic wins, saving your team time, effort, and money, instead of just being swept under the carpet. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Strata Business Summit
Location: 328/329
Carme Artigas (Synergic Partners)
Carme Artigas explains why an analytics center of excellence (ACoE), whether internal or outsourced, is an effective way to create mechanisms to deploy big data across the entire organization rather than simply serving a particular department or use case. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Paco Nathan (O'Reilly Media)
Average rating: *****
(5.00, 3 ratings)
Paco Nathan explains how O'Reilly employs AI, from the obvious (chatbots, case studies about other firms) to the less so (using AI to show the structure of content in detail, enhance search and recommendations, and guide editors for gap analysis, assessment, pathing, etc.). Approaches include vector embedding search, summarization, TDA for content gap analysis, and speech-to-text to index video. Read more.
Add to your personal schedule
11:15am–11:55am Thursday, 12/07/2017
Big data and the cloud, Machine Learning
Location: Summit 2 Level: Intermediate
Wee Hyong Tok (Microsoft), Danielle Dean (Microsoft)
Deep neural networks are responsible for many advances in natural language processing, computer vision, speech recognition, and forecasting. Danielle Dean and Wee Hyong Tok illustrate how cloud computing has been leveraged for exploration, programmatic training, real-time scoring, and batch scoring of deep learning models for projects in healthcare, manufacturing, and utilities. Read more.

12:05pm

Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Tzu-Li (Gordon) Tai (data Artisans)
Apache Flink is evolving from a framework for streaming data analytics to a platform that offers a foundation for event-driven applications that replaces the data management aspects that are typically handled by a database in more conventional architectures. Tzu-Li (Gordon) Tai explores the key features that are powering Flink's evolution, along with demonstrations of them in action. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Data engineering and architecture, Spark and beyond
Location: 310/311 Level: Intermediate
Carson Wang (Intel), Yucai Yu (Intel)
Average rating: ****.
(4.50, 2 ratings)
Spark SQL is one of the most popular components of Apache Spark. Carson Wang and Yucai Yu explore Intel's efforts to improve SQL performance and offer an overview of an adaptive execution mode they implemented for Spark SQL. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Big data and the cloud, Strata Business Summit
Location: 321/322 Level: Beginner
John Mertic (The Linux Foundation), Cupid Chan (4C Decision )
Average rating: ****.
(4.00, 1 rating)
John Mertic and Cupid Chan share real end-user perspectives from companies like GE on how they are using big data tools, challenges they face, and where they are looking to focus investments—all from a vendor-neutral viewpoint. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Mick Hollison (Cloudera)
Mick Hollison shares examples of real-world machine learning applications, explores a variety of challenges in putting these capabilities into production—the speed with with technology is moving, cloud versus in-data-center consumption, security and regulatory compliance, and skills and agility in getting data and answers into the right hands—and outlines proven ways to meet them. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Data engineering and architecture, Machine Learning
Location: Summit 1 Level: Intermediate
Graham Gear (Cloudera)
Average rating: *****
(5.00, 1 rating)
How can we drive more data pipelines, advanced analytics, and machine learning models into production? How can we do this both faster and more reliably? Graham Gear draws on real-world processes and systems to explain how it's possible to apply continuous delivery techniques to advanced analytics, realizing business value earlier and more safely. Read more.
Add to your personal schedule
12:05pm–12:45pm Thursday, 12/07/2017
Xianyan Jia (Intel), zhenhua wang (JD.com)
Xianyan Jia and Zhenhua Wang explore deep learning applications built successfully with BigDL. They also teach you how to develop fast prototypes with BigDL's off-the-shelf deep learning toolkit and build end-to-end deep learning applications with flexibility and scalability using BigDL on Spark. Read more.

12:45pm

Add to your personal schedule
12:45pm–1:45pm Thursday, 12/07/2017
Location: Sponsor Pavilion, Concourse 1-4
Looking to network with other attendees during lunch? Topic Table discussions help you connect with people in similar industries or interested in the same topics. Read more.

1:45pm

Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Data engineering and architecture, Spark and beyond
Location: 308/309 Level: Advanced
Apache Beam allows data pipelines to work in batch, streaming, and a variety of open source and private cloud data processing backends, including Apache Flink, Apache Spark, and Google Cloud Dataflow. Jean-Baptiste Onofré offers an overview of Apache Beam's programming model, explores mechanisms for efficiently building data pipelines, and demos an IoT use case dealing with MQTT messages. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Big data and the cloud, Data engineering and architecture
Location: 310/311 Level: Beginner
Calvin Jia (Alluxio), Haoyuan Li (Alluxio)
Calvin Jia and Haoyuan Li explain how to decouple compute and storage with Alluxio, exploring the decision factors, considerations, and production best practices and solutions to best utilize CPUs, memory, and different tiers of disaggregated compute and storage systems to build out a multitenant high-performance platform. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Law, ethics, and open data, Strata Business Summit
Location: 321/322 Level: Beginner
Gaurav Godhwani (Open Budgets India, Centre for Budget and Governance Accountability)
Average rating: *****
(5.00, 1 rating)
Most of the India’s budget documents aren’t easily accessible. Those published online are mostly available as unstructured PDFs, making it difficult to search, analyze, and use this crucial data. Gaurav Godhwani discusses the process of creating Open Budgets India and making India’s budgets open, usable, and easy to comprehend. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Smart cities and urban automation
Location: 328/329 Level: Non-technical
Alistair Croll (Solve For Interesting)
Average rating: ****.
(4.50, 2 ratings)
We infuse urban spaces with sensors, drinking from a torrent of data, making sense of city life. But this reliance on data has real risks: Complex systems often have unintended consequences, and it's hard to experiment. Alistair Croll shares lessons from the past and explains how paving the cowpaths, examining the models, and iterating everything can mitigate these risks. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
Teresa Tung (Accenture Labs), Ishmeet Grewal (Accenture Labs), Jurgen Weichenberger (Accenture Analytics)
Average rating: *****
(5.00, 1 rating)
As Accenture scaled to millions of predictive models, it required automation to ensure accuracy, prevent false alarms, and preserve trust. Teresa Tung, Ishmeet Grewal, and Jurgen Weichenberger explain how Accenture implemented a DevOps process for analytical models that's akin to software development—guaranteeing analytics modeling at scale and even in noncloud environments at the edge. Read more.
Add to your personal schedule
1:45pm–2:25pm Thursday, 12/07/2017
YONGLIANG XU (StarHub), Masatake Iwasaki (NTT DATA)
Average rating: *****
(5.00, 1 rating)
SmartHub and NTT DATA have embarked on a partnership to design next-generation architecture to power the data products that will help generate new insights. YongLiang Xu and Masatake Iwasaki explain how deep learning and other analytics models can coexist on the same platform to address opportunities and challenges in initiatives such as smart cities. Read more.

2:35pm

Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Supreet Oberoi (Oracle)
Time series data is any dataset that is plotted over a range of time. Often, in IoT use cases, what is of interest is finding a pattern in the sequence of measurements. However, queries on time series data do not traditionally scale. Supreet Oberoi explains how Oracle adapted and extended symbolic aggregate approximation (SAX) to solve such challenges. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Data engineering and architecture
Location: 310/311 Level: Intermediate
Dong Li (Kyligence), Luke Han (Kyligence)
Apache Kylin is an extreme distributed OLAP engine on Hadoop. Well-tuned cubes bring about the best performance with the least cost but require a comprehensive understanding of tuning principles to use. Dong Li and Luke Han explain advanced tuning and introduce KyBot, which helps find and solve bottlenecks in an intelligent way with AI methods performed on log analysis results. Read more.
2:35pm–3:15pm Thursday, 12/07/2017
Location: 321/322
TBC
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Nikki Rouda (Cloudera)
Managing the security and governance of big data can be challenging on-premises but becomes far more difficult in a heterogeneous environment spanning a public cloud or across multiple cloud services. Nikki Rouda shares unbiased best practices to ensure your data is under control everywhere. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Kazunori Sato (Google)
Average rating: ****.
(4.00, 1 rating)
BigQuery is Google's fully managed, petabyte-scale data warehouse. Its user-defined function realizes "smart" queries with the power of machine learning, such as similarity searches or recommendations on images or documents with feature vectors and neural network prediction. Kazunori Sato demonstrates how BigQuery and TensorFlow together enable a powerful "data warehouse + ML" solution. Read more.
Add to your personal schedule
2:35pm–3:15pm Thursday, 12/07/2017
Data science and advanced analytics, Machine Learning
Location: Summit 2 Level: Intermediate
Chris Hausler (Zendesk), Arwen Griffioen (Zendesk)
Average rating: *****
(5.00, 2 ratings)
Chris Hausler and Arwen Griffioen discuss Zendesk's experience with deep learning, using the example of Answer Bot, a question-answering system that resolves support tickets without agent intervention. They cover the benefits Zendesk has already seen and challenges encountered along the way. Read more.

3:15pm

3:15pm–4:15pm Thursday, 12/07/2017
Location: Sponsor Pavilion, Concourse 1-4
Afternoon break (1h)

4:15pm

Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Data engineering and architecture
Location: 308/309 Level: Intermediate
Xie Qi (Intel China), quanfu wang (Intel China)
Xie Qi and Quanfu Wang offer an overview of a configurable FPGA-based Spark SQL acceleration architecture that leverages FPGAs' very high parallel computing capability to tremendously accelerate Spark SQL queries and FPGAs' power efficiency to lower power consumption. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Data case studies, Data engineering and architecture
Location: 310/311 Level: Intermediate
Wei Chen (Intel), Zhaojuan Bian (Intel)
Kudu is designed to fill the gap between HDFS and HBase. However, designing a Kudu-based cluster presents a number of challenges. Wei Chen and Zhaojuan Bian share a real-world use case from the automobile industry to explain how to design a Kudu-based E2E system. They also discuss key indicators to tune Kudu and OS parameters and how to select the best hardware components for different scenarios. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Becoming a data-centric company, Strata Business Summit
Location: 321/322 Level: Intermediate
Sarang Anajwala (Autodesk)
Sarang Anajwala offers an overview of Autodesk’s centralized data platform, which democratizes analytics across various teams within Autodesk. The platform has gone through multiple iterations to optimize the balance between a complex one-size-fits-all data access layer and multiple fragmented noncohesive data access layers. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Thomas Dinsmore (Cloudera), Johnson Poh (DBS)
Average rating: *****
(5.00, 1 rating)
Data science alone is easy. Data science with others, in the enterprise, on shared distributed systems, requires a bit more work. Thomas Dinsmore and Johnson Poh share common technology considerations and patterns for collaboration in large teams and best practices for moving machine learning into production at scale. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Prateek Nagaria (The Data Team)
Most data scientists use traditional methods of forecasting, such as exponential smoothing or ARIMA, to forecast a product demand. However, when the product experiences several periods of zero demand, approaches such as Croston may provide a better accuracy over these traditional methods. Prateek Nagaria compares traditional and Croston methods in R on intermittent demand time series. Read more.
Add to your personal schedule
4:15pm–4:55pm Thursday, 12/07/2017
Machine Learning
Location: Summit 2
Adam Gibson (Skymind)
Average rating: ****.
(4.67, 3 ratings)
Adam Gibson demonstrates how to use variational autoencoders to automatically label time series location data. You'll explore the challenge of imbalanced classes and anomaly detection, learn how to leverage deep learning for automatically labeling (and the pitfalls of this), and discover how you can deploy these techniques in your organization. Read more.

5:05pm

Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Data engineering and architecture
Location: 308/309 Level: Advanced
Yu-Xi Lim (Teralytics), Michał Węgrzyn (Teralytics)
Average rating: *****
(5.00, 1 rating)
Yu-Xi Lim and Michal Wegrzyn outline a high-throughput distributed software pattern capable of processing event streams in real time. At its core, the pattern relies on functional reactive programming idioms to shard and splice state fragments, ensuring high horizontal scalability, reliability, and high availability. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Graham Dumpleton (Red Hat)
Average rating: ***..
(3.00, 2 ratings)
Jupyter notebooks provide a rich interactive environment for working with data. Running a single notebook is easy, but what if you need to provide a platform for many users at the same time. Graham Dumpleton demonstrates how to use JupyterHub to run a highly scalable environment for hosting Jupyter notebooks in education and business. Read more.
5:05pm–5:45pm Thursday, 12/07/2017
Location: 321/322
TBC
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Big data and the cloud, Strata Business Summit
Location: 328/329 Level: Beginner
Arun Veettil (Skellam AI)
Arun Veettil shares his experience and lessons learned developing a customized, enterprise-level NLP platform to replace a leading text analytics vendor platform. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Big data and the cloud, Machine Learning
Location: Summit 1 Level: Intermediate
Le Zhang (Microsoft), Graham Williams (Microsoft)
R has long been criticized for its limitations on scalable data analytics. What's needed is an R-centric paradigm that enables data scientists to elastically harness cloud resources of manifold computing capability for large-scale data analytics. Le Zhang and Graham Williams demonstrate how to operationalize an E2E enterprise-grade pipeline for big data analytics—all within R. Read more.
Add to your personal schedule
5:05pm–5:45pm Thursday, 12/07/2017
Financial technology and data, Machine Learning
Location: Summit 2 Level: Intermediate
Markus Kirchberg (Wismut Labs Pte. Ltd.)
As the share of digital payments increases so does payment fraud, which has almost tripled between 2013 and 2016. Markus Kirchberg explains how recent advances in AI and machine learning, decision sciences, and network sciences are driving the development of next-generation payment fraud capabilities for fraud scoring, deceptive merchant detection, and merchant compromise detection. Read more.