Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Speaker slides & video

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.)

If you are looking for slides and video from 2018, visit the Strata 2018 site.

Data Science, Machine Learning & AI
GRDF helps bring natural gas to nearly 11 million customers everyday. In partnership with GRDF, Dataiku worked to optimise the manual process of qualifying addresses to visit and ultimately save GRDF time and money. This solution was the culmination of a year-long adventure in the land of maintenance experts, legacy IT systems and agile development.
Case studies
Ivan Danesi (UniCredit Services S.C.p.A.)
The presented use case describes the construction of a customer relationship management application in a Big Data environment. More than 50 models (monthly refreshed) have been deployed within this project. Aim of each model is customer development (cross-selling) or attrition management (churn). Input for the analysis are heterogeneous data coming from 7 different Countries.
Data Science, Machine Learning & AI
Yoav Einav (GigaSpaces)
Technological advancements are transforming customer experience, and businesses are beginning to benefit from Deep Learning innovations to automate call center routing to the most proper agent. This session will discuss how Deep Learning models can be run with Intel BigDL and Spark frameworks co-located on an in-memory computing platform to enhance the customer experience without the need for GPUs
Data Science, Machine Learning & AI
Shivnath Babu (Unravel Data Systems | Duke University), Alkis Simitsis (Micro Focus)
Cost and resource provisioning are critical components of the big data stack. A magic 8-ball for the big data stack would give an enterprise a glimpse into its future needs and would enable effective and cost-efficient project and operational planning. This talk covers how to build that magic 8-ball, a decomposable time-series model, for optimal cost and resource allocation for the big data stack.
Data Science, Machine Learning & AI
Matthew Honnibal (Explosion AI)
In this talk, I'll discuss "one weird trick" that can give your NLP project a better chance of success. The advice is this: avoid a "waterfall" methodology where data definition, corpus construction, modelling and deployment are performed as separate phases of work.
Data Science, Machine Learning & AI
Alex Jaimes (Dataminr)
When emergency events occur, social signals and sensor data are generated. In this talk, I will describe how Machine Learning and Deep Learning are applied in processing large amounts of heterogeneous data from various sources in real time, with a particular focus on how such information can be used for emergencies and in critical events for first responders and for other social good use cases.
Strata Business Summit
Angie Ma (ASI Data Science)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Angie Ma and Jonny Howell offer a condensed introduction to key AI and machine learning concepts and techniques, showing you what is (and isn't) possible with these exciting new tools and how they can benefit your organization.
Case studies
Ganes Kesari (Gramener Inc)
Global environmental challenges have pushed our planet to the brink of disaster. Rapid advances in deep learning are placing immense power in the hands of consumers and enterprises. This power can be marshaled to support environmental groups and researchers who need immediate assistance to address the rapid depletion of our rich biodiversity.
Data Engineering and Architecture
Rebecca Simmonds (Red Hat), Michael McCune (Red Hat)
Artificial intelligence and machine learning are now popularly used terms but how do we make use of these techniques, without throwing away the valuable knowledge of experienced employees. This session will delve into this idea with examples of how distributed machine learning frameworks fit together naturally with business rules management systems.
Data Engineering and Architecture
Mark Madsen (Think Big Analytics), Todd Walter (Teradata)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure.
Data Engineering and Architecture
Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
In Upstream Oil and Gas, a vast amount of the data requested for analytics projects is “scientific data” - physical measurements about the real world. Historically this data has been managed “library-style” in files - but to provide this data to analytics projects, we need to do something different. Sun and Jane discuss architectural best practices learned from their work with subsurface data.
Data Engineering and Architecture, Streaming and IoT
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Many industry segments have been grappling with fast data (high-volume, high-velocity data). In this tutorial we shall lead the audience through a journey of the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline - messaging, compute and storage - for real-time data and algorithms to extract insights - e.g., heavy-hitters, quantiles - from data streams.
Data Engineering and Architecture
Holden Karau (Google), Kris Nova (VMware)
In the Kubernetes world where declarative resources are a first class citizen, running complicated workloads across distributed infrastructure is easy, and processing big data workloads using Spark is common practice -- we can finally look at constructing a hybrid system of running Spark in a distributed cloud native way. Join respective experts Kris Nova & Holden Karau for a fun adventure.
Data Engineering and Architecture
Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)
Introduce the challenges of migrating bigdata analytics workloads to public cloud - like performance lost, and missing features. Show case how to the new in memory data accelerator leveraging persistent memory and RDMA NICs can resolve this issues and enables new opportunities for bigdata workloads on the cloud.
Data Engineering and Architecture
Moty Fania (Intel)
In this session, Moty Fania will share his experience of implementing a Sales AI platform. It handles processing of millions of website pages and sifting thru millions of tweets per day. The platform is based on unique open source technologies and was designed for real-time, data extraction and actuation. This session highlights the key learnings with a thorough review of the architecture.
Data Science, Machine Learning & AI
The application of AI algorithms in domains such as criminal justice, credit scoring, and hiring holds unlimited promise. At the same time, it raises legitimate concerns about algorithmic fairness. There is a growing demand for fairness, accountability, and transparency from machine learning (ML) systems. In this talk we cover how to build just such a pipeline leveraging open source tools.
Data Engineering and Architecture
Jorge Lopez (Amazon Web Services)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this workshop, we show you how to incorporate serverless concepts into your big data architectures, looking at design patterns to ingest, store, and analyze your data. You will build a big data application using AWS technologies such as S3, Athena, Kinesis, and more
Business Analytics and Visualization
Alicia Williams (Google)
In this talk, Alicia Williams will share how two media companies followed this path to organize content and make it accessible around the world. Along the way, we will talk about the business problems they solved with ML, demonstrate the ease-of-use of the tools themselves, and show the value that ML has brought in each case.
Keynote
Shingai Manjengwa (Fireside Analytics Inc.)
Insights from teaching data science to 300,000 online learners, second-career college graduates and, Grade 12 / 6th Form high school students.
Data Engineering and Architecture
Jian Chang (Alibaba Group), Sanjian Chen (Alibaba Group)
We would like to share the architecture design and many detailed technology innovations of Alibaba TSDB, a state-of-the-art database for IoT data management, from years of development and continuous improvement.
Data Engineering and Architecture
Eoin O'Flanagan (Newday), Darragh McConville (Kainos)
In this session you will learn how we have built a high-performance contemporary data processing platform, from the ground up, on AWS. We will discuss our journey from legacy, onsite, traditional data estate to an entirely cloud-based, PCI DSS-compliant platform.
Data Engineering and Architecture
Arif Wider (ThoughtWorks), Emily Gorcenski (ThoughtWorks)
Machine learning can be challenging to deploy and maintain. Data change, and both models and the systems that implement them must be able to adapt. Any delays moving models from research to production means leaving your data scientists' best work on the table. In this talk, we explore continuous delivery (CD) for AI/ML, and explore case studies for applying CD principles to data science workflows.
Data Science, Machine Learning & AI
Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks Inc.)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
In this workshop, we will present how to apply the concept of Continuous Delivery (CD) - which ThoughtWorks pioneered - to data science and machine learning. It allows data scientists to make changes to their models, while at the same time safely integrating and deploying them into production, using testing and automation techniques to release reliably at any time and with a high frequency.
Data Science, Machine Learning & AI
Holden Karau (Google), Trevor Grant (IBM), Ilan Filonenko (Bloomberg LP), Francesca Lazzeri (Microsoft)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
This workshop will quickly introduce what Kubeflow is, and how we can use it to train and serve models across different cloud environments (and on-prem). We’ll have a script to do the initial set up work ready so you can jump (almost) straight into training a model on one cloud, and then look at how to set up serving in another cluster/cloud. We will start with a simple model w/follow up links.
Strata Business Summit
Alistair Croll (Solve For Interesting)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Welcome to the Data Case Studies tutorial.
Data Engineering and Architecture
Ravi Suhag (Go Jek)
At GO-JEK, we build products that help millions of Indonesians commute, shop, eat and pay, daily. The Data team is responsible to create resilient and scalable data infrastructure across all of GO-JEK’s 18+ products. This involves building distributed big data infrastructure, real-time analytics and visualization pipelines for billions of data points per day.
Data Engineering and Architecture
Václav Surovec (Deutsche Telekom IT), Gabor Kotalik (Deutsche Telekom AG)
The knowledge of location and travel patterns of customers is important for many companies. One of them is a German telco service operator T-Mobile Czech Republic. Commercial Roaming project using Cloudera Hadoop helped the company to better analyze the behavior of its customers from 10 countries, in a very secure way, to be able to provide better predictions and visualizations for the management.
Executive Briefing and best practices
Charlotte Werger (Van Lanschot Kempen)
In this talk we outline the components necessary to transform a traditional wealth manager into a data driven business. Special attention is made on devising and executing a transformation strategy by identifying key business sub-units where automation and improved predictive modelling can result in significant gains and synergies.
Culture and organization, Strata Business Summit
Robert Cohen (Economic Strategy Institute)
This talk describes the skills that employers are seeking from employees in digital jobs – linked to the new software hierarchy driving digital transformation. We describe this software hierarchy as one that ranges from DevOps, CI/CD, and microservices to Kubernetes and Istio. This hierarchy is used to define the jobs that are central to data-driven digital transformation.
Case studies
Cecilia Marchi (Jakala)
A major brewery was facing a big deal in defining its sales&marketing strategy in France given the big number of points of consumption (POC) and minimum data available to differentiate them. The session shows how Jakala mixed social media data and location data from mobile apps to estimate both the overall attractiveness of the POC and the affinity of its real consumers to brand target profiles
Data Science, Machine Learning & AI
Yves Peirsman (NLP Town)
In this age of big data, NLP professionals are all too often faced with a lack of data: written language is abundant, but labelled texts are much harder to get by. In my talk, I will discuss the most effective ways of addressing this challenge: from the semi-automatic construction of labelled training data to transfer learning approaches that reduce the need for labelled training examples.
Data Science, Machine Learning & AI
Deep Learning has enabled massive breakthroughs in offbeat tracks and has enabled better understanding of how an artist paints, how an artist composes music and so on. As part of Nischal & Raghotham’s loved project - Deep Learning for Humans, they want to build a font classifier and showcase to masses how fonts : * Can be classified * Understand how and why two or more fonts are similar
Data Science, Machine Learning & AI
Oliver Gindele (Datatonic)
The success of Deep Learning has reached the realm of structured data in the past few years where neural network have shown to improve the effectiveness and predictability of recommendation engines. This session will give a brief overview of such deep recommender systems and how they can be implemented in TensorFlow.
Data Science, Machine Learning & AI
Scott Stevenson (Faculty)
Modern deep learning systems allow us to build speech synthesis systems with the naturalness of a human speaker. Whilst there are myriad benevolent applications, this also ushers in a new era of fake news. This talk will explore the danger of such systems, as well as how deep learning can also be used to build countermeasures to protect against political disinformation.
Data Engineering and Architecture
Nanda Vijaydev (BlueData), Thomas Phelan (BlueData)
Organizations need to keep ahead of their competition by using the latest AI/ML/DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. This session will discuss the effective deployment of such applications in a container environment.
Data Engineering and Architecture
Constantin Muraru (Adobe), Dan Popescu (Adobe)
Obtaining servers to run your realtime application has never been easier. Cloud providers have removed the cumbersome process of provisioning new hardware, to suite your needs. What happens though when you wish to deploy your (web) applications frequently, on hundreds or even thousands of servers in a fast and reliable way with minimal human intervention? This session addresses this precise topic.
Nicolette Bullivant (Santander UK Technology)
Attend this session to learn more about the way Santander have restructured their business around data. Learn about the people, processes and technology they brought together to make it a success and get practical ideas to help you start or progress your journey with big data.
Data Science, Machine Learning & AI
Christopher Hooi (Land Transport Authority of Singapore)
The Fusion Analytics for Public Transport Event Response (FASTER) system provides a real-time advanced analytics solution for early warning of potential train incidents. Using novel fusion analytics of multiple data sources, FASTER harnesses the use of engineering and commuter-centric IoT data sources to activate contingency plans at the earliest possible time and reduce impact to commuters.
Data Science, Machine Learning & AI
Mingxi Wu (TigerGraph)
Graph query language is the key to unleash the value from connected data. In this talk, we point out 8 prerequisites of a practical graph query language concluded from our 6 years experience in dealing with real world graph analytical use cases. And compare GSQL, Gremlin, Cypher and Sparql in this regard.
Data Science, Machine Learning & AI
Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)
Cyber security analysts are under siege to keep pace with the ever-changing threat landscape. The analysts are overworked, burnout and bombarded with the sheer number of alerts that they must carefully investigate. To empower our cyber security analysts we can use a data science model for alert evaluations.
Executive Briefing and best practices, Strata Business Summit
Ellen Friedman (MapR Technologies)
A surprising fact of modern technology is that not knowing some things can make you better at what you do. This isn’t just lack of distraction or being too delicate to face reality. It’s about separation of concerns, with a techno flavor. In this talk I go through five things that best practice with emerging technologies and new architectures can give us ways to not know, and why that’s important.
Executive Briefing and best practices, Strata Business Summit
Brandy Freitas (Pitney Bowes)
Data science is an approachable field given the right framing. Often, though, practitioners and executives are describing opportunities using completely different languages. In this session, Harvard Biophysicist-turned-Data Scientist, Brandy Freitas, will work with participants to develop context and vocabulary around data science topics to help build a culture of data within their organization.
Executive Briefing and best practices, Strata Business Summit
Nikki Rouda (Amazon Web Services (AWS))
This talk is about some of the key trends we see in data lakes and analytics, and how they shape the services we offer at AWS. Specific topics include the rise of machine generated data and semi-structured/unstructured data as dominant sources of new data, the move towards serverless, SPI-centric computing, and the growing need for local access to data from users around the world.
Executive Briefing and best practices, Strata Business Summit
Mark Donsky (Okera), Steven Ross (Cloudera)
General Data Protection Regulation (GDPR) goes into effect in May 2018 for firms doing any business in the EU. However many companies aren't prepared for the strict regulation or fines for noncompliance (up to €20 million or 4% of global annual revenue). This session will explore the capabilities your data environment needs in order to simplify GDPR compliance, as well as future regulations.
Executive Briefing and best practices, Strata Business Summit
Mike Olson (Cloudera)
It's easier than ever to collect data -- but managing it securely, in compliance with regulations and legal constraints is harder. There are plenty of tools that promise to bring machine learning techniques to your data -- but choosing the right tools, and managing models and applications in compliance with regulation and law is quite difficult.
Executive Briefing and best practices, Strata Business Summit
Paco Nathan (derwen.ai)
Data governance is an almost overwhelming topic. This talk surveys history, themes, plus a survey of tools, process, standards, etc. Mistakes imply data quality issues, lack of availability, and other risks that prevent leveraging data. OTOH, compliance issues aim to preventing risks of leveraging data inappropriately. Ultimately, risk management plays the "thin edge of the wedge" in enterprise.
Executive Briefing and best practices, Strata Business Summit
Alasdair Allan (Babilim Light Industries)
A arrival of new generation of smart embedded hardware may cause the demise of large scale data harvesting. In its place smart devices will allow us process data at the edge, allowing us to extract insights from the data without storing potentially privacy and GDPR infringing data. The current age where privacy is no longer "a social norm" may not long survive the coming of the Internet of Things.
Executive Briefing and best practices, Strata Business Summit
Teresa Tung (Accenture Labs), Jean-Luc Chatelain (Accenture)
How do enterprises scale moving beyond one-off AI projects to making it re-usable? Teresa Tung and Jean-Luc Chatelain explain how domain knowledge graphs—the same technology behind today's Internet search—can bring the same democratized experience to enterprise AI. Beyond search applications, we show other applications of knowledge graphs in oil & gas, financial services, and enterprise IT.
Executive Briefing and best practices, Strata Business Summit
Dean Wampler (Lightbend)
Your team is building Machine Learning capabilities. I'll discuss how you can integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed. There are big challenges. How do you build long-running services that are very reliable and scalable? How do you combine a spectrum of very different tools, from data science to operations?
Executive Briefing and best practices, Strata Business Summit
Pete Skomoroch (Workday)
Companies that understand how to apply machine intelligence will scale and win their respective markets over the next decade. Others will fail to ship successful AI products that matter to customers. This talk describes how to combine product design, machine learning, and executive strategy to create a business where every product interaction benefits from your investment in machine intelligence.
Data Science, Machine Learning & AI
Ian Cook (Cloudera)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Advancing your career in data science requires learning new languages and frameworks—but learners face an overwhelming array of choices, each with different syntaxes, conventions, and terminology. Ian Cook simplifies the learning process by elucidating the abstractions common to these systems. Through hands-on exercises, you'll overcome obstacles to getting started using new tools.
Data Science, Machine Learning & AI
Eitan Anzenberg (Flowcast AI)
Machine learning applications balance interpretability and performance. Linear models provide formulas to directly compare the influence of the input variables, while non-linear algorithms produce more accurate models. We utilize "what-if" scenarios to calculate the marginal influence of features per prediction and compare with standardized methods such as LIME.
Data Science, Machine Learning & AI
Mikio Braun (Zalando SE)
In this talk, we will look at techniques and concepts around fairness, privacy, and security when it comes to machine learning models.
Data Science, Machine Learning & AI
Chris Wallace (Cloudera)
Imagine building a model whose training data is collected on edge devices such as cell phones or sensors. Each device collects data unlike any other, and the data cannot leave the device because of privacy concerns or unreliable network access. This challenging situation is known as federated learning. In this talk we’ll cover the algorithmic solutions and the product opportunities.
Strata Business Summit
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Welcome to the Findata Day tutorial.
Data Engineering and Architecture
Ted Malaska (Capital One), Jonathan Seidman (Cloudera)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
The enterprise data management space has changed dramatically in recent years, and this had led to new challenges for organizations in creating successful data practices. In this presentation we’ll provide guidance and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects.
Data Science, Machine Learning & AI
Charlotte Werger (Van Lanschot Kempen)
This talk discusses a best practice use case for detecting fraud at a financial institution. Where traditional systems fall short, machine learning models can provide a solution. Sifting through large amounts of transaction data, external hit lists, and unstructured text data we managed to build a dynamic and robust monitoring system that successfully detects unwanted client behavior.
Culture and organization, Strata Business Summit
Julia Butter (Scout24 AG)
To create value out of your data it is not about technology or engineers. It is all about changing the culture in the company to make everyone aware about data and how to build on top of data. At Scout24 we running a successful culture change and already have 60% of employees using our central BI tool. Since 2018 it is all about AI enablement.
Data Engineering and Architecture
Max Schultze (Zalando SE)
Data Lake implementation at a large scale company, raw data collection, standardized data preparation (e.g. binary conversion, partitioning), user driven analytics and machine learning.
Data Science, Machine Learning & AI
Divya Choudhary (GOJEK)
Data scientists around the globe would agree that addresses are the most unorganised textual data. Structuring addresses has almost led to a new stream of NLP itself. Who would've imagined that address text data can be used to develop one of the coolest product feature of finding the most precise pick up/drop-off locations for e-commerce, logistics, food delivery or ride/car services companies!
Data Engineering and Architecture
Mark Donsky (Okera)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads that span on-prem, private cloud, multi-cloud, and hybrid cloud. We will share hands-on best practices for meeting these challenges, with special attention to CCPA.
Data Engineering and Architecture
Sandeep U (Intuit)
Teams today rely on tribal data dictionaries which is a mixed bag w.r.t. correctness -- some datasets have accurate attribute details, while others are incorrect & outdated. This significantly impacts productivity of analysts & scientists. Existing tools for data dictionary are manually updated and difficult to maintain. This talk covers 3 patterns we have deployed to manage data dictionaries.
Data Science, Machine Learning & AI
Don Fox (The Data Incubator)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
We will walk through all the steps - from prototyping to production - of developing a machine learning pipeline. We’ll look at data cleaning, feature engineering, model building/evaluation, and deployment. Students will extend these models into two applications from real-world datasets. All work will be done in Python.
Streaming and IoT
Boris Lublinsky (Lightbend), Dean Wampler (Lightbend)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
This hands-on tutorial examines production use of ML in streaming data pipelines; how to do periodic model retraining and low-latency scoring in live streams. We'll discuss Kafka as the data backplane, pros and cons of microservices vs. systems like Spark and Flink, tips for Tensorflow and SparkML, performance considerations, model metadata tracking, and other techniques.
Data Engineering and Architecture
Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)
Expedia Group is a travel platform with an extensive portfolio including Expedia.com and Hotels.com. We like to give our data teams flexibility and autonomy to work with different technologies. However, this approach generates challenges that cannot be solved by existing tools. We'll explain how we built a unified virtual data lake on top of our many heterogeneous and distributed data platforms.
Data Engineering and Architecture
Neelesh Salian (Stitch Fix)
Developing data infrastructure is not trivial and neither is changing it. It takes effort and discipline to make changes that can affect your team. In this talk, we shall learn what we, in Stitch Fix's Data Platform team, do to maintain and innovate our infrastructure for our Data Scientists.
Case studies
Yoav Einav (GigaSpaces)
How a leading IT Service Provider for financial firms leverages NLP, helping service agents provide first call resolution quickly and efficiently to enhance CX and reduce time the agent spends on the line, lowering operational costs. The system responds with sub-second latency and creates continuous learning models based on each transaction, ensuring updated models for smarter, faster insights.
Data Science, Machine Learning & AI
Machine-learning algorithms are good at learning new behaviors, but bad at identifying when those behaviors are harmful or don’t make sense. Bias, ethics, and fairness is a big risk factor in Machine Learning (ML). We have a lot of experience dealing with intelligent beings—one another. In this talk, we use this common sense to build a checklist for protecting against ethical violations with ML.
Data Science, Machine Learning & AI
SEONMIN KIM (LINE Corp)
Kim will provide an introduction to activities that mitigate the risk of mobile payments through various data analytical skills which came out of actual case studies of mobile frauds, along with tree-based machine learning, graph analytics, and statistical approaches.
Executive Briefing and best practices, Strata Business Summit
Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)
Implementing Enterprise Data Management is never easy, but it's even harder in industrial and scientific organisations. Three worlds of business data, facilities data and scientific data have long been managed separately but must be brought together to realise business value. Sun and Jane will address the cultural and organisational differences as well as data management requirements to succeed.
Data Engineering and Architecture
Holden Karau (Google), Mikayla Konst (Google), Ben Sidhom (Google)
As more workloads move to “severless” like environments, the importance of properly handling downscaling increases.
Data Science, Machine Learning & AI
Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
In this auditory world, the human brain processes and reacts effortlessly to a variety of sounds. While many of us take this for granted, there are over 360 million in this world who are deaf or hard of hearing. We will explain how to make the auditory world inclusive and meet the great demand in other sectors by applying deep learning on audio in Azure.
Law and Ethics, Strata Business Summit
Sundeep Reddy Mallu (Gramener Inc)
Answering simple question of what rights do Indian citizens have over their data is a nightmare. The rollout of India Stack technology based solutions has added fuel to fire. Sundeep explains, with on ground examples, how businesses and citizens are navigating the India Stack ecosystem while dealing with Data privacy, security & Ethics space in India's booming digital economy.
Data Engineering and Architecture
Mark Samson (Cloudera)
It is now possible to build a modern data platform capable of storing, processing and analysing a wide variety of data across multiple public and private Cloud platforms and on-premise data centres. This session will outline an information architecture for such a platform, informed by working with multiple large organisations who have built such platforms over the last 5 years.
Case studies, Strata Business Summit
Fabio Ferraretto (Accenture), Tatiane Canero (Hospital Albert Einstein)
How Albert Einstein and Accenture evolved patient flow experience and efficiency with the use of applied AI, statistics and combinatorial math, allowing the hospital to antecipate E2E visibility within patient flow operations, from admission of emergency and ellective demands, to assignment and medical releases.
Case studies, Strata Business Summit
Dirk Petzoldt (Zalando SE)
Case Study from Europe’s leading online fashion platform Zalando about its journey to a scalable, personalized Machine Learning based marketing platform.
Law and Ethics, Strata Business Summit
Our experience with building the Business Intelligence platform has been nothing short of extraordinary. The proposal contains details about how Uber thought about building it's Business Intelligence platform. In this talk, I’ll narrate the journey of deciding on how we took a platform approach rather than adding features in a piecemeal fashion.
Keynote
Cait O'Riordan (Financial Times)
TBD
Keynote, Data Science, Machine Learning & AI
Cassie Kozyrkov (Google)
Cassie Kozyrkov
Keynote
David Boyle (Harrods)
David Boyle, Customer Insights Director, Harrods
Data Science, Machine Learning & AI
Amir Issaei (Databricks)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
The course covers the fundamentals of neural networks and how to build distributed Keras/TensorFlow models on top of Spark DataFrames. Throughout the class, you will use Keras, TensorFlow, Deep Learning Pipelines, and Horovod to build and tune models. You will also use MLflow to track experiments and manage the machine learning lifecycle. NOTE: This course is taught entirely in Python.
Data Science, Machine Learning & AI
Sophie Watson (Red Hat)
Identifying relevant documents quickly and efficiently enhances both user experience and business revenue every day. Sophie Watson demonstrates how to implement Learning to Rank algorithms and provides you with the information you need to implement your own successful ranking system.
Data Engineering and Architecture
Jason Bell (DeskHoppa)
The Embulk data migration tool offers a convenient way to load data in to a variety of systems with basic configuration. This talk gives an overview of the Embulk tool and shows some common data migration scenarios that a data engineer could employ using the tool.
Data Engineering and Architecture
Matt Fuller (Starburst)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Used by Facebook, Netflix, Airbnb, LinkedIn, Twitter, Uber, and others, Presto has become the ubiquitous open source software for SQL-on-Anything. Presto was built from the ground up for fast interactive SQL analytics against disparate data sources ranging in size from Gigabytes to Petabytes. In this tutorial, attendees will learn Presto usages, best practices, and optional hands on exercises.
Data Science, Machine Learning & AI
Shioulin Sam (Cloudera Fast Forward Labs)
Machine learning requires large datasets - a prohibitive limitation in many real world applications. What if we could build models from scratch that could recognize images using only a handful of labeled examples? In this talk, we will cover algorithmic solutions that enable learning with limited data, and discuss product opportunities.
Data Engineering and Architecture
Peter Billen (Accenture)
In this session we will explain how to use metadata to automate delivery and operations of a data platform. By injecting automation into the delivery processes we shorten the time-to-market while improving the quality of the initial user experience. Typical examples include: Data profiling and prototyping, Test automation, Continuous delivery and deployment, Automated code creation
Data Science, Machine Learning & AI
Guoqiong Song (Intel)
Collecting and processing massive time series data (e.g., logs, sensor readings, etc.), and detecting the anomalies in real time is critical for many emerging smart systems, such as industrial, manufacturing, AIOps, IoT, etc. This talk will share how to detect anomalies of time series data using Analytics Zoo and BigDL at scale on a standard Spark cluster.
Data Engineering and Architecture
Mark Grover (Lyft), Deepak Tiwari (Lyft)
Lyft’s data platform is at the heart of Lyft’s business. Decisions all the way from pricing, to ETA, to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. In this talk, Mark Grover walks through various choices Lyft has made in the development and sustenance of the data platform and why along with what lies ahead in future.
Data Science, Machine Learning & AI
Ana Hocevar (The Data Incubator)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
The TensorFlow library provides for the use of computational graphs, with automatic parallelization across resources. This architecture is ideal for implementing neural networks. This training will introduce TensorFlow's capabilities in Python. It will move from building machine learning algorithms piece by piece to using the Keras API provided by TensorFlow with several hands-on applications.
Case studies
Samuel Cristóbal (Innaxis)
DataBeacon is a multi-sided data and machine learning platform for the aviation industry. Two applications will be presented: SmartRunway (machine learning solution to runway optimisation) and SafeOperations (operations safety predictive analytics).
Keynote
Technology changes so fast these days, we spend much of our time just keeping up. Prediction, difficult enough at any time, is made even more complex when Big Data and predictive analytics immensely increase the number of options we need to consider.
Data Engineering and Architecture
Hussein Mehanna (Google Cloud)
AI will change how we live in the next 30 years. However, AI is still limited to a small group of companies. Building AI systems is expensive and difficult. But in order to scale the impact of AI across the globe, we need to reduce the cost of building AI solutions? How can we do that? Can we learn from other industries? Yes, we can. The automobile industry went through a similar cycle.
Data Engineering and Architecture
Sonal Goyal (Nube)
Enterprise data on customers, vendors, products etc is siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting and 360 views. Traditional rule based MDM systems with legacy architectures struggle to unify this growing data. This talk covers a modern master data application using Spark, Cassandra, ML and Elastic.
Data Engineering and Architecture
Ted Malaska (Capital One)
In the world of data it is all about building the best path to support time/quality to value. 80% to 90% of the work is getting the data into the hands and tools that can create value. This talk will take us on a journey of different patterns and solution that can work at the largest of companies.
Data Engineering and Architecture
Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University / Google Cloud)
Apache Oozie and Apache Airflow (incubating) are both widely used workflow orchestration systems where the former focuses on Apache Hadoop jobs. We see a need to build oozie to Airflow workflow mapping as a part of creating an effective cross-cloud/cross-system solution. This talk aims to introduce an open-source Oozie-to-Airflow migration tool developed at Google.
Data Engineering and Architecture
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
In this talk, we shall walk the audience through an architecture whereby models are served in real-time and the models are updated, using Apache Pulsar, without restarting the application at hand. Further, we will describe how Pulsar functions can be applied to support two example use cases, viz., sampling and filtering. We shall lead the audience through a concrete case study of the same.
Case studies
Rashed Iqbal (Investment and Development Office)
Despite fierce challenges, Tesla has upended not only the automotive and technology sectors but also our perception of disruption itself. Tesla and its enigmatic CEO, Elon Musk, have consistently used narratives to support their brand and market valuation. This talk presents a case study in the application of Narrative Modeling to the news and social media content about Tesla since its inception.
Data Science, Machine Learning & AI
Mounia Lalmas (Spotify)
The aim of our mission is "to match fans and artists in a personal and relevant way". In this talk, Mounia will describe some of the (research) work we are doing to achieve this, from using machine learning to metric validation. She will describe works done in the context of Home, Search and Voice.
Data Engineering and Architecture
Elliot West (Hotels.com), Jaydene Green (Hotels.com)
Hotels.com describe approaches for applying software engineering best practices to SQL-based data applications in order to improve maintainability and data quality. Using open source tools we show how to build effective test suites for Apache Hive code bases. We also present Mutant Swarm, a mutation testing tool we’ve developed to identify weaknesses in tests and to measure SQL code coverage.
Data Science, Machine Learning & AI
Alexander Thomas, Claudiu Branzan (G2 Web Services)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
This is a hands-on tutorial for scalable NLP using the highly performant, highly scalable open-source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve.
Data Engineering and Architecture
Simona Meriam (Nielsen)
Ingesting billions of events per day into our big data stores we need to do it in a scalable, cost-efficient and consistent way. When working with Spark and Kafka the way you manage your consumer offsets has a major implication on data consistency. We will go in depths of the solution we ended up implementing and discuss the working process, the dos and don'ts that led us to its final design.
Data Science, Machine Learning & AI
Moshe Wasserblat presents an overview of NLP Architect, an open source DL NLP library that provides SOTA NLP models making it easy for researchers to implement NLP algorithms and for data scientists to build NLP based solution for extracting insight from textual data to improve business operations.
Case studies
Yiannis Kanellopoulos (Code4Thought)
Black box algorithmic models make decisions that have a great impact in our lives. Thus, the need for their accountability and transparency is growing. To address this, we have created an evaluation framework for models and the organisations utilising them.This session presents the aspects of our framework and the lessons learnt from its application at a multibillion dollar high tech corporation.
Data Engineering and Architecture
Erik Nordström (Timescale)
Requirements of time-series databases include ingesting high volumes of structured data; answering complex, performant queries for both recent & historical time intervals; & performing specialized time-centric analysis & data management. I explain how one can avoid these operational problems by re-engineering Postgres to serve as a general data platform, including high-volume time-series workloads
Data Engineering and Architecture
Lars Volker (Cloudera), Anna Szonyi (Cloudera)
The Parquet format recently added column indexes, which improve the performance of query engines like Impala, Hive, and Spark on selective queries. We will cover the technical details of the design and its implementation, and we will give practical tips to help data architects leverage these new capabilities in their schema design. Finally, we will show performance results for common workloads.
Data Science, Machine Learning & AI
Ines Montani (Explosion AI)
In this talk, I'll explain spaCy's new support for efficient and easy transfer learning, and show you how it can kickstart new NLP projects with our new annotation tool, Prodigy Scale.
Case studies, Strata Business Summit
Rosaria Silipo (KNIME)
This is a collection of past data science projects. While the structure is often similar - data collection, data transformation, model training, deployment - each one of them has needed some special trick. It was either the change in perspective or a particular techniques to deal with special case and special business questions the turning point in implementing the data science solution.
Data Science, Machine Learning & AI
Sami Niemi (Barclays)
Predicting transaction fraud of debit and credit card payments in real-time is an important challenge, which state-of-art supervised machine learning models can help to solve. Barclays has been developing and testing different solutions and will show how well different models perform in variety of situations like card present and card not present debit and credit card transactions.
Data Engineering and Architecture
Wojciech Biela (Starburst), Piotr Findeisen (Starburst)
Presto is a popular open source distributed SQL engine for interactive queries over heterogeneous data sources (Hadoop/HDFS, Amazon S3/Azure ADSL, RDBMS, no-SQL, etc). Recently Starburst has contributed the Cost-Based Optimizer for Presto which brings a great performance boost for Presto. Learn about this CBO’s internals, the motivating use cases and observed improvements.
Data Engineering and Architecture, Streaming and IoT
Geir Endahl (Cognite), Daniel Bergqvist (Google)
Learn how Cognite is developing IIoT smart maintenance systems that can process 10M samples/second from thousands of sensors. We’ll review an architecture designed for high performance, robust streaming sensor data ingest and cost-effective storage of large volumes of time series data, best practices for aggregation and fast queries, and achieving high-performance with machine learning.
Data Engineering and Architecture
Jesse Anderson (Big Data Institute)
2-Day Training Please note: to attend, you must be registered for a Platinum or Training pass.
Takes a participant through an in-depth look at Apache Kafka. We show how Kafka works and how to create real-time systems with it. It shows how to create consumers and publishers in Kafka. The we look at Kafka’s ecosystem and how each one is used. We show how to use Kafka Streams, Kafka Connect, and KSQL.
Data Engineering and Architecture
Felipe Hoffa (Google)
Before releasing a public dataset, practitioners need to thread the needle between utility and protection of individuals. We will explore massive public datasets, taking you from theory to real life showcasing newly available tools that help with PII detection and brings concepts like k-anonymity and l-diversity to the practical realm (with options such as removing, masking, and coarsening).
Data Science, Machine Learning & AI
Weifeng Zhong (American Enterprise Institute)
We developed a machine learning algorithm to “read” the People’s Daily — the official newspaper of the Communist Party of China — and predict changes in China’s policy priorities using only the information in the newspaper. The output of this algorithm, which we call the Policy Change Index (PCI) of China, turns out to be a leading indicator of the actual policy changes in China since 1951.
Case studies
Romi Mahajan (KKM Group)
Residential Real Estate is the world's largest asset class. More importantly, "dwellings" constitute the single largest purchase for most families around the globe. Still, in the world's largest residential real estate markets, the process of valuing, buying, and selling houses is byzantine, analog, and mysterious. Using sophisticated and real-world AI is the key to democratizing value.
Data Engineering and Architecture
Robin Moffatt (Confluent)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
In this workshop you will learn the architectural reasoning for Apache Kafka and the benefits of real-time integration, and then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL.
Data Science, Machine Learning & AI
Christian Hidber (bSquare)
Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in go or flying a helicopter. No big data sets with the “right” answers are needed: the algorithms learn by experimenting. We show “how” and “why” RL works in an intuitive fashion & highlight how to apply it to an industrial, hydraulics application with 7000 clients in 42 countries.
Data Engineering and Architecture
Ananth Durai (Slack Technologies Inc)
Logs are everywhere. Every organization collects tons of data every day. The logs are as good as the trust it earns to make business-critical decisions. Building trust and reliability of logs are critical to creating a data-driven organization. Ananth walkthrough his experience building reliable logging infrastructure at Slack and how it helped to build confidence on data.
Data Engineering and Architecture, Streaming and IoT
Ted Dunning (MapR)
As a community, we have been pushing streaming architectures, particularly microservices, for several years now. But what are the results in the field? I will describe several (anonymized) case histories and describe the good, the bad and the ugly. In particular, I will describe how several teams who were new to big data fared by skipping map-reduce and jumping straight into streaming.
Law and Ethics, Strata Business Summit
Laila Paszti (GTC Law Group PC & Affiliates)
As companies commercialize novel applications of AI in areas such as finance, hiring, and public policy, there is concern that these automated decision-making systems may unconsciously duplicate social biases, with unintended societal consequences. This talk will provide practical advice for companies to counteract such prejudices through a legal and ethics based approach to innovation.
Data Engineering and Architecture
Jason Wang (Cloudera), Tony Wu (Cloudera), Vinithra Varadharajan (Cloudera)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Moving to the cloud poses challenges from re-architecting to be cloud-native, to data context consistency across workloads that span multiple clusters on-prem and in the cloud. First, we’ll cover in depth cloud architecture and challenges; second, you’ll use Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.
Data Engineering and Architecture
Jacques Nadeau (Dremio)
Performance and cost are two important considerations in determining optimized solutions for SQL workloads in the cloud. We look at TPC workloads and how they can be accelerated, invisible to client apps. We explore how Apache Arrow, Parquet, and Calcite can be used to provide a scalable, high-performance solution optimized for cloud deployments, while significantly reducing operational costs.
Data Engineering and Architecture
Anirudha Beria (Qubole), Rohit Karlupia (Qubole)
Autoscaling of resources aims to achieve low latency for a big data application, while reducing resource costs at the same time. Scalability aware autoscaling aims to use historical information to make better scaling decisions. In this talk we will talk about (1) Measuring efficiency of autoscaling policies and (2) coming up with more efficient autoscaling policies, in terms of latency and costs.
Data Engineering and Architecture
Manish Maheshwari (Cloudera), Lars Volker (Cloudera)
Apache Impala is a MPP SQL query engine for planet scale queries. When set up and used properly, Impala is able to handle hundreds of nodes and tens of thousands of queries hourly. In this talk, we will discuss how to avoid pitfalls in Impala configuration (memory limits, admission pools, metadata management, statistics), along with best practices and antipatterns for end users or BI applications.
Data Engineering and Architecture
David Josephsen (Sparkpost)
This is the story of how Sparkpost Reliability Engineering abandoned ELK for a DIY Schema-On-Read logging infrastructure. We share architectural details and tribulations from our _Internal Event Hose_ data ingestion pipeline project, which uses Fluentd, Kinesis, Parquet and AWS Athena to make logging sane.
Strata Business Summit, Visualization and UX
Mars Geldard (University of Tasmania), Paris Buttfield-Addison (Secret Lab Pty. Ltd.)
Science-fiction has been showcasing complex, AI-driven (often AR or VR) interfaces (for huge amounts of data!) for decades. As television, movies, and video games became more capable of visualising a possible future, the grandeur of these imagined science fictional interfaces has increased. What can we learn from Hollywood UX? Is there a useful takeaway? Does sci-fi show the future of AI UX?
Data Science, Machine Learning & AI
Arun Kejariwal (Independent), Ira Cohen (Anodot)
Recently, Sequence-2-Sequence has also been used for applications based on time series data. In this talk, we first overview S2S and the early use cases of S2S. Subsequently, we shall walk through how S2S modeling can be leveraged for the aforementioned use cases, viz., real-time anomaly detection and forecasting.
Data Science, Machine Learning & AI
Amy Unruh (Google)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
This tutorial provides an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hand-ons labs, you’ll learn machine learning (ML) and TensorFlow concepts, and develop skills in developing, evaluating, and productionizing ML models.
Data Science, Machine Learning & AI
Amy Unruh (Google)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
This tutorial provides an introduction to designing and building machine learning models on Google Cloud Platform. Through a combination of presentations, demos, and hand-ons labs, you’ll learn machine learning (ML) and TensorFlow concepts and develop skills in developing, evaluating, and productionizing ML models.
Case studies, Strata Business Summit
David Maman (Binah.ai)
The combination of a mere of a few minutes of video, signal processing, remote heart rate monitoring, machine learning, and data science can identify a person’s emotions, health condition and performance. Financial institutions and potential employers can analyze whether you have good or bad intentions.
Data Science, Machine Learning & AI
Ihab Ilyas (University of Waterloo | Tamr)
Last year, we covered two primary challenges in applying machine learning to data curation: entity consolidation & using probabilistic inference to suggest data repair for identified errors and anomalies. This year, we'll cover these limitations in greater detail and explain why data unification projects common to quickly require human guided machine learning and a probabilistic model.
Data Science, Machine Learning & AI
In this talk you will learn how to use Spark NLP and Apache Spark to standardize semi-structured text. You will see how Indeed standardizes resume content at scale.
Executive Briefing and best practices, Strata Business Summit
Vidya Raman (Cloudera)
Not surprisingly, there is no single approach to embracing data-driven innovations within any industry vertical. However, there are some enterprises that are doing a better job than others when it comes to establishing a culture, process and infrastructure that lends itself to data-driven innovations. In this talk, we will share some key foundational ingredients that span multiple industries.
Data Engineering and Architecture
Itai Yaffe (Nielsen)
At Nielsen Marketing Cloud, we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. To achieve that, we need to ingest billions of events per day into our big data stores and we need to do it in a scalable yet cost-efficient manner. In this talk, we will discuss how we continuously transform our data infrastructure to support these goals.
Data Engineering and Architecture, Streaming and IoT
Thomas Weise (Lyft)
Fast data and stream processing are essential for making Lyft rides a good experience for passengers and drivers. Our systems need to track and react to event streams in real-time, to update locations, compute routes and estimates, balance prices and more. The streaming platform at Lyft powers these use cases with development frameworks and deployment stack that are based on Apache Flink and Beam.
Data Science, Machine Learning & AI
Wolff Dobson (Google)
In this talk, we will cover the latest in TensorFlow, both for beginners and for developers migrating from 1.x to 2.0. We'll cover the best ways to set up your model, feed your data to it, and distribute it for fast training. We'll also look at how TensorFlow has been recently upgraded to be more intuitive.
test
Data Engineering and Architecture
Robin Moffatt (Confluent)
This talk discusses the concepts of events, their relevance to software and data engineers and their ability to unify architectures in a powerful way. It describes why analytics, data integration and ETL fit naturally into a streaming world. There'll be a hands-on demonstration of these concepts in practice and commentary on the design choices made.
Case studies
Simon Moritz (Ericsson AB)
This is a practical presentation of how the fourth industrial revolution are transforming companies and business models as we know it. The truth is no longer what you see with your eyes, the truth is in the digital sphere, where it only sometimes will be a need for a physical twin. What is the need for a road sign along the street if the information is already in the car.
Data Science, Machine Learning & AI
Maryam Jahanshahi (TapRecruit)
In this talk I will discuss exponential family embeddings, which are methods that extend the idea behind word embeddings to other data types. I will describe how we used dynamic embeddings to understand how data science skill-sets have transformed over the last 3 years using our large corpus of job descriptions. The key takeaway is that these models can enrich analysis of specialized datasets.
Data Engineering and Architecture
Greg Rahn (Cloudera)
Data warehouses have traditionally run in the data center and in recent years they have adapted to be more cloud-native. In this talk, we'll discuss a number of emerging trends and technologies that will impact how data warehouses are run both in the cloud and on-prem and share our vision on what that means for architects, administrators, and end users.
Law and Ethics, Strata Business Summit
Mark Hinely (KirkpatrickPrice)
Organizations across the globe are trying to determine whether GDPR applies to them. Now, it seems as though GDPR principles are headed to the US. In 2018 alone, more ten states have passed or amended consumer privacy and breach notification laws. Mark Hinely will provide insight on the current and future data privacy laws in the US and how they will impact organizations across the globe.
Data Science, Machine Learning & AI
David Low (Pand.ai)
Transfer Learning has been proven to be a tremendous success in the Computer Vision field as a result of ImageNet competition. In the past months, the Natural Language Processing field has witnessed several breakthroughs with transfer learning, namely ELMo, OpenAI Transformer, and ULMFit. In this talk, David will be showcasing the use of transfer learning on NLP application with SOTA accuracy.
Data Engineering and Architecture
Marcel Ruiz Forns (Wikimedia Foundation)
Analysts and researchers studying Wikipedia are hungry for long term data to build experiments and feed data-driven decisions. But Wikipedia has a strict privacy policy that prevents storing privacy-sensitive data over 90 days. The Wikimedia Foundation's analytics team is working on a vegan data diet to satisfy both.
Case studies, Strata Business Summit
Maurício Lins (everis consultancy UK), Lidia Crespo (Santander UK)
Big data is usually regarded as a menace for data privacy. However, with the right principles and mind-set, it can be a game changer to put customers first and consider data privacy an inalienable right. Santander UK applied this model to comply with GDPR by using graph technology, Hadoop, Spark, Kudu to drive data obscuring and data portability, and driving machine learning exploration.
Keynote
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program Chairs, Ben Lorica, Doug Cutting, and Alistair Croll, welcome you to the second day of keynotes.
Data Science, Machine Learning & AI
Francesca Lazzeri (Microsoft), Aashish Bhateja (Microsoft)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Time series modeling and forecasting has fundamental importance to various practical domains and, during the past few decades, machine learning model-based forecasting has become very popular in the private and the public decision-making process. In this tutorial, we will walk you through the core steps for using Azure Machine Learning to build and deploy your time series forecasting models.
Data Engineering and Architecture
Kai Wähner (Confluent)
How can you leverage the flexibility and extreme scale in public cloud combined with Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures, which span multiple public clouds or bridge your on-premise data centre to cloud? Join this talk to learn how to apply technologies such as TensorFlow with Kafka’s open source ecosystem for machine learning infrastructures
Data Engineering and Architecture
Willem Pienaar (GO-JEK), Zhi Ling Chen (GO-JEK)
Features are key to driving impact with AI at all scales. By democratizing the creation, discovery, and access of features through a unified platform, organizations are able to dramatically accelerate innovation and time to market. Find out how GO-JEK, Indonesia's first billion-dollar startup, built a feature platform to unlock insights in AI, and the lessons they learned along the way.
Data Science, Machine Learning & AI
S.P.T. Krishnan (REAN Cloud (A Hitachi Vantara company))
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
Provides an overview of the latest Big Data and Machine Learning serverless technologies from AWS, and a deep dive into using them to process and analyze two different datasets. The first dataset is publicly available Bureau of Labor Statistics, and the second is Chest X-Ray Image Data.
Law and Ethics, Strata Business Summit
Duncan Ross (TES Global), Francine Bennett (Mastodon C)
Being good is hard. Being evil is fun and gets you paid more. Once more Duncan Ross and Francine Bennett explore how to do high-impact evil with data and analysis (and possibly AI). Make the maximum (negative) impact on your friends, your business, and the world—or use this talk to avoid ethical dilemmas, develop ways to deal responsibly with data, or even do good. But that would be perverse.
Case studies
Volker Schnecke (Novo Nordisk)
Today more than 650 million people worldwide are obese, and most of them will develop additional health issues during their lifetime. However, not all are at equal risk. In this session we will show how we mine Electronic Health Records (EHRs) of millions of patients for understanding the risk in people with obesity and for supporting the discovery of new medicines.
Data Science, Machine Learning & AI
Alun Biffin (Van Lanschot Kempen), David Dogon (Van Lanschot Kempen)
In this talk we describe how machine learning revolutionized the stock picking process for portfolio managers at Kempen Capital Management by filtering the vast small-cap, investment universe down to a handful of optimal stocks.
Strata Business Summit, Visualization and UX
Brian O'Neill (Designing for Analytics)
Gartner says 85%+ of big data projects will fail, despite the fact your company may have invested millions on engineering implementation. Why are customers and employees not engaging with these products and services? Brian O'Neill explains why a "people first, technology second" mission—a design strategy, in other words—enables the best UX and business outcomes possible.
Data Science, Machine Learning & AI, Visualization and UX
Michael Freeman (University of Washington)
Statistical and machine learning techniques are only useful when they're understood by decision makers. While implementing these techniques is easier than ever, communicating about their assumptions and mechanics is not. In this session, participants will learn a design process for crafting visual explanations of analytical techniques and communicating them to stakeholders.
Keynote
Ben Lorica (O'Reilly Media), Doug Cutting (Cloudera), Alistair Croll (Solve For Interesting)
Program Chairs, Ben Lorica, Alistair Croll, and Doug Cutting, welcome you to the first day of keynotes.
Law and Ethics, Strata Business Summit
Duncan Ross (TES Global), Giselle Cory (DataKind)
DataKind UK has been working in data for good since 2013 working with over 100 uk charities, helping them to do data science for the benefit of their users. Some of those projects have delivered above and beyond expectations - others haven't. In this session Duncan and Giselle will talk about how to identify the right data for good projects...
Data Engineering and Architecture
Felix Cheung (Uber)
Did you know that your Uber rides are powered by Apache Spark? Join Felix Cheung to learn how Uber is building its data platform with Apache Spark at enormous scale and discover the unique challenges the company faced and overcame.
Strata Business Summit
Peter Aiken (Data BluePrint, DAMA International, Virginia Commonwealth University)
Tutorial Please note: to attend, you must be registered for a Gold or Silver pass.
The presents a more operational perspective on the use of data strategy that is especially useful for organizations just getting started with data