Speaker Slides & Video: Big data conference & machine learning training

8 prerequisites of a graph query language

Mingxi Wu (TigerGraph)

Download slides (PDF)

Graph query language is the key to unleash the value from connected data. Mingxi Wu outlines the eight prerequisites of a practical graph query language, drawn from six years' experience dealing with real-world graph analytical use cases. Along the way, Mingxi compares GSQL, Gremlin, Cypher, and SPARQL, pointing out their respective pros and cons.

AI for social good: Saving the planet through data science

Ganes Kesari (Gramener)

View slides

Global environmental challenges have pushed our planet to the brink of disaster. Rapid advances in deep learning are placing immense power in the hands of consumers and enterprises. Ganes Kesari explains how this power can be marshaled to support environmental groups and researchers who need immediate assistance to address the rapid depletion of our rich biodiversity.

Architecting a data platform for enterprise use

Mark Madsen (Teradata), Todd Walter (Archimedata)

Download slides (PDF)

Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that is not subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure.

Architecting a data platform to support analytic workflows for scientific data

Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)

Download slides (PDF)

In upstream oil and gas, a vast amount of the data requested for analytics projects is scientific data: physical measurements about the real world. Historically, this data has been managed library style, but a new system was needed to best provide this data. Sun Maria Lehmann and Jane McConnell share architectural best practices learned from their work with subsurface data.

Big data analytics in the public cloud: Challenges and opportunities

Jian Zhang (Intel), Chendi Xue (Intel), Yuan Zhou (Intel)

Download slides (PPTX)

Jian Zhang, Chendi Xue, and Yuan Zhou explore the challenges of migrating big data analytics workloads to the public cloud (e.g., performance lost and missing features) and demonstrate how to use a new in-memory data accelerator leveraging persistent memory and RDMA NICs to resolve this issues and enable new opportunities for big data workloads on the cloud.

Build your own data lake with AWS Glue and Amazon Athena (sponsored by Amazon Web Services)

Damon Cortesi (Amazon Web Services)

Download slides (PDF)

Damon Cortesi demonstrates how to use AWS Glue and Amazon Athena to implement an end-to-end pipeline.

Building a sales AI platform: Key principles and lessons learned

Moty Fania (Intel)

Download slides (PDF)

Moty Fania shares his experience implementing a sales AI platform that handles processing of millions of website pages and sifts through millions of tweets per day. The platform is based on unique open source technologies and was designed for real-time data extraction and actuation.

Building a serverless big data application on AWS

Jorge Lopez (Amazon Web Services), Nikki Rouda (Amazon Web Services), Damon Cortesi (Amazon Web Services), Sven Hansen (Amazon Web Services), Manos Samatas (Amazon Web Services), Alket Memushaj (Amazon Web Services)

Download slides (1-PDF)

Download slides (2-PDF)

Download slides (3-PDF)

Download slides (4-PDF)

Download slides (5-PDF)

Download slides (6-PDF)

Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join in to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more.

Building data science capacity in your organization

Shingai Manjengwa (Fireside Analytics Inc.)

Download slides (PPTX)

Shingai Manjengwa shares insights from teaching data science to 300,000 online learners, second-career college graduates, and grade 12/6th form high school students, explaining how business leaders can increase data science skill sets across different levels and functions in an organization to create real and measurable value from data.

Continuous intelligence: Keeping your AI application in production

Arif Wider (ThoughtWorks), Emily Gorcenski (ThoughtWorks)

Download slides (PDF)

Machine learning can be challenging to deploy and maintain. Any delays in moving models from research to production mean leaving your data scientists' best work on the table. Arif Wider and Emily Gorcenski explore continuous delivery (CD) for AI/ML along with case studies for applying CD principles to data science workflows.

Continuous intelligence: Moving machine learning into production reliably

Danilo Sato (ThoughtWorks), Christoph Windheuser (ThoughtWorks)

Download slides (PDF)

Danilo Sato and Christoph Windheuser walk you through applying continuous delivery (CD), pioneered by ThoughtWorks, to data science and machine learning. Join in to learn how to make changes to your models while safely integrating and deploying them into production, using testing and automation techniques to release reliably at any time and with a high frequency.

Data catalogs are changing the nature of working with data (sponsored by Alation)

Debora Seys (.)

Download slides (PDF)

Deb Seys shares the results of a study that she oversaw at eBay in collaboration with the Kellogg School of Management at Northwestern University. Examining the work of 2,000 analysts and almost 80,000 queries, the study revealed that a data catalog can be used as a learning platform that increases analyst productivity and creates a more collaborative approach to discovery and innovation.

Data science at Deutsche Telekom: Predicting global travel patterns and network demand

Vaclav Surovec (Deutsche Telekom), Gabor Kotalik (Deutsche Telekom)

Download slides (PDF)

Knowledge of customers' location and travel patterns is important for many companies, including German telco service operator Deutsche Telekom. Václav Surovec and Gabor Kotalik explain how a commercial roaming project using Cloudera Hadoop helped the company better analyze the behavior of its customers from 10 countries and provide better predictions and visualizations for management.

Data science transformation: Transforming a traditional wealth manager to a cutting-edge data-driven company

Charlotte Werger (Van Lanschot Kempen)

Download slides (1-PDF)

Download slides (2-PDF)

Charlotte Werger outlines the components necessary to transform a traditional wealth manager into a data-driven business, paying special attention to devising and executing a transformation strategy by identifying key business subunits where automation and improved predictive modeling can result in significant gains and synergies.

Data-driven digital transformation and jobs: The new software hierarchy and ML

Robert Cohen (Economic Strategy Institute)

Download slides (JPG)

Robert Cohen discusses the skills that employers are seeking from employees in digital jobs, linked to the new software hierarchy driving digital transformation. Robert describes this software hierarchy as one that ranges from DevOps, CI/CD, and microservices to Kubernetes and Istio. This hierarchy is used to define the jobs that are central to data-driven digital transformation.

Dealing with data scarcity in natural language processing

Yves Peirsman (NLP Town)

View slides

In this age of big data, NLP professionals are all too often faced with a lack of data: written language is abundant, but labeled text is much harder to come by. Yves Peirsman outlines the most effective ways of addressing this challenge, from the semiautomatic construction of labeled training data to transfer learning approaches that reduce the need for labeled training examples.

Deep learning for fonts

Raghotham Sripadraj (Ericsson), Nischal Harohalli Padmanabha (Omnius)

View slides

Deep learning has enabled massive breakthroughs in offbeat tracks and has enabled better understanding of how an artist paints, how an artist composes music, and so on. Nischal Harohalli Padmanabha and Raghotham Sripadraj discuss their project Deep Learning for Humans and their plans to build a font classifier.

Deep learning for recommender systems

Oliver Gindele (Datatonic)

Download slides (PDF)

The success of deep learning has reached the realm of structured data in the past few years, where neural networks have been shown to improve the effectiveness and predictability of recommendation engines. Oliver Gindele offers a brief overview of such deep recommender systems and explains how they can be implemented in TensorFlow.

Deep learning for speech synthesis: The good news, the bad news, and the fake news

Scott Stevenson (.)

Download slides (PDF)

Modern deep learning systems allow us to build speech synthesis systems with the naturalness of a human speaker. While there are myriad benevolent applications, this also ushers in a new era of fake news. Scott Stevenson explores the danger of such systems and details how deep learning can also be used to build countermeasures to protect against political disinformation.

Deep learning with TensorFlow and Spark using GPUs and Docker containers

Thomas Phelan (HPE BlueData)

Download slides (PPT)

Organizations need to keep ahead of their competition by using the latest AI, ML, and DL technologies such as Spark, TensorFlow, and H2O. The challenge is in how to deploy these tools and keep them running in a consistent manner while maximizing the use of scarce hardware resources, such as GPUs. Thomas Phelan discusses the effective deployment of such applications in a container environment.

Disrupting data discovery

Mark Grover (Lyft)

View slides

Mark Grover discusses how Lyft has reduced the time it takes to discover data by 10 times by building its own data portal, Amundsen. Mark gives a demo of Amundsen, leads a deep dive into its architecture, and discusses how it leverages centralized metadata, PageRank, and a comprehensive data graph to achieve its goal. Mark closes with a future roadmap, unsolved problems, and collaboration model.

Evaluating cybersecurity defenses with a data science approach

Brennan Lodge (Goldman Sachs), Jay Kesavan (Bowery Analytics LLC)

Download slides (ZIP)

Cybersecurity analysts are under siege to keep pace with the ever-changing threat landscape. The analysts are overworked as they are bombarded with and burned out by the sheer number of alerts that they must carefully investigate. Brennan Lodge and Jay Kesavan explain how to use a data science model for alert evaluations to empower your cybersecurity analysts.

Executive Briefing: AWS technology trends—Data lakes and analytics

Nikki Rouda (Amazon Web Services)

Download slides (PDF)

Nikki Rouda shares key trends in data lakes and analytics and explains how they shape the services offered by AWS. Specific topics include the rise of machine-generated data and semistructured and unstructured data as dominant sources of new data, the move toward serverless, SPI-centric computing, and the growing need for local access to data from users around the world.

Executive Briefing: Big data in the era of heavy worldwide privacy regulations

Mark Donsky (Okera), Nikki Rouda (Amazon Web Services)

Download slides (PDF)

The implications of new privacy regulations for data management and analytics, such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Protection Act (CCPA), can seem complex. Mark Donsky and Nikki Rouda highlight aspects of the rules and outline the approaches that will assist with compliance.

Executive Briefing: Overview of data governance

Paco Nathan (derwen.ai)

View slides

Effective data governance is foundational for AI adoption in enterprise, but it's an almost overwhelming topic. Paco Nathan offers an overview of its history, themes, tools, process, standards, and more. Join in to learn what impact machine learning has on data governance and vice versa.

Executive Briefing: The hidden data scientists lurking in your company

Jack Norris (MapR Technologies)

Download slides (PDF)

Many companies delay addressing core improvements in increasing revenues, reducing costs and risk exposure by tying changes to a to-be-hired data scientist. Drawing on three customer examples, Jack Norris explains how to achieve excellent results faster by starting with domain experience and helping developers and analysts better leverage data with available and understandable analytics.

Executive Briefing: What it takes to use machine learning in fast data pipelines

Dean Wampler (Anyscale)

View slides

Your team is building machine learning capabilities. Dean Wampler demonstrates how to integrate these capabilities in streaming data pipelines so you can leverage the results quickly and update them as needed and covers challenges such as how to build long-running services that are very reliable and scalable and how to combine a spectrum of very different tools, from data science to operations.

Executive Briefing: Why managing machines is harder than you think

Pete Skomoroch (Workday)

Download slides (PDF)

In the next decade, companies that understand how to apply machine intelligence will scale and win their markets. Others will fail to ship successful AI products that matter to customers. Pete Skomoroch details how to combine product design, machine learning, and executive strategy to create a business where every product interaction benefits from your investment in machine intelligence.

Explainable machine learning in fintech

Eitan Anzenberg (Bill.com)

Download slides (PPTX)

Machine learning applications balance interpretability and performance. Linear models provide formulas to directly compare the influence of the input variables, while nonlinear algorithms produce more accurate models. Eitan Anzenberg explores a solution that utilizes what-if scenarios to calculate the marginal influence of features per prediction and compare with standardized methods such as LIME.

Fair, privacy-preserving, and secure ML

Mikio Braun (Zalando)

Download slides (PPTX)

Mikio Braun explores techniques and concepts around fairness, privacy, and security when it comes to machine learning models.

Finding your North Star

Cait O'Riordan (Financial Times)

Watch the keynote

The Financial Times hit its target of 1 million paying subscribers a year ahead of schedule. Cait O'Riordan discusses the North Star metric the company uses to drive subscriber growth, detailing how it's embedded across the organization and within the engineering and product teams she's responsible for.

From BI to big data; Or, There and back again

Francesco Mucio (Francescomuc.io)

Download slides (PDF)

Francesco Mucio shares the basic tools he and his team had to learn (or relearn) moving from the coziness of their database to the big world of Spark, cloud, distributed systems, and continuous applications. It was an unexpected journey that ended exactly where it started: with an SQL query.

From legacy to cloud: An end-to-end data integration journey

Max Schultze (Zalando SE)

Download slides (PDF)

Max Schultze details Zalondo's end-to-end data integration platform to serve analytical use cases and machine learning throughout the company, covering raw data collection, standardized data preparation (binary conversion, partitioning, etc.), user-driven analytics, and machine learning.

Getting ready for GDPR and CCPA: Securing and governing hybrid, cloud, and on-premises big data deployments

Mark Donsky (Okera), Ifigeneia Derekli (Cloudera), Lars George (Okera), Michael Ernest (Dataiku)

Download slides (PDF)

New regulations such as CCPA and GDPR are driving new compliance, governance, and security challenges for big data. Infosec and security groups must ensure a consistently secured and governed environment across multiple workloads. Mark Donsky, Ifigeneia Derekli, Lars George, and Michael Ernest share hands-on best practices for meeting these challenges, with special attention paid to CCPA.

Herding elephants: Seamless data access in a multicluster clouds

Pradeep Bhadani (Hotels.com), Elliot West (Hotels.com)

Download slides (PDF)

Travel platform Expedia Group likes to give its data teams flexibility and autonomy to work with different technologies. However, this approach generates challenges that cannot be solved by existing tools. Pradeep Bhadani and Elliot West explain how the company built a unified virtual data lake on top of its many heterogeneous and distributed data platforms.

How do you evolve your data infrastructure?

Neelesh Salian (Stitch Fix)

Download slides (PDF)

Developing data infrastructure is not trivial; neither is changing it. It takes effort and discipline to make changes that can affect your team. Neelesh Salian discusses how Stitch Fix's data platform team maintains and innovates its infrastructure for the company's data scientists.

How retailers can leverage data to stay competitive in an ever-changing digital landscape (sponsored by Data Reply)

Luca Piccolo (Data Reply), Michele Miraglia (Data Reply)

Download slides (PDF)

Retailers are facing a daunting challenge: remaining competitive in an ever-changing landscape that is becoming increasingly digital—which requires them to overcome rifts in internal systems and seamlessly leverage their data to generate business value. Luca Piccolo and Michele Miraglia outline Data Reply's approach, distilled while supporting retailers in successfully tackling these challenges.

How to mitigate mobile fraud risk by data analytics

SEONMIN KIM (LINE)

Download slides (PDF)

Seonmin Kim offers an introduction to activities that mitigate the risk of mobile payments through various data analytical skills, drawn from actual case studies of mobile frauds, along with tree-based machine learning, graph analytics, and statistical approaches.

Implementing enterprise data management in industrial and scientific organizations

Jane McConnell (Teradata), Sun Maria Lehmann (Equinor)

Download slides (PDF)

To succeed in implementing enterprise data management in industrial and scientific organizations and realize business value, the worlds of business data, facilities data, and scientific data—which have long been managed separately—must be brought together. Sun Maria Lehmann and Jane McConnell explore the cultural and organizational differences and the data management requirements to succeed.

India's data dilemma with India Stack

Sundeep Reddy Mallu (Gramener)

Download slides (PDF)

Answering the simple question of what rights Indian citizens have over their data is a nightmare. The rollout of India Stack technology-based solutions has added fuel to fire. Sundeep Reddy Mallu explains, with on-the-ground examples, how businesses and citizens in India's booming digital economy are navigating the India Stack ecosystem while dealing with data privacy, security, and ethics.

Insightful health: Amplifying intelligence in healthcare patient flow execution

Fabio Ferraretto, Claudia Regina Laselva (Albert Einstein Jewish Hospital)

Download slides (PDF)

Fabio Ferraretto and Claudia Regina Laselva explain how Hospital Albert Einstein and Accenture evolved patient flow experience and efficiency with the use of applied AI, statistics, and combinatorial math, allowing the hospital to anticipate E2E visibility within patient flow operations, from admission of emergency and elective demands to assignment and medical releases.

Learning "learning to rank"

Sophie Watson (Red Hat)

Download slides (PDF)

Identifying relevant documents quickly and efficiently enhances both user experience and business revenue every day. Sophie Watson demonstrates how to implement learning-to-rank algorithms and provides you with the information you need to implement your own successful ranking system.

Learning how to perform ETL data migrations with open source tool Embulk

Jason Bell (Independent Speaker)

Download slides (PPTX)

The Embulk data migration tool offers a convenient way to load data in to a variety of systems with basic configuration. Jason Bell offers an overview of the Embulk tool and outlines some common data migration scenarios that a data engineer could employ using the tool.

Leveraging metadata for automating delivery and operations of advanced data platforms

Peter Billen (Accenture)

Download slides (PDF)

Peter Billen explains how to use metadata to automate delivery and operations of a data platform. By injecting automation into the delivery processes, you shorten the time to market while improving the quality of the initial user experience. Typical examples include data profiling and prototyping, test automation, continuous delivery and deployment, and automated code creation.

LSTM-based time series anomaly detection using Analytics Zoo for Spark and BigDL

Guoqiong Song (Intel)

Download slides (PPTX)

Collecting and processing massive time series data (e.g., logs, sensor readings, etc.) and detecting the anomalies in real time is critical for many emerging smart systems, such as industrial, manufacturing, AIOps, and the IoT. Guoqiong Song explains how to detect anomalies in time series data using Analytics Zoo and BigDL at scale on a standard Spark cluster.

Making data science useful

Cassie Kozyrkov (Google)

Watch the keynote

Despite the rise of data engineering and data science functions in today's corporations, leaders report difficulty in extracting value from data. Many organizations aren’t aware that they have a blindspot with respect to their lack of data effectiveness, and hiring experts doesn’t seem to help. Join Cassie Kozyrkov to talk about how you can change that.

Making the future

John Burke

Watch the keynote

James Burke asks whether we can use big data and predictive analytics at the social level to take the guesswork out of prediction and make the future what we all want it to be. If so, this would give us the tools to handle what looks like being the greatest change to the way we live since we left the caves.

Mass production of AI solutions

Nate Keating (Google)

Download slides (PDF)

AI will change how we live in the next 30 years, but it's still currently limited to a small group of companies. In order to scale the impact of AI across the globe, we need to reduce the cost of building AI solutions, but how? Nate Keating explains how to apply lessons learned from other industries—specifically, the automobile industry, which went through a similar cycle.

Mastering data with Spark and machine learning

Sonal Goyal (Nube)

Download slides (PDF)

Enterprise data on customers, vendors, and products is often siloed and represented differently in diverse systems, hurting analytics, compliance, regulatory reporting, and 360 views. Traditional rule-based MDM systems with legacy architectures struggle to unify this growing data. Sonal Goyal offers an overview of a modern master data application using Spark, Cassandra, ML, and Elastic.

Migrating Apache Oozie workflows to Apache Airflow

Feng Lu (Google Cloud), James Malone (Google), Apurva Desai (Google Cloud), Cameron Moberg (Truman State University | Google Cloud)

Download slides (1-PDF)

View slides

Apache Oozie and Apache Airflow (incubating) are both widely used workflow orchestration systems, the former focusing on Apache Hadoop jobs. Feng Lu, James Malone, Apurva Desai, and Cameron Moberg explore an open source Oozie-to-Airflow migration tool developed at Google as a part of creating an effective cross-cloud and cross-system solution.

Mutant tests too: The SQL

Elliot West (Hotels.com), Jaydene Green (Hotels.com)

Download slides (PPTX)

Elliot West and Jay Green share approaches for applying software engineering best practices to SQL-based data applications to improve maintainability and data quality. Using open source tools, Elliot and Jay show how to build effective test suites for Apache Hive code bases and offer an overview of Mutant Swarm, a tool to identify weaknesses in tests and to measure SQL code coverage.

On the accountability of black boxes: How we can control what we can’t exactly measure

Yiannis Kanellopoulos (Code4Thought)

Download slides (PDF)

Black box algorithmic systems make decisions that have a great impact in our lives. Thus, the need for their accountability and transparency is growing. Code4Thought created an evaluation model reflecting the state of practice in several organizations. Yiannis Kanellopoulos explores this model and shares lessons learned from its application at a financial corporation.

Practicing data science: A collection of case studies

Rosaria Silipo (KNIME)

Download slides (PDF)

Rosaria Silipo shares a collection of past data science projects. While the structure is often similar—data collection, data transformation, model training, deployment—each required its own special trick, whether a change in perspective or a particular technique to deal with special case and special business questions.

Predicting real-time transaction fraud using supervised learning

Sami Niemi (Barclays)

Download slides (PDF)

Predicting transaction fraud of debit and credit card payments in real time is an important challenge, which state-of-art supervised machine learning models can help to solve. Sami Niemi offers an overview of the solutions Barclays has been developing and testing and details how well models perform in variety of situations like card present and card not present debit and credit card transactions.

Processing 10M samples a second to drive smart maintenance in complex IIoT systems

Geir Engdahl (Cognite), Daniel Bergqvist (Google)

Download slides (PDF)

Geir Engdahl and Daniel Bergqvist explain how Cognite is developing IIoT smart maintenance systems that can process 10M samples a second from thousands of sensors. You'll explore an architecture designed for high performance, robust streaming sensor data ingest, and cost-effective storage of large volumes of time series data as well as best practices learned along the way.

Reading China: Predicting policy change with machine learning

Weifeng Zhong (Mercatus Center at George Mason University)

View slides

Weifeng Zhong shares a machine learning algorithm built to “read” the People’s Daily (the official newspaper of the Communist Party of China) and predict changes in China’s policy priorities. The output of this algorithm, named the Policy Change Index (PCI) of China, turns out to be a leading indicator of the actual policy changes in China since 1951.

Real-time SQL stream processing at scale with Apache Kafka and KSQL

Robin Moffatt (Confluent)

View slides

Robin Moffatt walks you through the architectural reasoning for Apache Kafka and the benefits of real-time integration. You'll then build a streaming data pipeline using nothing but your bare hands, Kafka Connect, and KSQL.

Reinforcement learning: A gentle introduction and an industrial application

Christian Hidber (bSquare)

Download slides (PDF)

Reinforcement learning (RL) learns complex processes autonomously like walking, beating the world champion in Go, or flying a helicopter. No big datasets with the “right” answers are needed: the algorithms learn by experimenting. Christian Hidber shows how and why RL works and demonstrates how to apply it to an industrial hydraulics application with 7,000 clients in 42 countries.

Scaling Impala: Common mistakes and best practices

Manish Maheshwari (Cloudera)

Download slides (PDF)

Apache Impala is an MPP SQL query engine for planet-scale queries. When set up and used properly, Impala is able to handle hundreds of nodes and tens of thousands of queries hourly. Manish Maheshwari explains how to avoid pitfalls in Impala configuration (memory limits, admission pools, metadata management, statistics), along with best practices and anti-patterns for end users or BI applications.

Serverless for data and AI

Avner Braverman (Binaris)

Download slides (PDF)

What is serverless, and how can it be utilized for data analysis and AI? Avner Braverman outlines the benefits and limitations of serverless with respect to data transformation (ETL), AI inference and training, and real-time streaming. This is a technical talk, so expect demos and code.

Spark NLP in action: How Indeed applies NLP to standardize résumé content at scale

Alexander Thomas (John Snow Labs), Alexis Yelton (Indeed)

View slides

Alexander Thomas and Alexis Yelton demonstrate how to use Spark NLP and Apache Spark to standardize semistructured text, illustrated by Indeed's standardization process for résumé content.

Stream, stream, stream: Different streaming methods with Spark and Kafka

Itai Yaffe (Nielsen)

Download slides (PPT)

NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way. Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals.

Sustaining machine learning in the enterprise

Ben Lorica (O'Reilly)

Watch the keynote

Keynote with Ben Lorica

Synthetic video generation: Why seeing should not always be believing

Alexander Adam (Faculty)

Download slides (PDF)

The advent of "fake news" has led us to doubt the truth of online media, and advances in machine learning give us an even greater reason to question what we are seeing. Despite the many beneficial applications of this technology, it's also potentially very dangerous. Alex Adam explains how synthetic videos are created and how they can be detected.

The changing face of ETL: Event-driven architectures for data engineers

Robin Moffatt (Confluent)

View slides

Robin Moffatt discusses the concepts of events, their relevance to software and data engineers, and their ability to unify architectures in a powerful way. Join in to learn why analytics, data integration, and ETL fit naturally into a streaming world. Along the way, Robin will lead a hands-on demonstration of these concepts in practice and commentary on the design choices made.

The digital truth and the physical twin

Simon Moritz (Ericsson)

Download slides (PDF)

The truth is no longer what you see with your eyes; the truth is in the digital sphere, where it only sometimes needs a physical twin. After all, what's the need for a road sign along the street if the information is already in the car? Simon Moritz details how the Fourth Industrial Revolution is transforming companies and business models as we know it.

The enterprise data cloud

Mick Hollison (Cloudera)

Watch the keynote

The last decade has seen incredible changes in our technology. The advent of big data and powerful new analytic techniques, including machine learning and AI, means that we understand the world in ways that were simply impossible before. The simultaneous explosion of public cloud services has fundamentally changed our expectations of technology: it should be fast, simple, and flexible to use.

The Lyft data platform: Now and in the future

Mark Grover (Lyft), Deepak Tiwari (Lyft)

View slides

Lyft’s data platform is at the heart of the company's business. Decisions from pricing to ETA to business operations rely on Lyft’s data platform. Moreover, it powers the enormous scale and speed at which Lyft operates. Mark Grover and Deepak Tiwari walk you through the choices Lyft made in the development and sustenance of the data platform, along with what lies ahead in the future.

The Presto Cost-Based Optimizer for interactive SQL on anything

Wojciech Biela (Starburst), Piotr Findeisen (Starburst)

Download slides (PDF)

Presto is a popular open source–distributed SQL engine for interactive queries over heterogeneous data sources (Hadoop/HDFS, Amazon S3, Azure ADSL, RDBMS, NoSQL, etc). Wojciech Biela and Piotr Findeisen offer an overview of the Cost-Based Optimizer (CBO) for Presto, which brings a great performance boost. Join in to learn about CBO internals, the motivating use cases, and observed improvements.

The unreasonable effectiveness of transfer learning on NLP

David Low (Pand.ai)

Download slides (PDF)

Transfer learning has been proven to be a tremendous success in computer vision—a result of the ImageNet competition. In the past few months, there have been several breakthroughs in natural language processing with transfer learning, namely ELMo, OpenAI Transformer, and ULMFit. David Low demonstrates how to use transfer learning on an NLP application with SOTA accuracy.

The unstoppable rise of white box data

Chris Taggart (OpenCorporates)

Download slides (PDF)

Chris Taggart explains the benefits of white box data and outlines the structural shifts that are moving the data world toward it.

The vegan data diet: How Wikipedia cuts down privacy issues while keeping data fit

Marcel Ruiz Forns (Wikimedia Foundation)

Download slides (PDF)

Analysts and researchers studying Wikipedia are hungry for long-term data to build experiments and feed data-driven decisions. But Wikipedia has a strict privacy policy that prevents storing privacy-sensitive data over 90 days. Marcel Ruiz Forns explains how the Wikimedia Foundation's analytics team is working on a vegan data diet to satisfy both.

There's something about data…

Martin Leijen (Rabobank / Digital Transformation Office)

Download slides (ZIP)

Martin Leijen discusses how Rabobank created a data and intelligence lab as an enabler for data and business domains to accelerate in using AI and Advanced Analytics.

Transforming a financial services data infrastructure for the modern era by building a PCI DSS-compliant data platform from the ground up on AWS

Eoin O'Flanagan (NewDay), Darragh McConville (Kainos)

Download slides (PDF)

Eoin O'Flanagan and Darragh McConville explain how NewDay built a high-performance contemporary data processing platform from the ground up on AWS. Join in to explore the company's journey from a traditional legacy onsite data estate to an entirely cloud-based PCI DSS-compliant platform.

Unleashing Apache Kafka and TensorFlow in hybrid architectures

Kai Wähner (Confluent)

Download slides (PDF)

How do you leverage the flexibility and extreme scale of the public cloud and the Apache Kafka ecosystem to build scalable, mission-critical machine learning infrastructures that span multiple public clouds—or bridge your on-premises data center to the cloud? Join Kai Wähner to learn how to use technologies such as TensorFlow with Kafka’s open source ecosystem for machine learning infrastructures.

Using electronic health records to predict health risks associated with obesity

Volker Schnecke (Novo Nordisk)

Download slides (PDF)

Today, more than 650 million people worldwide are obese, and most of them will develop additional health issues during their lifetime. However, not all are at equal risk. Volker Schnecke discusses how Novo Nordisk mines the electronic health records (EHRs) of millions of patients to understand the risk in people with obesity and to support the discovery of new medicines.

Why is it so hard to do AI for good?

Duncan Ross (Times Higher Education), giselle cory (DataKind UK)

View slides

DataKind UK has been working in data for good since 2013, helping over 100 UK charities to do data science for the benefit of their users. Some of those projects have delivered above and beyond expectations; others haven't. Duncan Ross and Giselle Cory explain how to identify the right data for good projects and how this can act as a framework for avoiding the same problems across industry.

Your data strategy: It should be concise, actionable, and understandable by business and IT

Peter Aiken (Data BluePrint | DAMA International | Virginia Commonwealth University)

Download slides (PDF)

Peter Aiken offers a more operational perspective on the use of data strategy, which is especially useful for organizations just getting started with data

Speaker slides & video

Sponsorship Opportunities

Partner Opportunities

Contact Us