Sep 23–26, 2019

Schedule: Data, Analytics, and AI Architecture sessions

Add to your personal schedule
9:00am - 5:00pm Monday, September 23 & Tuesday, September 24
Location: 1A 17
Jorge Lopez (Amazon Web Services), Radhika Ravirala (Amazon Web Services), Nikki Rouda (Amazon Web Services), Jesse Gebhardt (Amazon Web Services), Rajeev Chakrabarti (Amazon Web Services)
Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. Join the AWS team to learn how to incorporate serverless concepts into your big data architectures. You'll explore design patterns to ingest, store, and analyze your data as you build a big data application using AWS technologies such as S3, Athena, Kinesis, and more. Read more.
Add to your personal schedule
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 09
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Anurag Khandelwal (Yale University)
Arun Kejariwal, Karthik Ramasamy, and Anurag Khandelwal walk you through the landscape of streaming systems and examine the inception and growth of the serverless paradigm. You'll take a deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions and get a bird’s-eye view of the application domains where you can leverage Pulsar functions. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 09
Gowrishankar Balasubramanian (Amazon Web Services), Rajeev Srinivasan (Amazon Web Services)
Enterprises adopt cloud platforms such as AWS for agility, elasticity, and cost savings. Database design and management requires a different mindset in AWS when compared to traditional RDBMS design. Gowrishankar Balasubramanian and Rajeev Srinivasan explore considerations in choosing the right database for your use case and access pattern while migrating or building a new application on the cloud. Read more.
Add to your personal schedule
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 12/13
Mark Madsen (Teradata), Todd Walter (Archimedata)
Building a data lake involves more than installing Hadoop or putting data into AWS. The goal in most organizations is to build a multiuse data infrastructure that isn't subject to past constraints. Mark Madsen and Todd Walter explore design assumptions and principles and walk you through a reference architecture to use as you work to unify your analytics infrastructure. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 15/16
Navinder Pal Singh Brar (Walmart Labs)
Each week 275 million people shop at Walmart, generating interaction and transaction data. Navinder Pal Singh Brar explains how the customer backbone team enables extraction, transformation, and storage of customer data to be served to other teams. At 5 billion events per day, the Kafka Streams cluster processes events from various channels and maintains a uniform identity of a customer. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 25, 2019
Location: 1E 07/08
Paige Roberts (Vertica), Deepak Majeti (Vertica)
GoodData needed to autorecover from node failures and scale rapidly when workloads spiked on their MPP database in the cloud. Kubernetes could solve it, but it's for stateless microservices, not a stateful MPP database that needs hundreds of containers. Paige Roberts and Deepak Majeti detail the hurdles GoodData needed to overcome in order to merge the power of the database with Kubernetes. Read more.
Add to your personal schedule
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 23/24
Moty Fania (Intel)
Moty Fania details Intel’s IT experience of implementing a sales AI platform. This platform is based on streaming, microservices architecture with a message bus backbone. It was designed for real-time data extraction and reasoning and handles the processing of millions of website pages and is capable of sifting through millions of tweets per day. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 21/22
Swasti Kakker (LinkedIn), Manu Ram Pandit (LinkedIn), Vidya Ravivarma (LinkedIn)
Join Swasti Kakker, Manu Ram Pandit, and Vidya Ravivarma to explore what's offered by a flexible and scalable hosted data science platform at LinkedIn. It provides features to seamlessly develop in multiple languages, enforce developer best practices, governance policies, execute, visualize solutions, efficient knowledge management, and collaboration to improve developer productivity. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 10/11
As a steward for your enterprise’s data and digital transformation initiatives, you’re tasked with making the right choice. But before you can make those decisions, it’s important to understand what not to do when planning for your organization’s big data initiatives. Michael Stonebraker shares his top 10 big data blunders. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 08/10
James Tang (Walmart Labs), Yiyi Zeng (Walmart Labs), Linhong Kang (Walmart Labs)
James Tang, Yiyi Zeng, and Linhong Kang outline how Walmart provides a secure and seamless shopping experience through machine learning and large scale data analysis on centralized platform. Read more.
Add to your personal schedule
1:15pm1:55pm Wednesday, September 25, 2019
Location: 1A 15/16
Michael Noll (Confluent)
Would you cross the street with traffic information that's a minute old? Certainly not. Modern businesses have the same needs. Michael Noll explores why and how you can use Kafka and its growing ecosystem to build elastic event-driven architectures. Specifically, you look at Kafka as the storage layer, at Kafka Connect for data integration, and at Kafka Streams and KSQL as the compute layer. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1E 07/08
Tomer Levi (Fundbox)
Use of data workflows is a fundamental functionality of any data engineering team. Nonetheless, designing an easy-to-use, scalable, and flexible data workflow platform is a complex undertaking. Tomer Levi walks you through how the data engineering team at Fundbox uses AWS serverless technologies to address this problem and how it enables data scientists, BI devs, and engineers move faster. Read more.
Add to your personal schedule
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 21/22
Atul Gupte (Uber)
Uber is changing the way people think about transportation. As an integral part of the logistical fabric in 65+ countries around the world, it uses ML and advanced data science to power every aspect of the Uber experience—from dispatch to customer support. Atul Gupte and Nikhil Joshi explore how Uber enables teams to transform insights into intelligence and facilitate critical workflows. Read more.
Add to your personal schedule
2:55pm3:35pm Wednesday, September 25, 2019
Location: 1A 15/16
Weisheng Xie (Orange Financial), Jia Zhai (StreamNative)
As a fintech company of China Telecom with half of a billion registered users and 41 million monthly active users, risk control decision deployment has been critical to its success. Weisheng Xie and Jia Zhai explore how the company leverages Apache Pulsar to boost the efficiency of its risk control decision development for combating financial frauds of over 50 million transactions a day. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 23/24
Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)
As complexity of data systems has grown at Bayer, so has the difficulty to locate and understand what datasets are available for consumption. Naghman Waheed and John Cooper outline a custom metadata management tool recently deployed at Bayer. The system is cloud-enabled and uses multiple open source components, including machine learning and natural language processing to aid searches. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 15/16
Bas Geerdink (Aizonic)
Streaming analytics (or fast data processing) is the field of making predictions based on real-time data. Bas Geerdink presents a fast data architecture that covers many use cases that follow a "pipes and filters" pattern. This architecture can be used to create enterprise-grade solutions with a diversity of technology options. The stack is Kafka, Ignite, and Spark Structured Streaming (KISSS). Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 21/22
Chenzhao Guo (Intel), Carson Wang (Intel)
Shuffle in Spark requires the shuffle data to be persisted on local disks. However, the assumptions of collocated storage do not always hold in today’s data centers. Chenzhao Guo and Carson Wang outline the implementation of a new Spark shuffle manager, which writes shuffle data to a remote cluster with different storage backends, making life easier for customers. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1E 10/11
Alasdair Allan (Babilim Light Industries)
The arrival of a new generation of smart embedded hardware may cause the demise of large-scale data harvesting. In its place, smart devices will let us process data at the edge and extract insights without storing potentially privacy and GDPR infringing data. Join Alasdair Allan to learn why the current age where privacy is no longer "a social norm" may not long survive the coming of the IoT. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1E 12/13
Thiago Ribeiro (Griaule)
Brazil deployed a national biometric system to register all Brazilian voters using multiple biometric modalities and to ensure that a person does not enroll twice. This session highlights how a large-scale biometric system works, and what are the main architecture decisions that one has to take in consideration. Read more.
Add to your personal schedule
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1E 06
venkata gunnu (Comcast), Harish Doddi (Datatron)
Machine learning infrastructure is key to the success of AI at scale in enterprises, with many challenges when you want to bring machine learning models to a production environment, given the legacy of the enterprise environment. Venkata Gunnu and Harish Doddi explore some key insights, what worked, what didn't work, and best practices that helped the data engineering and data science teams. Read more.
Add to your personal schedule
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 15/16
Jing Huang (SurveyMonkey), Jesscia Mong (SurveyMonkey)
You're a SaaS company operating on a cloud infrastructure prior to the machine learning (ML) era and you need to successfully extend your existing infrastructure to leverage the power of ML. Jing Huang and Jessica Mong detail a case study with critical lessons from SurveyMonkey’s journey of expanding its ML capabilities with its rich data repo and hybrid cloud infrastructure. Read more.
Add to your personal schedule
1:15pm1:55pm Thursday, September 26, 2019
Location: 1E 07/08
Sushant Rao (Cloudera)
Jason Wang and Sushant Rao offer an overview of cloud architecture, then go into detail on core cloud paradigms like compute (virtual machines), cloud storage, authentication and authorization, and encryption and security. They conclude by bringing these concepts together through customer stories to demonstrate how real-world companies have leveraged the cloud for their big data platforms. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 23/24
Building a reliable big data platform is extremely challenging when it has to store and serve hundreds of petabytes of data in real time. Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs. Read more.
Add to your personal schedule
2:05pm2:45pm Thursday, September 26, 2019
Location: 1A 15/16
Davor Bonaci (Kaskada), Anand Madhavan (Narvar)
Narvar provides next-generation posttransaction experience for over 500 retailers. Karthik Ramasamy and Anand Madhavan take you on the journey of how Narvar moved away from using a slew of technologies for their platform and consolidated its use cases using Apache Pulsar. Read more.
Add to your personal schedule
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 15/16
Jonghyok Lee (SK Telecom), Chon Yong Lee (SK Telecom)
Jonghyok Lee Chon Yong Lee discuss T-CORE, SK Telecom’s monitoring and service analytics platform, which collects system and application data from several thousand servers and applications and provides a 3D visualization of the real-time status of the whole network. Join in to hear lessons learned during development. Read more.
Add to your personal schedule
3:45pm4:25pm Thursday, September 26, 2019
Location: 1E 07/08
Scott Castle (Sisense)
In this session, Scott Castle, General Manager at Sisense and former VP of Product at Periscope Data, will discuss lessons learned from scaling up Periscope Data to support incredibly large volumes of data and queries from its data teams. Read more.
Add to your personal schedule
3:45pm4:25pm Thursday, September 26, 2019
Location: 1A 23/24
Vitaliy Baklikov (DBS Bank), Dipti Borkar (Alluxio )
Vitaliy Baklikov and Dipti Borkar explore how DBS Bank built a modern big data analytics stack leveraging an object store even for data-intensive workloads like ATM forecasting and how it uses Alluxio to orchestrate data locality and data access for Spark workloads. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 26, 2019
Location: 1A 23/24
Supun Kamburugamuve (Indiana University)
Big data computing and high-performance computing (HPC) evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms increasingly embrace each other for data management and algorithms. Supun Kamburugamuve explores the possibilities and tools available for getting the best of HPC and big data. Read more.
Add to your personal schedule
4:35pm5:15pm Thursday, September 26, 2019
Location: 1E 10/11
Dean Wampler (Anyscale)
Dean Wampler dives into how (and why) to integrate ML into production streaming data pipelines and to serve results quickly; how to bridge data science and production environments with different tools, techniques, and requirements; how to build reliable and scalable long-running services; and how to update ML models without downtime. Read more.
  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires