20–23 April 2020

Speakers

Hear from innovative programmers, talented managers, and senior executives who are doing amazing things with data and AI. More speakers will be announced; please check back for updates.

Grid viewList view

Nutsa Abazadze is Data Scientist at TBC Bank. Her main responsibilities include Analyzing and modeling customers’ data from different perspectives and delivering high quality reports to decision makers.
Before joining TBC Bank, Nutsa worked in market research industry and applied machine learning models to survey data.
Nutsa holds a master’s degree in survey statistics from the University of Bamberg, Germany. In frames of her master thesis she developed an internal R package for questionnaire modularization for one of the biggest market research companies.

Presentations

How a Failed Machine Learning Exercise Increased Deposit Profitability by 20% Session

We will tell you how our failed attempt to build an ML model brought us to discovering institutional problems and kicked off improvement of existing business processes so that we would collect quality data for future modeling; and how we still managed to increase deposit profitability by 20% in the process.

Senior Data Engineer at the BBC, Tatiana is passionate about open-source, data-driven problem-solving and Python. As part of the Datalab team, she contributes to the development of recommendation systems and connects the data across the organisation. Tatiana has over 15 years of experience developing various software applications, including tridimensional image processing and educational platforms.

Presentations

Taming recommendation systems workflows with Apache Airflow Session

During the last year, the BBC Datalab team has adopted Apache Airflow to improve its recommendation model lifecycle and data processing pipeline. This talk presents lessons learned and includes practical examples, achievements and challenges. It also promotes critical thinking so the audience can be empowered to decide when to use Airflow.

Alasdair Allan is a scientist, author, hacker, maker, and journalist. An expert on the internet of things and sensor systems, he’s famous for hacking hotel radios, deploying mesh networked sensors through the Moscone Center during Google I/O, and for being behind one of the first big mobile privacy scandals when, back in 2011, he revealed that Apple’s iPhone was tracking user location constantly. He has written eight books, and writes regularly for Hackster.io, Hackaday, and other outlets. A former astronomer, he also built a peer-to-peer autonomous telescope network that detected what was, at the time, the most distant object ever discovered.

Presentations

Benchmarking Machine Learning at the Edge Session

The future of machine learning is on the edge and on small embedded devices. Over the last year custom silicon, intended to speed up machine learning inferencing on the edge, has started to appear. No cloud needed. We evaluate the new silicon, looking not just at inferencing speed, but also at heating, cooling, and the overall power envelope needed to run it.

Coming from neurosciences, I turn into data science and more specifically the development of AI’s based on either time-series data or images. These developments often include realtime management and embarked inference relying on new generation cards such as NVidia Jetson or Intel OpenVino.
I work on different use cases with railroad companies, or french army for example.
My knowledge in neurosciences often help finding ideas on artificial architecture that are at the edge of actual research.
I co-authored a book called “Apprendre demain” in french only for the moment. This book aims at depticiting AI and neurosciences as disciplines that has to work together and interact more.

Presentations

Dealing with time-series data Session

Time series are a particular type of data for one purpose, time. Because of this single property, time-series needs a very specific kind of neural network that necessitates memory. This presentation will first make an overview of what time series are and their properties. Finally we’ll have a brief introduction to recurrent neural nets a particular architecture designed for this purpose.

Shradha Ambekar is a staff software engineer in the Small Business Data Group at Intuit, where she’s the technical lead for lineage framework (SuperGLUE), real-time analytics, and has made several key contributions in building solutions around the data platform, and she contributed to spark-cassandra-connector. She has experience with HDFS, Hive, MapReduce, Hadoop, Spark, Kafka, Cassandra, and Vertica. Previously, she was a software engineer at Rearden Commerce. Shradha spoke at the O’Reilly Open Source Conference in 2019. She holds a bachelor’s degree in electronics and communication engineering from NIT Raipur, India.

Presentations

Always accurate business metrics through Lineage based Anomaly Tracking Session

Imagine a business metric showing a sudden spike. Debugging data pipelines is non-trivial and finding the root cause can take hours to days! We’ll share how Intuit built a self-serve tool that automatically discovers data pipeline lineage and applies anomaly detection to proactively detect and help debug issues in minutes–establishing trust in metrics and improve developer productivity by 10-100X.

Optimizing Analytical Queries on Cassandra by 100X Session

Data Analysis at scale with fast query response is critical for business needs.Cassandra is a popular datastore used in streaming applications.Cassandra with Spark integration allows running analytical workload but can be slow.Shradha will describe similar challenges faced at Intuit and solutions her team implemented to improve performance by 100X.She also contributed to spark-cassandra-connector.

Janisha Anand is a senior business development manager for data lakes at AWS, where she focuses on designing, implementing, and architecting large-scale solutions in the areas of data management, data processing, data architecture, and data analytics.

Presentations

Build a serverless data lake for analytics 1-day training

Learn how to build a serverless data lake on AWS. In the workshop you'll ingest Instacart's online grocery shopping public dataset to the data lake and draw valuable insights on consumer shopping trends. You’ll build data pipelines, leverage data lake storage infrastructure, configure security and governance policies, create a persistent catalog of data, perform ETL, and run ad-hoc analysis.

Eitan is currently the Director of Data Science at Bill.com and has many years of experience as a scientist and researcher. His recent focus is in machine learning, deep learning, applied statistics and engineering. Before, Eitan was a Postdoctoral Scholar at Lawrence Berkeley National Lab, received his PhD in Physics from Boston University and his B.S. in Astrophysics from University of California Santa Cruz. Eitan has 2 patents and 11 publications to date and has spoken about data at various conferences around the world.

Presentations

Beyond OCR: Using deep learning to understand documents Session

Although the field of optical character recognition (OCR) has been around for half a century, document parsing and field extraction from images remains an open research topic. We utilize an end-to-end deep learning architecture that leverages document understanding to extract fields of interest.

Antje Barth is a technical evangelist for AI and machine learning at AWS and is based in Düsseldorf, Germany. She has been working with Kubeflow since 2018. In addition, Antje regularly speaks at ML/AI conferences across the world. She is also passionate about helping developers leverage Big Data, containers, and Kubernetes platforms in the context of AI and machine learning. Previously, Antje worked in technical evangelist and solutions engineering roles at MapR and Cisco. Antje is a cofounder of the Düsseldorf chapter of Women in Big Data.

Presentations

Closing the loop: Continuous Machine Learning using Kubeflow Session

Many machine learning systems focus primarily on training models, but leave the users with the task of deploying and re-training their models. In this talk, we’ll discuss the importance of Continuous Machine Learning for improving model performance, and present practical approaches to building continuous model training pipelines using Kubeflow.

Jason Bell specializes in high-volume streaming systems for large retail customers, using Kafka in a commercial context for the last five years. Jason was section editor for Java Developer’s Journal, has contributed to IBM developerWorks on autonomic computing, and is the author of Machine Learning: Hands On for Developers and Technical Professionals.

Presentations

Migrating from Apache Kafka to Apache Pulsar - The Planning and the Reality Session

Apache Pulsar gives us the same robust realtime messaging capabilities as Kafka. In this talk Jason Bell looks at the challenges of migrating from an existing Kafka cluster to Apache Pulsar, what considerations to make with brokers, topics, retention, consumers and producers.

Giacomo Bernardi is Distinguished Engineer at Extreme Networks, where he works on multiple science-heavy project for traffic engineering and network traffic visibility analytics. He leads a global team of data scientist and machine learning engineers. Giacomo is a self-proclaimed networking nerd and was CTO of a large Internet Service Provider where he built a custom software-defined platform. He holds a PhD in Wireless Networking from the University of Edinburgh (UK), a MSc from Trinity College Dublin (Ireland) and a BSc from the University of Milan (Italy).

Presentations

What do machines say when nobody’s looking? Tracking IoT security with NLP Session

Machines talk among them! What can we learn about their behaviour by analysing their "language"? In this talk we present a lightweight approach for securing large IoT deployments by leveraging modern Natural Language Processing techniques. Rather than attempting cumbersome firewall rules, we argue that IoT deployments can be efficiently secured by online behavioural modelling.

Rajesh Shreedhar Bhat is working as a Data Scientist at Walmart Labs, Bangalore. His work is primarily focused on building reusable machine/deep learning solutions that can be used across various business domains at Walmart. He completed his Bachelor’s degree from PESIT, Bangalore and currently pursuing his MS in CS with ML specialization from Arizona State University.
He has a couple of research publications in the field of NLP and vision, which are published at top tier conferences such as CoNLL, ASONAM, etc.. and he has filed 6 US patents in Retail space leveraging AI & ML. He is a Kaggle Expert(World Rank 966/122431) with 3 silver and 2 bronze medals and has been a speaker in highly recognized conferences/meetups such as Data Hack Summit, India’s Largest Applied Artificial Intelligence & Machine Learning Conference, Kaggle days meet up – Senior Track, etc ..
Apart from this, Rajesh is a mentor for Udacity Deep learning & Data Scientist Nanodegree programs for the past 3 years and has conducted ML & DL workshops in GE Healthcare, IIIT Kancheepuram and many other places.

Presentations

Attention Networks all the way to production using Kubeflow 1-day training

With the latest developments and improvements in the field of deep learning and artificial intelligence, many demanding natural language processing tasks become easy to implement and execute. Text summarization is one of the tasks that can be done using attention networks.

Satadal Bhattacharjee is Principal Product Manager with AWS AI. He leads the Machine Learning Engine PM team working on projects such as SageMaker, optimizing/enhancing machine learning frameworks, and AWS Deep Learning Containers/AMIs. For fun outside work, Satadal loves to hike, coach robotics teams, and spend time with his family and friends.

Presentations

Using Amazon SageMaker to build, train and deploy ML models 1-day training

In this workshop, attendees will build, train and deploy a deep learning model on Amazon SageMaker and they will learn how to use some of the latest SageMaker features such as SageMaker Debugger and SageMaker Model Monitor.

Wojciech Biela is a co-founder of Starburst, where he’s responsible for engineering and product development. He has over 15 years’ experience building products and running engineering teams. Previously, Wojciech was the engineering manager at the Teradata Center for Hadoop, running the Presto engineering operations in Warsaw, Poland; built and ran the Polish engineering team for a subsidiary of Hadapt, a pioneer in the SQL-on-Hadoop space (acquired by Teradata in 2014); and built and led teams on multiyear projects from custom big ecommerce and SCM platforms to POS systems. Wojciech holds an MS in computer science from the Wroclaw University of Technology.

Presentations

Presto on Kubernetes: Query Anything, Anywhere Session

Presto, the open source SQL engine for Big Data, offers high concurrency, low-latency queries across multiple data sources within one query. With Kubernetes, you may easily deploy and manage Presto clusters across hybrid and multi cloud environment with built-in high availability, autoscaling and monitoring. Available now on RedHat OpenShift and Kubernetes Engines from AWS, Google Cloud, Azure.

Dr. Marcel Blattner is Chief Data Scientist at Tamedia, Switzerland. He is responsible for developing an analytical stack within the Tamedia end-to-end architecture to facilitate new insights from data benefiting all stakeholders. Blattner holds a Ph.D. in physics.

Presentations

The black box problem Session

We still lack a clear understanding of how deep learning neural networks learn. Theoretical physics can provide some tools to gain more intuition and insights about generalization and model robustness. I this talk I provide an overview of ongoing research and first promising and applicable results.

- serial CEO/CTO/VP Engineering/co-founder
- initiated and participated in five technology standards (including founding LTI Resource Search)
- wrote what may be first book on applied neural networks (Neural Networks in C++, Wiley. 1992)
- continually active open source contributor (Python, Go, Ruby)
- professor, UC Berkeley, Carnegie Mellon

Presentations

Automating AutoML: How Automated Building of Machine Learning Models Transforms Softwar Session

First generation AutoML was targeted to business analysts and "citizen data scientists": upload data to the service, watch the leaderboard, pick a winning model. Second generation of AutoML (from Microsoft, Google and updates to earlier AutoML tools) is targeted to developers and covers the full AutoML lifecycle. We show how such tool transform applications by replacing logic with predictions

Dr. Hugo Bowne-Anderson is a data scientist and educator at DataCamp and host of the DataCamp podcast DataFramed. He has worked in applied math research in cell biology at Yale University and the Max Planck Institute for Cell Biology and Genetics, after receiving his PhD in Pure Mathematics at the University of New South Wales. He joined DataCamp three years ago to build out their foundational data science curriculum in Python and his main interests now are promoting data & AI literacy & fluency, helping to spread data skills through organizations.

Presentations

Essential Math and Statistics for Data Science 2-Day Training

In this training, attendees will learn the basics of the math and stats they need to know to do data science and interpret their results correctly (the calculus, linear algebra, statistical intuition, probabilistic thinking, among others) through hands-on examples from machine learning, online experiments and hypothesis testing, natural language processing, data ethics, and more.

Essential Math and Statistics for Data Science (Day 2) Training Day 2

In this training, attendees will learn the basics of the math and stats they need to know to do data science and interpret their results correctly (the calculus, linear algebra, statistical intuition, probabilistic thinking, among others) through hands-on examples from machine learning, online experiments and hypothesis testing, natural language processing, data ethics, and more.

Yaakov Bressler is a Data Scientist with Dramatic Solutions, as well as a Theatre Producer. His works include Magic the Play, and Jung and Crazy. He has extensive consulting experience in data science, and utilizes advanced mathematics and sophisticated algorithms to tackle complicated problems. In his theatre work, he is drawn to tackling societal issues. It is in both these roles straddling both worlds that have allowed him to improve the communication and accessibility of advanced analytical practices to business leaders within entertainment and the arts.

Presentations

Dynamic Pricing for Broadway and the West End Session

Dynamic pricing, adjusting a price to meet its market value, implemented properly by Broadway, the West End, and smaller Theatres, shows promise of increasing revenue while selling more tickets and lowering prices. Yaakov Bressler and Kelly Carmody discuss their work proving the statistics behind dynamic pricing using probability distributions and a variety of modelling techniques in Python.

Patrick Buehler is a principal data scientist in the Cloud AI Group at Microsoft. He has over 15 years of working experience in academic settings and with various external customers spanning a wide range of computer vision problems. He earned his PhD from Oxford in computer vision with Andrew Zisserman.

Presentations

Solving real-world computer vision problems with open source Session

Training and deployment of deep neural networks for computer vision (CV) in realistic business scenarios remains a challenge for both data scientists and engineers. Angus Taylor and Patrick Buehler dig into state-of-the-art in the CV domain and provide resources and code examples for various CV tasks by leveraging the Microsoft CV best-practices repository.

Alberto Calleja is a software engineer interested in building products people love in agile environments with a focus on high-quality tests and clean code. At the moment, I am part of the Spring Engineering team at Pivotal working from Seville, Spain on a fully remote team. We are building Spring Cloud related products and frameworks to help people adopting a microservices architecture and improving the experience of Spring in Cloud Foundry and Kubernetes. I like to focus on Reliability, Continuous Delivery, and Testing tasks and I mainly commit to Java Open Source projects.

Presentations

Kubernetes Distilled - An in depth guide for the busy data engineer 1-day training

Today's data engineer needs a deep understanding of the key tools and concepts within the vast, rapidly evolving Kubernetes ecosystem. This training will provide developers with a thorough grounding on Kubernetes concepts, suggest best practices and get hands-on with some of the essential tooling. Topics will include

Kelly Carmody is a Data Scientist with an interdisciplinary background in neuroscience, sociology, and epidemiology. Lately, she has been involved with Dramatic Solutions in New York City, where she gives workshops and talks for the Theatre community on how to increase the profitability, accessibility, and quality of Theatre by implementing innovative Tech and Analytics solutions. She has a range of research experience at institutions ranging from the University of the Virgin Islands to Columbia University, and received her Master’s degree in Infectious Disease Control from the London School of Hygiene and Tropical Medicine.

Presentations

Dynamic Pricing for Broadway and the West End Session

Dynamic pricing, adjusting a price to meet its market value, implemented properly by Broadway, the West End, and smaller Theatres, shows promise of increasing revenue while selling more tickets and lowering prices. Yaakov Bressler and Kelly Carmody discuss their work proving the statistics behind dynamic pricing using probability distributions and a variety of modelling techniques in Python.

Wei-Chiu Chuang, Ph.D., Software Engineer

Wei-Chiu joined Cloudera in 2015 as a software engineer, where he is responsible for development of Cloudera’s storage systems, mostly the Hadoop Distributed File System (HDFS). He is an Apache Hadoop Committer/Project Management Committee member for his contribution in the open source project. He is also a co-founder of Taiwan Data Engineering Association, a non-profit organization promoting better Data Engineering technologies and applications in Taiwan. Wei-Chiu received his Ph.D. in Computer Science from Purdue University for his research in distributed systems and programming models.

Presentations

Distributed Tracing in Apache Hadoop Session

Distributed tracing is a well known technique for identifying where failures occur and the reason behind poor performance, especially for complex systems like Hadoop which involves many different components. We are small team at Cloudera working on integrating OpenTracing in Hadoop ecosystem. We would like to present a demo of our current work and talk about our future integration plan.

Dr. Maurice Coyle is Trūata’s Chief Data Scientist. He has more than 15 years of experience building innovative technology solutions that deliver improved experiences while respecting user privacy. Maurice’s deep technical and academic expertise gained during his Ph.D. and post-doctoral studies are complemented by a wealth of commercial expertise gained from co-founding and leading a tech startup as CEO.

Presentations

Data Privacy Mythbusting Session

Is customer trust dead? Trūata’s Chief Data Scientist Dr. Maurice Coyle looks at this question and explores some of the myths around the usage of personal data and consumer privacy. This session will debunk some of the most common data privacy myths as well as sharing valuable insights into the effective use of data for insights-driven organizations.

Robert Crowe is a data scientist and TensorFlow Developer Advocate at Google with a passion for helping developers quickly learn what they need to be productive. He’s used TensorFlow since the very early days and is excited about how it’s evolving quickly to become even better than it already is. Previously, Robert deployed production ML applications and led software engineering teams for large and small companies, always focusing on clean, elegant solutions to well-defined needs. In his spare time, Robert sails, surfs occasionally, and raises a family.

Presentations

From Research to Production - Lessons that Google has learned Session

Production ML must address issues of modern software methodology, as well as issues unique to ML. Different types of ML have different requirements, often driven by the different data lifecycles and sources of ground truth. Implementations often suffer from limitations in modularity, scalability, and extensibility. We discuss production ML applications, and review TensorFlow Extended (TFX).

Walid Daboubi is Cyber Data Scientist at Richemont Group where he develops threat hunting solutions by applying machine learning and advanced data analytics on cyber resilience, for example, a malware detection project using deep autoencoder neural network. he has previously worked on cloud security development at Dassault Systemes. he has a Masters in Computer Science from Université de Technologie de Compiègne.

he has presented at a number of internal and external conferences on machine learning use cases.

Presentations

Hunting With AI: A guide to proactive Incident Response Session

Traditional cybersecurity processes are by definition reactive, in that they are based on a set of rules. In this session, we will share how we made our cybersecurity approach more proactive by applying machine learning on a set of concrete use cases.

Robert is a Senior Manager in Accenture’s Global Innovation Centre, The Dock. He is responsible for the AI & Data Engineering teams, which build AI systems for Accenture business units & Accenture customers.

Presentations

Cloud Native Machine Learning Session

A look into building, training & deploying Machine Learning & Deep Learning Models on the main cloud platforms (AWS, Azure, GCP) and agnostically.

Ted Dunning is the chief technology officer at MapR, an HPE company. He’s also a board member for the Apache Software Foundation; and a PMC member and mentor for many other Apache projects. Ted has years of experience with machine learning and other big data solutions across a range of sectors. He’s contributed to clustering, classification, and matrix decomposition algorithms and designed the t-digest algorithm used in several open source projects and by a variety of companies. Previously, Ted was chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems and built fraud-detection systems for ID Analytics (LifeLock). Ted has coauthored a number of books on big data topics, including several published by O’Reilly related to machine learning, and has 24 issued patents to date plus a dozen pending. He holds a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.

Presentations

Building Real-World Data Pipelines Session

Data pipelines are fast becoming a standard fixture in modern systems, but how to build and maintain them isn't nearly as widely known as, say, building a data warehouse. I will describe the core building blocks of such pipelines and how to use tools such as TensorFlow (extended), scikit, Apache Flink and Apache Beam to build, maintain and monitor them.

Yoav Einav is vice president of product at GigaSpaces, where he drives product management, technology vision, and go-to-market activities. Yoav has more than 12 years of industry experience in product management and software engineering at high-growth software companies. Previously, Yoav held product management roles at Iguazio and Qwilt, mapping the product strategy and roadmap while providing technical leadership regarding architecture and implementation. An entrepreneur at heart, Yoav drives innovation and product excellence and successfully incorporates it with the market trends and business needs. He holds a BSC (magna cum laude) in computer science and business from Tel Aviv University and an MBA in finance from the Leon Recanati School in Tel Aviv University.

Presentations

Visualize your Operational Data, Analytics and Machine Learning Insights in Real Time Session

More enterprises are using big data for better business decision-making but existing infrastructure lacks the needed performance and scale to support growing requirements for real-time analysis and visualization of operational data. This session will propose how enterprises can achieve * BI Visualization on fresh data for real-time dashboards * Low latency response time when generating reports

Aparna Elangovan is a AIML Prototyping Engineer with AWS. She designs deep learning solutions in Computer Vision and Natural Language Processing on AWS.

Presentations

Using Amazon SageMaker to build, train and deploy ML models 1-day training

In this workshop, attendees will build, train and deploy a deep learning model on Amazon SageMaker and they will learn how to use some of the latest SageMaker features such as SageMaker Debugger and SageMaker Model Monitor.

Jeff joined the engineering team at StreamSets in 2016 to help build their state of the art data operations platform. Besides the obvious – developing new features and fixing bugs – he also engages actively in the StreamSets community channels, fleshes out technical designs for numerous projects, and spends many long hours debugging thorny customer issues.

Prior to joining StreamSets, he worked for a decade in the financial industry under a variety of technology roles. He also spent a few years building student safety solutions in the education technology sector.

Presentations

Implementing Slowly Changing Dimensions on Spark Session

Spark is a powerful tool for data processing, but can it do slowly changing dimensions? The answer is yes, with some thoughtful use of its capabilities. And thanks to Spark’s built-in features, we aren’t limited to databases when it comes to handling deltas and persisting historical changes in records. Live demos will be included throughout to help reinforce the concepts discussed.

Narrative strategist Ella has experience working with organisations like Co-op Digital, the Government Digital Service and Google to create and execute compelling stories about people and products. She was once, and only once, described as GDS’s ‘voice of the internet’ by someone who she owes many beers and a minor factcheck. Holding a PhD, she’s worked in teams that have gotten funding, had funding cut, and sometimes won awards, like the Interaction Design Association (IxDA) future voices award in 2016.

Presentations

Data design patterns that people trust Session

People care about how data about them is used. Building trust with consumers will require a change in how services treat data. Since 2016, IF has curated a data patterns catalogue which is used by product teams around the world. We’ll show how patterns help teams build digital services that give people agency over data, build trust and start addressing systemic balances of power.

Brandy Freitas is a principal data scientist at Pitney Bowes, where she works with clients in a wide variety of industries to develop analytical solutions for their business needs. Brandy is a research-physicist-turned-data-scientist based in Boston, Massachusetts. Her academic research focused primarily on protein structure determination, applying machine learning techniques to single-particle cryoelectron microscopy data. Brandy is a National Science Foundation Graduate Research Fellow and a James Mills Pierce Fellow. She holds an undergraduate degree in physics and chemistry from the Rochester Institute of Technology and did her graduate work in biophysics at Harvard University.

Presentations

Enhancing Machine Learning with Graph Native Algorithms: Introduction to Graph Analytics Session

In this session, Brandy Freitas will cover the interplay between graph analytics and machine learning, improved feature engineering with graph native algorithms, and outline the current use of graph technology in industry.

Laura Froelich is a data scientist at DHI Water & Environment, where she is dedicated to utilizing data to discover patterns and underlying structure to enable optimization of businesses and processes, particularly through deep learning methods. Before that, she worked on a large variety of projects covering industries spanning life sciences to the energy industry at Teradata. Previously, Laura was part of a research group investigating nonspecific effects of vaccines using survival analysis methods. Laura holds a PhD from the Technical University of Denmark. For her dissertation, Decomposition and Classification of Electroencephalography Data, Laura used unsupervised decomposition and supervised classification methods to research brain activity and developed rigorous, interpretable approaches to classifying tensor data.

Presentations

Radar-based flow prediction in water networks for better real-time decision making Session

We combine traditional predictive models with deep learning methods to improve operation of waste water treatment plants. This data-driven approach relies on weather radar data that replaces local and often sparsely located rain gauge sensor stations. Our approach allows for fast and probabilistic forecasts that robustly improve real-time operation of the urban drainage system.

Barbara Fusinska is a machine learning strategic cloud engineering manager at Google with a strong software development background. Previously, she was at a variety of different companies like ABB, Base, Trainline, and Microsoft, where she gained experience in building diverse software systems, ultimately focusing on the data science and machine learning field. Barbara believes in the importance of data and metrics when growing a successful business. In her free time, Barbara enjoys programming activities and collaborating around data architecture. She can be found on Twitter as @BasiaFusinska and blogs at http://barbarafusinska.com.

Presentations

Natural language processing with deep learning and TensorFlow Session

Natural language processing (NLP) offers techniques to gain insight from and generate text data. Barbara Fusinska introduces you to NLP concepts and deep learning architectures using document context. You'll see a series of demos with TensorFlow from classification task to text generation.

Debasish Ghosh is principal software engineer at Lightbend. Passionate about technology and open source, he loves functional programming and has been trying to learn math and machine learning. Debasish is an occasional speaker in technology conferences worldwide, including the likes of QCon, Philly ETE, Code Mesh, Scala World, Functional Conf, and GOTO. He’s the author of DSLs In Action and Functional & Reactive Domain Modeling. Debasish is a senior member of ACM. He’s also a father, husband, avid reader, and Seinfeld fanboy who loves spending time with his beautiful family.

Presentations

Online machine learning in streaming applications: adapt to change with limited resources Session

In this talk, we discuss online machine learning algorithm choices for streaming applications. We motivate the discussion with resource constrained use cases like IoT and personalization. We cover drift detection algorithms and Hoeffding Adaptive Trees, performance metrics for online models and practical concerns with deployment in production. We also provide code examples for each technique.

Oliver Gindele is head of machine learning at Datatonic. Oliver is passionate about using computers models to solve real-world problems. Working with clients in retail, finance, and telecommunications, he applies deep learning techniques to tackle some of the most challenging use cases in these industries. He studied materials science at ETH Zurich and holds a PhD in computational physics from UCL.

Presentations

ML in Production: Serverless and Painless Session

Productionising machine learning pipelines can be a daunting and difficult task for Data Scientists. In this session, we will review some of the newest technologies that address that issue and we explain how we used them to productionise a serverless ML pipeline in an exciting case study.

Sarah Gold is CEO and founder at Projects by IF. She is a leading expert in emerging issues and trends in privacy, security and technology. Since Sarah founded IF in 2015, she has grown a world-renowned, multidisciplinary team who work with some of the most influential global organisations. IF’s recent partners and clients include a range of companies from different sectors, from big tech to healthcare, including Homes England, Google AI, Oxfam, Barnardo’s and Citizens Advice.

Presentations

Data design patterns that people trust Session

People care about how data about them is used. Building trust with consumers will require a change in how services treat data. Since 2016, IF has curated a data patterns catalogue which is used by product teams around the world. We’ll show how patterns help teams build digital services that give people agency over data, build trust and start addressing systemic balances of power.

Victor Gonzalez is a lover of art, unknown places, long talks and technology that gives us time for all that.

For the last 15 years he has been closely linked to the development of technology projects, initially as a software engineer in different industries, business intelligence project consultant for decision making and in recent years in the administration of innovation projects, partially combining his taste for teaching in some universities.

He is currently working as a CIO of a Mexican financial institution and has participated for the last three years in the process of digital transformation of the same.

Presentations

From brick and mortar to digital first Company: Data-driven digital transformation Session

Victor Gonzalez explores the way the fintech ecosystem comes to change the rules of the finantial services industry in México. The ConCrédito digital transformation driven by data project is the basis for the growth and scope of business objectives, the current business model needed to migrate from the traditional model to digital processes that allow us to be in the hands of our customers.

Dr Martin Goodson is Chief Scientist and CEO of Evolution AI, and a specialist in machine reading technologies. He is the Chair of the Royal Statistical Society Data Science Section, the professional body for data science in the UK. He also runs the largest machine learning community in Europe.
Martin’s work has been covered in the Economist, Quartz, Business Insider and TechCrunch.

Presentations

How RBS and Dun & Bradstreet use AI to automate back-office tasks Session

Automating mundane back-office tasks has been a long-standing headache for businesses under pressure to increase efficiency. Recent breakthroughs in computer vision and machine learning finally allow the automation of time-consuming document processing tasks. Dr Martin Goodson gives an account of successful AI projects that are automating tasks at RBS and Dun & Bradstreet.

Sunil Goplani is a group development manager at Intuit, leading the big data platform. Sunil has played key architecture and leadership roles in building solutions around data platforms, big data, BI, data warehousing and MDM for startups and enterprises. Previously, Sunil served in key engineering positions at Netflix, Chegg, Brand.net, and few other startups. Sunil has a master’s degree in computer science.

Presentations

Always accurate business metrics through Lineage based Anomaly Tracking Session

Imagine a business metric showing a sudden spike. Debugging data pipelines is non-trivial and finding the root cause can take hours to days! We’ll share how Intuit built a self-serve tool that automatically discovers data pipeline lineage and applies anomaly detection to proactively detect and help debug issues in minutes–establishing trust in metrics and improve developer productivity by 10-100X.

Trevor Grant is an Apache Software Foundation Memmber and involved in multiple projects such as Mahout, Streams, and SDAP-incubating just to name a few. He holds an MS in applied math and an MBA from Illinois State University. He speaks about computer stuff internationally. He has taken numerous classes in stand-up and improv comedy to make his talks more pleasant for you- the listener.

Presentations

Ship it! A practitioner's guide to model management and deployment with Kubeflow. Session

We'll show you a way to get & keep your models in production with Kubeflow.

Morgan Gregory leads strategy and programs for Google Cloud’s Office of the CTO, the mission of which is to foster collaborative innovation between Google and its most strategic customers around the world. Her technical focus area is AI, and she has a passion for responsible AI, as well as AI for science and AI for good. By keeping her finger on the pulse of the Office’s engagements and leveraging the deep technical and industry expertise across the team, she identifies themes that are relevant and important for today’s technical leaders. Previously, Morgan was in management consulting at the Boston Consulting Group (BCG), where she advised F100 companies in the technology, financial services, and pharmaceutical industries. She started her career as a software engineer and product manager building partnering products and solutions for tech companies at Partnerpedia (later acquired by BMC) a startup in Vancouver, Canada. Morgan earned her BSc in computer science from the University of British Columbia and an MBA from the MIT Sloan School of Management. Find Morgan on LinkedIn or on Twitter as @morganjgregory.

Presentations

Responsible AI: the importance of getting it right and the harm of getting it wrong Session

The adoption of AI is accelerating at an increasing pace. We’re reaping many benefits from the advancement of AI, but we’re also seeing hints of the unintended harm that occurs when responsibility isn’t front and center. It’s critical for us to understand how and why this happens so we can build our future responsibly, with AI that is fair, safe, trustworthy, and green.

Anna Gressel is a litigation associate at Debevoise & Plimpton LLP and a member of the firm’s Commercial Litigation Group and Technology, Media & Telecommunications practice. Her practice focuses on complex civil litigation in federal and state courts, and she advises on legal and regulatory issues around artificial intelligence and other emerging technologies. She’s the coauthor of publications including “German Report May Be Road Map for Future AI Regulation,” “Storm Clouds or Silver Linings? Assessing the Impact of the U.S. CLOUD Act on Cross-Border Criminal Investigations,” and “Do the Apps Have Ears? Cross-Device Tracking." She sits on the board of directors of Ms. JD, a nonprofit organization dedicated to the success of women in law school and the legal profession. She a member of the Law Committee of the IEEE Global Initiative on the Ethics of Autonomous and Intelligent Systems.

Presentations

AI impact assessments: Tools for evaluating and mitigating corporate AI risks Session

The Canadian Government made waves when it passed a law requiring AI impact assessments for automated decision systems. Similar proposals are pending in the US and EU. Anna Gressel, Meeri Haataja, and Jim Pastore unpack what an AI impact assessment looks like in practice and how companies can get started from a technical and legal perspective, and they provide tips on assessing AI risk.

Regulation and Ethics of AI in FinTech: Emerging Insights from the U.S. and U.K. Session

This is a crash course on the emerging ethical and regulatory issues surrounding AI in FinTech. It will offer insights from recent statements by U.S. and U.K. regulators in the banking and financial services industries, and examine their priorities in 2020. It will also provide practical guidance on how companies can mitigate ethical and legal risks and position their AI products for success.

Rob studied Engineering, Economics and Management degree at Oxford University graduating in 2007 with First Class Honours. Prior to founding Mindful Chef in 2015, he worked as an Interest Rate Options trader at Morgan Stanley where he ran the Exotic Derivatives trading desk in New York. In 2018, he was nominated as one of the top 30 UK entrepreneurs under the age of 35 by Startups.co.uk.

Presentations

A recipe for innovation: recommending recipes based on adventurousness Session

Mindful Chef is a health-focused company that delivers weekly recipe boxes. In order to create a more personalised experience for their customers, they teamed up with Pivigo to develop an innovative recommender system. In this talk we will tell about this project and the development of a novel approach to understanding user taste that had an unexpectedly large impact on recommendation accuracy.

Sijie Guo is the founder and CEO of StreamNative. StreamNative is a data infrastructure startup offering a cloud native event streaming platform based on Apache Pulsar for enterprises. Previously, he was the tech lead for the Messaging Group at Twitter and worked on push notification infrastructure at Yahoo. He is also the VP of Apache BookKeeper and PMC Member of Apache Pulsar.

Presentations

The secrets behind Apache Pulsar for processing tens of billions of transactions per day Session

Apache Pulsar as a cloud-native event streaming platform gains more and more adoptions in mission critical services due to its stronger consistency and durability guarantees. This presentation deep dives into the technical details driven the Pulsar adoption trend and showcases the real world example on using Apache Pulsar to process billions of transactions every day.

Meeri is the CEO and Co-Founder of Saidot, a start-up with a mission for enabling responsible AI ecosystems. Saidot develops technology for end-user AI explainability, transparency, and independent validation. Meeri was the chair of ethics working group in Finland’s national AI program that submitted its final report in March 2019. In this role she initiated a national AI ethics challenge and engaged more than 70 organizations commit to ethical use of AI and define ethics principles. Meeri is also the Chair of IEEE’s initiative for the creation of AI ethics certificates in ECPAIS program (Ethics Certification Program for Autonomous and Intelligent Systems).

Meeri is an Affiliate at the Berkman Klein Center for Internet & Society at Harvard University during academic year 2019-2020 with a focus on projects related to building citizen trust through AI transparency & open informing.

Prior to starting her own company Meeri was leading AI strategy and GDPR implementation in OP Financial Group. Meeri has a long background in analytics and AI consulting with Accenture Analytics. During her Accenture years she has been working in driving data and analytics strategies and large AI implementation programs in media, telecommunications, high-tech and retail industries. Meeri started her career as data scientist in telecommunications after completing her M.Sc.(Econ.) in Helsinki School of Economics.

Meeri is an active advocate of responsible and human-centric AI. She’s an experienced public speaker regularly speaking in international conferences and seminars on AI opportunities, AI ethics and governance.

Presentations

AI impact assessments: Tools for evaluating and mitigating corporate AI risks Session

The Canadian Government made waves when it passed a law requiring AI impact assessments for automated decision systems. Similar proposals are pending in the US and EU. Anna Gressel, Meeri Haataja, and Jim Pastore unpack what an AI impact assessment looks like in practice and how companies can get started from a technical and legal perspective, and they provide tips on assessing AI risk.

Hatem Hajri is a senior research scientist at IRT SystemX, where he mainly works on robustness and adversarial attacks of artificial intelligence-based systems. Previously, he held three teaching and research positions as University Paris 10, Luxembourg University, and University of Bordeaux, where he worked on various problems of stochastic analysis and graphical models, and at the VeDeCoM Institute at Versailles, France, where he conducted research on autonomous driving. He earned the French agrégation of mathematics and his MS and PhD degrees in applied mathematics at Paris Sud University, France.

Presentations

A probabilistic approach to adversarial machine learning Session

Adversarial machine learning studies vulnerabilities of machine learning algorithms in adversarial settings and develops techniques to make learning more robust to adversarial examples. Hatem Hajr outlines adversarial machine learning and illustrates a new approach to address the problem of adversarial examples based on probabilistic techniques.

Rasmus Halvgaard is specialised in model predictive control, and combines this deep knowledge with data-driven methods to leverage the best of both worlds.

Presentations

Radar-based flow prediction in water networks for better real-time decision making Session

We combine traditional predictive models with deep learning methods to improve operation of waste water treatment plants. This data-driven approach relies on weather radar data that replaces local and often sparsely located rain gauge sensor stations. Our approach allows for fast and probabilistic forecasts that robustly improve real-time operation of the urban drainage system.

Jonny is a Senior Data Scientist from NVIDIA’s Healthcare team who specialises in the application of artificial intelligence within in the fields of radiology, histopathology and genomics. His work is all about accelerating the uptake of AI and making it easy for people to get started and get the most out of their hardware. Prior to joining NVIDIA, Jonny spent several years as a Solution Architect within Intel’s Health & Life Sciences team. Although originally trained as product designer, the majority of Jonny’s career has been in software development, with roles ranging from Engineer to Technical Director, but the theme of most of this work has been automation of image-related tasks, usually within the NHS and public sector.

Presentations

Federated Learning for Healthcare Session

Federated Learning is a relatively new technique pioneered to allow much larger datasets to be used to train machine learning models but without the need to share potentially sensitive data. This makes the technique ideal for the healthcare sector in which patient data is highly sensitive but there is a huge need to increase the amount of training data to get models to clinically-viable levels.

Adam Hill is a lead data scientist at HAL24K working with client projects to deliver smart-city and smart-infrastructure solutions. He has worked within the traffic and water sectors to deliver machine learning models that enable decision support and insight. Adam is also currently a Royal Society Entrepreneur in Residence encouraging innovation and entrepreneurial activities within academia around data science topics and tools. He is also a long-term, core volunteer within the DataKind UK community supporting data science projects delivering social good for NGOs. Adam holds a PhD in Astrophysics from the University of Southampton.

Presentations

Beyond smart infrastructure: leveraging satellite data to detect wildfires Session

Wildfires are a major environmental and health risk, with a frequency that has increased dramatically in the past decade. Early detection is critical, however most often wildfires are only discovered by eye-witness accounts. In this talk we will tell about a data science partnership between HAL24K and Pivigo aimed at building an automated wildfire detection system using NOAA satellite data.

Rainer Hoffmann is Senior Manager Data & AI at EnBW. In his role he works at the interface of data science as well as internal customers and identifies AI use cases across the whole company. He has lead numerous AI projects from ideation to production.

Presentations

AI@Scale Driving the German Energiewende Session

Almost two years ago we at EnBW developed our core beliefs for the role of AI at EnBW and derived concrete actions that need to be taken in order to scale our AI activities. In our talk, we will give an overview on those aspects and will describe the challenges we have been facing on our journey so far. Further, we will describe the particular approaches we took to master these challenges.

Rick Houlihan is a principal technologist and leads the NoSQL blackbelt team at AWS and has designed hundreds of NoSQL database schemas for some of the largest and most highly scaled applications in the world. Many of Rick’s designs are deployed at the foundation of core Amazon and AWS services such as CloudTrail, IAM, CloudWatch, EC2, Alexa, and a variety of retail internet and fulfillment-center services. Rick brings over 25 years of technology expertise and has authored nine patents across a diverse set of technologies including complex event processing, neural network analysis, microprocessor design, cloud virtualization, and NoSQL technologies. As an innovator in the NoSQL space, Rick has developed a repeatable process for building real-world applications that deliver highly efficient denormalized data models for workloads of any scale, and he regularly delivers highly rated sessions at re:Invent and other AWS conferences on this specific topic.

Presentations

Where's my Lookup Table? Session

When Amazon decided to migrate thousands of application services to NoSQL, many of those services required complex relational models that could not be reduced to simple key-value access patterns. The most commonly documented use cases for NoSQL are simplistic. this session shows how to model complex relational data efficiently in denormalized structures.

Oliver Hughes is an engineer on the Spring Cloud Services team.

Presentations

Kubernetes Distilled - An in depth guide for the busy data engineer 1-day training

Today's data engineer needs a deep understanding of the key tools and concepts within the vast, rapidly evolving Kubernetes ecosystem. This training will provide developers with a thorough grounding on Kubernetes concepts, suggest best practices and get hands-on with some of the essential tooling. Topics will include

Recent speaking Events –
- TEDx Bonnsquare – The anxiety and hope in a world full of algorithms
- Data + UX Conference – Build our algorithms before they build us
- Panel Strata NYC with Coudera
- CDX NYC – The future of Voice AI

Daniel is the Founder of gravityAI, one of the worlds first marketplaces for algorithms. Formerly he was the Head of Product Management at State Street Verus, where he leads a team of over 30 designers, engineers, data scientists, and SMEs. Verus is a first of its kind mobile application that uses NLP, machine learning and a knowledge graph to make connections between an investors portfolio and news. HE recently spoke about his experience developing this product at Strata NYC.

Prior to State Street, Dan led product management on similar AI platforms at Boston Consulting Group, Digital Ventures.

When Dan isn’t working on his startup you can find him teaching entrepreneurship, product management, and data for enterprise at CUNY or General Assembly.

Presentations

Algorithm Commoditization: build vs. buy decisions from a product manager perspective Session

Many types of algorithms have become commoditized, yet companies continue to use tight resources to try to build these in-house all the time. Considering that according to Gartner, nearly 9 out of 10 (87%) internal data science projects fail to make it into production, it's crazy to focus resources on anything but the most proprietary of projects. How do you decide where to focus?

Viacheslav Inozemtsev is a data engineer at Zalando, building an internal data lake platform on top of Apache Spark, Delta Lake, Apache Presto, and serverless cloud technologies, and enabling machine learning and AI for all teams and departments of the company. He has 8 years of data and software engineering experience. He earned a degree in applied mathematics, and then an MSc degree in computer science with the focus on data processing and analysis.

Presentations

Lambda Architecture with Apache Spark Structured Streaming and Delta Lake tables Session

Lambda architecture is a general purpose architecture for data platforms. It has been known for a while, but was always hard to implement. With the release of Delta Lake tables and after Spark Structured Streaming became mature, Lambda architecture has gotten a completely new breath, and can now be done in a much easier way than ever before for various analytical and machine learning use cases.

Charu Jaiswal is a Machine Learning Scientist at Integrate.ai, one of the fastest growing companies in Canadian history. She builds predictive models to help large enterprises like insurance companies and banks become more customer-centric. Charu completed her Masters in machine learning and industrial engineering from the University of Toronto. Prior to Integrate.ai, she also applied machine learning to the venture capital and energy storage industries.

Presentations

Machine Learning Models after Deployment: Testing, Monitoring, and Re-training Session

You train ML models and deploy them into the wild. What next? The performance of your models will decrease over time as business operations and customer behaviours change. You may only notice months later, incurring costly results. In this session, the audience will learn how to fight back against performance loss by monitoring, testing, and retraining ML models actively in production.

Asif Jan is a Group Director in Personalized Healthcare (PHC) Data Science at Roche, Switzerland, where he leads a multi-disciplinary team of scientists specializing in computer science, neuroscience, and statistics. The team implements a variety of statistical and machine learning methods on real-world datasets (e.g. Electronic Medical Records, Health Insurance Claims, Disease Registries) to fulfil evidence and data analysis needs of Neuroscience disease area at Roche. Previously, he was Head of Data Science at Roche Diagnostics leading a team of quantitative scientists supporting In-vitro Diagnostics (IVD) and clinical decision support (CDS) product development, and defined data strategy enabling use of real-world data in Roche Diagnostics. Earlier he has had a number of roles overseeing technology strategy development, enterprise and solution architecture and program management at Roche and in other research organisations. Asif has vast experience of building and leading data science teams in Pharma, Diagnostics, and industrial research institutes, tackling complex scientific and business problems.

Presentations

Data Science for Enabling Personalized Healthcare Session

Advances in AI/ML are critical to advancing our understanding of the disease and to bring better and more efficacious treatments to patients realising the dream of personalized healthcare. In this talk I will share lessons learnt from building data science teams in Pharma, and outline roadmap for success of AI/ML in Pharma industry.

Grishma Jena is a Data Scientist with the UX Research and Design team at IBM Data & AI in San Francisco. She works across portfolios in conjunction with user research and design teams and uses data to understand users’ struggles. Previously, she was a mentor for the nonprofit AI4ALL’s AI Project Fellowship, where she guided a group of high school students on using AI for prioritizing 911 EMS calls. Grishma also teaches Python at the San Francisco Public Library. She enjoys delivering talks and is passionate about encouraging women and youngsters in technology. She holds a master’s degree in computer science from the University of Pennsylvania. Her research interests include machine learning and natural language processing.

Presentations

Data Wrangling with Python 2-Day Training

Data Science is rapidly changing every industry. This has resulted in a shift away from traditional software development towards data-driven decision making. In this training, we will be using the popular Python to extract, wrangle, explore, and understand data so that we can leverage it in the real world.

Data Wrangling with Python (Day 2) Training Day 2

Data Science is rapidly changing every industry. This has resulted in a shift away from traditional software development towards data-driven decision making. In this training, we will be using the popular Python to extract, wrangle, explore, and understand data so that we can leverage it in the real world.

Pravin Jha is a senior data scientist at Ameren, an American power company. He contributes in customer analytics domain with his expertise in machine learning and natural language processing. He has more than five years of academic research experience in engineering and data analytics. He also holds a professional engineering license and has more than five years of professional experience in construction industry. He earned his PhD in engineering science from Southern Illinois University Carbondale. He’s an avid NBA fan and enjoys watching basketball in his free time.

Presentations

Writer-independent offline signature verification in banks using few-shot learning Session

Offline signature verification is one of the most critical tasks in traditional banking and financial industries. The unique challenge is to detect subtle but crucial differences between genuine and forged signatures. This verification task is even more challenging in writer-independent scenarios. Tuhin Sharma and Pravin Jha detail few-shot image classification.

Ken Johnston is the principal data science manager for the Microsoft 360 Business Intelligence Group (M360 BIG). In his time at Microsoft, Ken has shipped many products, including Commerce Server, Office 365, Bing Local and Segments, and Windows, and for two and a half years, he was the director of test excellence. A frequent keynote presenter, trainer, blogger, and author, Ken is a coauthor of How We Test Software at Microsoft and contributing author to Experiences of Test Automation: Case Studies of Software Test Automation. He holds an MBA from the University of Washington. Check out his blog posts on data science management on LinkedIn.

Presentations

Infinite segmentation: Scalable mutual information ranking on real-world graphs Session

Today, normal growth isn't enough—you need hockey-stick levels of growth. Sales and marketing orgs are looking to AI to "growth hack" their way to new markets and segments. Ken Johnston and Ankit Srivastava explain how to use mutual information at scale across massive data sources to help filter out noise and share critical insights with new cohort of users, businesses, and networks.

Kim Falk is a senior data scientist at IKEA, where he’s part of a small, dedicated team focusing on real-time promotions. Previously, Kim worked on recommender systems in scenarios like in retargeting ads and in video-on-demand sites. He’s also worked on classifying Danish legal documents using NLP. He’s the author of Practical Recommender Systems.

Presentations

Deep reinforcement learning for personalized promotions at IKEA Session

Around the world, IKEA has an ever-growing number of loyalty club (Family) members. An important part of IKEA’s ongoing digital transformation is to improve communication with these customers and to inspire them with offers that are most relevant for improving their everyday life. Kim Falk shares IKEA's work on personalizing promotional emails.

Robin actively contributed to building products and platforms that accelerated the Digital Transformation of industries using the power of Data. He works as the Chief Data and Analytical Officer for wefox – the largest Insurtech in Europe, and nominated as top 10 hottest FinTech companies in the world by Business Insider. Prior to this he has held many senior leadership roles in both Fortune 100 companies like Cisco and agile FinTech startups.

Robin will be the CTO, and Managing Director of the new Credit Risk Assessment startup of the firm in subject .

Presentations

How Explainable AI can Solve AI Adoption Hurdles Session

A key challenge to AI adoption is the lack of transparency and the Blackbox models. This talk shows how a Berlin based startup democratized Credit Risk Assessment with Explainable AI. The blackbox nature of AI causes concerns on adoption, regulation and ethical use. We present a hope that explainable AI could not only solve this problem, but in doing so make the world a better place.

Anthony Joseph is a technology cofounder of a property tech startup and an Australian software engineer and mathematician. He earned his degree from MBT and enjoys teaching and learning coding with the Australian startup scene.

Presentations

Applying machine learning to wearable technologies for exercise technique management Session

IoT devices are increasing in power and capability, now allowing developers to use machine learning models on the device. Anthony Joseph analyzes a boxing training session with motion sensors onboard IoT devices using the TensorFlow framework and provides user feedback on technique and speed.

Davin Kaing is a Data Scientist on the Client Advocacy team at IBM where he applies statistics, causal inference, and machine learning to uncover driving factors of client experience and generate insights to improve IBM client experience. Prior to IBM, Davin was a Data Scientist and Consultant for various start-ups in a variety of industries including healthcare, finance, and cyber insurance. He holds a Master’s in Statistics from Columbia University, a Master’s in Data Science from the George Washington University, and a Bachelor’s in Bioengineering from the University of the Pacific.

Presentations

Causal Inference Using Observational Data Session

What is driving revenue? How can we improve our client experience? These are causal questions that many organizations face. Answering these questions using data can be challenging, especially since in most cases, only observational data are available. We will go through an overview of both traditional and modern causal inference techniques and address their limitations and applications.

Swasti Kakker is a software development engineer on the LinkedIn Data team at LinkedIn. Her passion lies in increasing and improving developer productivity by designing and implementing scalable platforms for the same. In her two-year tenure at LinkedIn, she’s worked on the design and implementation of hosted notebooks at LinkedIn, which focuses on providing a hosted solution of Jupyter notebooks. She’s worked closely with the stakeholders to understand the expectations and requirements of the platform that would improve developer productivity. Previously, she worked with the Spark team, discussing how Spark History Server can be improved to make it more scalable to cater to the traffic by Dr. Elephant. She’s also contributed to adding the Spark heuristics in Dr. Elephant after understanding the needs of the stakeholders (mainly Spark developers) which gave her good knowledge about Spark infrastructure, Spark parameters, and how to tune them efficiently.

Presentations

Darwin: Evolving hosted notebooks at LinkedIn Session

Come and learn the challenges we overcame to make Darwin (Data Analytics and Relevance Workbench at LinkedIn) a reality. Know about how data scientists, developers, and analysts at LinkedIn can share their notebooks with their peers, author work in multiple languages, have their custom execution environments, execute long-running jobs, and do much more on a single hosted notebooks platform.

Amit Kapoor is a data storyteller at narrativeViz, where he uses storytelling and data visualization as tools for improving communication, persuasion, and leadership through workshops and trainings conducted for corporations, nonprofits, colleges, and individuals. Interested in learning and teaching the craft of telling visual stories with data, Amit also teaches storytelling with data for executive courses as a guest faculty member at IIM Bangalore and IIM Ahmedabad. Amit’s background is in strategy consulting, using data-driven stories to drive change across organizations and businesses. Previously, he gained more than 12 years of management consulting experience with A.T. Kearney in India, Booz & Company in Europe, and startups in Bangalore. Amit holds a BTech in mechanical engineering from IIT, Delhi, and a PGDM (MBA) from IIM, Ahmedabad.

Presentations

Democratize and build better deep learning models using TensorFlow.js Session

Bargava Subramanian and Amit Kapoor use two real-world examples to show how you can quickly build visual data products using TensorFlow.js to address the challenges inherent in understanding the strengths, weaknesses, and biases of your models as well as involving business users to design and develop a more effective model.

Holden Karau is a transgender Canadian software working in the bay area. Previously, she worked at IBM, Alpine, Databricks, Google (twice), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.

Presentations

Ship it! A practitioner's guide to model management and deployment with Kubeflow. Session

We'll show you a way to get & keep your models in production with Kubeflow.

Meher Kasam is an iOS software engineer at Square and is a seasoned software developer with apps used by tens of millions of users every day. He’s shipped features for a range of apps from Square’s point of sale to the Bing app. Previously, he worked at Microsoft, where he was the mobile development lead for the Seeing AI app, which has received widespread recognition and awards from Mobile World Congress, CES, FCC, and the American Council of the Blind, to name a few. A hacker at heart with a flair for fast prototyping, he’s won close to two dozen hackathons and converted them to features shipped in widely used products. He also serves as a judge of international competitions including the Global Mobile Awards and the Edison Awards.

Presentations

30 golden rules to speed up TensorFlow performance Session

Meher Kasam, Anirudh Koul, and Siddha Ganju highlight the must-have checklist for everyday AI practitioners to speed up your deep learning training and inference with TensorFlow code examples.

Phil Kendall (he/him) is the chief innovation officer at Intercept IP, a small UK company that produces a low-power black box for the motor insurance market, where he leads the R&D efforts, ensuring it products remain cutting edge. From making a start on the ZX Spectrum, his experience ranges across various industries from telematics to enterprise virtualization software.

Presentations

Implementing device-specific learning for IoT devices Session

Philip Kendall offers a look at the challenges involved in training and deploying a unique model to each of tens of thousands of Arduino-class IoT devices to minimize power use and maximize lifetime. The solution involves a high-level simulation of the system on the backend to perform the training and a custom virtual machine on the device to implement the learned model.

Scott Kidder has been building video encoding & delivery platforms for over 12 years (MobiTV, Brightcove/Zencoder, and now Mux). He’s currently a Staff Software Engineer working on the Mux Data service which provides realtime and historical analytics for Internet video playback. Scott has built high-volume stream-processing applications for Mux Data and Mux Video (our full-service video encoding and distribution service) that have served some of the most widely watched video streams on the Internet (World Cup, NFL Super Bowl). Interests include Kafka, Flink, Kubernetes, and Go.

Presentations

Stateful Stream Processing with Kafka and Go Session

Learn how the Mux Data service has leveraged Kafka and Go to build stateful stream-processing applications that operate on extremely high-volumes of video-view beacons to drive real-time monitoring dashboards and historical metrics representing a viewer’s quality-of-experience. We’ll also cover fault-tolerance, monitoring, and Kubernetes container deployments.

Kevin is Head of Data Group at ‘Socar’, the largest car sharing company in Korea. He co-founded ‘Between’, 20 million downloaded app for couples, and has worked as a developer and data scientist until ‘Socar’ acquired his company. He is also an open-source enthusiast, Committer and PMC member of Apache Zeppelin project.

Presentations

Redefining Car Sharing Industry with Data Science Session

Socar, one of the largest car sharing fleet operator in the world, has been seriously focused on data operations. You can hear how Socar is redefining car sharing industry with data science by experiment-based pricing strategy, machine-learning based demand prediction, optimized car management, accident risk profiling, and many more.

Stavros Kontopoulos is a principal engineer at Lightbend.
Previously, he worked building software solutions that scale in different verticals like telecoms and marketing. His interests among others include distributed system design, streaming technologies, and NoSQL databases.

Presentations

Online machine learning in streaming applications: adapt to change with limited resources Session

In this talk, we discuss online machine learning algorithm choices for streaming applications. We motivate the discussion with resource constrained use cases like IoT and personalization. We cover drift detection algorithms and Hoeffding Adaptive Trees, performance metrics for online models and practical concerns with deployment in production. We also provide code examples for each technique.

Gabor Kotalik is a big data project lead at Deutsche Telekom, where he’s responsible for continuous improvement of customer analytics and machine learning solutions for commercial roaming business. He has more than 10 years of experience in business intelligence and advanced analytics focusing on utilization of insights and enabling data-driven business decisions.

Presentations

Machine Learning processes at Deutsche Telekom Global Carrier Session

Deutsche Telekom is 4th biggest telecommunication company in the world, and every day millions of our customers are using their mobile services in roaming. This presentation is about how we designed and built our machine learning processes on top of Cloudera Hadoop cluster to support commercial roaming business at Deutsche Telekom Global Carrier.

Melanie wants a world where robots do all the boring, repetitive stuff for her so she can spend her time doing not boring, repetitive stuff.

As a mathematician-turned-programmer with more than 10 years of experience, she has worked at universities, Booz Allen, PwC, and Capgemini analyzing data, producing cool demonstrations of artificial intelligence, discovering (inventing?) mathematics, project managing large technical implementations, and trying to keep people from freaking out. After work, she spends most evenings working on homework and studying as she is trying to complete a M.S. in Computer Science from Georgia Tech. Melanie is extremely passionate about artificial intelligence and has spoken at multiple conferences such as Grace Hopper and Lesbians Who Tech, where her talks are well received by diverse audiences.

When she’s not coding, which is almost all the time, you can find her figuring out how to cook vegan keto recipes, taking care of her paraplegic and geriatric cats and dogs, and trying to raise a decent human being to take over the world.

Presentations

If I only had a brain: putting the intelligence in intelligent automation Session

Traditional automation is typically limited to clear-cut business rules that can be easily programmed. We expand what automation can do by adding eyes (computer vision), a brain (general AI models) and speech (natural language processing) to our automations to enhance the ability of our automations

Jonathan Leslie is head of data science at Pivigo, where he works with business partners to develop data science solutions that make the most of their data, including in-depth analysis of existing data and predictive analytics for future business needs. He also programs, mentors, and manages teams of data scientists on projects in a wide variety of business domains.

Presentations

Bringing innovation to online retail: automating customer service using NLP Session

MADE.com are a furniture and homewares retailer with a unique online-only business model. Given this format, it is crucial that customer service agents are able to respond to queries quickly and accurately. However, it can often be difficult to match the demand of incoming requests. We will tell about a project aimed developing a framework for automated responses to customer queries.

Marko Letic is a front-end engineer, lecturer and data visualization scientist. He is currently leading the front-end team at AVA, a Berlin-based company, where he is working on a platform that combines big data, pattern recognition and artificial intelligence to take the safety of individuals, organizations, cities, and countries to a whole new level. His main role is to create a contextual analysis of the processed data through a web-based client application. Marko is also working as a Tech Speaker at Mozilla promoting the values of the open web and he is one of the organizers of Armada JS, the first JS conference in Serbia. He holds a MSc degree in Computer Science and is pursuing his PhD in data visualization. He sometimes writes fiction novels that probably will never get published as he spends too much time coding.

Presentations

Saving the world with JavaScript: A Data Visualization story Session

Did you know that the beginnings of data visualization are strongly tied to solving some of the biggest problems humanity has ever faced? Wouldn’t it be more interesting to say that you’re not a doctor, but you do save lives than to say you’re just a developer? If you want to know more, join me on this trip through time and beyond.

Simon Lidberg is a solution architect in Microsoft’s Data Insights Center of Excellence. He’s worked with database and data warehousing solutions for almost 20 years in a various of industries, with a more recent focus on analysis, BI, and big data. Simon is the author of Getting Started with SQL Server 2012 Cube Development.

Presentations

(Partially) demystifying DevOps for AI Session

DevOps, DevSecOps, AIOps, ML Ops, Data Ops, No Ops....Ditch your confusion and join Simon Lidberg and Benjamin Wright-Jones to understand what DevOps means for AI and your organization.

Alexandre Lomadze is a senior data scientist in TBC bank. He has a broad experience in using and developing machine learning algorithms for various business projects, such as telco, HR, banking. At the same time he teaches Machine Learning in the Free University of Tbilisi. Aleksandre has a technical background in math and computer science but gets most excited about approaching data problems from a business perspective and to drive to the optimal decisions. He has BAC degree at Moscow institute of physics and technology and master degree at Tbilisi State University in computer science. Also he has twice medaled in international mathematical Olympiad.

Presentations

How a Failed Machine Learning Exercise Increased Deposit Profitability by 20% Session

We will tell you how our failed attempt to build an ML model brought us to discovering institutional problems and kicked off improvement of existing business processes so that we would collect quality data for future modeling; and how we still managed to increase deposit profitability by 20% in the process.

Markus Ludwig is a senior data scientist at Scout24, where he builds and deploys machine learning systems that power search and discovery. Previously, Markus worked as an academic researcher, lecturer, and consultant. He earned a PhD in computational finance from the University of Zurich, Switzerland.

Presentations

Transformers in the wild Session

Markus Ludwig shares insights from training and deploying a Transformer model that translates natural language to structured search queries. You'll cover the entire journey from idea to product, from teaching the model new tricks to helping it forget bad habits, and iteratively refine the user experience.

Mike Lutz is an infrastructure lead at Samtec. Traditionally living in the data communications world, he stumbled into data (and big data) as a way to manage the floods of information that were being generated in his many telemetry and internet of things adventures

Presentations

Big Data for the Small Fry - Bootstrapping from onsite to Cloud Big Data Session

Netflix proposed a novel best-practice in using Jupyter notebooks as glue for working in the "Big Data"/AI-processing domain - this presentation will follow a manufacturing companies adventure in trying to to implement Netflix's ideas, but on a dramatically smaller scale - working through how their idea can be useful even for the Small Fry.

Miguel Martínez is a Senior Deep Learning Solutions Architect at NVIDIA, where he concentrates on RAPIDS. Previously, he mentored students at Udacity’s Artificial Intelligence Nanodegree. He has a strong background in financial services, mainly focused on payments and channels. As a constant and steadfast learner, he is always up for new challenges.

Presentations

Accelerating Machine Learning and Graph Analytics by Several Orders of Magnitude with GPUs Session

GPU acceleration has been at the heart of scientific computing and artificial intelligence for many years now. Since the launch of RAPIDS last year, this vast computational resource has become available for data science workloads too. The RAPIDS framework is a GPU-accelerated drop-in replacement for utilities such as Pandas, Scikit-Learn, NetworkX and XGBoost.

Hamlet Jesse Medina Ruiz is a senior data scientist at Criteo. Previously, he was a control system engineer for Petróleos de Venezuela. Hamlet finished in the top ranking in multiple data science competitions, including 4th place on predicting return volatility on the New York Stock Exchange hosted by Collège de France and CFM in 2018 and 25th place on predicting stock returns hosted by G-Research in 2018. Hamlet holds a two master degrees on mathematics and machine learning from Pierre and Marie Curie University, and a PhD in applied mathematics from Paris-Sud University in France, where he focused on statistical signal processing and machine learning.

Presentations

Predicting Criteo’s internet traffic load using Bayesian structural time series models Session

Criteo's infrastructure provides capacity and connectivity to host Criteo’s platform and applications. The evolution of our infrastructure is driven by the ability to forecast Criteo's traffic demand. In this talk, we explain how Criteo uses Bayesian Dynamic time series models to accurately forecast its traffic load and optimize hardware resources across data centers.

Apache Hadoop (HDFS) / Apache Ozone contributor.

Presentations

Distributed Tracing in Apache Hadoop Session

Distributed tracing is a well known technique for identifying where failures occur and the reason behind poor performance, especially for complex systems like Hadoop which involves many different components. We are small team at Cloudera working on integrating OpenTracing in Hadoop ecosystem. We would like to present a demo of our current work and talk about our future integration plan.

Laurence Moroney is a developer advocate on the Google Brain team at Google, working on TensorFlow and machine learning. He’s the author of dozens of programming books, including several best sellers, and a regular speaker on the Google circuit. When not Googling, he’s also a published novelist, comic book writer, and screenwriter.

Presentations

Zero to hero with TensorFlow 2.0 Session

Laurence Moroney explores how to go from wondering what machine learning (ML) is to building a convolutional neural network to recognize and categorize images. With this, you'll gain the foundation to understand how to use ML and AI in apps all the way from the enterprise cloud down to tiny microcontrollers using the same code.

Francesco Mucio is a BI architect at Zalando. The first time Francesco met the word data, it was just the plural of datum. Now he’s helping to redraw Zalando’s data architecture. He likes to draw data models and optimize queries. He spends his free time with his daughter, who, for some reason, speaks four languages.

Presentations

Data Engineering: The Worst Practices Session

Please sit down and play with us the Data Engineering worst practices bingo. From cloud infrastructure to stream processing, from data lakes to analytics come to see what can go wrong and what was the reasoning behind these decision. After collecting stories for almost 20 years, it is finally time to give back. And if you recognize your organization in some of them, well we told you to sit down.

Jacques Nadeau is the CTO and co-founder of Dremio. He is also the PMC Chair of the open source Apache Arrow project, spearheading the project’s technology and community. Prior to Dremio, he was the architect and engineering manager for Apache Drill and other distributed systems technologies at MapR. In addition, Jacques was CTO and co-founder of YapMap, an enterprise search startup, and held engineering leadership roles at Quigo (AOL), Offermatica (ADBE), and aQuantive (MSFT).

Presentations

Real World Cloud Data Lakes: Examples and Guide Session

This talk will serve as review of how to build a successful cloud data lake. It will cover key topics such as landing, etl, security cost/performance tradeoffs and access patterns as well as technologies such as Apache Arrow, Iceberg and Spark in the context of real world customer deployments.

Working in many fields of Machine learning, and in the moment specially with water and waste water treatmentplans.

Presentations

Radar-based flow prediction in water networks for better real-time decision making Session

We combine traditional predictive models with deep learning methods to improve operation of waste water treatment plants. This data-driven approach relies on weather radar data that replaces local and often sparsely located rain gauge sensor stations. Our approach allows for fast and probabilistic forecasts that robustly improve real-time operation of the urban drainage system.

I specialize in big data, machine learning, and analytical platforms. A combination of a strong engineering background and pre-sales experience gave me a broader understanding of needs, objectives, and expectations towards data projects.

Working in a data field since 2011, I had an opportunity to work within different companies, teams, and industries. I enjoy applying data-management to solve real business problems, building an analytical data culture, and overall using data to make better decisions.

Presentations

Scaling a real-time recommender system for 350M users in a dynamic marketplace Session

OLX Group group includes 20+ brands, more than 350M monthly active users, and millions of new items added to a platform daily. Of course, recommender systems play a crucial part in our platform. This session highlights the data flows and core components used for building, serving and continuously iterating over recommenders in such a dynamic marketplace.

Thomas Nield is the founder of Nield Consulting Group, LLC and professional author, conference speaker, and trainer at O’Reilly Media. He authored two books including “Getting Started with SQL” by O’Reilly and “Learning RxJava” by Packt. He regularly teaches classes on analytics, machine learning, and mathematical optimization as well as written several popular articles like “How it Feels to Learn Data Science in 2019” and “Is Deep Learning Already Hitting Its Limitations?”.

Valuing problem-solving over problem-finding, Thomas believes using solutions that are practical, which are often unique in every industry.

Presentations

Large-Scale Machine Learning with Spark and Scikit-learn 2-Day Training

There has been an explosion of tools for machine learning, but two have emerged as practical go-to solutions: Scikit-Learn and Apache Spark. Using Python, we will cover examples in parallel (no pun intended!) for both of these tools and learn how to tackle machine learning at small, medium, and large scales.

Large-Scale Machine Learning with Spark and Scikit-learn Training Day 2

There has been an explosion of tools for machine learning, but two have emerged as practical go-to solutions: Scikit-Learn and Apache Spark. Using Python, we will cover examples in parallel (no pun intended!) for both of these tools and learn how to tackle machine learning at small, medium, and large scales.

Dr Sami Niemi has been working on Bayesian inference and machine learning over 10 years and have published peer reviewed papers in astrophysics and statistics. He has delivered machine learning models for e.g. telecommunications and financial services. Sami has built supervised learning models to predict customer and company defaults, 1st and 3rd party fraud, customer complaints, and used natural language processing for probabilistic parsing and matching. He has also used unsupervised learning in a risk based anti-money laundering application. Currently Sami works at Barclays where he leads a team of data scientists building fraud detection models and manages the UK fraud models.

Presentations

Implementing Machine Learning Models for Real-Time Transaction Fraud Detection Session

Predicting transaction payment fraud in real-time is an important challenge, which state-of-art supervised machine learning models can help to solve. In last two years Barclays has developed and tested different models and implementation solutions. In this talk we learn how state-of-the-art machine learning models can be implemented, while meeting strict real-time latency requirements.

Kim Nilsson is the CEO of Pivigo, a London-based data science marketplace and training provider responsible for S2DS, Europe’s largest data science training program, which has by now trained more than 650 fellows working on over 200 commercial projects with 120+ partner companies, including Barclays, KPMG, Royal Mail, News UK, and Marks & Spencer. An ex-astronomer turned entrepreneur with a PhD in astrophysics and an MBA, Kim is passionate about people, data, and connecting the two.

Presentations

A recipe for innovation: recommending recipes based on adventurousness Session

Mindful Chef is a health-focused company that delivers weekly recipe boxes. In order to create a more personalised experience for their customers, they teamed up with Pivigo to develop an innovative recommender system. In this talk we will tell about this project and the development of a novel approach to understanding user taste that had an unexpectedly large impact on recommendation accuracy.

Senior data scientist, data science lead on Deliveroo Plus.
M.A. economics.

Presentations

Getting an Edge with Network Analysis with Python Session

This talk will introduce network analysis and show what a powerful and impactful tool it is. Using plethora of real world examples and friendly Python syntax, audience members will be equipped - and hopefully inspired - to start their journey with this network analysis!

Tristan O’Gorman is a product architect with IBM Watson IoT, specializing in asset management solutions with a focus on applications of artificial intelligence. Previously, he worked in a variety of software product development roles. Tristan has advanced degrees, including applied data science, from National University of Ireland, Galway; University of Limerick; and Technical University Dublin. In his spare time, he’s busy with his two boys and enjoys tennis and photography.

Presentations

Data-driven predictive maintenance: Has the promise of IIoT been realized? Session

The advance of the industrial internet of things (IIoT) promised much, particularly in the area of predictive maintenance. Tristan O'Gorman digs into whether or not those promises have been realized. You'll learn about the particular technical and strategic challenges that organizations seeking to adopt predictive maintenance have to overcome.

Florian Ostmann is a Policy Fellow within the Public Policy Programme at the Alan Turing Institute, the UK’s national institute for data science and artificial intelligence. His research interests are centred around applications of data science and AI in the public sector, ethical and regulatory questions in relation to emerging technologies across different sectors of the economy, and the future of work and social welfare systems. Among other responsibilities, he currently acts as a principal investigator for projects on the use of AI in financial services and criminal justice.

Florian is a member of the Law Committee for the IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. Prior to joining the Turing, Florian was a Research Associate at the Shorenstein Center on Media, Politics and Public Policy where he conducted research on questions of fairness and transparency in the context of algorithmic decision-making. His previous experience also includes working for the Pan American Health Organization and serving as a consultant on responsible investing and human rights due diligence in the private sector (with a focus on modern slavery risks), autonomous vehicle policy, and social impact measurement. Florian holds a Master in Public Policy from the Harvard Kennedy School and a PhD in Political Philosophy from University College London.

Presentations

Regulation and Ethics of AI in FinTech: Emerging Insights from the U.S. and U.K. Session

This is a crash course on the emerging ethical and regulatory issues surrounding AI in FinTech. It will offer insights from recent statements by U.S. and U.K. regulators in the banking and financial services industries, and examine their priorities in 2020. It will also provide practical guidance on how companies can mitigate ethical and legal risks and position their AI products for success.

Professor Lukumon Oyedele is the founding Director of the Big Data, Enterprise and Artificial Intelligence Lab (Big-DEAL) at the University of the West of England (UWE) Bristol. He is currently the Assistant Vice-Chancellor, Digital Innovation and Enterprise at UWE Bristol. His research focus is in the transformation of the UK construction industry for improved productivity and performance using emerging digital technologies. The key digital technologies include Artificial Intelligence (AI), Big Data, Machine Learning and Deep Learning, Internet of Things (IoT), Natural Language Processing and Augmented Reality/Virtual Reality. His cross-disciplinary research has culminated into strategic partnerships with businesses to stimulate improved productivity and value delivery within the Architectural, Engineering and Construction (AEC) industries. Prof Lukumon has substantial research track record in managing and delivering large-scale, applied, collaborative and multi-year research projects to the tune of £18 million. The impact of the projects evidences Professor Lukumon Oyedele’s knack for employing emerging innovative technologies to address diverse challenges confronting large businesses and SME within the AEC industries. Prof. Lukumon Oyedele currently leads a cross-disciplinary team of world-class researchers, including computer scientists, data scientist, BIM modellers, civil engineers, architects, planners, electrical engineers, sociologist, psychologists, financial analyst, among others.

Presentations

Conversational-AI and Augmented Reality for Supporting Frontline Construction Workers Session

In this talk, the use of conversational-AI and Augmented Reality to interact with BIM will be presented. The aim is to make it possible for onsite construction workers to seek support from BIM through the use of verbal query and augmented display.

Maziyar Panahi is a data scientist at John Snow Labs and an active contributor to the Spark NLP open-source project. He is also a lead big data engineer and project manager at the Institut des Systèmes Complexes in Paris, overseeing a platform with over 110 billion documents on 120+ servers and 120+ TB of HDFS storage. Maziyar has 15 years of experience as a software engineer, system administrator, project manager, and research officer.

Presentations

Advanced Natural language processing with Spark NLP 1-day training

This is a hands-on training covering applying the latest advances in deep learning for common NLP tasks such as named entity recognition, document classification, sentiment analysis, spell checking and OCR. Learn to build complete text analysis pipelines using the highly performant, high scalable, open-source Spark NLP library in Python.

Jim Pastore is a litigation partner at Debevoise & Plimpton LLP and a member of the firm’s Cybersecurity and Data Privacy Practice and Intellectual Property Litigation Group. His practice focuses on privacy and cybersecurity issues. He’s recognized by Chambers USA and the Legal 500 US (2015–2019) for his cybersecurity work and was included in Benchmark Litigation’s Under 40 Hot List, which recognizes attorneys under 40 with outstanding career accomplishments. Named as a cybersecurity trailblazer by the National Law Journal, he’s twice been named to Cybersecurity Docket’s “Incident Response 30,” a list of the best and brightest data-breach-response attorneys. Previously Jim served for five years as an Assistant United States Attorney in the Southern District of New York, where he spent most of his time as a prosecutor with the Complex Frauds Unit and Computer Hacking and Intellectual Property Section.

Presentations

AI impact assessments: Tools for evaluating and mitigating corporate AI risks Session

The Canadian Government made waves when it passed a law requiring AI impact assessments for automated decision systems. Similar proposals are pending in the US and EU. Anna Gressel, Meeri Haataja, and Jim Pastore unpack what an AI impact assessment looks like in practice and how companies can get started from a technical and legal perspective, and they provide tips on assessing AI risk.

Regulation and Ethics of AI in FinTech: Emerging Insights from the U.S. and U.K. Session

This is a crash course on the emerging ethical and regulatory issues surrounding AI in FinTech. It will offer insights from recent statements by U.S. and U.K. regulators in the banking and financial services industries, and examine their priorities in 2020. It will also provide practical guidance on how companies can mitigate ethical and legal risks and position their AI products for success.

Lomit Patel is the vice president of growth at IMVU, responsible for user acquisition, retention, and monetization. Previously, Lomit managed growth at early stage startups including Roku (IPO), TrustedID (acquired by Equifax), Texture (acquired by Apple), and EarthLink. Lomit is a public speaker, author, advisor, and recognized as a Mobile Hero by Liftoff.

Presentations

Lean AI: How innovative startups use artificial intelligence to grow Session

The future of customer acquisition rests on the shoulders of leveraging intelligent machines, orchestrating complex campaigns across key marketing platforms—dynamically allocating budgets, pruning creatives, surfacing insights, and taking actions powered by AI. Lomit Patel shows you how to use AI and machine learning (ML) to provide an operational layer to deliver meaningful results.

Andy is the CEO of Kensu Inc., an Analytics and AI Governance company, which created the Kensu Data Activity Manager (DAM), the first of its kind GCP (Governance, Compliance & Performance) Solution.
Andy is an entrepreneur with Mathematics and Geospatial data analysis background.
Andy is recognized In the open-source community, for the Spark Notebook project bridging distributed data science and the Scala communities

Presentations

Data Quality & Lineage Monitoring: Why You Need them in Production? Session

Recent papers from Google and the European Commission emphasized the need for solutions to monitor data quality & lineage. Enforced by our experience, we want to highlight three advantages for monitoring in production: - Boost efficiency of data processes - Increase confidence in models in real-time - Ensure accountability to fulfill policies

Rupert is a senior product manager for the Elsevier data platform. He has worked in digital product management roles for 10 years, in the past 3 years at RELX, for the LexisNexis and Elsevier businesses. His work has focused on helping business customers find and understand relationships in content and data in the legal and scientific industries, leveraging a range of graph database technologies. A key focus for him is understanding how technology solutions deliver customer value and successfully delivering search, recommendation and data platform products.

Presentations

Data cleaning at scale Session

The ultimate purpose of data is to drive decisions, but commonly in the real world things aren’t as reliable or accurate as we would like them to be. The main reason why data gets dirty and often unreliable is simple: human intervention. So how do you maintain the reliability of data that is constantly exposed to and updated by your users?

Phillip Radley is chief data architect on BT’s core enterprise architecture team, where he’s responsible for data architecture across BT Group Plc. Based at BT’s Adastral Park campus in the UK, Phill leads BT’s MDM and big data initiatives, driving associated strategic architecture and investment roadmaps for the business. Phill has worked in IT and the communications industry for 30 years, mostly with British Telecommunications Plc., and his previous roles in BT include nine years as chief architect for infrastructure performance-management solutions from UK consumer broadband to outsourced Fortune 500 networks and high-performance trading networks. He has broad global experience, including with BT’s Concert global venture in the US and five years as an Asia Pacific BSS/OSS architect based in Sydney. Phill is a physics graduate with an MBA.

Presentations

Pivoting from BI on Hadoop to ML & streaming in the cloud at BT Session

Enterprise *IT* has been delivering *BI* on Hadoop for a few years but frustrated business analysts and data scientists now want self-service data & ML in the cloud, so they can go much faster. This session explores the challenges encountered when enterprise IT teams have to quickly pivot; from caring for an elephant on-premise to farming herds of clusters, pipelines and models in clouds.

Rajkumar Iyer is Sr Staff Engineer in Qubole (Bangalore) working on challenges of running Spark as a Service on cloud. His current interests include auto scaling, task scheduling and transactions in Big Data systems. Prior to this he worked at Aerospike on hyper scale real time distributed key-value store and Sybase shared disk distributed database.

Presentations

ACID for Big Data Lakes on Apache Hive, Apache Spark and Presto Session

An open-source framework for Apache Hive, Apache Spark and Presto that provides cross engine ACID transactions, and enables performant and cost-effective Updates and Deletes on Big Data Lakes on the cloud.

Manu Ram Pandit is a senior software engineer on the data analytics and infrastructure team at LinkedIn. He has extensive experience in building complex and scalable applications. During his tenure at LinkedIn, he’s influenced design and implementation of hosted notebooks, providing a seamless experience to end users. He works closely with customers, engineers, and product to understand and define the requirements and design of the system. Previously, he was with Paytm, Amadeus, and Samsung, where he built scalable applications for various domains.

Presentations

Darwin: Evolving hosted notebooks at LinkedIn Session

Come and learn the challenges we overcame to make Darwin (Data Analytics and Relevance Workbench at LinkedIn) a reality. Know about how data scientists, developers, and analysts at LinkedIn can share their notebooks with their peers, author work in multiple languages, have their custom execution environments, execute long-running jobs, and do much more on a single hosted notebooks platform.

Nathalie Rauschmayr is a machine learning scientist at AWS, where she helps customers develop deep learning applications. She has a research background in high-performance computing, having conducted research in several international research organizations including the German Aerospace Center, the European Organization for Nuclear Research (CERN), and Lawrence Livermore National Laboratory (LLNL).

Presentations

Using Amazon SageMaker to build, train and deploy ML models 1-day training

In this workshop, attendees will build, train and deploy a deep learning model on Amazon SageMaker and they will learn how to use some of the latest SageMaker features such as SageMaker Debugger and SageMaker Model Monitor.

Bhargavi is currently working as a Senior Data Engineer at Netflix. She is part of the Platform and Security Data engineering team where she builds large scale analytical data products to enable efficient utilization of Netflix’s cloud resources and strengthen its privacy & security posture. She is passionate about advancing the cause of Women in Technology. She actively volunteers with various I&D and Women in Tech groups internally and externally to encourage and inspire women to pursue and excel in technology careers. Bhargavi graduated with a masters in Information Systems Management from Carnegie Mellon University. When she’s not working, you’re most likely to find Bhargavi at a Bollywood dance academy or exploring the world or enjoying Indian food.

Presentations

Driving cloud efficiency and security using AWS S3 access logs at Netflix Session

This talk will cover the driving forces for effective Data Lifecycle Management (DLM) at Netflix, current state of Netflix’s S3 data warehouse, overview of S3 access logs collection process using SQS and Apache Iceberg, and how the S3 logs are being used for improving the efficiency and security posture of our cloud infrastructure at scale in the DLM realm.

Nikki Rouda is a principal product marketing manager at Amazon Web Services (AWS). Nikki has decades of experience leading enterprise big data, analytics, and data center infrastructure initiatives. Previously he held senior positions at Cloudera, Enterprise Strategy Group (ESG), Riverbed, NetApp, Veritas, and UK-based Alertme.com (an early consumer IoT startup). Nikki holds an MBA from Cambridge’s Judge Business School and an ScB in geophysics from Brown University.

Presentations

Build a serverless data lake for analytics 1-day training

Learn how to build a serverless data lake on AWS. In the workshop you'll ingest Instacart's online grocery shopping public dataset to the data lake and draw valuable insights on consumer shopping trends. You’ll build data pipelines, leverage data lake storage infrastructure, configure security and governance policies, create a persistent catalog of data, perform ETL, and run ad-hoc analysis.

Building a secure, scalable, and transactional data lake on AWS 2-Day Training

In this workshop, we will walk you through the steps of building a data lake on Amazon S3 using different ingestion mechanisms, performing incremental data processing on the data lake to support transactions on S3, and securing the data lake with fine grained access control policies.

Building a secure, scalable, and transactional data lake on AWS (Day 2) Training Day 2

In this workshop, we will walk you through the steps of building a data lake on Amazon S3 using different ingestion mechanisms, performing incremental data processing on the data lake to support transactions on S3, and securing the data lake with fine grained access control policies.

Nipun Sadvilkar is a Senior Data Scientist at US healthcare company Episource, helping to design and build the Clinical NLP engine to revamp medical coding workflows, enhance coder efficiency and accelerate revenue cycle. He has 3+ years of experience in building NLP solutions and web-based data science platforms in the area of healthcare, finance, media, and psychology. His interest lies at the intersection of Machine learning and Software engineering with a fair understanding of the business domain. Nipun is a member of PyCon India, PyDelhi, PyData Mumbai, SciPy India, and blogs regularly about Python and AI on his website.

Presentations

Clinical NLP : Building Named Entity Recognition model for clinical text Session

Episource is building a Clinical NLP engine to extract medical facets from medical charts to automate coding in claims submissions. They use medical coder's expertise to review highlighted clinical entities and their auto-suggested ICD10 codes. Nipun will talk about building a key component of Episource's Clinical NLP engine - Clinical NER, from data annotation to models and techniques.

I am an Enterprise Architect heading the Data and Analytics platform design and strategy in Barclays People Analytics and the wider HR. I have around 15 years of industry experience working at the enterprise level in Data Landscape, including data migration, data warehouse, BI, AI, ML and data strategy. I helped a wide variety of organisations with their data journey; they include Barclays, Bank Of England, BAE Systems, Network Rail, Sky, Standard Life, Orange, Honeywell Corp.

Presentations

Architecting platform for people analytics Session

People Analytics has become key to unlocking human resource insights to understand and measure policy effectiveness and implement improvements by embedding intelligent decision making in the processes. In this session, I will talk about the pipeline we have developed, and corresponding controls and governance model implemented to support various People Analytics use-cases in Barclays.

Majken Sander is a data nerd, business analyst, and solution architect. Majken has worked with IT, management information, analytics, BI, and DW for 20+ years. Armed with strong analytical expertise, she is keen on “data driven” as a business principle, data science, the IoT, and all other things data. Read more majkensander.com

Presentations

Let’s talk data literacy - and ethics Session

Schools and society in general need to focus on educating the citizens to raise their digital awareness. Companies need to be building the data literacy competencies of their employees. And the digital economy strategy of the companies should include data ethics and might also chose to embrace it as a competitive edge gained via branding value.

Flávio Roberto Santos works as a Data Infrastructure Engineer at Spotify in Stockholm, Sweden. He currently works in the event delivery team, whose main responsibility is to build and maintain a internal data platform used to collect and store events from Spotify clients and backend services. Before joining Spotify, Flávio worked managing Hadoop, Cassandra, Kafka, and Elasticsearch clusters in Brazil.

Presentations

A journey through Spotify event delivery system Session

Data has been a first-class citizen at Spotify since the beginning. It is an important component of the ecosystem that allows data scientists and analysts to improve features and develop new products. Events collected from instrumented clients and backends go through a complex system before they are available for internal teams. This talk goes deep into how event delivery is built inside Spotify.

Richard Sargeant is the chief commercial officer at Faculty. Richard supports senior leaders across a variety of sectors to transform their businesses to use AI effectively. Previously, he was director of transformation at the Home Office, where he oversaw the creation of the second most advanced in-house machine learning capability in government; he was one of the founding directors of the UK’s Government Digital Service; and he was at Google. He has also worked at the Prime Minister’s Strategy Unit and HM Treasury. He is a nonexec on the Board of Exeter University, and the Government’s Centre for Data Ethics and Innovation. He has a degree in political philosophy, economics, and social psychology from Cambridge University.

Presentations

AI safety: How do we bridge the gap between technology and the law? Session

Firms and government have become more aware of the risk of "black-box" algorithms that "work," but in an opaque way. Existing laws and regulations merely stipulate what ought to be the case and not to achieve it technically. Richard Sargeant is joined by leading figures from law, technology, and businesses to interrogate this subject.

Frank Säuberlich is Chief Data Officer at EnBW.

Presentations

AI@Scale Driving the German Energiewende Session

Almost two years ago we at EnBW developed our core beliefs for the role of AI at EnBW and derived concrete actions that need to be taken in order to scale our AI activities. In our talk, we will give an overview on those aspects and will describe the challenges we have been facing on our journey so far. Further, we will describe the particular approaches we took to master these challenges.

Alejandro Saucedo is chairman at the Institute for Ethical AI & Machine Learning. In his more than 10 years of software development experience, Alejandro has held technical leadership positions across hypergrowth scale-ups and tech giants including Eigen Technologies, Bloomberg LP, and Hack Partners. Alejandro has a strong track record of building multiple departments of machine learning engineers from scratch and leading the delivery of numerous large-scale machine learning systems across the financial, insurance, legal, transport, manufacturing, and construction sectors in Europe, the US, and Latin America.

Presentations

A practical ML Ops framework for machine learning at massive scale Session

Managing production machine learning systems at scale has uncovered new challenges that require fundamentally different approaches to traditional software engineering or data science. Alejandro Saucedo explores ML Ops, a concept that often encompasses the methodologies to continuously integrate, deploy and monitor machine learning in production at massive scale.

Conor Sayles is Group Advanced Analytics Lead at Bank of Ireland, reporting into the Chief Data Officer. In this role he leads an analytics team with an annual €10m data value realisation target, and coordinates collaboration with analytics teams embedded in business functions across the Bank. Conor has 18 years’ experience in the retail banking industry, including risk model development, regulatory reporting, data visualisation and automated decisioning.

Presentations

Building an Ecosystem for Analytics Success at Bank of Ireland Session

This session will describe how Bank of Ireland has led out a data value realisation strategy, yielding a return of over €70m and incorporating infrastructure investment, agile management and design thinking. An analytic system including Tableau, Teradata, SAS and Cloudera provides a cornerstone for decision-making across multiple functions. Underlying the success is a growing data community.

Machine Learning Engineer

Presentations

Finding payment information on invoices using machine learning Session

Finvoice started out as a small project consisting of one machine learning engineer and 50 invoices, today Finvoice is used by companies that scan over 80 million invoices per year. This session highlights how machine learning can be used to capture payment information on invoices and how we expanded from a cloud-based API solution to doing the inference directly on customers mobile phones.

Tuhin Sharma is a cofounder and CTO of Binaize, an AI-based firm. Previously, he was a data scientist at IBM Watson and Red Hat, where he mainly worked on social media analytics, demand forecasting, retail analytics, and customer analytics, and he worked at multiple startups, where he built personalized recommendation systems to maximize customer engagement with the help of ML and DL techniques across multiple domains like fintech, ed tech, media, and ecommerce. He’s filed five patents and published four research papers in the field of natural language processing and machine learning. He holds a postgraduate degree in computer science and engineering, specializing in data mining, from the Indian Institute of Technology Roorkee. He loves to play table tennis and guitar in his leisure time. His favorite quote is, “Life is beautiful.”

Presentations

Writer-independent offline signature verification in banks using few-shot learning Session

Offline signature verification is one of the most critical tasks in traditional banking and financial industries. The unique challenge is to detect subtle but crucial differences between genuine and forged signatures. This verification task is even more challenging in writer-independent scenarios. Tuhin Sharma and Pravin Jha detail few-shot image classification.

Thunder Shiviah is a senior solutions architect at Databricks. Previously, Thunder was a machine learning engineer at McKinsey & Company, focused on productionizing machine learning at scale.

Presentations

Managing the full deployment lifecycle of machine learning models with MLflow Session

Thunder Shiviah and Cyrielle Simeone dive into MLflow, an open source platform from Databricks, to manage the complete ML lifecycle, including experiment tracking, model management, and deployment. With over 140 contributors and 800,000 monthly download on PyPi, MLflow has gained tremendous community adoption, demonstrating the need for an open source platform for the ML lifecycle.

Cyrielle Simeone is the product marketing manager for data science and machine learning at Databricks.

Presentations

Managing the full deployment lifecycle of machine learning models with MLflow Session

Thunder Shiviah and Cyrielle Simeone dive into MLflow, an open source platform from Databricks, to manage the complete ML lifecycle, including experiment tracking, model management, and deployment. With over 140 contributors and 800,000 monthly download on PyPi, MLflow has gained tremendous community adoption, demonstrating the need for an open source platform for the ML lifecycle.

Julien Simon is a technical evangelist at AWS. Previously, Julien spent 10 years as a CTO and vice president of engineering at a number of top-tier web startups. He’s particularly interested in all things architecture, deployment, performance, scalability, and data. Julien frequently speaks at conferences and technical workshops, where he helps developers and enterprises bring their ideas to life thanks to the Amazon Web Services infrastructure.

Presentations

A pragmatic introduction to graph neural networks Session

Julien Simon offers an overview of graph neural networks (GNNs), one of the most exciting developments in machine learning today. You'll discuss real-life use cases for which GNNs are a great fit and get started with GNNs using the Deep Graph Library, an open source library built on top of Apache MXNet and PyTorch.

Pramod Singh is currently playing a role of Machine Learning Expert at Walmart Labs. He has extensive hands-on experience in machine learning, deep learning, AI, data engineering, designing algorithms and application development. He has spent more than 10 years working on multiple data projects at different organizations. He’s the author of three books -Machine Learning with PySpark , Learn PySpark and Learn TensorFlow 2.0. He is also a regular speaker at major conferences such as O’Reilly’s Strata and AI conferences. Pramod holds a BTech in electrical engineering from B.A.T.U, and an MBA from Symbiosis University. He has also done Data Science certification from IIM–Calcutta. He lives in Bangalore with his wife and three-year-old son. In his spare time, he enjoys playing guitar, coding, reading, and watching football.

Presentations

Attention Networks all the way to production using Kubeflow 1-day training

With the latest developments and improvements in the field of deep learning and artificial intelligence, many demanding natural language processing tasks become easy to implement and execute. Text summarization is one of the tasks that can be done using attention networks.

Karol Sobczak is a Software Engineer and a founding member of the Starburst team. He contributes to the Presto code base and is also active in the community. Karol has been involved in the design and development of significant features in Presto like the Kubernetes integration, cost-based optimizer, correlated subqueries, distributed ordering and a plethora of smaller planner and performance enhancements. Previously, he worked at Teradata Labs, Hadapt and IBM Research. A graduate of Warsaw University and the Vrije University of Amsterdam.

Presentations

Presto on Kubernetes: Query Anything, Anywhere Session

Presto, the open source SQL engine for Big Data, offers high concurrency, low-latency queries across multiple data sources within one query. With Kubernetes, you may easily deploy and manage Presto clusters across hybrid and multi cloud environment with built-in high availability, autoscaling and monitoring. Available now on RedHat OpenShift and Kubernetes Engines from AWS, Google Cloud, Azure.

Abhishek Somani is a Senior Staff Software engineer on the Hive team at Qubole. Previously, Abhishek worked at Citrix and Cisco. He holds a degree from NIT Allahabad.

Presentations

ACID for Big Data Lakes on Apache Hive, Apache Spark and Presto Session

An open-source framework for Apache Hive, Apache Spark and Presto that provides cross engine ACID transactions, and enables performant and cost-effective Updates and Deletes on Big Data Lakes on the cloud.

Ankit Srivastava is a senior data scientist on the core data science team for the Azure Cloud + AI Platform Division at Microsoft, where he focuses on commercial and education segment data science projects within the company. Previously, he was a developer on the data integration and insights team. He has built several production-scale ML enrichments that are leveraged for sales compensation and senior leadership team metrics.

Presentations

Infinite segmentation: Scalable mutual information ranking on real-world graphs Session

Today, normal growth isn't enough—you need hockey-stick levels of growth. Sales and marketing orgs are looking to AI to "growth hack" their way to new markets and segments. Ken Johnston and Ankit Srivastava explain how to use mutual information at scale across massive data sources to help filter out noise and share critical insights with new cohort of users, businesses, and networks.

Presentations

Building a Universal Recommendations framework for an ever changing landscape Session

In this talk we share our experience on building a framework to build public service recommendations for the BBC, deploying in multiple clouds, following our machine learning principles, and that could reflect the values from our editorial: inform, educate and entertain.

Bargava Subramanian is a cofounder and deep learning engineer at Binaize in Bangalore, India. He has 15 years’ experience delivering business analytics and machine learning solutions to B2B companies, and he mentors organizations in their data science journey. He holds a master’s degree from the University of Maryland, College Park. He’s an ardent NBA fan.

Presentations

Democratize and build better deep learning models using TensorFlow.js Session

Bargava Subramanian and Amit Kapoor use two real-world examples to show how you can quickly build visual data products using TensorFlow.js to address the challenges inherent in understanding the strengths, weaknesses, and biases of your models as well as involving business users to design and develop a more effective model.

Perumal is a Data scientist with a masters in Data science, working at Data Reply with 5 years of software development experience with 4 different companies in sentiment analysis, NLP, big data processing, text analytics and implementing machine learning models.

Focusing on AI using reinforcement learning.

Presentations

Deep reinforcement learning for NLP Session

NLP tasks using supervised ML perform poorly where conversational context is involved. This session will cover the implementation of deep reinforcement learning in NLP as a coherent and better predictor in handling problems like Q&A , dialogue generation ,and article summarisation by simulation of two agents taking turns that explore state-action space and learning a policy.

Dan is a software architect, author, and instructor with over 25 years of experience in the tech industry. He has extensive experience is multiple fields, including machine learning, data science, streaming analytics and cloud architecture.

Dan’s latest books include NoSQL for Mere Mortals, Google Cloud Certified Associate Cloud Engineer Study Guide, and Google Cloud Professional Architect Study Guide (forthcoming). His courses cover a range of topics, including scalable machine learning, data science topics, Scala, Cassandra and Advanced SQL, and have accumulated almost one million views across Lynda and LinkedIn Learning.

Dan holds a PhD. in genetics and computational biology.

Presentations

Don’t Be That Developer Who Puts a Biased Model into Production Session

ML models may perform as expected from a reliability and scalability perspective, but make poor decisions that cost sales and trust. In worst-case scenarios, decisions may violate policies and government regulations. In this talk, attendees will learn techniques for identifying bias, leveraging explainability methods to measure compliance and incorporating these techniques into DevOps practices.

Václav Surovec comanages the Big Data Department at Deutsche Telekom IT. The department’s more than 45 engineers deliver big data projects to Germany, the Netherlands, and the Czech Republic. Recently, he led the Commercial Roaming project. Previously, he worked at T-Mobile Czech Republic while he was still a student of Czech Technical University in Prague.

Presentations

Machine Learning processes at Deutsche Telekom Global Carrier Session

Deutsche Telekom is 4th biggest telecommunication company in the world, and every day millions of our customers are using their mobile services in roaming. This presentation is about how we designed and built our machine learning processes on top of Cloudera Hadoop cluster to support commercial roaming business at Deutsche Telekom Global Carrier.

Ben lives by the tag-line “I make things and break things”, believing that the best learning opportunities come when things don’t work. He is currently building the Video Player Delivery Platform at Netflix that’s used to manage the realtime deployment and monitoring of daily updates to video player devices around the world.

Presentations

How Netflix uses Real-time Insights to Ensure a High Quality Streaming Experience Session

Ensuring a consistently great Netflix experience while continuously pushing innovative technology updates is no easy feat. We'll look at how Netflix turns log streams into real-time metrics to provide visibility into how devices are performing in the field. Including sharing some of the lessons learned around optimizing Druid to handle our load.

Andras Szabo is a data scientist at Pivigo (London, UK) where, besides working on internal data science projects, he takes part in facilitating projects carried out by either freelancers or aspiring data scientist from academia. A physicist by training, he has a decade of experience working in the biological academic research field on experimental analysis and hypothesis testing through computational simulations. After leaving academia he worked as a freelance data scientist in a variety of fields, including healthcare and finance, before joining Pivigo in 2019.

Presentations

Beyond smart infrastructure: leveraging satellite data to detect wildfires Session

Wildfires are a major environmental and health risk, with a frequency that has increased dramatically in the past decade. Early detection is critical, however most often wildfires are only discovered by eye-witness accounts. In this talk we will tell about a data science partnership between HAL24K and Pivigo aimed at building an automated wildfire detection system using NOAA satellite data.

Shubham Tagra is a senior staff engineer at Qubole working on Presto and Hive development and making these solutions cloud ready. Previously, Shubham worked on the storage area network at NetApp. Shubham holds a bachelor’s degree in computer engineering from the National Institute of Technology, Karnataka, India.

Presentations

ACID for Big Data Lakes on Apache Hive, Apache Spark and Presto Session

An open-source framework for Apache Hive, Apache Spark and Presto that provides cross engine ACID transactions, and enables performant and cost-effective Updates and Deletes on Big Data Lakes on the cloud.

Angus Taylor is a data scientist at Microsoft, where he builds AI solutions for customers. He holds a MSc in artificial intelligence and has previous experience in the retail, energy, and government sectors.

Presentations

Solving real-world computer vision problems with open source Session

Training and deployment of deep neural networks for computer vision (CV) in realistic business scenarios remains a challenge for both data scientists and engineers. Angus Taylor and Patrick Buehler dig into state-of-the-art in the CV domain and provide resources and code examples for various CV tasks by leveraging the Microsoft CV best-practices repository.

Alex Thomas is a data scientist at John Snow Labs. He’s used natural language processing (NLP) and machine learning with clinical data, identity data, and job data. He’s worked with Apache Spark since version 0.9 as well as with NLP libraries and frameworks including UIMA and OpenNLP.

Presentations

Advanced Natural language processing with Spark NLP 1-day training

This is a hands-on training covering applying the latest advances in deep learning for common NLP tasks such as named entity recognition, document classification, sentiment analysis, spell checking and OCR. Learn to build complete text analysis pipelines using the highly performant, high scalable, open-source Spark NLP library in Python.

Ward Van Laer is the leading machine learning engineer at IxorThink, the AI practice of the Belgian software company Ixor.
Fascinated by the mystery and power of the human mind, he is inventive in explaining how AI models work and how to interpret their results.
At IxorThink, Ward demonstrated a lot of added value with operational AI driven solutions in the content marketing industry and healthcare sector.

Presentations

Understanding AI: Interpretability and UX Session

A machine learning solution is only as good as it is deemed by the end-user. More often than not, we do not think through how results are communicated or measured. If we want business- end end-users to trust and correctly interpret AI models, we might need to make our models transparent and understandable.

Navneet Kumar Verma is a Software Engineer on the data analytics and infrastructure team at Linkedin. He contributes to Darwin by designing and developing backend services & infrastructure. He is working with Darwin customers and stakeholders to add new features/extend Darwin to transform it into a data app platform. In the past, he had been working with WalmartLabs and Manhattan Associates to build big data applications.

Presentations

Darwin: Evolving hosted notebooks at LinkedIn Session

Come and learn the challenges we overcame to make Darwin (Data Analytics and Relevance Workbench at LinkedIn) a reality. Know about how data scientists, developers, and analysts at LinkedIn can share their notebooks with their peers, author work in multiple languages, have their custom execution environments, execute long-running jobs, and do much more on a single hosted notebooks platform.

Naghman Waheed is the data platforms lead at Bayer Crop Science, where he’s responsible for defining and establishing enterprise architecture and direction for data platforms. Naghman is an experienced IT professional with over 25 years of work devoted to the delivery of data solutions spanning numerous business functions, including supply chain, manufacturing, order to cash, finance, and procurement. Throughout his 20+ year career at Bayer, Naghman has held a variety of positions in the data space, ranging from designing several scale data warehouses to defining a data strategy for the company and leading various data teams. His broad range of experience includes managing global IT data projects, establishing enterprise information architecture functions, defining enterprise architecture for SAP systems, and creating numerous information delivery solutions. Naghman holds a BA in computer science from Knox College, a BS in electrical engineering from Washington University, an MS in electrical engineering and computer science from the University of Illinois, and an MBA and a master’s degree in information management, both from Washington University.

Presentations

Enabling data streaming from SAP ERP system using KAFKA Session

IT information systems have been a key enabler for our business in a very competitive environment. As the complexity of our business has grown so has the need to provide data for real-time business analytics and BI. A unique architecture has been setup to stream data out of our SAP ERP using SAP SLT and KAFKA enabling business to make decision based on real-time events.

Mary Wahl is a data scientist on the AI for Earth team at Microsoft, which helps NGOs apply deep learning to problems in conservation biology and environmental science. Mary has also worked on computer vision and genomics projects as a member of Microsoft’s algorithms and data science solutions team in Boston. Previously, Mary studied recent human migration, disease risk estimation, and forensic reidentification using crowdsourced genomic and genealogical data as a Harvard College Fellow.

Presentations

AI applications in aerial imagery Session

With the increasing availability of massive high-resolution aerial imagery, the geospatial information system community and the computer vision (CV) community joined forces in the new field of "geo AI." Mary Wahl and Ye Xing introduce you to this new field with live demos and sample code for common AI applications to aerial imagery from both commercial and government use cases.

Dean Wampler is an expert in streaming data systems, focusing on applications of ML/AI. Formerly, he was the vice president of fast data engineering at Lightbend, where he led the development Lightbend CloudFlow, an integrated system for building and running streaming data applications with Akka Streams, Apache Spark, Apache Flink, and Apache Kafka. Dean is the author of Fast Data Architectures for Streaming Applications, Programming Scala and Functional Programming for Java Developers and the coauthor of Programming Hive, all from O’Reilly. He’s a contributor to several open source projects. A frequent Strata speaker, he’s also the co-organizer of several conferences around the world and several user groups in Chicago. He has a Ph.D. in Physics from the University of Washington.

Presentations

Understanding Data Governance for Machine Learning Models Session

Production deployment of ML models requires Data Governance, because models are data. This session justifies that claim, then explores its implications and techniques for satisfying the requirements. Using motivating examples, we’ll explore reproducibility, security, traceability, and auditing, plus some unique characteristics of models in production settings.

Using Ray to Scale Python, Data Processing, and Machine Learning 1-day training

Surprisingly, there is no simple way to scale up Python applications from your laptop to the cloud. Ray is an open source framework for parallel and distributed computing that makes it easy to program and analyze data at any scale by providing general-purpose high-performance primitives. This training will show how to use Ray to scale up Python applications, data processing, and machine learning

Jiao (Jennie) Wang is a Sr. Software Engineer on the big data technology team at Intel, where she works in the area of big data analytics. She’s engaged in developing and optimizing distributed deep learning framework on Apache Spark.

Presentations

Real-time Drive-Thru Recommendation leveraging Deep Learning using Analytics Zoo on Spark Session

Drive-thru innovation is the big thing in Quick Serving Restaurant (QSR) industry. This talk shows an effective real-time menu recommendation system in this area leveraging cutting-edge deep learning technologies on big data ecosystems to deliver better guest drive-thru experience based on guest order baskets along with other context factors like time of the day, weather condition, etc.

Luyang Wang is the Sr. Manager, Guest Intelligence and Data Science at Restaurant Brands International, where he works on machine learning and big data analytics. He’s engaged in developing distributed machine learning applications and real-time recommendation services for BurgerKing brand. Before joining RBI Luyang has been working at OfficeDepot and Philips Big Data&AI Lab.

Presentations

Real-time Drive-Thru Recommendation leveraging Deep Learning using Analytics Zoo on Spark Session

Drive-thru innovation is the big thing in Quick Serving Restaurant (QSR) industry. This talk shows an effective real-time menu recommendation system in this area leveraging cutting-edge deep learning technologies on big data ecosystems to deliver better guest drive-thru experience based on guest order baskets along with other context factors like time of the day, weather condition, etc.

A Data Strategist with a history in adserving, affiliate marketing and securities lending prior to joining Elsevier. Primarily works with rapid prototyping and investigative analysis in Spark, Python and Redshift. Used to have hobbies, now has a young daughter. Powered by coffee, yoga and music.

Presentations

Data cleaning at scale Session

The ultimate purpose of data is to drive decisions, but commonly in the real world things aren’t as reliable or accurate as we would like them to be. The main reason why data gets dirty and often unreliable is simple: human intervention. So how do you maintain the reliability of data that is constantly exposed to and updated by your users?

Benjamin Wright-Jones is a solution architect at the Microsoft WW Services CTO Office for Data and AI, where his team helps enterprise customers solve their analytical challenges. Over his career, Ben has worked on some of the largest and most complex data-centric projects around the globe.

Presentations

(Partially) demystifying DevOps for AI Session

DevOps, DevSecOps, AIOps, ML Ops, Data Ops, No Ops....Ditch your confusion and join Simon Lidberg and Benjamin Wright-Jones to understand what DevOps means for AI and your organization.

Ye Xing is a senior data scientist at Microsoft and has rich experience in providing end-to-end big data analytic solutions to big enterprise customers. Her main focused vertical areas for enterprise customers are (but not limited to) customer segmentation, personalized recommendation, churn prediction, predictive maintenance. Beyond the classic machine learning techniques, she’s also familiar with cutting-edge machine learning techniques, deep learning in computer vision and medical image analysis and advanced online learning recommendation algorithms.

Presentations

AI applications in aerial imagery Session

With the increasing availability of massive high-resolution aerial imagery, the geospatial information system community and the computer vision (CV) community joined forces in the new field of "geo AI." Mary Wahl and Ye Xing introduce you to this new field with live demos and sample code for common AI applications to aerial imagery from both commercial and government use cases.

Itai Yaffe is a big data tech lead at Nielsen Marketing Cloud, where he deals with big data challenges using tools like Spark, Druid, Kafka, and others.
He is also a part of the Israeli chapter’s core team of Women in Big Data.
Itai is keen about sharing his knowledge and has presented his real-life experience in various forums in the past.

Presentations

Casting the spell: Druid advanced techniques Session

At Nielsen Marketing Cloud, we leverage Apache Druid to provide our customers (marketers and publishers) real-time analytics tools for various use-cases, including in-flight analytics, reporting and building target audiences. In this talk, we will discuss advanced Druid techniques, such as efficient ingestion of billions of events per day, query optimization, and data retention and deletion.

Jennifer Yang is the head of data management and risk control at Wells Fargo Enterprise Data Technology. Previously, Jennifer served various senior leadership roles in risk management and capital management at major financial institutions. Jennifer’s unique experience allows her to understand data and technology from both the end user’s and data management’s perspectives. Jennifer is passionate about leveraging the power of new technologies to gain insights from the data to develop cost effective and scalable business solutions. Jennifer holds an undergraduate degree in applied chemistry from Beijing University, a master’s degree in computer science from the State University of New York at Stony Brook, and an MBA specializing in finance and accounting from New York University’s Stern School of Business.

Presentations

Apply Machine Learning Technique in Data Quality Management Session

Traditional rule-based data quality management methodology is costly and poorly scalable. It requires subject matter experts within business, data and technology domains. The presentation will discuss a use case that demonstrates how the machine learning techniques can be used in the data quality management on the big data platform in the financial industry.

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires