Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA
 
< Filters

No Results Found

Clear all filters

Close

Filters

      Clear filters
      LL20 A
      Add Data Case Studies to your personal schedule
      9:00am Data Case Studies Barbara Eckman (Comcast), Dirk Jungnickel (Emirates Integrated Telecommunications Company (du)), Kishore Papineni (Astellas Pharma), Carlo Torniai (Pirelli Tyre), Bryan Harrison (American Express), Chris Murphy, Emma Jones, Maura Lynch (Pinterest), Nixon Patel (Kovid Group), Bas Geerdink (ING)
      LL20 C
      Add Data 101 to your personal schedule
      9:00am Data 101 Edd Wilder-James (Silicon Valley Data Science), Melanie Warrick (Skymind), Jim Scott (MapR Technologies, Inc.), Ellen Friedman (Independent)
      Add Developing a modern enterprise data strategy to your personal schedule
      1:30pm Developing a modern enterprise data strategy Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science)
      LL20 D
      Add Architecting a data platform to your personal schedule
      9:00am Architecting a data platform John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
      Add Architecting a next-generation data platform to your personal schedule
      1:30pm Architecting a next-generation data platform Jonathan Seidman (Cloudera), Ted Malaska (Blizzard), Mark Grover (Cloudera), Gwen Shapira (Confluent)
      LL21 B
      Add Unravelling data with Spark using machine learning to your personal schedule
      9:00am Unravelling data with Spark using machine learning Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera)
      Add Guerrilla guide to Python and Apache Hadoop to your personal schedule
      1:30pm Guerrilla guide to Python and Apache Hadoop Juliet Hougland (Cloudera)
      LL21 C/D
      Add Using R for scalable data analytics: From single machines to Hadoop Spark clusters to your personal schedule
      9:00am Using R for scalable data analytics: From single machines to Hadoop Spark clusters Vanja Paunic (Microsoft), Robert Horton (Microsoft), Hang Zhang (Microsoft), Srini Kumar (Microsoft), Mengyue Zhao (Microsoft), John-Mark Agosta (Microsoft), Mario Inchiosa (Microsoft), Debraj GuhaThakurta (Microsoft Corporation)
      Add Modeling big data with R, sparklyr, and Apache Spark to your personal schedule
      1:30pm Modeling big data with R, sparklyr, and Apache Spark John Mount (Win Vector LLC), Steve Nolen (RStudio), Edgar Ruiz (RStudio)
      LL21 E/F
      Add Getting started with TensorFlow to your personal schedule
      9:00am Getting started with TensorFlow Josh Gordon (Google)
      Add Scalable deep learning for the enterprise with DL4J to your personal schedule
      1:30pm Scalable deep learning for the enterprise with DL4J Dave Kale (Skymind), Susan Eraly (Skymind), Melanie Warrick (Skymind), Josh Patterson (Skymind)
      210 A/E
      Add Learn stream processing with Apache Beam to your personal schedule
      9:00am Learn stream processing with Apache Beam Jesse Anderson (Smoking Hand), Frances Perry (Google), Tyler Akidau (Google)
      210 C/G
      Add Exploration and visualization of large, complex datasets with R, Hadoop, and Spark to your personal schedule
      9:00am Exploration and visualization of large, complex datasets with R, Hadoop, and Spark Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting)
      Add Introduction to visualizations using D3 to your personal schedule
      1:30pm Introduction to visualizations using D3 Brian Suda ((optional.is))
      210 D/H
      Add Building your first big data application on AWS to your personal schedule
      9:00am Building your first big data application on AWS Rahul Bhartia (Amazon Web Services)
      Grand Ballroom
      Add Startup Showcase  to your personal schedule
      6:30pm Startup Showcase | Room: Grand Ballroom
      LL20 B
      Add Data, Transportation, Logistics to your personal schedule
      9:00am Data, Transportation, Logistics Andreas Ribbrock (#zeroG, A Lufthansa Systems Company), Rodrigo Fontecilla (Unisys), Rodrigo Fontecilla (Unisys), Ryan Baumann (Mapbox), Jay White Bear (IBM), Andre Luckow (BMW Group), Crystal Valentine (MapR Technologies)
      LL21 A
      Add Deploying and operating big data analytic apps on the public cloud to your personal schedule
      9:00am Deploying and operating big data analytic apps on the public cloud Jennifer Wu (Cloudera), Vinithra Varadharajan (Cloudera), Andrei Savu (Cloudera), Matthew Jacobs (Cloudera)
      210 B/F
      Add Just enough Scala for Spark to your personal schedule
      9:00am Just enough Scala for Spark Dean Wampler (Lightbend)
      Add Determining the economic value of your data to your personal schedule
      1:30pm Determining the economic value of your data William Schmarzo (Dell EMC)
      Add Opening Reception to your personal schedule
      5:00pm Opening Reception | Room: Hall 1, 2, 3
      12:30pm Lunch | Room: 230 A-C
      10:30am Morning break | Room: Break
      3:00pm Afternoon break | Room: Break
      8:00am Coffee break | Room: LL Foyer and Executive Concourse
      9:00am-5:00pm (8h)
      Data Case Studies
      Barbara Eckman (Comcast), Dirk Jungnickel (Emirates Integrated Telecommunications Company (du)), Kishore Papineni (Astellas Pharma), Carlo Torniai (Pirelli Tyre), Bryan Harrison (American Express), Chris Murphy, Emma Jones, Maura Lynch (Pinterest), Nixon Patel (Kovid Group), Bas Geerdink (ING)
      In a series of 12 half-hour talks aimed at a business audience, you’ll hear data-themed case studies from household brands and global companies, explaining the challenges they wanted to tackle, the approaches they took, and the benefits—and drawbacks—of their solutions. If you want practical insights about applied data, look no further.
      9:00am-12:30pm (3h 30m)
      Data 101
      Edd Wilder-James (Silicon Valley Data Science), Melanie Warrick (Skymind), Jim Scott (MapR Technologies, Inc.), Ellen Friedman (Independent)
      Data 101 introduces you to core principles of data architecture, teaches you how to build and manage successful data teams, and inspires you to do more with your data through real-world applications. Setting the foundation for deeper dives on the following days of Strata + Hadoop World, Data 101 reinforces data fundamentals and helps you focus on how data can solve your business problems.
      1:30pm-5:00pm (3h 30m) Data-driven business management, Strata Business Summit
      Developing a modern enterprise data strategy
      Edd Wilder-James (Silicon Valley Data Science), Scott Kurth (Silicon Valley Data Science)
      Big data and data science have great potential for accelerating business, but how do you reconcile the business opportunity with the sea of possible technologies? Data should serve the strategic imperatives of a business—those aspirations that will define an organization’s future vision. Scott Kurth and Edd Wilder-James explain how to create a modern data strategy that powers data-driven business.
      9:00am-12:30pm (3h 30m) Spark & beyond Architecture
      Architecting a data platform
      John Akred (Silicon Valley Data Science), Stephen O'Sullivan (Silicon Valley Data Science)
      What are the essential components of a data platform? John Akred and Stephen O'Sullivan explain how the various parts of the Hadoop, Spark, and big data ecosystems fit together in production to create a data platform supporting batch, interactive, and real-time analytical workloads.
      1:30pm-5:00pm (3h 30m) Hadoop platform and applications Architecture
      Architecting a next-generation data platform
      Jonathan Seidman (Cloudera), Ted Malaska (Blizzard), Mark Grover (Cloudera), Gwen Shapira (Confluent)
      Using Entity 360 as an example, Jonathan Seidman, Ted Malaska, Mark Grover, and Gwen Shapira explain how to architect a modern, real-time big data platform leveraging recent advancements in the open source software world, using components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics.
      9:00am-12:30pm (3h 30m) Spark & beyond
      Unravelling data with Spark using machine learning
      Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera)
      Vartika Singh, Jayant Shekhar, and Jeffrey Shmain walk you through various approaches available via machine-learning algorithms available in Spark Framework (and more) to understand and decipher meaningful patterns in real-world data in order to derive value.
      1:30pm-5:00pm (3h 30m) Data science & advanced analytics Pydata
      Guerrilla guide to Python and Apache Hadoop
      Juliet Hougland (Cloudera)
      Using an interactive demo format with accompanying online materials and data, data scientist Juliet Hougland offers a practical overview of the basics of using Python data tools with a Hadoop cluster.
      9:00am-12:30pm (3h 30m) Data science & advanced analytics R
      Using R for scalable data analytics: From single machines to Hadoop Spark clusters
      Vanja Paunic (Microsoft), Robert Horton (Microsoft), Hang Zhang (Microsoft), Srini Kumar (Microsoft), Mengyue Zhao (Microsoft), John-Mark Agosta (Microsoft), Mario Inchiosa (Microsoft), Debraj GuhaThakurta (Microsoft Corporation)
      Join in to learn how to do scalable, end-to-end data science in R on single machines as well as on Spark clusters. You'll be assigned an individual Spark cluster with all contents preloaded and software installed and use it to gain experience building, operationalizing, and consuming machine-learning models using distributed functions in R.
      1:30pm-5:00pm (3h 30m) Data science & advanced analytics R
      Modeling big data with R, sparklyr, and Apache Spark
      John Mount (Win Vector LLC), Steve Nolen (RStudio), Edgar Ruiz (RStudio)
      Sparklyr provides an R interface to Spark. With sparklyr, you can manipulate Spark datasets to bring them into R for analysis and visualization and use sparklyr to orchestrate distributed machine learning in Spark from R with the Spark MLlib and H2O SparkingWater libraries. Sean Lopp, John Mount, and Garrett Grolemund demonstrate how to use sparklyr to analyze big data in Spark.
      9:00am-12:30pm (3h 30m) Data science & advanced analytics Deep learning
      Getting started with TensorFlow
      Josh Gordon (Google)
      Josh Gordon walks you through training and deploying a machine-learning system using TensorFlow, a popular open source library. You'll learn how to build machine-learning systems from simple classifiers to complex image-based models and how to deploy models in production using TensorFlow Serving.
      1:30pm-5:00pm (3h 30m) Data science & advanced analytics Deep learning
      Scalable deep learning for the enterprise with DL4J
      Dave Kale (Skymind), Susan Eraly (Skymind), Melanie Warrick (Skymind), Josh Patterson (Skymind)
      Dave Kale, Melanie Warwick, Susan Eraly, and Josh Patterson explain how to build, train, and deploy neural networks using Deeplearning4j. Topics include the fundamentals of deep learning, ND4J and DL4J, and scalable training using GPUs and Apache Spark. You'll gain hands-on experience with several models, including convolutional and recurrent neural nets.
      9:00am-12:30pm (3h 30m) Stream processing and analytics Streaming
      Learn stream processing with Apache Beam
      Jesse Anderson (Smoking Hand), Frances Perry (Google), Tyler Akidau (Google)
      Come learn the basics of stream processing via a guided walkthrough of the most sophisticated and portable stream processing model on the planet—Apache Beam (incubating). Tyler Akidau, Frances Perry, and Jesse Anderson cover the basics of robust stream processing with the option to execute exercises on top of the runner of your choice—Flink, Spark, or Google Cloud Dataflow.
      1:30pm-5:00pm (3h 30m) Stream processing and analytics Streaming
      Building real-time data pipelines with Apache Kafka
      Ian Wrigley (Confluent)
      Ian Wrigley demonstrates how Kafka Connect and Kafka Streams can be used together to build real-world, real-time streaming data pipelines. Using Kafka Connect, you'll ingest data from a relational database into Kafka topics as the data is being generated and then process and enrich the data in real time using Kafka Streams before writing it out for further analysis.
      9:00am-12:30pm (3h 30m) Data science & advanced analytics, Visualization & user experience R
      Exploration and visualization of large, complex datasets with R, Hadoop, and Spark
      Stephen Elston (Quantia Analytics, LLC), Ryan Hafen (Hafen Consulting)
      Divide and recombine techniques provide scalable methods for exploration and visualization of otherwise intractable datasets. Stephen Elston and Ryan Hafen lead a series of hands-on exercises help you develop skills in exploration and visualization of large, complex datasets using R, Hadoop, and Spark.
      1:30pm-5:00pm (3h 30m) Visualization & user experience
      Introduction to visualizations using D3
      Brian Suda ((optional.is))
      Visualizations are a key part of conveying any dataset. D3 is the most popular, easiest, and most extensible way to get your data online in an interactive way. Brian Suda outlines best practices for good data visualizations and explains how you can build them using D3.
      9:00am-12:30pm (3h 30m) Big data and the Cloud Cloud
      Building your first big data application on AWS
      Rahul Bhartia (Amazon Web Services)
      Want to ramp up your knowledge of Amazon's big data web services and launch your first big data application on the cloud? Rahul Bhartia walks you through building a big data application in real time using a combination of open source technologies, including Apache Hadoop, Spark, and Zeppelin, as well as AWS managed services such as Amazon EMR, Amazon Kinesis, and more.
      1:30pm-5:00pm (3h 30m) Big data and the Cloud, Spark & beyond Architecture, Cloud
      Architecting and building enterprise-class Spark and Hadoop in cloud environments
      James Malone (Google)
      James Malone explores using managed Spark and Hadoop solutions in public clouds alongside cloud products for storage, analysis, and message queues to meet enterprise requirements via the Spark and Hadoop ecosystem.
      9:00am-5:00pm (8h) Spark & beyond Streaming, Text
      Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML
      This one-day hands-on class offers an introduction to Apache Spark 2.0 core concepts with a focus on Spark's machine-learning library, using text mining on real-world data as the primary end-to-end use case.
      6:30pm-8:00pm (1h 30m) Event
      Startup Showcase
      What new companies are at the leading edge of the data space? Meet some of the best, most innovative founders as they demonstrate their game-changing ideas at the Startup Showcase.
      9:00am-5:00pm (8h)
      Data, Transportation, Logistics
      Andreas Ribbrock (#zeroG, A Lufthansa Systems Company), Rodrigo Fontecilla (Unisys), Rodrigo Fontecilla (Unisys), Ryan Baumann (Mapbox), Jay White Bear (IBM), Andre Luckow (BMW Group), Crystal Valentine (MapR Technologies)
      Data, Transportation, and Logistics Day offers a daylong deep-dive into how data science is changing transportation and logistics. We’ll investigate the latest advances in and applications of self-driving vehicles, automated drones, and embedded sensors and explore how new uses of data are challenging the industry to evolve infrastructure for the future.
      9:00am-12:30pm (3h 30m) Big data and the Cloud Architecture, Cloud
      Deploying and operating big data analytic apps on the public cloud
      Jennifer Wu (Cloudera), Vinithra Varadharajan (Cloudera), Andrei Savu (Cloudera), Matthew Jacobs (Cloudera)
      Andrei Savu, Vinithra Varadharajan, Matthew Jacobs, and Jennifer Wu explore best practices for Hadoop deployments in the public cloud and provide detailed guidance for deploying, configuring, and managing Hive, Spark, and Impala in the public cloud.
      1:30pm-5:00pm (3h 30m) Platform Security and Cybersecurity
      A practitioner’s guide to securing your Hadoop cluster
      Mark Donsky (Cloudera)
      Michael Yoder, Ben Spivey, Mark Donsky, and Mubashir Kazia walk you through securing a Hadoop cluster. You’ll start with a cluster with no security and then add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance.
      9:00am-12:30pm (3h 30m) Spark & beyond
      Just enough Scala for Spark
      Dean Wampler (Lightbend)
      Apache Spark is written in Scala. Hence, many if not most data engineers adopting Spark are also adopting Scala, while most data scientists continue to use Python and R. Dean Wampler offers an overview of the core features of Scala you need to use Spark effectively, using hands-on exercises with the Spark APIs.
      1:30pm-5:00pm (3h 30m) Data-driven business management, Strata Business Summit
      Determining the economic value of your data
      William Schmarzo (Dell EMC)
      Organizations need a model to measure how effectively they are using data and analytics. Once they know where they are and where they need to go, they then need a framework to determine the economic value of their data. William Schmarzo explores techniques for getting business users to “think like a data scientist” so they can assist in identifying data that makes the best performance predictors.
      5:00pm-6:30pm (1h 30m)
      Opening Reception
      Grab a drink, mingle with fellow Strata + Hadoop World attendees, and see the latest technologies and products from leading companies in the data space.
      12:30pm-1:30pm (1h)
      Break: Lunch
      10:30am-11:00am (30m)
      Break: Morning break
      3:00pm-3:30pm (30m)
      Break: Afternoon break
      8:00am-9:00am (1h)
      Break: Coffee break