Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Cloud

The amount of cutting-edge technology that Azure puts at your fingertips is incredible. Artificial intelligence is no exception. Azure enables sophisticated capabilities in artificial intelligence, machine learning, deep learning, cognitive services, and advanced analytics. Rimma Nehme explains why Azure is the next AI supercomputer and how this vision is being implemented in reality.
Roy Ben-Alta explores the Amazon Kinesis platform in detail and discusses best practices for scaling your core streaming data ingestion pipeline as well as real-world customer use cases and design pattern integration with Amazon Elasticsearch, AWS Lambda, and Apache Spark.
Li Li and Hao Hao elaborate the architecture of Apache Sentry + RecordService for Hadoop in the cloud, which provides unified, fine-grained authorization via role- and attribute-based access control, to encourage attendees to adopt Apache Sentry and RecordService to protect sensitive data on the multitenant cloud across the Hadoop ecosystem.
Henry Robinson and Justin Erickson explain how to best take advantage of the flexibility and cost-effectiveness of the cloud with your BI and SQL analytic workloads using Apache Hadoop and Apache Impala (incubating), covering the architectural considerations, best practices, tuning, and functionality available when deploying or migrating BI and SQL analytic workloads to the cloud.
Siva Raghupathy demonstrates how to use Hadoop innovations in conjunction with Amazon Web Services (cloud) innovations.
BigQuery provides petabyte-scale data warehousing with consistently high performance for all users. However, users coming from traditional enterprise data warehousing platforms often have questions about how best to adapt their workloads for BigQuery. Chad Jennings explores best practices and integration with BigQuery with special emphasis on loading and transforming data for BigQuery.
When building your data stack, the architecture could be your biggest challenge. Yet it could also be the best predictor for success. With so many elements to consider and no proven playbook, where do you begin to assemble best practices for a scalable data architecture? Ben Sharma offers lessons learned from the field to get you started.
Alex Bordei walks you through the steps required to build a data lake in the cloud and connect it to on-premises environments, covering best practices in architecting cloud data lakes and key aspects such as performance, security, data lineage, and data maintenance. The technologies presented range from basic HDFS storage to real-time processing with Spark Streaming.
Ben Sharma uses popular cloud-based use cases to explore how to effectively and safely leverage big data in the cloud to achieve business goals. Now is the time to get the jump on this trend before your competition gets the upper hand.
Public cloud usage for Hadoop workloads is accelerating. Consequently, Hadoop components have adapted to leverage cloud infrastructure. Andrei Savu, Vinithra Varadharajan, Matthew Jacobs, and Jennifer Wu explore best practices for Hadoop deployments in the public cloud and provide detailed guidance for deploying, configuring, and managing Hive, Spark, and Impala in the public cloud.
Rick McFarland explains how the Hearst Corporation utilizes big data and analytics tools like Spark and Kinesis to stream click data in real-time from its 300+ websites worldwide. This streaming process feeds an editorial tool called Buzzing@Hearst, which provides instant feedback to authors on what is trending across the Hearst network.
The largest challenge for deep learning is scalability. Google has built a large-scale neural network in the cloud and is now sharing that power. Kazunori Sato introduces pretrained ML services, such as the Cloud Vision API and the Speech API, and explores how TensorFlow and Cloud Machine Learning can accelerate custom model training 10x–40x with Google's distributed training infrastructure.
Running Hadoop, Spark, and Presto can be as fast and inexpensive as ordering a latte at your favorite coffee shop. Jonathan Fritz explains how organizations are deploying these and other big data frameworks with Amazon Web Services (AWS) and how you too can quickly and securely run Spark and Presto on AWS. Jonathan shows you how to get started and shares best practices and common use cases.