Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

In-Person Training
Cloudera Big Data Architecture Workshop

Bruce Martin (Cloudera)
Monday, September 25 & Tuesday, September 26, 9:00am - 5:00pm
Data engineering
Location: 1A 04/05
Secondary topics:  Architecture, Cloud
See pricing & packages
Best Price ends June 29

This course will sell out—sign up today!

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Tuesday.

The Cloudera Big Data Architecture Workshop (BDAW) is a 2-day leaning event that addresses advanced big data architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data architecture problems in general, and then applies them to the design of a challenging system.

The Cloudera Big Data Architecture Workshop (BDAW) is a 2-day learning event that addresses advanced big data application architecture topics. BDAW brings together technical contributors into a group setting to design and architect solutions to a challenging business problem. The workshop addresses big data application architecture problems in general, and then applies them to the design of a challenging system.

Throughout the highly interactive workshop, participants apply concepts to real-world examples resulting in detailed synergistic discussions. The workshop is conducive for participants to learn techniques for architecting big data systems, not only from Cloudera’s experience but also from the experiences of fellow participants.

More specifically, BDAW addresses advanced big data application architecture topics, including, data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, minimizing the risk of an unsound architecture and technology selection.

To gain the most from the workshop, participants should have a technical background and working knowledge of technologies such as HDFS, Spark, Map-Reduce, Hive/Impala, data formats and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities.

The workshop will be divided into small groups to discuss the problems and develop solutions. Each group will select a spokesperson who will present the group’s findings to the workshop. There will not be any programming labs, but we will have solutions implemented and deployed in the cloud for demos during the workshop.

Course details:

Introduction

Workshop Application Use Cases

  • Oz Metropolitan
  • Architectural questions

Application Vertical Slice

  • Definition
  • Minimizing risk of an unsound architecture
  • Selecting a vertical slice
  • Team activity: Identify an initial vertical slice for Metroz

Application Data and Processing

  • Three V’s of Big Data
  • Data Lifecycle
  • Data Formats
  • Transforming Data
  • Real time, near real time processing
  • Batch processing
  • Data access patterns
  • Delivery and processing guarantees
  • Machine Learning pipelines
  • Team activity: identify data and processing requirements

Scalable Applications

  • Scale up, scale out, scale to X
  • Determining if an application will scale
  • Poll: scalable airport terminal designs
  • Hadoop and Spark Scalability
  • Team activity: Scaling Metroz

Fault Tolerant Distributed Systems

  • Principles
  • Transparency
  • Hardware vs. Software redundancy
  • Stateless functional fault tolerance
  • Stateful fault tolerance
  • Replication and group consistency
  • Fault tolerance in Spark and Map Reduce
  • Application tolerance for failures
  • Team activity: Identify Metroz component failures and requirements

Technology Selection

  • Technology selection methodology
  • Team activity: select big data technologies for Metroz use cases

Software Architecture

  • Architecture artifacts
  • One platform or multiple, lambda architecture
  • Team activity: produce high level architecture, selected technologies, revisit vertical slice
  • Vertical Slice demonstration

Wrap Up

About your instructor

Bruce Martin is a senior instructor at Cloudera, where he teaches courses on data science, Apache Spark, Apache Hadoop, and data analysis. Previously, Bruce was principle architect and director of advanced concepts at SunGard Higher Education, where he developed the software architecture for SunGard’s Course Signals Early Intervention System, which uses machine-learning algorithms to predict the success of students enrolled in university courses. Bruce’s other roles have included senior staff engineer at Sun Microsystems and researcher at Hewlett-Packard Laboratories. Bruce has written many papers on data management and distributed system technologies and frequently presents his work at academic and industrial conferences. Bruce holds patents on distributed object technologies. Bruce holds a PhD and master’s degree in computer science from the University of California at San Diego and a bachelor’s degree in computer science from the University of California, Berkeley.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package. Best Price ends June 29.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Sophia DeMartini
Sophia DeMartini | SENIOR SPEAKER MANAGER
06/09/2017 8:20am EDT

Hi Howard,

What we have listed on this page is what is available for now, but we’ll be making updates with a more detailed schedule as soon as possible. Please make sure to check back to see the updates.

And if you have any other questions about the content of the training, you can feel free to email me at speakers@oreilly.com.

Thank you,
Sophia

hungwei yeh | DATABASE DEVELOPER
06/09/2017 5:05am EDT

Hi,
Our team is planning to register for Platinum Pass with 2 day-training. We are wondering if there are any course agenda/details for each of the training topics you’re offering in the 9/25 & 9/26 NY conference.
Thanks,
Howard