Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

In-Person Training
Cloudera big data architecture workshop

Bruce Martin (Cloudera)
Monday, September 25 & Tuesday, September 26, 9:00am - 5:00pm
Data engineering
Location: 1A 04/05
Secondary topics:  Architecture, Cloud
Average rating: *....
(1.50, 2 ratings)

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Tuesday.

Bruce Martin leads you through designing and architecting solutions to a challenging business problem. You'll explore big data application architecture concepts in general and then apply them to the design of a challenging system.

What you'll learn, and how you can apply it

  • Understand and work with big data application architecture concepts


  • A working knowledge of HDFS, Spark, MapReduce, Hive, Impala, data formats, and relational database management systems

Bruce Martin leads you through designing and architecting solutions to a challenging business problem. You’ll explore big data application architecture concepts, including data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, minimizing the risk of an unsound architecture and technology selection, and then apply them to the design of a challenging system.

You won’t do any programming during the workshop, but you’ll be able to explore solutions implemented and deployed in the cloud.


Workshop application use cases

  • Oz metropolitan
  • Architectural questions

Application vertical slice

  • Definition
  • Minimizing the risk of an unsound architecture
  • Selecting a vertical slice
  • Team activity: Identify an initial vertical slice for Metroz

Application data and processing

  • The three Vs of big data
  • The data lifecycle
  • Data formats
  • Transforming data
  • Real-time and near-real-time processing
  • Batch processing
  • Data access patterns
  • Delivery and processing guarantees
  • Machine learning pipelines
  • Team activity: Identify data and processing requirements

Scalable applications

  • Scale up, scale out, scale to X
  • Determining if an application will scale
  • Poll: Scalable airport terminal designs
  • Hadoop and Spark scalability
  • Team activity: Scaling Metroz

Fault-tolerant distributed systems

  • Principles
  • Transparency
  • Hardware versus software redundancy
  • Stateless functional fault tolerance
  • Stateful fault tolerance
  • Replication and group consistency
  • Fault tolerance in Spark and MapReduce
  • Application tolerance for failures
  • Team activity: Identify Metroz component failures and requirements

Technology selection

  • Technology selection methodology
  • Team activity: Select big data technologies for Metroz use cases

Software architecture

  • Architecture artifacts
  • One platform or multiple, Lambda architecture
  • Team activity: Produce high-level architecture and selected technologies; revisit vertical slice
  • Vertical slice demonstration

Wrap-up and Q&A

About your instructor

Bruce Martin is a senior instructor at Cloudera, where he teaches courses on data science, Apache Spark, Apache Hadoop, and data analysis. Previously, Bruce was principal architect and director of advanced concepts at SunGard Higher Education, where he developed the software architecture for SunGard’s Course Signals Early Intervention System, which uses machine learning algorithms to predict the success of students enrolled in university courses. His other roles have included senior staff engineer at Sun Microsystems and researcher at Hewlett-Packard Laboratories. Bruce has written many papers on data management and distributed system technologies and frequently presents his work at academic and industrial conferences. Bruce has authored patents on distributed object technologies. He holds a PhD and master’s degree in computer science from the University of California, San Diego, and a bachelor’s degree in computer science from the University of California, Berkeley.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package.

Comments on this page are now closed.


09/18/2017 10:31am EDT


My colleague is attending this session and I’m planning to attend Cloudera big data architecture workshop using Training pass. Is there an option for me to enroll or is there waiting list option?


Picture of Sophia DeMartini
06/09/2017 8:20am EDT

Hi Howard,

What we have listed on this page is what is available for now, but we’ll be making updates with a more detailed schedule as soon as possible. Please make sure to check back to see the updates.

And if you have any other questions about the content of the training, you can feel free to email me at

Thank you,

06/09/2017 5:05am EDT

Our team is planning to register for Platinum Pass with 2 day-training. We are wondering if there are any course agenda/details for each of the training topics you’re offering in the 9/25 & 9/26 NY conference.