Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

In-Person Training
Cloudera big data architecture workshop

Bruce Martin (Cloudera)
Monday, September 25 & Tuesday, September 26, 9:00am - 5:00pm
Data engineering
Location: 1A 04/05
Secondary topics:  Architecture, Cloud

Participants should plan to attend both days of this 2-day training course. Platinum and Training passes do not include access to tutorials on Tuesday.

This training brings together technical contributors in a group setting to design and architect solutions to a challenging business problem. You'll explore big data application architecture concepts in general and then apply them to the design of a challenging system.

What you'll learn, and how you can apply it

  • Understand and work with big data application architecture concepts

Prerequisites:

  • A working knowledge of HDFS, Spark, MapReduce, Hive, Impala, data formats, and relational database management systems

This training brings together technical contributors in a group setting to design and architect solutions to a challenging business problem. You’ll explore big data application architecture concepts, including data formats, transformation, real-time, batch and machine learning processing, scalability, fault tolerance, minimizing the risk of an unsound architecture and technology selection, and then apply them to the design of a challenging system.

You won’t do any programming during the workshop, but you’ll be able to explore solutions implemented and deployed in the cloud.

Outline:

Workshop application use cases

  • Oz metropolitan
  • Architectural questions

Application vertical slice

  • Definition
  • Minimizing the risk of an unsound architecture
  • Selecting a vertical slice
  • Team activity: Identify an initial vertical slice for Metroz

Application data and processing

  • The three Vs of big data
  • The data lifecycle
  • Data formats
  • Transforming data
  • Real-time and near-real-time processing
  • Batch processing
  • Data access patterns
  • Delivery and processing guarantees
  • Machine learning pipelines
  • Team activity: Identify data and processing requirements

Scalable applications

  • Scale up, scale out, scale to X
  • Determining if an application will scale
  • Poll: Scalable airport terminal designs
  • Hadoop and Spark scalability
  • Team activity: Scaling Metroz

Fault-tolerant distributed systems

  • Principles
  • Transparency
  • Hardware versus software redundancy
  • Stateless functional fault tolerance
  • Stateful fault tolerance
  • Replication and group consistency
  • Fault tolerance in Spark and MapReduce
  • Application tolerance for failures
  • Team activity: Identify Metroz component failures and requirements

Technology selection

  • Technology selection methodology
  • Team activity: Select big data technologies for Metroz use cases

Software architecture

  • Architecture artifacts
  • One platform or multiple, Lambda architecture
  • Team activity: Produce high-level architecture and selected technologies; revisit vertical slice
  • Vertical slice demonstration

Wrap-up and Q&A

About your instructor

Bruce Martin is a senior instructor at Cloudera, where he teaches courses on data science, Apache Spark, Apache Hadoop, and data analysis. Previously, Bruce was principle architect and director of advanced concepts at SunGard Higher Education, where he developed the software architecture for SunGard’s Course Signals Early Intervention System, which uses machine learning algorithms to predict the success of students enrolled in university courses. Bruce’s other roles have included senior staff engineer at Sun Microsystems and researcher at Hewlett-Packard Laboratories. Bruce has written many papers on data management and distributed system technologies and frequently presents his work at academic and industrial conferences. Bruce has authored patents on distributed object technologies. Bruce holds a PhD and master’s degree in computer science from the University of California at San Diego and a bachelor’s degree in computer science from the University of California, Berkeley.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package. .

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Sophia DeMartini
Sophia DeMartini | SENIOR SPEAKER MANAGER
06/09/2017 8:20am EDT

Hi Howard,

What we have listed on this page is what is available for now, but we’ll be making updates with a more detailed schedule as soon as possible. Please make sure to check back to see the updates.

And if you have any other questions about the content of the training, you can feel free to email me at speakers@oreilly.com.

Thank you,
Sophia

hungwei yeh | DATABASE DEVELOPER
06/09/2017 5:05am EDT

Hi,
Our team is planning to register for Platinum Pass with 2 day-training. We are wondering if there are any course agenda/details for each of the training topics you’re offering in the 9/25 & 9/26 NY conference.
Thanks,
Howard