Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Designing a scalable real-time data platform using Akka, Spark Streaming, and Kafka

Alex Silva (Pluralsight)
4:20pm–5:00pm Wednesday, 03/30/2016
Spark & Beyond

Location: 210 A/E
Tags: real-time
Average rating: ***..
(3.94, 16 ratings)

With the advent of reliable streaming technologies, real-time data pipelines have become a crucial component of any robust data initiative today. Compared to a traditional Hadoop-centric data hub, these real-time stacks provide high-levels of system availability and data integrity coupled with very low latency queries without incurring the overhead of inflexible schemas and batch analysis lag.

Alex Silva demonstrates how to use Kafka, Spark Streaming, Akka, and Hadoop to orchestrate a real-time stack and explains how data flows through this system. This real-time data platform combines a mix of open source technologies and home-grown services aimed at providing a full end-to-end solution, starting from flexible data-ingestion protocols to fast data analysis and queries.

Topics include:

  • External message providers, which connect to the platform through a data-ingestion service modeled as a robust actor system using Akka and Scala
  • Routing different backend systems, including Kafka and Druid
  • Spark Streaming, which is used to perform real-time complex analytical and scientific processing on the data
  • Exporting data for future processing into Hadoop
  • Querying and visualization
Photo of Alex Silva

Alex Silva

Pluralsight

Alex Silva is a chief data architect at Pluralsight, where he leads the development of the company’s data infrastructure and services. He’s been instrumental in establishing Pluralsight’s data initiative by architecting a platform to capture valuable insights on real-time video analytics while integrating several data sources within the business. He’s built a reputation as a passionate and pragmatic data evangelist. Previously, Alex was a principal data engineer at Rackspace, leading a team of developers building its data initiative, while establishing its big data platform by helping architect a solution to drive actionable insight on consumer behavior and product-usage trends and designing analytical models, APIs, and frameworks to deliver fanatical support, including a computational linguistics library to analyze and classify support chat logs; a principle software engineer at ESPN Emerging Technologies, where he architected and developed a distributed application to help basketball operators collect play-by-play records; and several senior-level engineering positions at Walt Disney World Internet Group, Pentaho, OutStart, and Travelatro.com. He’s Sun Certified as an enterprise architect for the J2EE platform and is a web component developer and a Java 2 programmer. He earned his bachelor’s degree in molecular biology and an MBA from the University of Central Florida in Orlando. When Alex is not programming, you’ll probably catch him with an athletic bag on his shoulders. He’s a little bit of a sports junkie, particularly a CrossFit addict, who’s been known to create an epidemic of fitness recovery, smoking cessation, and weight loss around him.