Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

NoLambda: A new architecture combining streaming, ad hoc, machine-learning, and batch analytics

Helena Edelson (Apple), Evan Chan (Tuplejump)
11:50am–12:30pm Wednesday, 03/30/2016
Data Innovations

Location: 210 D/H
Tags: real-time
Average rating: ***..
(3.85, 13 ratings)

Prerequisite knowledge

Attendees should be familiar with distributed computing and architecture, basic analytics, NoSQL, and general fault tolerance and have a general understanding of Apache Spark, Cassandra, and Kafka.

Description

In today’s world of exploding big and fast data, developers who want both streaming analytics and ad hoc, OLAP-like analysis have often had to develop complex architectures such as Lambda—a path for fast streaming analytics using NoSQL stores such as Cassandra and HBase with a separate batch path involving HDFS and Parquet. While this approach works, it involves too many moving parts, too many technologies for ops, and too many engineering hours. Helena Edelson and Evan Chan highlight a much simpler approach to combine streaming and ad hoc/batch analysis using what they call the NoLambda stack (Apache Spark/Scala, Mesos, Akka, Cassandra, Kafka) plus FiloDB, a new entrant to the distributed-database world, which combines streaming and ad hoc analytics.

Topics include:

  • Modern streaming and batch/ad hoc architectures
  • Precise and scalable streaming ingestion using Apache Kafka, Akka, Spark Streaming, Cassandra, and FiloDB
  • How a unified streaming + batch stack can lower your TCO
  • What FiloDB is and how it enables fast analytics with competitive storage cost
  • Use cases involving time series, smart cities, and event data
  • Machine learning using Spark MLlib—without the need to export to HDFS
  • Combining streaming and historical/ad hoc data analysis, including efficient longer-time window analysis
Photo of Helena Edelson

Helena Edelson

Apple

Committer to several open source projects including the Spark Cassandra Connector, Cassandra Kafka Connector, a previous contributor to Akka (2 new features in Akka Cluster), Spring Integration and several others. She is also a speaker at international Big Data and Scala conferences: Kafka Summit, Spark Summit (EU and NYC), Strata (NYC and San Jose), Reactive Summit, QCon SF, Scala Days (EU and US), Scala World and Philly Emerging Technology. Currently a Senior Software Engineer in Distributed Systems at Apple.

Photo of Evan Chan

Evan Chan

Tuplejump

Evan Chan is a distinguished software engineer at Tuplejump. Evan loves to design, build, and improve bleeding-edge distributed data and backend systems using the latest open source technologies. He has led the design and implementation of multiple big data platforms based on Storm, Spark, Kafka, Cassandra, and Scala/Akka, including a columnar real-time distributed query engine. Evan is an active contributor to the Apache Spark project, a DataStax Cassandra MVP, and cocreator and maintainer of the open source Spark Job Server. He is a big believer in GitHub, open source, and meetups and has given talks at various conferences, including Spark Summit, Cassandra Summit, FOSS4G, and Scala Days.