James Burkhart explains how Uber supports millions of analytical queries daily across real-time data with Apollo. James covers the architectural decisions and lessons learned building an exactly-once ingest pipeline storing raw events across in-memory row storage and on-disk columnar storage and a custom metalanguage and query layer leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.
This session is sponsored by MemSQL.
James Burkhart is the technical lead on real-time data infrastructure at Uber. James has a strong background in time series data storage, processing, and retrieval. Previously, he worked on Blueflood, a time series database on top of Cassandra, while at Rackspace.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.