Creating big data solutions that can process data at terabyte scale and produce spatial-temporal real-time insights at speed demands a well-thought-through system architecture. The architecture should support the creation of a data pipeline that involves ingestion, processing, indexing, caching and retrieval of insights, and finally a propagate collection of data metrics across all the system layers. Chandras Sekhar Saripaka details the production architecture at DataSpark that works through terabytes of spatial-temporal telco data each day in PaaS mode and showcases how DataSpark operates in SaaS mode.
Chandra Sekhar Saripaka is a product developer, big data professional, and data scientist at DataSpark in Singtel. He has a deep experience in financial products, CMS, and identity management and is an expert in data crunching at terabyte scale on graphs and Hadoop. Previously, Chandra carried out research on image search indexing and retrieval and has built many architectures on enterprise integration and portals, a cloud search engine for ecommerce, and a framework for real-time news recommendation systems.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.