Currently, business applications run on their own hardware, and the data generated requires processes to move the data to another group of dedicated hardware. In addition, this data must be transformed when moved between data sources. Those processes and transformations must be built, tested, and maintained in dev, QA, and production, and may prevent you from operating in real time.
Jim Scott offers a glimpse of a better world, where it’s all more seamlessly integrated. If the applications that are generating the data run on top of the data lake, there is no need to perform additional data movement. NoSQL databases like HBase or MapR-DB can handle transactional persistence, and data formats like JSON can be thoughtfully selected to enable data exploration at the source without the need to perform transformations, which are costly to build and maintain over time.
Services that need to talk to each other can talk via a Kafkaesque messaging system which can easily scale into the millions of events per second, allowing decoupled communication among services. This is a better architectural approach because it enables scaling components independently of each other, thus enabling the deployment and management of microservices, which can reduce the time to market for new products and services.
Building a scalable application platform isn’t easy. Neither is figuring out how to move your data to your data lake for processing. This architecture supports elastic expansion and contraction of services and optimizes resource utilization across the data center.
Jim Scott is the head of developer relations, data science, at NVIDIA. He’s passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.