Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference

Evolving beyond the data lake

Jim Scott (NVIDIA)
4:15pm–4:55pm Thursday, December 8, 2016
Data innovations
Location: Summit 2 Level: Beginner
Average rating: ****.
(4.40, 5 ratings)

Prerequisite Knowledge

  • An understanding of the general concepts behind big data-related technologies

What you'll learn

  • Learn how to apply big data technologies to all aspects of the business, not just big data
  • Understand how big data technologies can be used to rethink the business processes to streamline your business and rewrite the rules of your industry


Currently, business applications run on their own hardware, and the data generated requires processes to move the data to another group of dedicated hardware. In addition, this data must be transformed when moved between data sources. Those processes and transformations must be built, tested, and maintained in dev, QA, and production, and may prevent you from operating in real time.

Jim Scott offers a glimpse of a better world, where it’s all more seamlessly integrated. If the applications that are generating the data run on top of the data lake, there is no need to perform additional data movement. NoSQL databases like HBase or MapR-DB can handle transactional persistence, and data formats like JSON can be thoughtfully selected to enable data exploration at the source without the need to perform transformations, which are costly to build and maintain over time.

Services that need to talk to each other can talk via a Kafkaesque messaging system which can easily scale into the millions of events per second, allowing decoupled communication among services. This is a better architectural approach because it enables scaling components independently of each other, thus enabling the deployment and management of microservices, which can reduce the time to market for new products and services.

Building a scalable application platform isn’t easy. Neither is figuring out how to move your data to your data lake for processing. This architecture supports elastic expansion and contraction of services and optimizes resource utilization across the data center.

Photo of Jim Scott

Jim Scott


Jim Scott is the head of developer relations, data science, at NVIDIA. He’s passionate about building combined big data and blockchain solutions. Over his career, Jim has held positions running operations, engineering, architecture, and QA teams in the financial services, regulatory, digital advertising, IoT, manufacturing, healthcare, chemicals, and geographical management systems industries. Jim has built systems that handle more than 50 billion transactions per day, and his work with high-throughput computing at Dow was a precursor to more standardized big data concepts like Hadoop. Jim is also the cofounder of the Chicago Hadoop Users Group (CHUG).