Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Data visualization in a big data world

Jeff Fletcher (Cloudera)
12:0512:45 Wednesday, 23 May 2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Intermediate
Secondary topics:  Visualization, Design, and UX
Average rating: ****.
(4.73, 11 ratings)

Who is this presentation for?

  • Data visualization practitioners and data scientists

Prerequisite knowledge

  • Data science or data visualization implementation experience

What you'll learn

  • Gain an overview of data visualization techniques and tools for working with large datasets and techniques to better communicate probability and uncertainty


As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. There is a lot of of overlap between the roles of data scientist and data visualizer, but the primary difference is the visualizer’s focus on storytelling. Data science sometimes requires visualization and storytelling, but the focus is more on creating models and understanding the structure of the data. Data visualization, on the other hand, is about being able to convey a story about the data or the models that are built.

Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models. Jeff outlines the skills data scientists need to improve their data visualizations, new mechanisms for visually representing the range of data, including encoding techniques that better portray the mean and error for collected data, how to visualize predicted data points, and new tools and techniques for doing visualization on large datasets, including live demos using Hadoop, Spark, SQL, R, Python, and notebooks, as well techniques to visualize uncertainty from machine learning models.

Photo of Jeff Fletcher

Jeff Fletcher


Jeff Fletcher is a systems engineer at Cloudera, where he helps customers build big data infrastructure. Jeff has been involved in internet technology all his professional life. Previously, he worked on the initial internet infrastructure team and managed aspects of the Johannesburg Beltel installation at Telkom; designed and implemented new internet products and services at Sprint (which became UUNET which became Verizon Business); founded Antfarm Networking Technologies, South Africa’s first streaming and webcasting company; and led the product development team at Internet Solutions (then IS). He does occasional consulting for corporate companies looking to move beyond pie charts. Jeff was shortlisted for an Information Is Beautiful award in 2015. He is the creator of, a blog dedicated to the art of data visualization. Jeff holds a degree in electrical engineering from Witwatersrand University.