Presented By O’Reilly and Cloudera
Make Data Work
21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Data Visualisation in a Big Data World

Jeff Fletcher (Cloudera)
11:1511:55 Wednesday, 23 May 2018
Data science and machine learning, Visualization and user experience
Location: Capital Suite 14 Level: Intermediate

Who is this presentation for?

Data Visualisation practitioners, Data Scientists

Prerequisite knowledge

Some basic data science or data visualisation implementation experience is preferable.

What you'll learn

1 - An overview of data visualisation techniques when working with large data sets 2 - Discover techniques to better communicate probability and uncertainty 3 - See examples tools to use for visualising large data sets

Description

Data Science vs Data Visualisation

There is a lot of of overlap between the role of data scientist and data visualiser but primary difference is the focus on storytelling. Data Science will sometimes require visualisation and telling a story, but the focus is more on creating models and understanding the structure of the data. Data visualisation is about being able to convey a story about the data or the models that are built.

This section will focus on the additional skills needed for current data scientists wishing to improve their data visualisation skills. Where the skills overlap, there is a different level of proficiency and depth required, and these will be highlighted.

Visualising Uncertainty

Probability and statistics are fundamental knowledge requirements for data scientists, but is something that is not widely understood by a general audience. It is often counterintuitive. People often get basic statistical questions wrong, even those with a statistical background. As datasets get bigger and machine learning improves the predictive power of algorithms, more information needs to be represented with bounds of uncertainty. Helping someone understand the implication of a confidence interval when they have limited statistical training is difficult, but there are new techniques that can be employed to improve this.

This section will cover some of the new mechanisms for visually representing the range of data. This will include encoding techniques that will better portray the mean and error for collected data.

Visualising Probabilities

Making predictions to infer new data points is primarily what machine learning is used for and the results are often presented as projections or forecasts for new data. Weather forecasts, price predictions etc, are all based on probabilities and have boundaries of uncertainty for those predictions. This section will look at some examples and how to visualise predicted data points.

Working with Big Data Sets

The final section will cover some of the new tools and techniques for doing visualisation on large data sets. This will include live demos that visualise data using Hadoop, Spark, SQL, R, Python and Notebooks as well techniques to visualise uncertainty from machine learning models.

Photo of Jeff Fletcher

Jeff Fletcher

Cloudera

Jeff graduated from Witwatersrand University with a degree in electrical engineering, and has been involved in Internet technology all his professional life, but with a strong commercial bent. He started his career at Telkom in 1994, working on the initial Internet infrastructure team and managing aspects of the Johannesburg Beltel installation. Between stints at Sprint (which became UUNET which became Verizon Business), where he designed and implemented new Internet products and services, Jeff Fletcher founded Antfarm Networking Technologies, South Africa’s first streaming and webcasting company. He returned to the corporate world in 2004, occupying the corner office of the new product development team at Internet Solutions (then IS). Jeff now works as a Systems Engineer for Cloudera, helping customers build big data infrastructure. In 2012 he founded www.limn.co.za, a blog dedicated to the art of data visualisation and does occasional presentations for people looking to move beyond pie charts. He was shortlisted for an Information is Beautiful award in 2015.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)