Data Science vs Data Visualisation
There is a lot of of overlap between the role of data scientist and data visualiser but primary difference is the focus on storytelling. Data Science will sometimes require visualisation and telling a story, but the focus is more on creating models and understanding the structure of the data. Data visualisation is about being able to convey a story about the data or the models that are built.
This section will focus on the additional skills needed for current data scientists wishing to improve their data visualisation skills. Where the skills overlap, there is a different level of proficiency and depth required, and these will be highlighted.
Probability and statistics are fundamental knowledge requirements for data scientists, but is something that is not widely understood by a general audience. It is often counterintuitive. People often get basic statistical questions wrong, even those with a statistical background. As datasets get bigger and machine learning improves the predictive power of algorithms, more information needs to be represented with bounds of uncertainty. Helping someone understand the implication of a confidence interval when they have limited statistical training is difficult, but there are new techniques that can be employed to improve this.
This section will cover some of the new mechanisms for visually representing the range of data. This will include encoding techniques that will better portray the mean and error for collected data.
Making predictions to infer new data points is primarily what machine learning is used for and the results are often presented as projections or forecasts for new data. Weather forecasts, price predictions etc, are all based on probabilities and have boundaries of uncertainty for those predictions. This section will look at some examples and how to visualise predicted data points.
Working with Big Data Sets
The final section will cover some of the new tools and techniques for doing visualisation on large data sets. This will include live demos that visualise data using Hadoop, Spark, SQL, R, Python and Notebooks as well techniques to visualise uncertainty from machine learning models.
Jeff graduated from Witwatersrand University with a degree in electrical engineering, and has been involved in Internet technology all his professional life, but with a strong commercial bent. He started his career at Telkom in 1994, working on the initial Internet infrastructure team and managing aspects of the Johannesburg Beltel installation. Between stints at Sprint (which became UUNET which became Verizon Business), where he designed and implemented new Internet products and services, Jeff Fletcher founded Antfarm Networking Technologies, South Africa’s first streaming and webcasting company. He returned to the corporate world in 2004, occupying the corner office of the new product development team at Internet Solutions (then IS). Jeff now works as a Systems Engineer for Cloudera, helping customers build big data infrastructure. In 2012 he founded www.limn.co.za, a blog dedicated to the art of data visualisation and does occasional presentations for people looking to move beyond pie charts. He was shortlisted for an Information is Beautiful award in 2015.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org