As big data adoption grows, Apache Hadoop, Apache Spark, and machine learning technologies are increasingly being used to analyze ever-larger datasets, but we still have to keep telling stories about the data and making sure the message is clear. There is a lot of of overlap between the roles of data scientist and data visualizer, but the primary difference is the visualizer’s focus on storytelling. Data science sometimes requires visualization and storytelling, but the focus is more on creating models and understanding the structure of the data. Data visualization, on the other hand, is about being able to convey a story about the data or the models that are built.
Jeff Fletcher details the tools and techniques that are relevant to data visualization practitioners working with large datasets and predictive models. Jeff outlines the skills data scientists need to improve their data visualizations, new mechanisms for visually representing the range of data, including encoding techniques that better portray the mean and error for collected data, how to visualize predicted data points, and new tools and techniques for doing visualization on large datasets, including live demos using Hadoop, Spark, SQL, R, Python, and notebooks, as well techniques to visualize uncertainty from machine learning models.
Jeff Fletcher is a systems engineer at Cloudera, where he helps customers build big data infrastructure. Jeff has been involved in internet technology all his professional life. Previously, he worked on the initial internet infrastructure team and managed aspects of the Johannesburg Beltel installation at Telkom; designed and implemented new internet products and services at Sprint (which became UUNET which became Verizon Business); founded Antfarm Networking Technologies, South Africa’s first streaming and webcasting company; and led the product development team at Internet Solutions (then IS). He does occasional consulting for corporate companies looking to move beyond pie charts. Jeff was shortlisted for an Information Is Beautiful award in 2015. He is the creator of Limn.co.za, a blog dedicated to the art of data visualization. Jeff holds a degree in electrical engineering from Witwatersrand University.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org