Data science is currently a hot topic, but what is it? There are several definitions and opinions. Data science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools, and presenting the best possible answer in a clear, engaging manner.
Using weather data, geographical data, and UN country statistical data—all open datasets that are publicly available for download—Margriet Groenendijk walks you through an example of a typical workflow: defining the question, finding the data, exploring the data and finding the best tools for the analysis, cleaning and storing the data, and visualizing and summarizing the cleaned data. This work is quite often done iteratively, with each iteration informed by a growing understanding of the data through munging and crunching.
Margriet concludes by highlighting some of the latest tools and tricks available to data scientists. More data is now easily accessible through REST APIs, making it even simpler to store and analyze (big) data in the cloud using tools such as Spark, Python notebooks, or Scala notebooks. These new developments make collaborating easy by allowing data scientists to easily share their data and analyses.
Margriet is a Data Scientist and Developer Advocate for the IBM Watson Data Platform. She has a background in Climate Science where she explored large observational datasets of carbon uptake by forests and the output of global scale weather and climate models. Now she explores ways to simplify working with diverse data using cloud databases, data warehouses, Spark, and Python notebooks.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com