Engineering the Future of Software
November 13–14, 2016: Training
November 14–16, 2016: Tutorials & Conference
San Francisco, CA

Cloud architectures for data science

2:15pm–3:05pm Wednesday, 11/16/2016
Fundamentals
Location: Tower Salon A Level: Beginner
Average rating: ****.
(4.00, 4 ratings)

What you'll learn

  • Gain an overview of the latest data science developments
  • Learn the workflow and tools to get started analyzing data

Description

Data science is currently a hot topic, but what is it? There are several definitions and opinions. Data science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools, and presenting the best possible answer in a clear, engaging manner.

Using weather data, geographical data, and UN country statistical data—all open datasets that are publicly available for download—Margriet Groenendijk walks you through an example of a typical workflow: defining the question, finding the data, exploring the data and finding the best tools for the analysis, cleaning and storing the data, and visualizing and summarizing the cleaned data. This work is quite often done iteratively, with each iteration informed by a growing understanding of the data through munging and crunching.

Margriet concludes by highlighting some of the latest tools and tricks available to data scientists. More data is now easily accessible through REST APIs, making it even simpler to store and analyze (big) data in the cloud using tools such as Spark, Python notebooks, or Scala notebooks. These new developments make collaborating easy by allowing data scientists to easily share their data and analyses.

Photo of Margriet Groenendijk

Margriet Groenendijk

IBM

Margriet is a Data Scientist and Developer Advocate for the IBM Watson Data Platform. She has a background in Climate Science where she explored large observational datasets of carbon uptake by forests and the output of global scale weather and climate models. Now she explores ways to simplify working with diverse data using cloud databases, data warehouses, Spark, and Python notebooks.