Analyst's Nightmare or Laundering Massive Spreadsheets

Moderated by: Feyzi Bagirov & Tatiana Yarmola

Who is this presentation for?

Data Analysts/Scientists, Beginner

Prerequisite knowledge

-Python 3 -Jupyter -Regex

What you'll learn

-Attendees will learn how to spot and clean data quality issues in spreadsheet data using Python

Description

The spreadsheet lives on, especially in sectors slow to adopt new technology, such as medicine and finance. Not only data is frequently stored and passed around in the spreadsheet formats, analysis is also frequently performed without leaving Excel. And when the data happens to be not as clean as you hoped it to be, serious errors occur and reproduce through the spreadsheet workcycle. Data quality issues such as duplicates and nulls, common practices such as copy-pastes, VLOOKUPS, and manual imputations as well as failure to properly understand and clean the data prior to making conclusions frequently lead to significant errors.
Pandas library provides a powerful tool of ingesting, cleaning, transforming, and visualizing spreadsheet data that are either lacking in Excel or are very painful to implement given the number of worksheets required for a task. This talk will demonstrate several frequently occurring data issues and show how they can be dealt with in Pandas.

Elite Sponsors

Strategic Sponsor

Bloomberg

Contributing Sponsor

Impact Sponsor

Domino Data Lab

Supporting Sponsors

Premier Exhibitors

Innovators

Community Partners

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email jupytersponsorships@oreilly.com

Partner Opportunities

For information on trade opportunities with JupyterCon, email partners@oreilly.com

Contact Us

View a complete list of JupyterCon contacts

©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com