The spreadsheet lives on, especially in sectors slow to adopt new technology, such as medicine and finance. Not only data is frequently stored and passed around in the spreadsheet formats, analysis is also frequently performed without leaving Excel. And when the data happens to be not as clean as you hoped it to be, serious errors occur and reproduce through the spreadsheet workcycle. Data quality issues such as duplicates and nulls, common practices such as copy-pastes, VLOOKUPS, and manual imputations as well as failure to properly understand and clean the data prior to making conclusions frequently lead to significant errors.
Pandas library provides a powerful tool of ingesting, cleaning, transforming, and visualizing spreadsheet data that are either lacking in Excel or are very painful to implement given the number of worksheets required for a task. This talk will demonstrate several frequently occurring data issues and show how they can be dealt with in Pandas.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com