For most companies, data analysis means collecting the data, building a data pipeline to clean and transform the data into a usable form, and only then looking for insights. Without good tools to automate the data pipeline, data flow management can become a tedious and brittle process.
In this talk we highlight some useful tools that we built in-house:
Siwei Zhu is a data scientist at Scribd focused on understanding how users engage with the product. Previously, he has worked as a data scientist at Facebook.
Kevin Perko is the Data Team Lead at Scribd, the leading subscription reading service. He focuses on evaluating search engine performance, building data pipelines, and democratizing access to data through various initiatives including Reddit-style AMAs, emails, and individual outreach. With nearly a decade of analytics experience, Kevin has worked for a multitude of Bay Area startups including Eventbrite, GREE, and Education.com. He has a background in Finance from Santa Clara University and has volunteered with The University of Cape Town to teach computer skills in the townships of South Africa.
Comments on this page are now closed.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.