Other industries are catching on to what Wall Street has known for years – the collection of data and application of analytic methods can provide enormous value to enterprises.
In this tutorial, attendees will get a taste of how large scale data science techniques and technologies developed for the consumer internet can be applied in the world of finance. Attendees will enrich stock tick data with Wikipedia page view traffic data as well as the text of pages. We will guide an exploration of the relationship between the traffic on Wikipedia pages to the movement of stock prices.
In this tutorial attendees will learn how to:
Sean Owen is director of data science at Cloudera in London. Before Cloudera, he founded Myrrix Ltd. (now the Oryx project) to commercialize large-scale real-time recommender systems on Hadoop. He is an Apache Spark committer and is a co-author of O’Reilly Media’s Advanced Analytics on Spark. He was a committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google.
Juliet Hougland is a data scientist at Cloudera, and contributor/committer/maintainer for the Sparkling Pandas project. Her commercial applications of data science include developing predictive maintenance models for oil and gas pipelines at Deep Signal, and designing/building a platform for real-time model application, data storage, and model building at WibiData. Juliet was the technical editor for Learning Spark by Karau et al. and Advanced Analytics with Spark by Ryza et al. She holds an M.S. in applied mathematics from the University of Colorado, Boulder and graduated Phi Beta Kappa from Reed College with a BA in math-physics.
Sandy Ryza is a data scientist at Cloudera focusing on Apache Spark and its ecosystem. He recently lead Spark development at Cloudera. Sandy is a frequent Spark contributor and member of the Apache Hadoop Project Management Committee. He graduated Phi Beta Kappa from Brown University.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are trademarks of the Apache Software Foundation and are used with permission. The ASF has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.