In 2007, a computer game company decided to jump ahead of competitors by capturing and using data created during online gaming. It believed that this data could be used to not only improve the in-game experience but also improve marketing, provide insight into customers, deliver personalized recommendations, research new products, and aid product managers responsible for the product life-cycle.
At the time, collecting and storing all the events generated by online game play was a novel idea. So was the idea of using this nontransactional data across multiple lines of business. The company thought its main problem would be dealing with Internet-scale data. Despite some bad technology choices and major project problems, it turned out that engineering was the easy part. None of the existing development or data practices prepared the company for dealing with the data management and process challenges stemming from distributed devices creating data: business estimation problems, distributed metadata, master data in operational systems and in firmware, varied SLAs, data quality problems, varied event data, and multiple engineering teams with different skills and expectations.
Mark Madsen shares a case study that explores the oversights, failures, and lessons the company learned along the way. The lessons from this project apply as much today in the post-Hadoop, Kafka, and Spark world as they did back then. The only part that has gotten easier is the ability to collect and store data.
Mark Madsen is a fellow at Teradata, where he’s responsible for understanding, forecasting, and defining the analytics ecosystem and architecture. Previously, he was CEO of Third Nature, where he advised companies on data strategy and technology planning and vendors on product management. Mark has designed analysis, machine learning, data collection, and data management infrastructure for companies worldwide.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.