Foundations for Successful Data Projects
Who is this presentation for?Technical leaders, including technical leads, architects, managers, and CTOs, CDOs, and CIOs, as well as developers working on developing data projects.
Most organizations have developed processes and practices for data management and development of large software projects. While many of these processes and practices are still relevant and valuable, the dramatic growth in volume and variety of data, along with new tools to manage this data, have caused these same organizations to struggle to adapt to this new landscape. This includes understanding how to evaluate new data management systems, how to properly staff projects to ensure success, how to properly evaluate and manage risks when working with these new management systems, and so on.
In this tutorial, we’ll share guidelines and practices to provide a path through the process of developing data projects, from planning to implementation. This includes topics like:
- Starting the planning process by understanding the key data project types.
- Selecting data management software in the new enterprise data space.
- Managing project risk, including technology risk, team risk, and requirements risk.
- Ensuring integrity of data through your entire data pipelines.
- Ensuring the integrity of data through effective data governance and management of data.
After taking this tutorial, you’ll come away with insights on managing and delivering your own successful data projects based on the presenters years of experience working with multiple companies and customers.
Prerequisite knowledgeUnderstanding and experience with data management concepts and systems such as relational databases. Previous experience working on building large software projects will be helpful. Some knowledge of newer data management systems such as Hadoop or Cassandra will be helpful but not required,
Materials or downloads needed in advance
What you'll learn
Ted Malaska is currently a Director of Enterprise Architecture at Capital One, before that he was the Director of Engineering at Blizzard’s Global Insight Department. Ted was also principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem, and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has also contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.
Jonathan Seidman is a software engineer on the cloud team at Cloudera. Previously, he was a lead engineer on the big data team at Orbitz Worldwide, helping to build out the Hadoop clusters supporting the data storage and analysis needs of one of the most heavily trafficked sites on the internet. Jonathan is a cofounder of the Chicago Hadoop User Group and the Chicago Big Data Meetup and a frequent speaker on Hadoop and big data at industry conferences such as Hadoop World, Strata, and OSCON. Jonathan is the coauthor of Hadoop Application Architectures from O’Reilly.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts