Sep 23–26, 2019
Please log in

Foundations for successful data projects

Ted Malaska (Capital One), Jonathan Seidman (Cloudera), Matthew Schumpert (Cloudera, Inc.), Raman Rajasekhar (Cloudera Inc), Krishna Maheshwari (Cloudera)
1:30pm5:00pm Tuesday, September 24, 2019
Location: 1E 10
Secondary topics:  Culture and Organization
Average rating: *****
(5.00, 3 ratings)

Who is this presentation for?

  • Technical leaders, including technical leads, architects, managers; CTOs, CDOs, and CIOs; and developers working on developing data projects

Level

Intermediate

Description

Most organizations have developed processes and practices for data management and development of large software projects. While many of these processes and practices are still relevant and valuable, the dramatic growth in volume and variety of data, along with new tools to manage this data, have caused these same organizations to struggle to adapt to this new landscape. This includes understanding how to evaluate new data management systems, how to properly staff projects to ensure success, how to properly evaluate and manage risks when working with these new management systems, and so on.

Ted Malaska and Jonathan Seidman detail guidelines and best practices to provide a path through the process of developing data projects from planning to implementation.

Topics include:

  • Starting the planning process by understanding the key data project types
  • Selecting data management software in the new enterprise data space
  • Managing project risk, including technology risk, team risk, and requirements risk
  • Ensuring integrity of data through your entire data pipelines
  • Ensuring the integrity of data through effective data governance and management of data

You’ll come away with insights on managing and delivering your own successful data projects based on Ted and Jonathan’s years of experience working with multiple companies and customers.

Prerequisite knowledge

  • Experience with data management concepts and systems such as relational databases
  • Familiarity with newer data management systems such as Hadoop or Cassandra (useful but not required)
  • Experience working on building large software projects (useful but not required)

Materials or downloads needed in advance

None

What you'll learn

  • Gain guidelines on delivering successful data projects
  • Learn to apply existing knowledge to develop new data management projects
Photo of Ted Malaska

Ted Malaska

Capital One

Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.

Photo of Jonathan Seidman

Jonathan Seidman

Cloudera

Jonathan Seidman is a software engineer on the cloud team at Cloudera. Previously, he was a lead engineer on the big data team at Orbitz, helping to build out the Hadoop clusters supporting the data storage and analysis needs of one of the most heavily trafficked sites on the internet. Jonathan is a cofounder of the Chicago Hadoop User Group and the Chicago Big Data Meetup and a frequent speaker on Hadoop and big data at industry conferences such as Hadoop World, Strata, and OSCON. Jonathan is the coauthor of Hadoop Application Architectures from O’Reilly.

Matthew Schumpert

Cloudera, Inc.

Matt has been working in the enterprise infrastructure software space for 15 years in various capacities, including product management, sales engineering, and strategic alliances. A veteran of the Hadoop ecosystem since 2010, Matt is currently focused on driving cluster management and workload management technology initiatives at Cloudera. Matt holds a BS in Computer Science from the University of Virginia.

Raman Rajasekhar

Cloudera Inc

Photo of Krishna Maheshwari

Krishna Maheshwari

Cloudera

Krishna Maheshwari is the director of product management at Cloudera and is responsible for operational databases (HBase, Phoenix, Kudu, and Accumulo). You can find him on LinkedIn.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires