Companies have invested an estimated $3-4 trillion in IT over the last 20-plus years. Most investment has gone into development and deployment of single systems, applications, functions and geographies to automate and optimize key business processes. This increases the number of data silos. Automated business processes generate further data.
Companies are now investing heavily in Big Data and Analytics 3.0 to begin the analytic prosecution of all this data. Data Variety – the natural, siloed nature of data as it’s created – is becoming a bottleneck. Its cost is appreciated when companies attempt to ask simple questions across many business silos: divisions, geographies, functions. Current top-down, deterministic data unification approaches (such as ETL, ELT and MDM) weren’t designed to scale to the variety of hundreds, thousands or tens of thousands of data silos. These systems depend on highly trained architects developing “master” schemas – “the one schema to rule them all.” This is a red herring.
The fundamental diversity and mutability of enterprise data and semantics lead towards a bottom-up, probabilistic approach to connecting data sources from various silos. You also need to engage source owners to curate data at scale. Overcoming data silos demands a more scalable, open and collaborative approach to getting data to work together – one that respects the need for data quality, provenance and fidelity.
A new bottom-up, probabilistic approach to data unification provides the scalability to exploit Big Data Variety. Finding and connecting siloed data into unified views starts to look more like a Google search circa 2014 than a Yahoo index crawl circa 1995.
This session is sponsored by Tamr
Andy Palmer is co-founder and CEO of Tamr, Inc., Palmer co-founded Tamr with fellow entrepreneur Michael Stonebraker, PhD, adjunct professor at MIT CSAIL; Ihab Ilyas, professor at the University of Waterloo; and others. Previously, Palmer was co-founder and founding CEO of Vertica Systems, a pioneering big data analytics company (acquired by HP).
During his career as an entrepreneur, Palmer has served as founding investor, BOD member or advisor to more than 50 start-up companies. He founded Koa Labs, a co-working space for entrepreneurs to start independent companies in Cambridge’s Harvard Square, and was named 2013 Angel of the Year by the New England Venture Capital Association.
He also served as Global Head of Software Engineering and Architecture at Novartis Institutes for BioMedical Research (NIBR) and as a member of the start-up team and Senior Vice President of Operations and CIO at Infinity Pharmaceuticals (NASDAQ: INFI). Earlier in his career, he held executive positions and served as a member of the core start-up teams at Bowstreet (acquired by IBM), pcOrder.com (NASDAQ: PCOR) and Trilogy. He earned undergraduate degrees in English, history and computer science from Bowdoin College, and an MBA from the Tuck School of Business at Dartmouth.
For exhibition and sponsorship opportunities, email firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World contacts
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.