Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

The 3 key barriers keeping companies from acting upon the possibilities that big data has to offer

Pauline Brown (Dataiku)
4:00pm–4:40pm Wednesday, 12/02/2015
Data-driven Business
Location: 331 Level: Non-technical
Average rating: ****.
(4.00, 3 ratings)
Slides:   1-PDF 

Prerequisite Knowledge

-Basic understanding of the business value of "big data" and what business use cases come out of "data-driven projects." -A global understanding of the big data market today. -A global understanding of the steps involved in getting from raw data to business solution (from data storage, to data preprocessing, to modeling and machine learning, to testing, to deploying). -Basic knowledge of the languages, tools, and technology stacks involved in "big data" projects

Description

Pauline Brown has been working in big data companies for a while now. Her job involves understanding why organisations are slow to re-imagine and act upon the possibilities that big data has to offer. Here is her point of view:

The situation:

  • The technologies exist: the tools for treating and analysing big data are out there and they are accessible (open source tools for example)
  • The data exists: there is no doubt that a majority of companies accumulate mounds of data – valuable data
  • The people exist: getting from raw data to developing a business solution requires business knowledge, analytical skills, some development skills. That translates to common skill sets such as business analysts, developers, business decision makers… which most companies have.

The problem:

  • It’s not a one man show: It is a common misconception that data projects are solely driven by data scientists. In reality, and on an enterprise level, in getting from raw data to actually deploying data-driven solutions (say a fraud detection solution or a churn predictor), many different skill sets are involved. From the IT team, to the data wardens, to the data scientists (part developer, part statistician), to the business decision makers, and to the executives, everyone contributes to the project. Unfortunately, getting all these people to work together is difficult. One of the most important barriers is language. Another is isolation.
  • People talking to technology talking to people: All of the above mentioned skill sets don’t speak the same language; some speak SQL, others Python, Pig, Hive, or R, some speak Excel, and others speak plain old English. And to complicate things even more, all of these languages have their own associated tool stack. And these tool stacks aren’t always compatible with each other. So how do you get all these people to work together and share the insights that each individual skill has to offer?
  • Isolation is counter-productive: Companies hire very expensive “data scientists” and expect them to magically create data-driven solutions to business problems quickly – on their own. But a majority of the steps involved in transforming raw data into business solutions are long and tedious… and don’t require a PhD in machine learning. So these expensive data scientists end up spending a majority of their time trying to find and get access to the data, cleaning and enriching the data, fixing data streams and compatibility issues rather than using their “data science” machine learning skills to build high ROI applications for their businesses. When they finally have results to show for their project (months later), technological problems occur (broken data streams, test data no longer matches production data…) or the business objectives have shifted. Projects therefore stay in a test environment and rarely go into production because they take too much time to build.

Finally, this all comes down to one underlying problem: collaboration. Indeed, getting big data embedded on a company level remains a collaboration problem – collaboration between people, technologies, and data. So how do you get businesses to insight collaboration on a widespread level? How do you get your non-data-scientists to get involved in the data project? How do you get your data-driven business solutions to the front lines of production quicker? How do you bring about the shift from data-scientist-does-it-all to data team works together?

Photo of Pauline Brown

Pauline Brown

Dataiku

Pauline Brown is director of marketing at Dataiku, which has developed the most productive predictive services development platform for data professionals, Data Science Studio (DSS). Pauline is a firm believer that the data and predictive analytics ecosystem is entering a new era; one in which even nontechnical members of the data team can transform raw data into business-impacting solutions. After all, as a marketer at Dataiku, data-driven projects in DSS are part of her daily routine. Born and raised in Palo Alto, California, Pauline graduated from Columbia University before joining Paris’s IEP.