Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

What Esperanto can teach us about collaboration in the big data environment

11:00–11:20 Wednesday, 1/06/2016
DDBD
Location: Capital Suite 2/3 Level: Non-technical
Average rating: *****
(5.00, 1 rating)

Prerequisite knowledge

Attendees should have a basic understanding of the big data environment and the business value behind data-driven projects, as well as global understanding of the different steps involved (connecting to data, data preparation, predictive models, and deployment into production), and basic knowledge of machine learning and programming languages.

Description

Today, a company depends on many skill sets, from the business expert to the data scientist to the data engineer, to develop a data product, whether that product is for the reduction of churn, the detection of fraud, or intelligent targeting. Anne Sophie Roessler uses the example of the failed universal language Esperanto to explain how to help these stakeholders—most of whom use different languages and technologies and have different baselines—work together.

In the 1880s, an eye doctor by the name of Ludwig Lejzer Zamenhof attempted to foster peace and international understanding between people of different languages by creating a universal system of communication, which he called Esperanto. But Esperanto failed because replacing one language by another is an almost impossible goal. People are, by nature, more attached to, more comfortable with, and more efficient in their mother tongue.

This is particularly relevant for big data projects, where many skilled stakeholders—with many different skills—are involved. These include architects, who have to connect to all data sources and spend a lot of time doing data plumbing between Hadoop clusters, SPSS infrastructure, etc.; developers and data scientists, who use different programming languages (SQL, Spark, Python, Hive, R, etc.); business intelligence analysts, who produce reports based on data and spend a majority of their time in Excel spreadsheets or on PowerPoint presentations; and business decision makers, who usually speak plain English and want to be able to make decisions based on analyzed data. Even though these people have the same global goal, they all work by different means and track different KPIs to achieve that goal.

So what can make everyone work together as a team and give every stakeholder the both ability to achieve their goals and the freedom to improvise and produce efficiently? The first reaction is often an attempt at democratizing data access—in other words, creating a unique environment in which one language is used by all (the goal of Esperanto). The problem with this approach is that people who speak programming languages also speak English (or any other language), while the opposite is not true. Therefore, this unique environment usually tends to turn toward a code-free tool. But code-free also means less efficient and highly frustrating for developers and data scientists.

If you wipe out diversity of approach and culture for the sake of simplification and better collaboration, you’ll soon figure out two things: forcing everyone to fit in a restricted environment despite their skills doesn’t work and the quality of the framework in a project is as important as the technical abilities of its stakeholders. This is why the environment should adapt to the content and not the contrary.

The challenge of a big data project management is to make IT, BI, and business units work together on the same project, letting everyone speak their own language for maximum efficiency. Anne Sophie explores one solution: creating an adapted framework which allows a real-time view of what happens on the workflow, easy prototyping, scalability and deployment into production on an end-to-end workflow, and the ability to leverage underlying technologies (machine learning, Hive, Spark, MapReduce, etc.) with the frameworks that are best adapted to each individual skill set. In such an environment, diversity no longer means chaos. On the contrary, it becomes the key to a successful project.

Anne Sophie Roessler

Dataiku

Anne Sophie Roessler is deployment strategist at Dataiku, developer of Data Science Studio (DSS), which integrates all the capabilities required to build end-to-end highly specific services that turn raw data into business-impacting predictions quickly. From her experience in project management, Anne Sophie got the conviction that collaboration is a particularly relevant topic in the big data environment. She believes that in the future, all the stakeholders of data-driven projects will have to be able to work with data, whether they have technical skills or not. Anne Sophie graduated from ESCP Paris. She also studied classical singing and worked as an opera singer for a few years.