Sep 23–26, 2019

Finding your needle in a haystack

Naghman Waheed (Bayer Crop Science), John Cooper (Bayer)
5:25pm6:05pm Wednesday, September 25, 2019
Location: 1A 23

Who is this presentation for?

Managers, cloud architects, cloud engineers, data stewards




The need for a robust metadata and knowledge management system was a gap that had existed at Bayer Crop Sciences Division data environment for a while. As new systems were introduced and exiting systems enhanced the complexity of the entire data ecosystem increased significantly over time. Often finding and understanding the nature and meaning of data sets and processes was difficult to find and often limited to a select few individuals in the company. In order to remedy the situation, the Data Platform Architecture and Engineering Team embarked on creating a scalable metadata and knowledge platform, named Haystack. The result has been a an easy to use system that is now being used across the globe for collecting both technical and business metadata and organizing business glossary for all data systems at the company.

The entire system has been designed with several key architecture and engineering principles in mind. Instantiated in AWS cloud, and using only open source components, the system is fully scalable for both processing and storage needs. Moreover, integration with existing key data systems and ease of use for information entry are some of the key features incorporated into the overall design of the new system. The entire platform has been built using open source software components. Its key components include MediaWiki being used as an information storage engine, Kafka producers and consumers that move metadata in and out of Haystack from various systems, and an ElasticSearch cluster that is integrated with MediaWiki’s search engine. Moreover, what started as a small Slackbot to retrieve simple queries within Haystack, has now evolved into a multi-platform AI that can use machine learning and natural language processing for interpreting queries and retrieving information. This results in a unique personal experience for the end user.

In this session we will focus on technical design and build of the entire system. Topics include the technical architecture, how and why we chose certain open source components and the lessons learned along the way. The talk will also highlight the value derived out of the new platform thru examples of how the system is being used to streamline gathering of metadata information from both business and technical users thereby making it simple for all users to easily search and learn about data sets that exist within our company.

Prerequisite knowledge

Familiarity with AWS Cloud and services, data management, metadata management

What you'll learn

Here are some key takeaways the audience will be able to walk away with : 1. Metadata and knowledge management systems can significantly aid data stewardship function. 2. Agility and scalability behind cloud solution can be a competitive advantage for your business. 3. Using open source components can allow you to build innovative solutions.
Photo of Naghman Waheed

Naghman Waheed

Bayer Crop Science

Naghman Waheed leads the data platforms team at Bayer, where he is responsible for defining and establishing enterprise architecture and direction for data platforms. Naghman is an experienced IT professional with over 25 years of work devoted to the delivery of data solutions spanning numerous business functions, including supply chain, manufacturing, order to cash, finance, and procurement. Throughout his 20+year career at Bayer, Naghman has held a variety of positions in the data space, ranging from designing several scale data warehouses to defining a data strategy for the company and leading various data teams. His broad range of experience includes managing global IT data projects, establishing enterprise information architecture functions, defining enterprise architecture for SAP systems, and creating numerous information delivery solutions. Naghman holds a BA in computer science from Knox College, a BS in electrical engineering from Washington University, an MS in electrical engineering and computer science from the University of Illinois, and an MBA and a master’s degree in information management, both from Washington University.

Photo of John Cooper

John Cooper


Will add later

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts