Enterprises want to be data driven from the very beginning or want to join the race for data supremacy. Being data driven requires the system to store and process every single transaction and interaction the customer makes with the product, thus enabling the business to make better decisions.
But storing, processing, and analyzing data comes with a cost. This cost is distributed across the choice of technology, infrastructure, and go-to-market strategy.
Nischal HP and Raghotham Sripadraj share their experience building data science platforms for various enterprises, with an emphasis on making the right architecture choices for things such as databases, queues, caching mechanisms, distribution of the workload, underlying technology for machine learning and predicitive models, visualization, and prototyping. Nischal and Raghotham stress the importance of using distributed and fault-tolerant tools, which themselves come with the cost of managing the infrastructure (including, by implication, a dedicated team to monitor the infra). However, with small data, simple tools take you a long way.
Many things can go unnoticed in building an end-to-end data science system, like the importance of logging, building a data pipeline that sends notifications to the required medium of communication, exposing data science as a service via APIs, or A/B testing for data science-backed feature releases when required. Only when the data science solution is in production does it power the organization the right way.
When building data science products you should live by the motto “fail fast.” Nischal and Raghotham themselves have failed fast when making these choices, but in time they came to understand that adopting the latest and the coolest technology on the planet just for the sake of it is not the right thing to do.
Nischal HP is the cofounder and data scientist at Unnati Data Labs, where he is building end-to-end data science systems in the fields of fintech, marketing analytics, and event management. Nischal is also a mentor for data science on Springboard. Previously he built, from scratch, various ecommerce systems for catalog management, recommendation engines, and sentiment analyzer during his tenure at Redmart and built various data crawlers and intention mining systems and laid down initial work on an end-to-end text mining and analysis pipeline at SAP Labs. The majority of his work, however, was centered around building gamification of technical indicators for algorithmic trading platforms. Nischal has conducted workshops in the field of deep learning across the world and has spoken at a number of data science conferences. He is a strong believer of open source and loves to architect big, fast, and reliable systems. In his free time, he enjoys music, traveling, and meeting new people.
Raghotham Sripadraj is senior data scientist at Ericsson. Raghotham is also a mentor for data science on Springboard. Previously, he headed the data science team at Treebo Hotels and was cofounder and data scientist at Unnati Data Labs, where he built end-to-end data science systems in the fields of fintech, marketing analytics, and event management. Before that, at Touchpoints Inc., he single-handedly built a data analytics platform for a fitness wearable company, and at SAP Labs, he was a core part of what is currently SAP’s framework for building web and mobile products, as well as a part of multiple company-wide events helping to spread knowledge both internally and to customers. Drawing on his deep love for data science and neural networks and his passion for teaching, Raghotham has conducted workshops across the world and given talks at a number of data science conferences. Apart from getting his hands dirty with data, he loves traveling, Pink Floyd, and masala dosas.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.