Apache Spark plays a key role in addressing several big data challenges in Bing. The diverse set of capabilities in Spark enables a variety of internet-scale workloads that power Bing services. The value Spark adds to the business and how well it fits with the existing data platform architecture complementing existing internal and external big data frameworks is clearly the driver behind the adoption of Spark for various next-gen data processing investments in Bing.
Kaarthik Sivashanmugam shares the Bing team’s experiences with Spark, discussing how Spark is employed in the use cases and covering batch processing of document corpus spanning the web and near real-time processing of events corresponding to hundreds of millions of search queries. Kaarthik also explores the challenges the team faced in adopting Spark and implementing scalable data processing pipelines and explains how they influenced the team in customizing Spark and building extensions.
Kaarthik Sivashanmugam is a principal software engineer in the AI Infrastructure and Tools Group at Microsoft, where he is building a platform for scale-out deep learning to unlock the full potential of GPU cloud, data, and ML techniques in addressing complex AI challenges and enabling magical end-user experiences in various Microsoft services powered by AI. Previously, Kaarthik was the tech lead for the Mobius project and used it to implement Spark Streaming workloads for timely, high-fidelity processing of Bing logs at scale. Before joining Microsoft, Kaarthik was a senior software engineer in a semantic technology startup, where he built an ontology-based semantic metadata platform and used it to implement solutions for KYC/AML analytics.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.