Get a CLUE: Optimizing big data compute efficiency
Who is this presentation for?Data engineers, data architects, developers
Compute efficiency optimization is of critical importance in the big data era, as data science and ML algorithms become increasingly complex and data size increases exponentially over time. Opportunities exist throughout the resource use funnel, which Zhe Zhang and Huangming Xie characterize using the framework CLUE: capacity of resources (all resources available) → loaded resources (resources that applications requested from Hadoop) → used resources → effective resources (resources spent on effective or useful work).
Zhe and Huangming highlight highlight initiatives from the past year and share the lessons they learned, including: C → L optimization with smart scheduling: they applied machine learning to figure out the best start time for scheduled flows while maintaining business SLA to decrease >20% capacity during peak hours and reduce latency for ad hoc jobs; L → U optimization with YARN overcommit: they analyzed CPU and memory usage for >7,000 nodes with different types of SKUs to evaluate the opportunity for YARN to reclaim requested but unused memory resources from applications (aka overcommit); U → E optimization with Spark SQL optimizations: they developed efficient algorithms to join large datasets in Spark SQL, which is a common pattern in processing LinkedIn member graph data and generating features for ML algorithms. You’ll get to view their investigation and experience with adaptive execution, cost-based optimization, and other SQL execution optimizations.
These initiatives allow LinkedIn to improve compute efficiency, save hundreds of millions of dollars, and boost developers’ productivity. Its framework, strategy, and lessons learned from compute efficiency optimization can be leveraged by other companies to improve their own resource intelligence strategy.
- General knowledge of big data, Spark, and Hadoop
What you'll learn
- Learn how LinkedIn improved compute efficiency, saved hundreds of millions of dollars, and boosted developers’ productivity
- Discover how to leverage LinkedIn's framework, strategy, and the lessons they learned to improve your resource intelligence strategy
Zhe Zhang is a senior manager of core big data infrastructure at LinkedIn, where he leads an excellent engineering team to provide big data services (Hadoop distributed file system (HDFS), YARN, Spark, TensorFlow, and beyond) to power LinkedIn’s business intelligence and relevance applications. Zhe’s an Apache Hadoop PMC member; he led the design and development of HDFS Erasure Coding (HDFS-EC).
Huangming Xie is a senior manager of data science at LinkedIn, where he leads the infrastructure data science team to drive resource intelligence, optimize compute and storage efficiency, and automate capacity forecasting for better scalability, as well as improve site availability for a pleasant member and customer experience. Huangming is an expert at converting data into actionable recommendations that impact strategy and generate direct business impact. Previously, he lead initiatives to enable data-driven product decisions at scale and build a great product for more than 600 million LinkedIn members worldwide.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsors
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires