Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Serverless analytics in AWS Glue (sponsored by Amazon Web Services)

Mehul Shah (Amazon Web Services )
11:50am12:30pm Wednesday, March 27, 2019
Sponsored
Location: 2005
Average rating: *****
(5.00, 2 ratings)

What you'll learn

  • Explore AWS Glue's severless analytics features for data science, data discovery, data cleaning and transformation, and data lake management

Description

Serverless computing offers a fundamentally new and more efficient abstraction for architecting systems in the cloud. Instead of managing virtual machines, developers simply submit “functions” or scripts that are executed behind the scenes with the minimal required resources. Mehul Shah offers an overview of serverless computing and details AWS Glue’s severless analytics features for data science, data discovery, data cleaning and transformation, and data lake management.

Unlike other analytic systems, AWS Glue allows customers to run arbitrary Python or Spark code that is automatically scaled with no limitations on runtime. Customers can interact with Glue through their favorite notebooks, continuously monitor execution metrics and logs through the console for health and debugging, and scale the workload both horizontally (more workers) and vertically (bigger workers). Finally, Glue Spark scripts integrate seamlessly with the Glue Data Catalog for a true end-to-end serverless analytics experience.

This session is sponsored by Amazon Web Services.

Photo of Mehul Shah

Mehul Shah

Amazon Web Services

Mehul Shah heads two cloud services at AWS: AWS Lake Formation and AWS Glue. His expertise spans large-scale data management, distributed systems, and energy-efficient computing. His work has been published in top-tier conferences and journals and has won several awards including a Test of Time. Previously, he was cofounder and CEO of Amiato, a startup that offered a real-time ETL cloud service, and principal research scientist at HP Labs. He holds a PhD from UC Berkeley, where his work focused on adding fault tolerance and autoscaling in the TelegraphCQ stream processing system, and both an MEng and BS in CS and physics from MIT. He’s currently a member of the Sort Benchmark committee.