Serverless computing offers a fundamentally new and more efficient abstraction for architecting systems in the cloud. Instead of managing virtual machines, developers simply submit “functions” or scripts that are executed behind the scenes with the minimal required resources. Mehul Shah offers an overview of serverless computing and details AWS Glue’s severless analytics features for data science, data discovery, data cleaning and transformation, and data lake management.
Unlike other analytic systems, AWS Glue allows customers to run arbitrary Python or Spark code that is automatically scaled with no limitations on runtime. Customers can interact with Glue through their favorite notebooks, continuously monitor execution metrics and logs through the console for health and debugging, and scale the workload both horizontally (more workers) and vertically (bigger workers). Finally, Glue Spark scripts integrate seamlessly with the Glue Data Catalog for a true end-to-end serverless analytics experience.
This session is sponsored by Amazon Web Services.
Mehul Shah heads two cloud services at AWS: AWS Lake Formation and AWS Glue. His expertise spans large-scale data management, distributed systems, and energy-efficient computing. His work has been published in top-tier conferences and journals and has won several awards including a Test of Time. Previously, he was cofounder and CEO of Amiato, a startup that offered a real-time ETL cloud service, and principal research scientist at HP Labs. He holds a PhD from UC Berkeley, where his work focused on adding fault tolerance and autoscaling in the TelegraphCQ stream processing system, and both an MEng and BS in CS and physics from MIT. He’s currently a member of the Sort Benchmark committee.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org