Organizations from small startups to large enterprises are increasingly using open source frameworks such as Apache Hadoop, Spark, and Presto to address a broad range of analytic use cases, including business intelligence, stream processing, and machine learning. However, with any big data project comes the risk of uncapped costs, delayed timelines, expensive infrastructure, and difficult choices about where to focus in the open source toolset.
Jonathan Fritz explains how organizations are deploying these and other big data frameworks with Amazon Web Services (AWS) and how you too can quickly and securely run Spark and Presto on AWS. Jonathan demonstrates how to lower costs and accelerate deployment of big data applications, using Amazon EMR to easily create a Hadoop cluster running Spark and Presto and querying data in Amazon S3 using ANSI SQL. Jonathan then explores how you can use Amazon S3 as a highly scalable, durable, and secure data lake by decoupling compute from storage, before outlining best practices to lower costs using Amazon EC2 Spot Instances and discussing how to secure your clusters using AWS’s extensive security capabilities.
This session is sponsored by Amazon.
Jonathan Fritz is a senior product manager at Amazon Elastic MapReduce (EMR), a managed service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data using Hadoop, Spark, and Presto. Previously, Jonathan was the founder and CEO of Eleven Media Group and performed research in organic chemistry and nanotechnology in the Maurer Group at Washington University in St. Louis. He holds an MBA from the Stanford Graduate School of Business and a bachelor’s degree in chemistry with minor in biology from Washington University in St. Louis. He received a certificate for accomplishment in entrepreneurship from the Skandalaris Center for Entrepreneurial Studies.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.