Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
Apache Hadoop is great for storing large amounts of unstructured data, but when analyzing this data, users need to reference data from existing RDBMS based systems. We’ll look at how to transfer large volumes of data from Oracle into Hadoop efficiently with high scalability. We’ll also look at some strategies to keep this data up to date and place minimal load on our existing systems.
In addition, we will look at strategies for Hadoop-to-RDBMS data flows, such as moving aggregated data from Hadoop to RDBMS, and consider how Hadoop may be used alongside an RDBMS as a long-term archive or as a long-term transaction or audit log. We will discuss the new features of Apache Sqoop 2.0 and the merging of the Dell/Quest connector for Oracle into the Sqoop core, providing Sqoop with much improved scalability and manageability.
Guy Harrison is an executive director of research and development at Dell Software. Guy is the author of Oracle Performance Survival Guide, MySQL Stored Procedure Programming, and Oracle SQL High Performance Tuning as well as other books, articles, and presentations on database technology. He also writes a monthly column for Database Trends and Applications (www.dbta.com). Guy can be found on the internet at http://www.guyharrison.net, and on e-mail at firstname.lastname@example.org.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.