Apache Spark is on fire. Over the past five years, more and more organizations have looked to leverage Spark to operationalize their teams and the delivery of analytics to their respective businesses. Adrian Houselander and Joy Spohn demonstrate two use cases of how Apache Spark and Apache Hadoop are being used to harness valuable insights from complex data across cloud and hybrid environments.
The first example showcases RedRock, an application that lets the user act on data-driven insights discovered from Twitter. Powered by IBM Analytics running on Spark and Hadoop, it finds patterns in user tweets to see influential individuals, related topics of interest, and where in the world the conversation is taking place. RedRock leverages two specific data science algorithms, Word2vec and k-means, to build screens in the app. The Word2vec algorithm, based on deep neural networks, assigns a numerical vector to each of the words in the Twitter data. Once a feature matrix is formed with the Word2vec algorithm, k-means is applied to the cluster words.
The second example showcases a financial institution that derives cross-sell/up-sell insight targeted to their specific clients for purposes of customer retention/loyalty. The financial institution wants to leverage business-owned on-premises data found in DB2 for z/OS, IMS, and VSAM within their z Systems (mainframe) environment augmented with insight from sentiment analysis of Twitter data and public S&P stock price data, which could be based on cloud implementations. Apache Spark running natively on z/OS provides flexibility, economic advantages, and governance through avoidance of unnecessary ETL leveraging federated analytics.
This session is sponsored by IBM.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.