Leading companies that are getting the most out of their data are not focusing on queries and data lakes; they are actively integrating analytics into their operations. Jack Norris reviews three customer case studies in ad/media, financial services, and healthcare to show how a focus on real-time data streams can transform the development, deployment, and future agility of applications.
FINRA ingests over 50 billion records of stock market trading data daily into multipetabyte databases. Janaki Parameswaran and Kishore Ramachandran explain how FINRA technology integrates data feeds from disparate systems to provide analytics and visuals for regulating equities, options, and fixed-income markets.
Through collaboration with some of the top payments companies around the world, Intel has developed an end-to-end solution for building fraud detection applications. Yuhao Yang explains how Intel used and extended Spark DataFrames and ML Pipelines to build the tool chain for financial fraud detection and shares the lessons learned during development.
Offshore leaks, Lux leaks, Swiss leaks, Bahamas leaks, and the Panama Papers—all have one thing in common: they were all uncovered by the International Consortium of Investigative Journalists. Giannina Segnini and Mar Cabra explain how this global network of muckrakers uses technology to deal with big data and find cross-border stories that have worldwide impact.
We're headed toward a decentralized economy, where our finances are managed by investment algorithms, big data analytics, IoT-linked devices, and crowdfunding marketplaces. But its potential won't be realized until we overcome a core obstacle: trust. Michael Casey explains why blockchain technology, with its decentralized trust architecture, is the platform that makes everything else possible.
Finance is information. From analyzing risk and detecting fraud to predicting payments and improving customer experience, data technologies are transforming the financial industry. And we're diving deep into this change with a new day of data-meets-finance talks, tailored for Strata + Hadoop World events in the world's financial hubs.
Bas Geerdink offers an overview of the evolution that the Hadoop ecosystem has taken at ING. Since 2013, ING has invested heavily in a central data lake and data management practice. Bas shares historical lessons and best practices for enterprises that are incorporating Hadoop into their infrastructure landscape.
Kaushik Deka and Phil Jarymiszyn discuss the benefits of a Spark-based feature store, a library of reusable features that allows data scientists to solve business problems across the enterprise. Kaushik and Phil outline three challenges they faced—semantic data integration within a data lake, high-performance feature engineering, and metadata governance—and explain how they overcame them.
Yaron Haviv explains how to design real-time IoT and FSI applications, leveraging Spark with advanced data frame acceleration. Yaron then presents a detailed, practical use case, diving deep into the architectural paradigm shift that makes the powerful processing of millions of events both efficient and simple to program.
Jim Scott outlines the core tenets of a message-driven architecture and explains its importance in real-time big data-enabled distributed systems within the realm of finance.
With the emergence of the Internet, social media, and the IoT, the nature of analysis for investment decisions has shifted from linear analysis to nonlinear techniques. Robert Passarella offers a survey on how we arrived at this point in finance, where we came from, and where we're going, as we leave the world of model-driven finance and enter into the world of data-driven finance.
Many areas of applied machine learning require models optimized for rare occurrences, such as class imbalances, and users actively attempting to subvert the system (adversaries). Brendan Herger offers an overview of multiple published techniques that specifically attempt to address these issues and discusses lessons learned by the Data Innovation Lab at Capital One.
How can the value of a patent be quantified? Josh Lemaitre explores how Thomson Reuters Labs approached this problem by applying machine learning to the patent corpus in an effort to predict those most likely to be enforced via litigation. Josh covers infrastructure, methods, challenges, and opportunities for future research.
Visa, the world’s largest electronic payments network, is transforming the way it manages data: database appliances are giving way to Hadoop and HBase; proprietary ETL technologies are being replaced by Spark; and enterprise warehouse data models will be complemented by flexible data schemas. Nandu Jayakumar explores the adoption of big data practices at a conservative, financial enterprise.
Anand Sanwal explores the trends, technologies, and business models that will disrupt financial services.
The release of Hadoop fundamentally changed the ability of financial enterprises to address velocity, variety, and volume in data. Ten years later, Juan Huerta describes the most significant data-oriented technical challenges the industry currently faces and the promising confluence of technologies and modeling paradigms that will drive the evolution of data technologies during the next decade.
Susan Woodward discusses venture outcomes—what fraction make lots of money, which just barely return capital, and which fraction fail completely. Susan uses updated figures on the fraction of entrepreneurs who succeed, including some interesting details on female founders of venture companies.
Zillow pioneered providing access to unprecedented information about the housing market. Long gone are the days when you needed an agent to get comparables and prior sale and listing data. And with more data, data science has enabled more use cases. Jasjeet Thind explains how Zillow uses Spark and machine learning to transform real estate.