Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Spark, GraphX, and blockchains: Building a behavioral analytics platform for forensics, fraud, and finance

Bryan Cheng (BlockCypher), Karen Hsu (BlockCypher)
4:20pm5:00pm Thursday, March 16, 2017
Spark & beyond
Location: LL21 C/D Level: Beginner
Secondary topics:  Financial services
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Engineers and architects

Prerequisite knowledge

  • Familiarity with Spark

What you'll learn

  • Understand the unique challenges blockchains present to data analysis
  • Learn how to utilize Spark to overcome these limitations and enhance the depth and quality of data available


Blockchain technology represents a new paradigm for applications in fintech and beyond, one that requires new approaches to age-old problems—how do you track criminals, derive BI insights, and fight fraud in a blockchain-powered world? While the decentralized nature of the blockchain presents challenges to easily understanding the movement of money, the immutable, shared ledger also represents a tremendous opportunity for behavioral analytics at scale. As blockchain adoption continues to accelerate, organizations big and small need to understand the systems required to efficiently and quickly analyze the wealth of complex information available.

Since 2014, BlockCypher has been on the forefront of blockchain infrastructure. BlockCypher’s Bryan Cheng and Karen Hsu describe how they built machine-learning and graph traversal systems on Apache Spark to help government organizations and private businesses stay informed in the brave new world of blockchain technology. Bryan and Karen also share lessons learned combining these two bleeding-edge technologies and explain how these techniques can be applied to private and federated chains.

Topics include:

  • The unique challenges inherent with blockchain data and identities
  • The architecture BlockCypher uses to power blockchain analytics
  • Techniques and tools used for particular types of blockchains
  • Integration of discrete machine-learning, graph traversal, and heuristic components into a coherent pipeline combining machine and human intelligence
Photo of Bryan Cheng

Bryan Cheng


Bryan Cheng is a backend developer and analytics lead at BlockCypher. Since 2015, he has worked on infrastructure powering bitcoin and other blockchains. As analytics lead, Bryan works to combine BlockCypher’s experience with blockchains of all sizes with the latest in machine-learning and big data analytics to help governments and private industry stay informed and secure. Previously, Bryan cofounded a startup and led a network access control team at UC Berkeley, where he graduated with a BS in materials science and mechanical engineering. When not hacking in Spark or writing Golang, Bryan can be found learning Rust, riding his bike, and exploring VR.

Photo of Karen Hsu

Karen Hsu


Karen Hsu is head of growth at BlockCypher. Karen has over 20 years of experience in technology, with a focus on business intelligence, fintech, and the blockchain, and has worked in a variety of engineering, marketing, and sales roles to bring new products to market. She has coauthored four patents. Karen holds a BS in management science and engineering from Stanford University.