Insertion bottlenecks lie at the heart of database and file-system
innovations, best practices, and system workarounds. Most databases
and file systems are based on B-tree data structures, and suffer from
the performance cliffs and unpredictable run times of B-trees. In
this talk, we introduce the Fractal Tree data structure and explain
how it works and how it provides dramatically improved performance in
both theory and in practice. Although our company Tokutek is selling
a transaction-safe fractal-tree storage engine for MySQL, this talk is
primarily about the underlying technology.
From a theoretical perspective, if B is the block-transfer size, the
B-tree performs O(log_B N) block transfers per insert in the worst
case. In contrast, the Fractal Tree structure performs O((log_B N)/B)
memory transfers per insert, which translates to run-time improvements
of two orders of magnitude.
To relate that theory to practice, we present an algorithmic model for
B-tree performance bottlenecks. We explain how the bottlenecks affect
best practice and how database designers typically modify B-trees to
try to mitigate the bottlenecks. Then we show how Fractal Tree
structures can attain faster insertion rates, intuitively by
transforming disk-seek bottlenecks into disk-bandwidth bottlenecks
We conclude with performance results showing how a Fractal Tree
storage engine can maintain rich indexes more efficiently than
B-trees. Surprisingly, Fractal Tree structures seem to maintain their
order-of-magnitude competitive advantage over B-trees on SSDs as well
as traditional rotating media.
Dr. Kuszmaul’s research focuses on developing computer systems and
hardware that behave well both in practice and in theory. His entry
won 5 out of 6 categories in Jim Gray’s 2007 sorting benchmark
contest, sorting a terabyte in 197
seconds. He formerly architected Akamai’s distributed data collection
system, was a Yale Professor of Computer Science and was a principal
network architect for the Thinking Machines Connection Machine
CM-5 Supercomputer.
He was one of the developers of the MIT
Cilk multithreaded programming
system and wrote, in Cilk++, winning entries for 4 out of 12 of
problems in the 2009 Intel Threading Challenge. Dr. Kuszmaul a
founder and Chief Architect at Tokutek and is a Research Scientist in
the Massachusetts Institute of Technology Computer Science and
Artificial Intelligence Laboratory (MIT CSAIL).
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Yvonne Romaine at yromaine@oreilly.com
Download the O'Reilly MySQL Conference & Expo Sponsor/ Exhibitor Prospectus
Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the O'Reilly MySQL Conference newsletter (login required).
View a complete list of O'Reilly MySQL Conference contacts.
Comments
Rick, fractal trees are designed to be as good as B-trees on queries and insertions. It’s intended for general usage and not to fit in a niche.
Geeky. Very technical talk about technology behind a product. Good for understanding why the product fits in its niche.