For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at email@example.com.
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org.
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON newsletter (login required).
View a complete list of OSCON 2008 Contacts
This presentation will introduce people to bigdata—a scale-out
database and computing platform. Unlike either Hadoop’s or Google’s
approach, bigdata begins with a distributed index architecture and
derives a high concurrency row store, a high performance semantic web
database, a generic object database, and a distributed file system
with atomic append from some basic operations on those indices. Thompson
will introduce the high-level architecture, show how they derived the
various services from range-partitioned indices, discuss how and where
scale-out indices and map/reduce computing can be combined, and
present in some depth on the scale-out semantic web database including
some performance and scaling data.
This presentation will be technical. People should have an awareness
of cloud computing and/or a familiarity with the semantic web.
This presentation is important because cloud (or grid, or scale-out)
computing will increasingly provide the infrastructure for emerging
businesses. Open source platforms for cloud computing are vital as
they bring enabling technology to more people and enable businesses by
keeping down the cost of scaling out.
This presentation will be interesting to architects and developers who
want to explore cloud computing, to people developing scale-out
infrastructure, such as Hadoop or CouchDB, and to businesses
interested in open source platforms for scale-out computing. bigdata
is especially of interest for the semantic web / Web 3.0 space—there
are no generally available scale-out semantic web databases available
bigdata is a 100% Java project providing scale-out (distributed)
indices, map/reduce style computing, a sparse row store (ala Hadoop’s
HBase, Google’s bigtable, or CouchDB) a distributed file system (ala
Hadoop’s HDFS or Google’s GFS), a high performance RDF database, and a
flexible object generic object model (GOM) database.
The basic building blocks for the bigdata architecture are scale-out
indices, data services (hosting index partitions), and metadata
services (locators for data services). The scale-out indices are
B+Trees and remain balanced under insert and removal operations. The
B+Tree defines a mapping from/to variable length bytes (the keys are
interpreted as unsigned bytes) and structure is imposed on those
keys and values by the application. Indices are transparently
range-partitioned and distributed across a cluster or grid of
commodity servers. Service failover and high availability are handled
by redundent service registrations. Rather than storing index data in
a distributed file system, data is stored on local disk on each
machine hosting a data service. In fact, the distributed file system
itself is just an application of the scale-out index service. Data
failover is handled by replicating data using streaming writes to
secondary services. The services layer uses Jini for service
registration and discovery, but SCA and OSGi integrations are being
Mr. Thompson is the Chief Scientist and a co-founder of SYSTAP, LLC. SYSTAP is a boutique software consultancy focused on providing custom technology services to the federal government and private sector. SYSTAP provides solutions that bridge the gap between real-world, mission-critical customer problems and innovative research, emerging technologies, and open-source software. His work for the last several years has been focused on assessing and applying Semantic Web technologies to support semantics-based federation (mashups) at scale (billions of triples). Mr. Thompson is the founder of the bigdata open source project, which is developing a scale-out database and computing fabric. He is also the founder of the CognitiveWeb – an open source project whose goal is to is to extend human decision horizons by compensating for some intrinsic aspects of selective attention – basically helping people to bridge their separate areas of expertise. He was an active member in the jdbm project for several years, and developed the extensible serialization mechanism used by that project.