Clojure: Full Stack Data Science

Nerdcore
Location: Thames Suite Level: Intermediate
Tags: 20min
Average rating: *****
(5.00, 2 ratings)

I’d like to talk about how I, as a startup founder, am using Clojure to build and deploy data science products. The aspect I’d like to focus upon is how Clojure allows me to grow an idea from a pure research notion to a hardened production product using a single codebase, which is unusual.

Mathematical software is hard to create because at some point somebody has had to consider each of many specialist levels of abstraction: from matrix condition number, to algorithmic convergence down to off-by-one errors. Rare is the programmer who can do all of that, so specialist languages that provide good maths and stats abstractions: R, MATLAB, Mathematica, exist. However, the compromise is that they are terrible in production in that they are slow and insular.

As a result in most commercial environments where research drives product, such as banks and hedge funds (where I have quite a bit of experience) the research team work in one language to do the research in a ‘mathsy’ language. That is then handed off to a programming team who reconstruct the code in a production language like C# or C++. This fracture causes severe pain and error.

Clojure has solved this problem for me, allowing me perform the entire process from research to production in one language. As its a lisp I am able to easily build DSLs for my problem, and the benefits of FP with which the audience is familiar accrue. What is unusual is that because its a JVM language I don’t have the usual FP problem of a lack of libraries or platform power. Clojure has a native R-like environment, Incanter, which is great for data exploration. I can pull up jBLAS for linear algebra. At the same time I can easily interface with databases, message queues, web servers or any other datasource from the ‘real world’. Then I can very easily bolt a web server onto the same codebase and expose it to production systems. This single language, end-to-end process really speeds my development process.

Bringing production practices into the research process has also been tremendously beneficial. Notions such using logging, contracts and assertions in my functions and having unit tests and continuous integration for research code, none of which are really supported in R/Matlab/Mathematica, has brought many rewards.

Having an entire research and an entire production stack available simultaneously, and being able to work in one language and platform at all times, has been hugely beneficial to me and my company, and I think the audience would enjoy and find it interesting to hear about it.

Photo of Edmund Jackson

Edmund Jackson

Cambridge Data Science

Edmund is a data scientist starting up a quantitative hedge fund. He is armed with a PhD in statistical signal processing and well tempered enthusiasm for functional programming.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com or +1 (707) 827-7148

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.