The world of data is inherently diverse and “messy”. As a data scientist, you have to work with numerous different tools and languages to get all the data you need in a usable form, before you can even start doing the interesting part of your job. Wouldn’t it be nice if your programming language was aware of the external data sources that you are accessing?
In this talk, we look at doing data science with F#, which is an open-source and cross-platform programming language that provides unique way of integrating external data sources and tools into a single environment. This means that you can access data, but also Matlab scripts or R statistical and visualization packages, all from a single environment. Interactive coding with rich editor support is the F# way of communicating your ideas and exploring your Hadoop Hive, third-party REST services, as well as open-government data in CSV, XML or even HTML formats.
In a live coding part of the talk, you’ll see how one can combine data from Hadoop, CSV and JSON-based REST services, explore and visualize the dataset interactively and create a transparent and reproducible research report documenting the work.
F# has been used heavily in the finance and insurance industries, but is gaining traction in other areas including bioinformatics. This talk looks at the recent F# open-source libraries and tools for data-science, developed in collaboration by the speaker at University of Cambridge, the open-source F# community and industrial partners such as BlueMountain Capital.
Tomas is a computer scientist, book author and open-source developer. He is the lead developer of several F# data-science libraries (Deedle and F# Data), but he also contributed to the design of the F# language itself as an intern and independent consultant. He is the author of a popular book called “Real-World Functional Programming” and is currently editing a collection of practical F# case studies.
Tomas is a PhD student at the University of Cambridge, working on types for understanding context usage in programming languages. He is a founder of DualNotion ltd. where he provides training and consulting services. He recently spent 3 months in New York, working on F# tools for data science at BlueMountain Capital.
For exhibition and sponsorship opportunities, email firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World contacts
©2015, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.