How to Write Compilers and Optimizers (and solve Data Transformation Problems)

Shevek . (Nebula)
Sponsored Sessions
Location: E143
Average rating: ****.
(4.27, 11 ratings)

A presentation and discussion on building compilers and optimizers for data transformations or domain-specific languages. After this presentation, any programmer should be able to write a basic compiler, and people with prior knowledge will be able to write more advanced optimizing compilers.

Compilation, loosely described, is the automated transformation of one piece of data into another; and the optimizer, the frontend’s partner in crime, is a key decision-maker in these transformations.

A good compiler-optimizer pair can solve a map-reduce scheduling problem, a timetabling problem, or even lay out the desks in an office; and the compiler-optimizer itself can be written in under 500 lines of code. (Compiling C to binary takes about 5,000.)

Yet we treat these techniques as arcane and shy away from them in everyday code; we write serializers and greedy algorithms by hand, often with poor performance, inconsistent results or badly understood corner cases.

We can fix that!

This is a huge topic, and the presentation is based on a wealth of experience of writing compilers, in different languages, for different platforms, with very different underlying behaviours. There will be considerable opportunity for the audience to guide the talk through a selection of available sub-topics.

Most of the really interesting problems are in the optimizer, which is probably why it isn’t often discussed or taught. A good optimizer can ensure the success of a business, can fit 20 more customers onto a piece of hardware, or can make your program run an order of magnitude faster. Making the wrong decision can be worse than making no decision, especially in an automated system.

  • Basic local optimizations.
  • The problem with local optimization.
  • Global optimization.
  • Algebraic techniques.
  • NP-Hard problems and solution in generality.
  • Modeling for CSP and SAT.
  • Existing libraries.

There are also some interpreter-related party-tricks, like direct-threading, which can be used to scare people, but do make code run faster!

Applications and Front-End
We must be able to integrate our new techniques and tools into our workflow. We will discuss building incremental compilers, techniques for error management, reporting and recovery, and debugging – often touted as twice as hard as development.

During development of a program, the input is more often invalid than not, and a good frontend must handle invalid input elegantly, produce useful and appropriate error messages, and not fail at the first error. Even if we are just building a serializer, we may prefer to use compiler-building tools to construct it. The available front-end tools have reached such a level of maturity that they generally produce the most performant, reliable and correct solutions to many problems.

  • Lexing and parsing of text.
  • Avoiding shift-reduce conflicts and ambiguity.
  • Handling non-textual input, such as graph languages.
  • Available tools.

Also Available
It’s too much for one presentation, but let’s see where we end up!

  • Language Design: What makes a good language for describing a problem?
  • Backends – custom interpreter, existing VM, or hardware?
  • Parallel Systems – a personal favourite at the moment.
  • Worked Examples – shall we compile C, for example?

This session is sponsored by Nebula

Photo of Shevek .

Shevek .


Shevek is an expert programmer with a strong interest in parallel and
distributed systems. He has worked on cutting edge research in compilers
and language design, algorithmic optimization, systems and security. He
is capable of maintaining a very straight face under questioning on
topics including “Why is our printer playing ‘happy birthday’?” or “What
is that message doing on the side of that building?” He received a
Doctorate in Computing on the Formalization of Protection Systems from
the University of Bath, England. He also holds a Masters in Pure
Mathematics and an epee.

Comments on this page are now closed.


Picture of Rob Reilly
Rob Reilly
07/20/2012 11:22am PDT


My printer doesn’t play anything and you can see from my picture, I’m actually smiling (sort of). My wife accuses me of being way too poker-faced.

I suspect you can answer why there’s a message on a building, perhaps from a technical or philosophical standpoint, of course. Non-techies do frequently ponder those things, and sometimes need insight.

Nevertheless, sorry I didn’t get to meet you. Definitely wanted to inquire about your attire.

Maybe next OSCON.

I hope you had a productive conference.

Best regards,



For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or

View a complete list of OSCON contacts