Unicode Support Shootout: The Good, the Bad, the Mostly Ugly

Location: Portland 255
Average rating: ***..
(3.91, 11 ratings)

How does Unicode support stack up across major platforms, including Java, Perl, Python, Ruby, and more? Who’s doing the best job, and who’s failing miserably? Is anyone doing a good job? Does anyone actually implement to standard, and to what extent? I’ll compare the major platforms to separate the losers from the not-so-losers.

It’s been my personal hell to find out the answers to these questions in my day-to-day work mining very large Unicode-only corpora. I’ll share my tales of woe and suffering, including my struggles with JDK7, along with a regex-rewriting library I’ve developed to enhance Java regexes’ Unicode sensitivity.

Photo of Tom Christiansen

Tom Christiansen


Tom Christiansen is a programmer, author, and lecturer who’s been
involved with Perl since its initial public release back in 1987. Tom is
the owner of the PERL.COM domain and website, and original author of much
of Perl’s online documentation. Tom is lead author of the
The Perl Cookbook and co-author of Programming Perl, Learning Perl
(2nd edition), and Learning Perl on Win32 Systems, all bestselling titles by
O’Reilly & Associates.

He served two terms on the USENIX Association Board of Directors, and was
president of The Perl Journal. Perl users selected Tom to receive the
first White Camel Award in 1999. In 2000, Members of the Open Source
community voted Tom Best Newbie Helper in the first annual Andover.Net
Slashdot Open Source Community Awards, to honor Open Source pioneers.

Tom holds a Masters degree in Computer Science from the University of
Wisconsin – Madison with a dual specialization in operating systems design
and in computational linguistics. He previously received his Bachelors
degree there in Spanish and Computer Science with minor fields of study in
French, Mathematics, and Music. Tom has lived abroad in England and in
Spain, where he studied Romance Philology, café solo, and vino tinto.

Residing at the western edge of Boulder, Colorado, Tom is an amateur
naturalist who spends most of his summer hiking and camping high in the
wilderness well above 10,000 feet of elevation, wandering about the vast
Colorado Plateau, or relaxing under the glittering kaleidoscope of the
Black Rock Desert’s starkly featureless playa. Over the past five years,
Tom has become especially interested in how the exciting growth of
affordable digital photography has opened up to mere mortals dramatic
artistic opportunities previously possible to only the most dedicated
and persistent of professional photographers, and often not even to them.

Comments on this page are now closed.


Picture of Peter Banka
Peter Banka
07/29/2011 5:43am PDT

Didn’t really enjoy the scattered stream-of-consciousness approach. Started late due to lack of preparedness.

Arvind Jayaprakash
07/28/2011 12:24pm PDT

Great to see one person talk about multiple porgramming languages in all earnest and not just trolling. Doubly great given that it comes from a father figure of one language. And yes, it was really informative.

Ilia Cheishvili
07/28/2011 11:14am PDT

I doubt that there is anyone that knows more about Unicode than Tom. Great presentation.