Skip to main content

JavaScript ♥ Unicode

Average rating: *****
(5.00, 7 ratings)

This presentation discusses the various ways JavaScript relies on Unicode, what the consequences are for JavaScript developers, and how ECMAScript 6 will make our lives a bit easier in this regard.

First off, I’ll explain the basics of Unicode. Once that’s out of the way, I’ll touch on the subject of character encodings, only to determine the internal character encoding that JavaScript uses internally (which is kind of a mixture between UCS-2/UTF-16).

Then I’ll explore the various consequences of JavaScript exposing “characters” according to UCS-2/UTF-16, and explain why that is so problematic. I have many real-world examples of this behavior causing bugs in various “top 100” websites, such as Twitter.

Finally, I’ll present robust ECMAScript 5-compatible workarounds to the issues encountered, and explain how ECMAScript 6 will make it easier to support full Unicode in JavaScript strings and regular expressions.

Photo of Mathias Bynens

Mathias Bynens

Qiwi

Mathias Bynens (@mathias) is a freelance web developer from Belgium. He likes HTML, CSS, JavaScript, Unicode, performance, and security.