avatarharuki zaemon

The Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!)

Shared by

A thoroughly accessible and enjoyable read, despite the dry topic.

Nikita Prokopov:

Unicode is a standard that aims to unify all human languages, both past and present, and make them work with computers.

[…]

What’s sad for us is that the rules defining grapheme clusters change every year as well. What is considered a sequence of two or three separate code points today might become a grapheme cluster tomorrow! There’s no way to know! Or prepare!

Even worse, different versions of your own app might be running on different Unicode standards and report different string lengths!

[…]

Unfortunately, Unicode is not a perfect system, and it has many shortcomings. Among them is assigning the same code point to glyphs that are supposed to look differently, like Cyrillic Lowercase K and Bulgarian Lowercase K (both are U+043A)

[…]

Overall, yes, Unicode is not perfect, but the fact that

  1. an encoding exists that covers all possible languages at once,
  2. the entire world agrees to use it,
  3. we can completely forget about encodings and conversions and all that stuff

is a miracle. Send this to your fellow programmers so they can learn about it, too.