The Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!)
Shared by Simon HarrisA thoroughly accessible and enjoyable read, despite the dry topic.
Unicode is a standard that aims to unify all human languages, both past and present, and make them work with computers.
[…]
What’s sad for us is that the rules defining grapheme clusters change every year as well. What is considered a sequence of two or three separate code points today might become a grapheme cluster tomorrow! There’s no way to know! Or prepare!
Even worse, different versions of your own app might be running on different Unicode standards and report different string lengths!
[…]
Unfortunately, Unicode is not a perfect system, and it has many shortcomings. Among them is assigning the same code point to glyphs that are supposed to look differently, like Cyrillic Lowercase K and Bulgarian Lowercase K (both are
U+043A)[…]
Overall, yes, Unicode is not perfect, but the fact that
- an encoding exists that covers all possible languages at once,
- the entire world agrees to use it,
- we can completely forget about encodings and conversions and all that stuff
is a miracle. Send this to your fellow programmers so they can learn about it, too.