Bidi Brackets for Dummies
Just like the Hindu-Arabic number system enabled great innovation in mathematics because it was so much easier to use than the Roman system, I wonder if the simplicity of characters in English, enabled the US to jump to such an early start in computer software.
English characters are pretty compact (26 symbols), no accent marks that can't be ignored, have 1:1 mapping between uppercase and lowercase, and are easy to break up by word. This enables even very simple algorithms in a very resource constrained computer to do some work that is mostly right.
For example, splitting a person's name by space, taking the last word, making it uppercase and sorting by ASCII value lexicographically, mostly works if you want to produce a phone book listing, especially in 1950s-1980's America. And you can code this with very simple integer operations without needing a lookup table or a bunch of special rules. Soundex is also pretty simple to implement and deals with a lot of homophones.
Of course, now that we have the computing resources and libraries, handling the vast diversity of human languages is doable, but in terms of bootstrapping a computer software industry, simplicity, I think played a role.
This is surprisingly hilarious for a "Unicode Technical Note." It changed my opinion about the Unicode Consortium—positively!—until I read this:
> These technical notes are independent publications, not approved by any of the Unicode Technical Committees, nor are they part of the Unicode Standard or any other Unicode specification. Publication does not imply endorsement by the Unicode Consortium in any way.
Still, awesome and fun. And I learned something.
> — How about the left angle bracket "<"? Is that at least a "bracket"?
> No.
> — Why not?
> Because it's a LESS-THAN SIGN.
It's petty, and there are bigger problems, but this is one of my main gripes against SGML and its successors.
Ok it is 2020. Let me see if I can use brackets in my text:
This is English with some فارسی(Persian) in it. Persian is also called Farsi(فارسی).
این متن به فارسی )Persian) نوشته شده
It worked but I was very confused and I'm not actually sure if position of (Persian) is right. I put it after the Farsi.
The majority Farsi line is completely butchered. Not sure if it is because this text input doesn't support RTL?
Pronounced Bidi like MIDI, or Bidi like Wifi, or (?)
What I missed in the article, was: why? Why does a text rendering system or encoding conversion system need to care about which brackets are paired?
I tried to find what "Left S-shaped bag delimiter" was used for and all I found was the Wikipedia page listing the symbol among other math symbols: https://en.wikipedia.org/wiki/Miscellaneous_Mathematical_Sym.... Does someone here have an explanation?
Of course mathematicians would also consider brackets used like ]0,1[ and [0,1[ as valid notation for open or half-open intervals. And if you try hard enough you can even make your braces {} look like a ξ and write some monstrosity like {ξ∈[0,1[}.
Parsing brackets is even more entertaining in physics due to the bra-ket notation:
"U+FDE3" should be U+FD3E, ORNATE LEFT PARENTHESIS.
(2014)