Does any one know of a method or library for sorting lists of UTF8 strings in Nim?
And ... before someone send me a link to the sort library. I know how to sort things in Nim. :-) And, I know how to do a blind rune codepoint sort of the strings.
But that is NOT the same thing as ordering string per the Unicode specifications as being in alphabetical order. See https://unicode.org/reports/tr10/.
Think A a à B... as being the correct means of sorting letters in a spelling dictionary.
Think A a à B... as being the correct means of sorting letters in a spelling dictionary.
I think that you are right, but how should the sorting order of A a à ä á B look like? Because I don't think that these special characters are in alphabetical order.
@srd,
You are right, it is a very non-trivial task. I'm suspecting 200+ hours of work to write. The sorting algorithm would be established by a very complex and convoluted comparator.
It has been done in C and Java in a library called ICU here: https://github.com/unicode-org/icu
Haskell has a native implementation. And python has a ICU wrapper. I think I saw some JS and perl attempts but did not look closely.
Just didn't know if something has been done for Nim.
@Yardinico , I will look at that nimble library. Perhaps the db import has done half the work.
BTW, I did just learn that the basic data that would be used by a comparator is a public text file at http://www.unicode.org/Public/UCA/latest/allkeys.txt
You are right, it is a very non-trivial task. I'm suspecting 200+ hours of work to write.
I think that it is probably 200++ hours ;) I would just use c2nim with ICU.