I want to print the Unicode NAMES related to whitespace and control characters (ASCII code < 14) e.g 'SPACE', 'NO-BREAK SPACE', HORIZONTAL TAB, etc.
# non-working Python code :-)
import unicodedata, string
for e in string.whitespace + unicodedata.lookup("GREEK SMALL LETTER ALPHA"):
print(ord(e))
print(unicodedata.name(e))
# this Python works:
str_whitespace = string.whitespace
print(ascii(str_whitespace)) # ' \t\n\r\x0b\x0c'
print(str_whitespace.encode()) # b' \t\n\r\x0b\x0c'
How to make it work in Nim programming language? Do I have to work with bytes and find the equivalent rune and eventually unicode name?
This works fine in Nim:
echo Whitespace
But how to print the Unicode names of these characters?
When I enter 1 in "runeAt()" - I get error
Hint: /Volumes/T5/bin/discard_with_comment [Exec]
/Volumes/T7/Nim/_my_code_ex/discard_with_comment.nim(7) discard_with_comment
/Volumes/T5/.choosenim/toolchains/nim-1.4.8/lib/pure/unicode.nim(80) runeAt
/Volumes/T5/.choosenim/toolchains/nim-1.4.8/lib/system/fatal.nim(49) sysFatal
Error: unhandled exception: index 1 not in 0 .. 0 [IndexDefect]
Error: execution of an external program failed: '/Volumes/T5/bin/discard_with_comment '
Here is a pointer in Python: https://stackoverflow.com/questions/68153407/printing-unicode-character-names-e-g-greek-small-letter-alpha-instead-of/68158208#68158208
I am looking for this output:
U+0020 SPACE
U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
U+000B LINE TABULATION
U+000C FORM FEED
U+03B1 GREEK SMALL LETTER ALPHA
well "/".runeAt(1) is defintely a runtime error since there is a single rune in that string.
the following does what you want but apparently unicodedb misses names for first 16 codepoints (it could be probably intended as a possible issue of the lib):
import strutils
import unicode
import unicodedb/names
template echoRune(n: int) =
let name = Rune(n).name
if name.len > 0:
echo "U+", toHex(n, 4), " ", name
for n in 0 .. 0x2A:
echoRune(n)
echo "..."
echoRune(0x03B1)
output:
U+0020 SPACE
U+0021 EXCLAMATION MARK
U+0022 QUOTATION MARK
U+0023 NUMBER SIGN
U+0024 DOLLAR SIGN
U+0025 PERCENT SIGN
U+0026 AMPERSAND
U+0027 APOSTROPHE
U+0028 LEFT PARENTHESIS
U+0029 RIGHT PARENTHESIS
U+002A ASTERISK
...
U+03B1 GREEK SMALL LETTER ALPHA