Hi,
This is the first time I've truly dived into using the FFI in Nim. I'm currently working on modifying the illwill library to support some non-ASCII character input in its getKey function, but I've run into some unpredictable behavior regarding calling _kbhit on Windows systems via the FFI. Boiling it down to a minimally reproducible example, if we have the following Nim code:
import strutils
proc kbhit(): cint {.importc: "_kbhit", header: "<conio.h>".}
proc getch(): cint {.importc: "_getch", header: "<conio.h>".}
echo("\xD0\xB4")
while true:
var keys = kbhit()
if keys != 0:
echo(keys, " keys were hit.")
for idx in 0..<keys:
var ch = getch()
echo("key ", idx, " is 0x", toHex(ch))
\\xD0\\xB4 is the UTF-8 representation for the Cyrillic character д. This character prints fine via echo when run in cmd.exe. Similarly, pressing common keys works fine; you get the output that a key was hit and the hex value of its ordinal representation. I can also copy-paste traditional ASCII characters into the terminal just fine and get the appropriate output. When I try to paste in the д character, however, my terminal crashes. The terminal window completely closes, and the process that was hosting it is stranded running in the background and requires manual intervention to kill.
I know unicode in terminals is a particularly painful experience in Windows, so I don't doubt I could be doing something wrong on that front, although right now I'm not sure what. I did ensure that I tried this after changing to code-page 65001, and running cmd.exe with a /u just for good measure. Additionally, outside of the context of the Nim program (but in the same terminal it was run from), I can copy-paste the д character and have it display just fine.
I also wrote some functionally equivalent (manually crafted) C-code:
#include <conio.h>
#include <stdio.h>
int main(int argc, char** argv)
{
while (1)
{
int keys = _kbhit();
if (keys != 0)
{
printf("%d keys were hit\n", keys);
for (int idx = 0; idx < keys; idx++)
{
int ch = _getch();
printf("Key %d is 0x%x\n", idx, ch);
}
}
}
return 0;
}
And this C-code doesn't choke near as badly on the input. It reads the character as \x3F which I don't believe is right, but it at least doesn't horribly crash the program. The differing behavior is particularly confusing to me.
Am I doing something wrong with Nim FFI here? Or have I simply fallen victim to the many pitfalls that come with non-ASCII in Windows terminals?
I was using the default C compiler on Nim (which should default to MinGW if I'm not mistaken). I tried compiling the equivalent manually crafted C code in MinGW and MSVC and got the same result where it more or less functioned appropriately, unlike the Nim counterpart.
I didn't mention it in the post (forgot) but I also tried calling _getch twice and it made no difference; according to both my echo statements and a debugger, it's _kbhit that hangs/crashes, and it hangs on the first iteration (before any of the _getch calls), so I don't believe that's the issue. Truthfully I'm a little unsettled to see different behavior via Nim FFI as I do with equivalent C calls to the same functions, as it makes me question the stability of doing this, but for my use case I can investigate just using the Windows API functions specifically, which I believe would entail ReadConsoleInput and GetNumberOfConsoleInputEvents.
Hm. I'm running on a x64 Win10 box as well and even with that modification crash or hang the terminal. Perhaps it has something to do with terminal settings...
In either case, I suppose to do this reliably I'd need to use the Windows API functions as mentioned.