nimforum mirror - unicode support

1596 :: unicode support

[2015-08-31T16:39:45+02:00]

View Orginal

liulun (orginal) [2015-08-31T16:39:45+02:00] view original

I am in Chinese windows 10 environment.

When I write Nim code which contains Chinese word,

I must change the code file to ANSI or GB2312 encoding,

Then my program work exactly.

But when my program access my DataBase

which is utf-8 encoding, and contains chinese word

unreadable code appeared.

Is this an issue?

Please help me,Thank you!

OderWat (orginal) [2015-08-31T17:44:20+02:00] view original

I am just guessing but your code really should be UTF-8 only and you may need to adjust the console codepage to utf-8 using this chcp 65001. But beware... really just guessing!

Varriount (orginal) [2015-08-31T23:36:32+02:00] view original

As an addition, there's been some talk about having echo() encode its parameters to the local code page. Python does this, but most other languages do not (so this is hardly a deficiency specific to Nim).

liulun (orginal) [2015-09-01T02:25:16+02:00] view original

Please,think about it for us(Chinese,Japanese,Korean).

When you write a Nim GUI program,

you must change your code file to ANSI encoding,

and you can not read your data by DataBase tools.

liulun (orginal) [2015-09-01T02:29:35+02:00] view original

°¢빵ٷ�

This is the data I copy from "SQLite Expert"

This data is writen by my Nim GUI program

_tulayang (orginal) [2015-09-01T03:01:21+02:00] view original

Chinese too. You should really try sublime text 3. I use sublime text to write c, js, scheme, nim, css, html, ... in Ubuntu 14.04 CHN. I also feel that Chinese support seems to be some problem in windows os.

Varriount (orginal) [2015-09-01T03:56:29+02:00] view original

Again, this is not a problem specific to Nim. It's due to how the Windows console handles encodings, combined with the fact that string literals are encoded in UTF-8. Similar problems arise in Java, C, and C++. The only language that I know of that mitigates this specific problem is Python, and that's because Python's print() procedure automatically tries to encode its input using the console encoding. Even then, there are problems.

Edit: I'll see if I can whip up a uniEcho procedure tomorrow in my free time. It won't be efficient, but it'll do. There are various strategies, but the foolproof ones are complicated.

hibernating (orginal) [2015-09-01T04:49:55+02:00] view original

This is because Windows' console does not support unicode (even with wprintf() of C-runtime). You should use UTF-8 for your source code, and when you have to output text to console, convert it from UTF-8 to your current non-unicode charset, i.e GBK for ur situation.


While the problem still exists, since the ``parseInt`` proc is broken in Nim's module ``parseutils``, and the module ``encodings`` u need here depends on it, so u will be stuck ...

EDIT: Oh, no, the parseInt thing is really a tinycc's problem, I changed the backend to gcc-mingw and it works.

hibernating (orginal) [2015-09-01T05:13:21+02:00] view original

I'll give a example here:

import "encodings"

var ce = getCurrentEncoding()
echo "current encoding name: ", ce

var enc = encodings.open(ce, "UTF-8")
echo enc.convert("你好，中文！")

Above code will produce:


d:\test\nim\nim02.exe
current encoding name: gb2312
你好，中文！

d:\test\nim>

Note that, the encodings module does not support charset name GBK, use gb2312 or GB18030 instead if hard-coded.

Varriount (orginal) [2015-09-01T07:23:18+02:00] view original

@hibernating The encodings module doesn't get the console code page, it gets the system code page.

Araq (orginal) [2015-09-01T08:08:35+02:00] view original

Nim save user data, which contains Chinese word, into database (windows 10 environment) these words are unreadable in the DB.

Which DB do you use?

liulun (orginal) [2015-09-01T09:22:44+02:00] view original

Web Code(MySql DB):

#file encoding UTF-8
#nim c -r test.nim
import jester, asyncdispatch,  db_mysql
var conn = db_mysql.open("******","******","******","******")
routes:
    get "/":
        var sql = sql("insert into student (title) values (?)")
        conn.exec(sql,"测试中文")   #this data in db--------->      æµ‹è¯•ä¸æ–‡
        sql = sql("select title from student  order by id desc limit 0,1")
        var str = conn.getValue(sql)
        resp str  #this data in web page--------->      娴嬭瘯涓枃

runForever()
db_mysql.close(conn)

I will show my GUI test code a moment later.

liulun (orginal) [2015-09-01T09:33:59+02:00] view original

GUI Code(No DB)

# nim c -r --app: gui  test.nim
# file encoding UTF-8
import  iup
discard iup.open(nil, nil)
message("测试一下中文", "中文")   #unreadable words
close()

Araq (orginal) [2015-09-01T10:02:21+02:00] view original

https://dev.mysql.com/doc/refman/5.0/en/charset-applications.html indicates that UTF-8 is not the default.

For your IUP example you need to call storeGlobal("UTF8MODE", "YES") but I couldn't get it to work, most likely because I don't have a Unicode build of iup.dll.

liulun (orginal) [2015-09-01T10:19:33+02:00] view original

My MySql charset is UTF-8.

storeGlobal("UTF8MODE", "YES") Yes!This is useful!Fix it!Thank you!

hibernating (orginal) [2015-09-01T10:39:54+02:00] view original

@liulun

IUP supports UTF8 since version 3.x.

Plus, u should set another global attribute to handle file names in Windows:

if $iup.getGlobal("DRIVER") == "Win32":
  iup.setGlobal("UTF8MODE_FILE", "YES")

@Araq

May I ask that is there any way to compare two cstrings except to convert them into strings? Obviously the == operator won't work, and the cmp either.

hibernating (orginal) [2015-09-01T10:50:21+02:00] view original

@Varriount

You're correct, but it doesn't matter for PO's situation, and CJK charsets do not cover each other, plus, the Windows' console does not support outputing text of different codepages at the same time.

jangko (orginal) [2015-09-01T18:10:36+02:00] view original

@liulun

it's a bit strange, I don't have any problem with MySQL + Nim handling UTF-8.

I stored many Chinese words into my database.

I don't even use "SET NAMES UTF8" for MySQL.

also I use "latin1" for my MySQL default charset(unbelievable, but it works)

possible suspect: double conversion?

liulun (orginal) [2015-09-02T03:47:13+02:00] view original

I use innodb/UTF-8, and SET NAMES UTF8.

It is not a problem specific to Nim

Thank you all

Mirror of forum.nim-lang.org

1596 :: unicode support