I know that some library like the following zlib.nim enable to compress a string.
https://github.com/nim-lang/zip/blob/master/zip/zlib.nim
But I can't write it because the compressed string is not ascii or unicode. Maybe some illegal character contains it. Then, I wonder how to do it. Should I restrict characters to use compressed string?Or should I escape some characters? Both of them makes the size of source code bigger and I want to avoid it...
const s = r"hogehoge" # this is compressed string of non-ascii code
load(s) # I can write it by macro
if it's not valid unicode, your best bet is to store it in a separate file and use staticRead to load it into a string at compile-time.
const s = staticRead("myfile.bin")
load(s)
If I understand correctly, you would like to include a non-printable byte, like 0x15 (ASCII NAK), in a string literal in source code. If so you can do it using "x" escapes like this:
const s = "hello\x15\x2f"
This does not increases (binary) code size.
You can combine exelotl's method with compression so that the embedded asset takes even less space. A good example is:
https://github.com/guzba/supersnappy/blob/master/examples/compiletime.nim
Thanks for these replies! I am searching exelotl's method.
What I want to do is gather all compressed data(source code) to one .nim files. I am attending AtCoder(programming contest) using Nim. In this contest, submission by only one file is pemitted and cannot submit multiple files. Then, the contest server compile the submit code and run the tests. Since, there is a restriction for the size of submit code(not the size of binary), I want to compress the code.
I want to write the information of zip file in some way (not only by string). I didn't know how to escape non-printable byte in string, now I solved it. But it may makes the code size larger...
I am also finding another way to write them other than raw string.
If your data is somehow mostly ASCII, then the 4x cost of "x1b" might be fine. But compression into base64 is probably ideal, and base64's in the stdlib.
Here is my code. It can be compiled. I want to compile with commented out part which is same operation in compile time. But I couldn't do it even though I put {.compiletime.} pragma...
const libz = "libz.so.1"
type
  Uint* = cuint
  Ulong* = culong
  Ulongf* = culong
  Pulongf* = ptr Ulongf
  Pbyte* = cstring
  Pbytef* = cstring
  Allocfunc* = proc(p: pointer, items: Uint, size: Uint): pointer{.cdecl.}
  FreeFunc* = proc(p, address: pointer){.cdecl.}
  InternalState*{.final, pure.} = object
  ZStream*{.final, pure.} = object
    nextIn*: Pbytef
    availIn*: Uint
    totalIn*: Ulong
    nextOut*: Pbytef
    availOut*: Uint
    totalOut*: Ulong
    msg*: Pbytef
    state*: ptr InternalState
    zalloc*: Allocfunc
    zfree*: FreeFunc
    opaque*: pointer
    dataType*: cint
    adler*: Ulong
    reserved*: Ulong
const
  ZLIB_VERSION = "1.2.11"
  Z_NO_FLUSH = 0
  Z_OK = 0
  Z_STREAM_END = 1
  Z_BUF_ERROR = -5
  Z_NO_COMPRESSION* = 0
  MAX_WBITS = 15
proc inflate*(strm: var ZStream, flush: cint): cint{.cdecl, dynlib: libz, importc: "inflate".}
proc inflateEnd*(strm: var ZStream): cint{.cdecl, dynlib: libz, importc: "inflateEnd".}
proc inflateInit2u*(strm: var ZStream, windowBits: cint, version: cstring, streamSize: cint): cint{.cdecl, dynlib: libz, importc: "inflateInit2_".}
proc inflateInit2(strm: var ZStream, windowBits: cint): cint = inflateInit2u(strm, windowBits, ZLIB_VERSION, sizeof(ZStream).cint)
proc uncompress*(sourceBuf: cstring, sourceLen: Natural): string =
  assert (not sourceBuf.isNil) and sourceLen >= 0
  var z: ZStream
  var d = ""
  var sbytes, wbytes = 0
  z.availIn = 0
  var wbits = MAX_WBITS + 32
  var status = inflateInit2(z, wbits.cint)
  if status != Z_OK: assert false
  while true:
    z.availIn = (sourceLen - sbytes).Uint
    if sourceLen-sbytes<=0: break
    z.nextIn = sourceBuf[sbytes].unsafeaddr
    while true:
      if wbytes >= d.len:
        let n = if d.len == 0: sourceLen*2 else: d.len*2
        if n < d.len: discard inflateEnd(z); assert false
        d.setLen(n)
      let space = d.len - wbytes
      z.availOut = space.Uint;z.nextOut = d[wbytes].addr;status = inflate(z, Z_NO_FLUSH)
      if status.int8 notin {Z_OK.int8, Z_STREAM_END.int8, Z_BUF_ERROR.int8}:discard inflateEnd(z);assert false
      wbytes += space - z.availOut.int
      if not (z.availOut == 0):break
    if (status == Z_STREAM_END):break
  discard inflateEnd(z)
  if status != Z_STREAM_END:assert false
  d.setLen(wbytes)
  swap result, d
proc uncompress*(sourceBuf: string):string = uncompress(sourceBuf, sourceBuf.len)
import base64
const s = "eJxLTc7IV1DySM3JyVcIzy/KSVECADp4BiA="
#static:
#  var sd = s.decode
#  echo uncompress(sd)
var sd = s.decode
echo uncompress(sd)
If you need it to be in 1 file, maybe you could try something like this?
#[
testing, put your dirty unprintable string here
]#
const s = staticRead("main.nim")[3..49]
echo s
Output:
testing, put your dirty unprintable string here
The compressed code is embedded into the block comment. The program staticRead's itself and takes a slice to get only the stuff inside the comment. You are then free to decompress it and use it to generate the real code at compile time.
Wow! This method is fantustic!
Thank you very much!
well, I thought it was a neat concept but I actually don't think it's necessary. It seems like you can just straight up put invalid unicode in a string literal.
const s = """
�����������
"""
echo s
 works for me. I filled out the string with FFFFF... in a hex editor, which I believe is invalid. The compiler doesn't care, and successfully compiles it anyways.I tried your code. Actually string with FFFFF... was not failed. But unfortunetely, the other case of general string output by zip was failed. Your method of comment out by #[ ]# also failed...
Moreover, I noticed these code with invalid char cannot paste to this thread.
The compiler doesn't like something, maybe it's the null bytes?
I had to
echo "echo \"Hello, World\"" | gzip | base64
#H4sIAAAAAAACA0tNzshXUPJIzcnJ11EIzy/KSVHiAgBv5X7/FAAAAA==
import macros
macro exec(s:static string):untyped = s.parseStmt
exec:
  staticExec "cut -c 2- in.nim | base64 -d | zcat"
Would cheating by using external utilities (zcat, cut, base64) work?
Thank you! I tried it in the code test in AtCoder environment and found it works! I felt writing decompression code in nim makes the submit source code a bit larger. So, your method of calling external utilities is better!
By the way your bash code in StaticExec outputs the following message in addition to nim code. I am studing commands to fix them.
base64: invalid input
echo "Hello, World"
gzip: stdin: decompression OK, trailing garbage ignored