nimforum mirror - string of compressed source code

chaemon (orginal) [2022-08-01T17:40:51+02:00] view original

I want do the following

compress a code A and write it to some other code B as a string or something

decompress the string and load it by macro

I know that some library like the following zlib.nim enable to compress a string.

https://github.com/nim-lang/zip/blob/master/zip/zlib.nim

But I can't write it because the compressed string is not ascii or unicode. Maybe some illegal character contains it. Then, I wonder how to do it. Should I restrict characters to use compressed string?Or should I escape some characters? Both of them makes the size of source code bigger and I want to avoid it...

const s = r"hogehoge" # this is compressed string of non-ascii code
load(s) # I can write it by macro

exelotl (orginal) [2022-08-01T17:47:42+02:00] view original

if it's not valid unicode, your best bet is to store it in a separate file and use staticRead to load it into a string at compile-time.

const s = staticRead("myfile.bin")
load(s)

auxym (orginal) [2022-08-01T17:54:24+02:00] view original

If I understand correctly, you would like to include a non-printable byte, like 0x15 (ASCII NAK), in a string literal in source code. If so you can do it using "x" escapes like this:

const s = "hello\x15\x2f"

This does not increases (binary) code size.

EyeCon (orginal) [2022-08-01T19:22:06+02:00] view original

You can combine exelotl's method with compression so that the embedded asset takes even less space. A good example is:

https://github.com/guzba/supersnappy/blob/master/examples/compiletime.nim

chaemon (orginal) [2022-08-03T17:35:07+02:00] view original

Thanks for these replies! I am searching exelotl's method.

What I want to do is gather all compressed data(source code) to one .nim files. I am attending AtCoder(programming contest) using Nim. In this contest, submission by only one file is pemitted and cannot submit multiple files. Then, the contest server compile the submit code and run the tests. Since, there is a restriction for the size of submit code(not the size of binary), I want to compress the code.

I want to write the information of zip file in some way (not only by string). I didn't know how to escape non-printable byte in string, now I solved it. But it may makes the code size larger...

I am also finding another way to write them other than raw string.

jrfondren (orginal) [2022-08-03T20:42:21+02:00] view original

Your options are:

representing binary data directly with escape codes: "x1b". Four bytes for eight bits of data, 32/8=4x

since it's all binary data, drop the x, "1b". Two bytes for eight bits of data, 16/8=2x

use base64. One byte for six bits of data, 8/6=1.3x

If your data is somehow mostly ASCII, then the 4x cost of "x1b" might be fine. But compression into base64 is probably ideal, and base64's in the stdlib.

chaemon (orginal) [2022-08-06T10:14:26+02:00] view original

Thanks! I implemented by 3. . But I noticed that importc seems not to be able to use in copile time with the error "cannot 'importc' variable at compile time; inflateEnd"

jrfondren (orginal) [2022-08-06T10:26:11+02:00] view original

Indeed, even when Nim compiles through C, Nim's compiletime happens before C's compiletime, so you have use let instead of const, etc., for imported values.

chaemon (orginal) [2022-08-06T11:50:16+02:00] view original

Here is my code. It can be compiled. I want to compile with commented out part which is same operation in compile time. But I couldn't do it even though I put {.compiletime.} pragma...

const libz = "libz.so.1"
type
  Uint* = cuint
  Ulong* = culong
  Ulongf* = culong
  Pulongf* = ptr Ulongf
  Pbyte* = cstring
  Pbytef* = cstring
  Allocfunc* = proc(p: pointer, items: Uint, size: Uint): pointer{.cdecl.}
  FreeFunc* = proc(p, address: pointer){.cdecl.}
  InternalState*{.final, pure.} = object
  ZStream*{.final, pure.} = object
    nextIn*: Pbytef
    availIn*: Uint
    totalIn*: Ulong
    nextOut*: Pbytef
    availOut*: Uint
    totalOut*: Ulong
    msg*: Pbytef
    state*: ptr InternalState
    zalloc*: Allocfunc
    zfree*: FreeFunc
    opaque*: pointer
    dataType*: cint
    adler*: Ulong
    reserved*: Ulong
const
  ZLIB_VERSION = "1.2.11"
  Z_NO_FLUSH = 0
  Z_OK = 0
  Z_STREAM_END = 1
  Z_BUF_ERROR = -5
  Z_NO_COMPRESSION* = 0
  MAX_WBITS = 15
proc inflate*(strm: var ZStream, flush: cint): cint{.cdecl, dynlib: libz, importc: "inflate".}
proc inflateEnd*(strm: var ZStream): cint{.cdecl, dynlib: libz, importc: "inflateEnd".}
proc inflateInit2u*(strm: var ZStream, windowBits: cint, version: cstring, streamSize: cint): cint{.cdecl, dynlib: libz, importc: "inflateInit2_".}
proc inflateInit2(strm: var ZStream, windowBits: cint): cint = inflateInit2u(strm, windowBits, ZLIB_VERSION, sizeof(ZStream).cint)
proc uncompress*(sourceBuf: cstring, sourceLen: Natural): string =
  assert (not sourceBuf.isNil) and sourceLen >= 0
  var z: ZStream
  var d = ""
  var sbytes, wbytes = 0
  z.availIn = 0
  var wbits = MAX_WBITS + 32
  var status = inflateInit2(z, wbits.cint)
  if status != Z_OK: assert false
  while true:
    z.availIn = (sourceLen - sbytes).Uint
    if sourceLen-sbytes<=0: break
    z.nextIn = sourceBuf[sbytes].unsafeaddr
    while true:
      if wbytes >= d.len:
        let n = if d.len == 0: sourceLen*2 else: d.len*2
        if n < d.len: discard inflateEnd(z); assert false
        d.setLen(n)
      let space = d.len - wbytes
      z.availOut = space.Uint;z.nextOut = d[wbytes].addr;status = inflate(z, Z_NO_FLUSH)
      if status.int8 notin {Z_OK.int8, Z_STREAM_END.int8, Z_BUF_ERROR.int8}:discard inflateEnd(z);assert false
      wbytes += space - z.availOut.int
      if not (z.availOut == 0):break
    if (status == Z_STREAM_END):break
  discard inflateEnd(z)
  if status != Z_STREAM_END:assert false
  d.setLen(wbytes)
  swap result, d
proc uncompress*(sourceBuf: string):string = uncompress(sourceBuf, sourceBuf.len)

import base64

const s = "eJxLTc7IV1DySM3JyVcIzy/KSVECADp4BiA="
#static:
#  var sd = s.decode
#  echo uncompress(sd)

var sd = s.decode
echo uncompress(sd)

exelotl (orginal) [2022-08-06T13:20:05+02:00] view original

If you need it to be in 1 file, maybe you could try something like this?

#[
testing, put your dirty unprintable string here
]#

const s = staticRead("main.nim")[3..49]

echo s

Output:


testing, put your dirty unprintable string here

The compressed code is embedded into the block comment. The program staticRead's itself and takes a slice to get only the stuff inside the comment. You are then free to decompress it and use it to generate the real code at compile time.

chaemon (orginal) [2022-08-06T13:30:06+02:00] view original

Wow! This method is fantustic!

Thank you very much!

exelotl (orginal) [2022-08-06T14:52:28+02:00] view original

well, I thought it was a neat concept but I actually don't think it's necessary. It seems like you can just straight up put invalid unicode in a string literal.

const s = """
�����������
"""

echo s

works for me. I filled out the string with FFFFF... in a hex editor, which I believe is invalid. The compiler doesn't care, and successfully compiles it anyways.

chaemon (orginal) [2022-08-06T17:44:00+02:00] view original

I tried your code. Actually string with FFFFF... was not failed. But unfortunetely, the other case of general string output by zip was failed. Your method of comment out by #[ ]# also failed...

Moreover, I noticed these code with invalid char cannot paste to this thread.

shirleyquirk (orginal) [2022-08-06T23:50:27+02:00] view original

The compiler doesn't like something, maybe it's the null bytes?

I had to


echo "echo \"Hello, World\"" | gzip | base64

#H4sIAAAAAAACA0tNzshXUPJIzcnJ11EIzy/KSVHiAgBv5X7/FAAAAA==
import macros
macro exec(s:static string):untyped = s.parseStmt
exec:
  staticExec "cut -c 2- in.nim | base64 -d | zcat"

Would cheating by using external utilities (zcat, cut, base64) work?

chaemon (orginal) [2022-08-07T09:05:43+02:00] view original

Thank you! I tried it in the code test in AtCoder environment and found it works! I felt writing decompression code in nim makes the submit source code a bit larger. So, your method of calling external utilities is better!

By the way your bash code in StaticExec outputs the following message in addition to nim code. I am studing commands to fix them.


base64: invalid input
echo "Hello, World"

gzip: stdin: decompression OK, trailing garbage ignored

Mirror of forum.nim-lang.org

9337 :: string of compressed source code