nimforum mirror - New nim llvm genererator

arnetheduck (orginal) [2016-01-19T15:43:24+01:00] view original

For llvm aficionados, I've started playing with an llvm IR generator for Nim here: https://github.com/arnetheduck/nlvm - join in if you like!

Angluca (orginal) [2016-01-20T07:50:22+01:00] view original

Looks like so cool.

Araq (orginal) [2016-01-20T10:07:35+01:00] view original

I forked it, awesome work! In the longer run, I don't think it's too much work to keep up with upstream as the AST interface between the backends and the frontend rarely changes (for better or worse) as we're heading to v1.0.

arnetheduck (orginal) [2016-01-20T13:42:10+01:00] view original

Thanks!

It was changing pretty much when I did the bulk of it, and reusing semi-private stuff from the c generator probably didn't help either - though tbh I hope that it keeps changing - every cleanup done will make it easier for the next fellow that has to dive in there.

Mainly, I just don't want to spend time on checking which changes are benign and which are not for now. I guess having a test suite would help in that area.

> rarely changes

I guess that diminishes the chances of \\n ever behaving like a normal \\n, cross-platform? ;)

brianrogoff (orginal) [2016-01-20T15:29:14+01:00] view original

Wow, this is just great! I'm looking forward to having this merged in. The new year is starting off well for Nim.

Varriount (orginal) [2016-01-20T15:51:03+01:00] view original

@arnetheduck Currently, \n is symbolic for the system's native newline representation. If you want a carriage return or line feed, use \r or \l, respectively.

Regarding nlvm, does it work on Windows?

Orion (orginal) [2016-01-20T16:59:39+01:00] view original

This is very cool! keep up the good work!

arnetheduck (orginal) [2016-01-22T10:07:15+01:00] view original

@Varriount, yep, that's the issue, right there - an annoying little detail that you have to remember whenever switching (or converting) to Nim from any other language out there, where \n has a pretty much standardized meaning.

as to running on windows, I suspect not, as-is. there's probably a number of things to fix, like int being of a different size.

Varriount (orginal) [2016-01-22T13:19:27+01:00] view original

@arnetheduck Well I just compiled LLVM + LibLLVM on my Windows computer, so I should be able to help with that.

Incidentally, if anyone else need to compile LibLLVM on their computer, the CMakefile dictating how to generate export symbols needs to be modified if you're running Mingw/Msys. The Cygwin check must be removed and the path building logic modified so that nm is given a valid Windows path (not a Posix one).

Araq (orginal) [2016-01-22T13:59:44+01:00] view original

@Varriount, yep, that's the issue, right there - an annoying little detail that you have to remember whenever switching (or converting) to Nim from any other language out there, where n has a pretty much standardized meaning.

As long as I don't receive real bug reports about this "issue" I won't change anything. All I see right now is pure ignorance about how Nim's "IO subsystem" (hint: for the runtime there are only binary files) works. That the codegen in Nim uses $n rather than \n has completely different reasons. (Real reason: When producing C code for Visual Studio and Borland only CR-LF used to work and you can produce the C code for these compilers on a Linux machine so detecting the current OS to produce "proper newlines" is wrong.)

Araq (orginal) [2016-01-26T22:34:13+01:00] view original

The major benefit is better debugging support since we're not restricted to non-clashing C names.

kashyap (orginal) [2016-01-27T03:49:59+01:00] view original

Thanks ... I get the optimization bit - but then again - I'd imagine that this is the kind of optimization that one could not get in C even and one would need to do some assembly (inline or perhaps a whole function or a set of functions). It may not be wrong to say that, a situation where an a substantial piece of code would need that level of control is rare.

If I understand right, the non-clashing C names restriction you are referring to is from the fact that the obj contains "generate symbols" that are not human understandable? In theory, I believe it is not a blocker - I mean, it is possible to write a tool to update the symbol/source info in the object files correct :)

It is just that I've never had a pleasant experience with compiling g++ or clang for that matter (especially on non-linux platforms) - so anything that brings in a C++ dependency makes me question - is this absolutely necessary :)

arnetheduck (orginal) [2016-01-27T13:45:57+01:00] view original

@kashyap: There are several advantages - in it's current form (when using clang - it's similar with other c compilers though, they tend to have their own IR), the chain goes like : nim -> clang -> llvm -> machine code. By generating directly to llvm, one step is cut out.

Since you can compile C code to llvm, in theory you can generate the exact same llvm code directly from nlvm as if you were generating c code and compiling that with clang. Now, some constructs in nim have a more succinct ("better") representation in llvm-ir (because when generating C code, you're bound by the rules of the C language, in addition to those of llvm IR, indirectly), so it follows that code generated by llvm will be at least as good, but typically better. True, you could replace hot spots with inline asm, but with nlvm you, the end user, get those improvements "for free" without having to resort to pesky tricks.

The improvements range from better optimization, better debug information, smaller executables, dwarf ("zero-cost") exception handling etc, to the compilation itself being faster (since we're doing one translation step less).

There are of course disadvantages as well - chiefly that it's much less portable, it's harder to take advantage of tricks like reusing C header files (in fact, C interop in general is more difficult to get right(. also, more people know C, so to get an idea of what's going on "under the hood" when exploring nim, looking at generated C code is typically easier.

cdunn2001 (orginal) [2016-01-27T18:40:09+01:00] view original

What about incremental compile-times? Is the direct LLVM path faster or slower than generating C code first? Are there long term plans to make it all faster?

Stefan_Salewski (orginal) [2016-01-27T18:50:17+01:00] view original

term plans to make it all faster?

What is the problem. You can write the code during day and compile over night :-)

Or do you want to compile after each typed character? Then a scripting language may be the better choice.

Sorry, but in another thread someone just complained about long 20 seconds compile time of C++.

For me that really is not bad, and Nim is much better. When learning a language, we may compile more often of course, but then we have a few lines of code only.

kashyap (orginal) [2016-01-27T19:26:24+01:00] view original

Thanks for the explanation @arnetheduck

Although I am a little too biased against C++ to appreciate any advantage involving it :) - just kidding.

Regarding portability - Once I have nlvm on my desired host platform I can always generate all the targets executables that llvm supports. Perhaps it may not be such a big limitation after all.

arnetheduck (orginal) [2016-04-04T16:37:54+02:00] view original

There - nlvm can now compile itself (provided that a few upstream patches are applied - thanks Araq for merging the ones I've posted so far) ;)

Test results: "total": 1118, "passed": 911, "skipped": 30

Failures generally fall into these categories:

tester tries to run nim js which the nlvm driver doesn't support (emscriptem sounds like a better fit for nlvm anyway) - couldn't find a trivial way to disable them from being run

standard library stuff that relies on C headers which nlvm cannot parse or whose Nim definitions don't match the C abi

no GC support (funny enough, most GC tests succeed without a GC, but I don't have boehm installed so those fail when linking)

statements uses as expressions - nlvm makes a distinction between expressions and statements (for convenience), but the c generator mostly does not - for example, sometimes an nkTryStmt is expect to return a value - these are rare though

type bugs - tyRef, tyVar, tyPointer, tyPtr oh my! with tyGeneric and tyRange on top...

compiling as library - not supported

try it out and let me know if it works ;)

dom96 (orginal) [2016-04-04T20:30:52+02:00] view original

Wow. Impressive progress!

I'm trying to build it now. Some observations (will update as they come):

The command nice make -j$(nproc) was too much for my MacBook.

After make compare: could not load: libLLVM-3.7.so. Might be worth mentioning that the .so needs to be installed in the readme. Also, judging by the message I guess as of yet, there is no OS X support :)

arnetheduck (orginal) [2016-04-05T02:25:47+02:00] view original

For that particular issue, I think if you just add the path where libLLVM is to LD_LIBRARY_PATH is, you'll get one step further - for linux that happens automagically through an RPATH flag - a similar feature exists on OSX but the linker flag to enable it is different afair...

OSX support is probably not so far away, it has a mostly similar ABI, might come down to a few flags and the occasional adjustment for os struct layouts..

Mirror of forum.nim-lang.org

1955 :: New nim llvm genererator