nimforum mirror - Compiler Reading

jyapayne (orginal) [2016-06-12T04:13:29+02:00] view original

This is a question that Dom and Araq will probably be best to answer, but anyone is welcome to answer if they have some recommendations :)

Let me first say that I regard the Nim compiler to be one of the best compilers that I've seen. It's extremely fast and efficient and seems to be well modularized in terms of compilation stages (from what little I know of compilers).

One of my dreams as a software developer is to one day write a compiler for a language of my own. However, I've googled a lot and come up with many book suggestions by people who may or may not have actually written a compiler for a programming language, and I'm a bit overwhelmed with regards to which one to start reading. I've also tried looking at the source for a few small languages, but without context, it's a bit hard to read.

So my question to you, Araq and Dom, is this: Which books led to the development of Nim, and which books would you recommend studying for a beginner learning how to make an efficient compiler? Also, maybe you could share some major stumbling blocks that you hit that might be useful for a newbie to know about?

mbaulch (orginal) [2016-06-12T14:19:30+02:00] view original

The "Dragon book" provides a wide coverage of classical techniques for compiler and interpreter construction.

You mentioned that source for small languages is "hard to read [without context]". Perhaps you should look at an even smaller implementation. Google for "tiny lisp interpreter". You'll find them in plenty of languages--with implementations as small as 100 lines. Play with them. Write some programs in their language. Modify the interpreter. Add a small feature to the language. Change the syntax. Implement a basic type system.

Bored with interpreters? Design a simple bytecode for one of them. 'Compile' by dumping the AST into a file. Work backwards. Replace the AST components by more primitive instructions. Maybe this is too boring or tedious? Study a 'tiny' compiler that translates from high-level (eg. C) source through every stage to machine code. Of course, dealing with low-level architecture involves a certain level of complexity. If this is too much for now, perhaps a simple compiler translating to C, or a simpler machine-code (or 'virtual' machine-code) like the JVM would be appropriate. Alternatively, Forth is an easy language to interpret and compile. Why not study an implementation, or try building your own?

It's hard to know exactly where to point you. Hopefully this helps a bit. If not, perhaps explaining your goals a bit more would help others give you more targeted advice. Compiler construction projects vary immensely.

jyapayne (orginal) [2016-06-12T17:40:01+02:00] view original

The "Dragon book" provides a wide coverage of classical techniques for compiler and interpreter construction.

I've heard of the dragon book :) Everyone seems to recommend that one, but is it friendly for a beginner to start learning how to build compilers? What I'm interested in is the books that led to the Nim compiler, and also some practical books that I can use to get my feet wet in building a relatively simple compiler first. I realize that building a complete compiler for a complex language takes years and I want to eventually build one, but starting simple is probably the best course of action.

You mentioned that source for small languages is "hard to read [without context]". Perhaps you should look at an even smaller implementation. Google for "tiny lisp interpreter". You'll find them in plenty of languages--with implementations as small as 100 lines. Play with them. Write some programs in their language. Modify the interpreter. Add a small feature to the language. Change the syntax. Implement a basic type system.

Thank you for your advice. I found a couple lisps that are interpreted and should be understandable in under 100 lines :)

I will look at the tiny C compiler as well later on, when I have a bit more of a deeper understanding of things.

Your response did help me. Thank you!

Araq (orginal) [2016-06-12T20:16:50+02:00] view original

Books that I have fond memories of:

Modern Compiler Implementation in Java/ML/C by by Andrew W. Appel

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

I found "the dragon book" mostly useless and others agree with me it's really quite dated by now. One thing to watch out is that Wirth's way of doing things really does not scale: He doesn't use an AST and so once you get to the point of "how do I implement generics" you'll see it doesn't work without an AST. (Ok, ok, you could also do "token replays" but who does that?)

Random notes:

Don't have "up" pointers in your AST nodes. Makes sharing impossible. The AST should be acyclic for easy traversal.

Have a single concrete Node type, no hierarchies.

Use a single intermediate representation (IR) for everything. Nim uses ASTs.

While ASTs are relatively simple to design, type graphs are not. The representation in the Nim compiler sucks quite a bit but then so does any other representation, IMO. Types are inherently cyclic so for every traversal you need to remember some visited set. But you can try to break up the cycles in your representations.

Ensure the graphs (ASTs, symbols, types) you need to operate on use as few memory as possible. Memory equals speed on modern CPUs (nothing beats if your data can fit L1 cache)

Keep the number of passes to a minimum, every pass over the graph is expensive (flushes the caches).

In the codegen phases, don't produce strings or ropes (this is something Nim got wrong too): The code generation should be a pass from IR to IR and the resulting IR needs to be downright trivial to stringify / write to a file.

Minimize the amount of static data (not re-entrant). This is something the Nim compiler doesn't do and as a result using Nim as a library still sucks quite a bit. Later refactorings will eliminate the problem.

Introduce an ID mechanism for your nodes (can be enabled with -d:useNodeIds) so that you can debug more easily: Pointer suck as their values change from run to run.

Parser and lexer generators are not important: Lexing and parsing are the easiest steps in a compiler and if you cannot even do that manually, there is little hope you can deal with the latter compilation phases.

Don't write stack based interpreters, write register based interpreters: Produce much better code out of the box, dispatching is faster, the mapping from instructions to line information is easier to do when every instruction is of the same size, no need for peephole optimizations.

Don't quit after the first error (IDE tools need to continue!) and don't use exceptions either (Nim does use exceptions to some extend). It's hard to ensure that the tool did as much as it could when exceptions are allowed to bubble up the call stack. Instead have an error node kind that marks wrong subtrees.

Beware of recursions: Every recursion not guarded by some mechanism means the compiler can crash on malicious input. Usually a simple recursion limit counter will do.

jyapayne (orginal) [2016-06-12T20:45:35+02:00] view original

@Araq, this is awesome! It's just what I was looking for. I will buy a copy of Modern Compiler Implementation and see how I go from there. I'll also take a look at Wirth's publication watching out for the scale issues. What is it that you liked from Wirth?

I'm sure your notes here will be of great use to me later on, so thank you very much! It's interesting to learn some of the internal structures of Nim :)

bobg (orginal) [2016-06-12T21:59:39+02:00] view original

Prof Niklaus Wirth was involved in a number of different languages - Pascal, Modula, Oberon and others.

To add to your programming knowledge, you could click around on "C.A.R. Hoare" who did the CSP series of languages while head of the Programming Research Group at Oxford. Also David May of Univ of Bristol who did the language Occam - a very interesting language - including PAR for parallel sections, CHAN for communication channels, PROTOCOL, etc. It uses the same indented syntax as Nim.

jyapayne (orginal) [2016-06-13T01:23:31+02:00] view original

@bobg, sounds interesting! I will definitely check out Hoare and the Occam language. Thank you!

mbaulch (orginal) [2016-06-13T04:48:52+02:00] view original

@jyapayne > I've heard of the dragon book Everyone seems to recommend that one, but is it friendly for a beginner to start learning how to build compilers?

I agree with @Araq that it's outdated in some ways. Many years ago, keeping a complete AST in memory wasn't practical--and the book reflects this. I disagree that it's useless; but as a quick way to pickup the basics, it has a flaw or two. Appel looks to be a good alternative. A lot of overlap with dragon, but more modern. Also, I'd have loved to learn compilers with ML!

bpr (orginal) [2016-06-13T07:15:33+02:00] view original

That's an interesting set of notes and observations about the Nim compiler.

As I recall, Appel's books do use Yacc and Lex for parser construction, whilst Wirth tends to favor hand coded recursive descent. Many real world compilers favor hand coded recursive descent too, including the GNAT Ada compiler, which had some of the best error messages in any compiler I'd used. IMO, it's better to just learn the fairly small set of tricks around hand coded parsers (like operator precedence parsing) that work in all languages and be done with all the hassles of learning generators.

Rust is in the process of having another IR (the MIR) to it's phases, the previous IR just being the AST I think. The rationale is given here.

I hadn't given much thought to using the NIm compiler as a library, but that sounds great. I guess after Nim 1.0, making the compiler reentrant will be a priority?

euant (orginal) [2016-06-13T17:05:01+02:00] view original

Not exactly a book on compiler design, but rather one on the topic of Domain Specific Languages that is a great read is Martin Fowler's Domain Specific Languages. Another good read on the same topic is Terrence parr's Language Implementation Patterns - Create Your Own Domain-Specific and General Programming Languages. Again, not exactly related to compilers, but they do touch on similar topics and Fowler's book especially is extremely well written.

Javi (orginal) [2016-06-13T18:10:27+02:00] view original

@jyapayne The best book I've read about compiler construction was "Engineering a Compiler" by Keith Cooper, really good, very clear and concise, and explaining the latest techniques. It has some points where I think the authors should go a bit deeper, for example the NFA to DFA subset construction and the Hopcroft's and Brzozowski's DFA minimization algorithms, but in general is 9/10.

There are also old books very good as: "Brinch Hansen on Pascal Compilers" by B. Hansen and "A Small C Compiler" by J. Hendrix. And of course the classic tutorials "Let's Build a Compiler" by Crenshaw. I remember I downloaded these tutorials in the '90s from a BBS ; ) http://compilers.iecc.com/crenshaw

Not directly related, but a good companion for this subject is also "Linkers and Loaders" by John Levine.

And do not forget that you have tons of free resources, for example: https://www.youtube.com/watch?v=sm0QQO-WZlM&list=PLFB9EC7B8FE963EB8 http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/index.html http://web.stanford.edu/class/archive/cs/cs143/cs143.1128

Etc, etc...

jyapayne (orginal) [2016-06-14T16:58:58+02:00] view original

@Javi

The best book I've read about compiler construction was "Engineering a Compiler" by Keith Cooper, really good, very clear and concise, and explaining the latest techniques. It has some points where I think the authors should go a bit deeper, for example the NFA to DFA subset construction and the Hopcroft's and Brzozowski's DFA minimization algorithms, but in general is 9/10.

Your description makes me want to read that one, as well :) Another one to add to my now growing list :P

I'll be sure to check out your other suggestions as well :) Thank you!

jibal (orginal) [2016-06-15T05:15:21+02:00] view original

What I'm interested in is the books that led to the Nim compiler

Keep in mind that reading a book is just one part of experience. Someone like Araq will have read many books, articles, on-line discussions, looked at other compilers, thought a lot, and experimented. Don't expect to just sit down and write something like the Nim compiler, which took years to get to its present state, from scratch.

jyapayne (orginal) [2016-06-15T06:32:58+02:00] view original

Keep in mind that reading a book is just one part of experience. Someone like Araq will have read many books, articles, on-line discussions, looked at other compilers, thought a lot, and experimented. Don't expect to just sit down and write something like the Nim compiler, which took years to get to its present state, from scratch.

I'm aware of this :) I mostly just want to get into compilers, and the best way I thought of to do that is with a book that describes how to create one. I am fully aware that Araq has spent 10+ years creating the Nim compiler and has probably amassed a great wealth of knowledge related to the subject from a large amount of sources. I only wanted a little glimpse of where to start and thought it would be best to ask a compiler pro :)

gokr (orginal) [2016-06-15T09:54:55+02:00] view original

I just wanted to mention the Spry (sprylang.org) language I am implementing in Nim. It may be interesting to look at depending on where you want to go. It is one of those "minimalistic" languages, the parser is very simple since the grammar is extremely simple in Spry. It's not even recursive - iteration with some simple state worked fine. Insanely fast.

Then the interpreter runs on the AST directly, it's not stack nor register based I guess. Unfortunately it's still designed as "recursive" in the sense that a call in Spry executes as a call in Nim - which leads to the Nim call stack being a mirror of the Spry call stack. That's not good - as Andreas pointed out to me - better to use a "stackless" design, which I intend to move to. That probably gives better performance, doesn't build two stacks in memory, and also enables funny stuff like coroutines, stack manipulations like continuations etc.

Spry is only 1600-ish lines, half parser half interpreter. And the language is already fairly advanced with closures, non local returns etc. I am on both IRC and gitter if you wish to know more. :)

And no, Spry is no compiler, but... it may be a good start with an interpreter if you want to learn and have some kind of "testbed for thoughts". Then of course, if you find Spry interesting - just join up! :)

gokr (orginal) [2016-06-15T10:01:49+02:00] view original

And oh, Clement Bera who is working on the Cog JIT Smalltalk VM is a very sharp guy with a good style of writing. He is creating a book for implementing virtual machines:

https://clementbera.wordpress.com/2016/04/10/vm-learning-call-stack-management/

I haven't read it myself, but it may be of interest. There are tons of interesting articles on his blog. The Cog VM is very advanced with competitive performance and is moving towards adaptive optimization (Sista).

jyapayne (orginal) [2016-06-16T04:54:26+02:00] view original

@gokr

I just wanted to mention the Spry (sprylang.org) language I am implementing in Nim. It may be interesting to look at depending on where you want to go. It is one of those "minimalistic" languages, the parser is very simple since the grammar is extremely simple in Spry. It's not even recursive - iteration with some simple state worked fine. Insanely fast.

Looks pretty cool :) I will definitely check it out!

And oh, Clement Bera who is working on the Cog JIT Smalltalk VM is a very sharp guy with a good style of writing.

Probably a good read further down the line :) Thank you!

oyster (orginal) [2017-05-22T08:41:13+02:00] view original

there is a book "Writing An Interpreter In Go" (https://interpreterbook.com/). Someone is learning it by implementing the interpreter in rust

alfrednewman (orginal) [2017-07-21T02:35:54+02:00] view original

Sounds interesting stuff: https://leanpub.com/create_languages

Mirror of forum.nim-lang.org

2309 :: Compiler Reading