This is a question that Dom and Araq will probably be best to answer, but anyone is welcome to answer if they have some recommendations :)
Let me first say that I regard the Nim compiler to be one of the best compilers that I've seen. It's extremely fast and efficient and seems to be well modularized in terms of compilation stages (from what little I know of compilers).
One of my dreams as a software developer is to one day write a compiler for a language of my own. However, I've googled a lot and come up with many book suggestions by people who may or may not have actually written a compiler for a programming language, and I'm a bit overwhelmed with regards to which one to start reading. I've also tried looking at the source for a few small languages, but without context, it's a bit hard to read.
So my question to you, Araq and Dom, is this: Which books led to the development of Nim, and which books would you recommend studying for a beginner learning how to make an efficient compiler? Also, maybe you could share some major stumbling blocks that you hit that might be useful for a newbie to know about?
The "Dragon book" provides a wide coverage of classical techniques for compiler and interpreter construction.
You mentioned that source for small languages is "hard to read [without context]". Perhaps you should look at an even smaller implementation. Google for "tiny lisp interpreter". You'll find them in plenty of languages--with implementations as small as 100 lines. Play with them. Write some programs in their language. Modify the interpreter. Add a small feature to the language. Change the syntax. Implement a basic type system.
Bored with interpreters? Design a simple bytecode for one of them. 'Compile' by dumping the AST into a file. Work backwards. Replace the AST components by more primitive instructions. Maybe this is too boring or tedious? Study a 'tiny' compiler that translates from high-level (eg. C) source through every stage to machine code. Of course, dealing with low-level architecture involves a certain level of complexity. If this is too much for now, perhaps a simple compiler translating to C, or a simpler machine-code (or 'virtual' machine-code) like the JVM would be appropriate. Alternatively, Forth is an easy language to interpret and compile. Why not study an implementation, or try building your own?
It's hard to know exactly where to point you. Hopefully this helps a bit. If not, perhaps explaining your goals a bit more would help others give you more targeted advice. Compiler construction projects vary immensely.
The "Dragon book" provides a wide coverage of classical techniques for compiler and interpreter construction.
I've heard of the dragon book :) Everyone seems to recommend that one, but is it friendly for a beginner to start learning how to build compilers? What I'm interested in is the books that led to the Nim compiler, and also some practical books that I can use to get my feet wet in building a relatively simple compiler first. I realize that building a complete compiler for a complex language takes years and I want to eventually build one, but starting simple is probably the best course of action.
You mentioned that source for small languages is "hard to read [without context]". Perhaps you should look at an even smaller implementation. Google for "tiny lisp interpreter". You'll find them in plenty of languages--with implementations as small as 100 lines. Play with them. Write some programs in their language. Modify the interpreter. Add a small feature to the language. Change the syntax. Implement a basic type system.
Thank you for your advice. I found a couple lisps that are interpreted and should be understandable in under 100 lines :)
I will look at the tiny C compiler as well later on, when I have a bit more of a deeper understanding of things.
Your response did help me. Thank you!
I found "the dragon book" mostly useless and others agree with me it's really quite dated by now. One thing to watch out is that Wirth's way of doing things really does not scale: He doesn't use an AST and so once you get to the point of "how do I implement generics" you'll see it doesn't work without an AST. (Ok, ok, you could also do "token replays" but who does that?)
Random notes:
@Araq, this is awesome! It's just what I was looking for. I will buy a copy of Modern Compiler Implementation and see how I go from there. I'll also take a look at Wirth's publication watching out for the scale issues. What is it that you liked from Wirth?
I'm sure your notes here will be of great use to me later on, so thank you very much! It's interesting to learn some of the internal structures of Nim :)
Prof Niklaus Wirth was involved in a number of different languages - Pascal, Modula, Oberon and others.
To add to your programming knowledge, you could click around on "C.A.R. Hoare" who did the CSP series of languages while head of the Programming Research Group at Oxford. Also David May of Univ of Bristol who did the language Occam - a very interesting language - including PAR for parallel sections, CHAN for communication channels, PROTOCOL, etc. It uses the same indented syntax as Nim.
@jyapayne > I've heard of the dragon book Everyone seems to recommend that one, but is it friendly for a beginner to start learning how to build compilers?
I agree with @Araq that it's outdated in some ways. Many years ago, keeping a complete AST in memory wasn't practical--and the book reflects this. I disagree that it's useless; but as a quick way to pickup the basics, it has a flaw or two. Appel looks to be a good alternative. A lot of overlap with dragon, but more modern. Also, I'd have loved to learn compilers with ML!
That's an interesting set of notes and observations about the Nim compiler.
As I recall, Appel's books do use Yacc and Lex for parser construction, whilst Wirth tends to favor hand coded recursive descent. Many real world compilers favor hand coded recursive descent too, including the GNAT Ada compiler, which had some of the best error messages in any compiler I'd used. IMO, it's better to just learn the fairly small set of tricks around hand coded parsers (like operator precedence parsing) that work in all languages and be done with all the hassles of learning generators.
Rust is in the process of having another IR (the MIR) to it's phases, the previous IR just being the AST I think. The rationale is given here.
I hadn't given much thought to using the NIm compiler as a library, but that sounds great. I guess after Nim 1.0, making the compiler reentrant will be a priority?
@jyapayne The best book I've read about compiler construction was "Engineering a Compiler" by Keith Cooper, really good, very clear and concise, and explaining the latest techniques. It has some points where I think the authors should go a bit deeper, for example the NFA to DFA subset construction and the Hopcroft's and Brzozowski's DFA minimization algorithms, but in general is 9/10.
There are also old books very good as: "Brinch Hansen on Pascal Compilers" by B. Hansen and "A Small C Compiler" by J. Hendrix. And of course the classic tutorials "Let's Build a Compiler" by Crenshaw. I remember I downloaded these tutorials in the '90s from a BBS ; ) http://compilers.iecc.com/crenshaw
Not directly related, but a good companion for this subject is also "Linkers and Loaders" by John Levine.
And do not forget that you have tons of free resources, for example: https://www.youtube.com/watch?v=sm0QQO-WZlM&list=PLFB9EC7B8FE963EB8 http://www.diku.dk/hjemmesider/ansatte/torbenm/Basics/index.html http://web.stanford.edu/class/archive/cs/cs143/cs143.1128
Etc, etc...
@Javi
The best book I've read about compiler construction was "Engineering a Compiler" by Keith Cooper, really good, very clear and concise, and explaining the latest techniques. It has some points where I think the authors should go a bit deeper, for example the NFA to DFA subset construction and the Hopcroft's and Brzozowski's DFA minimization algorithms, but in general is 9/10.
Your description makes me want to read that one, as well :) Another one to add to my now growing list :P
I'll be sure to check out your other suggestions as well :) Thank you!
What I'm interested in is the books that led to the Nim compiler
Keep in mind that reading a book is just one part of experience. Someone like Araq will have read many books, articles, on-line discussions, looked at other compilers, thought a lot, and experimented. Don't expect to just sit down and write something like the Nim compiler, which took years to get to its present state, from scratch.
Keep in mind that reading a book is just one part of experience. Someone like Araq will have read many books, articles, on-line discussions, looked at other compilers, thought a lot, and experimented. Don't expect to just sit down and write something like the Nim compiler, which took years to get to its present state, from scratch.
I'm aware of this :) I mostly just want to get into compilers, and the best way I thought of to do that is with a book that describes how to create one. I am fully aware that Araq has spent 10+ years creating the Nim compiler and has probably amassed a great wealth of knowledge related to the subject from a large amount of sources. I only wanted a little glimpse of where to start and thought it would be best to ask a compiler pro :)
I just wanted to mention the Spry (sprylang.org) language I am implementing in Nim. It may be interesting to look at depending on where you want to go. It is one of those "minimalistic" languages, the parser is very simple since the grammar is extremely simple in Spry. It's not even recursive - iteration with some simple state worked fine. Insanely fast.
Then the interpreter runs on the AST directly, it's not stack nor register based I guess. Unfortunately it's still designed as "recursive" in the sense that a call in Spry executes as a call in Nim - which leads to the Nim call stack being a mirror of the Spry call stack. That's not good - as Andreas pointed out to me - better to use a "stackless" design, which I intend to move to. That probably gives better performance, doesn't build two stacks in memory, and also enables funny stuff like coroutines, stack manipulations like continuations etc.
Spry is only 1600-ish lines, half parser half interpreter. And the language is already fairly advanced with closures, non local returns etc. I am on both IRC and gitter if you wish to know more. :)
And no, Spry is no compiler, but... it may be a good start with an interpreter if you want to learn and have some kind of "testbed for thoughts". Then of course, if you find Spry interesting - just join up! :)
And oh, Clement Bera who is working on the Cog JIT Smalltalk VM is a very sharp guy with a good style of writing. He is creating a book for implementing virtual machines:
https://clementbera.wordpress.com/2016/04/10/vm-learning-call-stack-management/
I haven't read it myself, but it may be of interest. There are tons of interesting articles on his blog. The Cog VM is very advanced with competitive performance and is moving towards adaptive optimization (Sista).
@gokr
I just wanted to mention the Spry (sprylang.org) language I am implementing in Nim. It may be interesting to look at depending on where you want to go. It is one of those "minimalistic" languages, the parser is very simple since the grammar is extremely simple in Spry. It's not even recursive - iteration with some simple state worked fine. Insanely fast.
Looks pretty cool :) I will definitely check it out!
And oh, Clement Bera who is working on the Cog JIT Smalltalk VM is a very sharp guy with a good style of writing.
Probably a good read further down the line :) Thank you!