Recently I became intensely interested in how programming languages are implemented. While most people will never have the need to write a production ready programming language I'm finding writing a language a really fun exercise and that its given me a basis for understanding unexpected behavior or limitations of the existing languages I use every day.
Besides the basics these are the two more ambitious / experimental features we're going to try to implement:
- Cloud native: With the advent of 'infrastructure as code' it is now possible to declaratively express the necessary infrastructure required to deploy distributed systems. By involving a compiler in this process it can potentially be made more accessible to software devs unfamiliar with IaC and remove some of the manual effort required to provision cloud resources. Basically, given a program and some config I'd like to be able to 'compile' its infrastructure. Imagine if you could deploy a aws lambda function automatically by adding a @lambda annotation to it for example?
- Built in Relational algebra - I'd like to add built in querying capabilities to ligature, similar to what you'd typically see in something like sql. Interfacing with databases is a common but can sometimes be clunky. Worst case this just becomes a language level ORM but I think there are some potential advantages. There have been a couple experiments at making sql dialects that behave more like sql but I'd like to go the other way and make a programming language that has features of sql. Imagine if you could write stored procedures in a functional programming language or query in memory data structures the same way you'd query a database.
By bringing these two usually separate functions together inside a single tool Im hoping that it could make it significantly simpler to write distributed data systems and is metaphorically the reason I decided to call the language Ligature.
Trying to go from raw text to an executable program is non trivial so traditionally its broken down into smaller, more manageable steps. Note that I personally found the official academic distinctions became much more useful after working the problem a bit but its useful to have a mental model for where we are in the process. For our purposes we will break this process down into 4ish steps.
Using a tool called a parser generator (in this case peg.js) we define a grammar and from it generate a parser. The parser transforms the raw text string representing our program and turns it into a intermediate form (this is a recurring theme) called An AST. An AST or Abstract syntax tree is just like it sounds, a recursive data structure that will make our program easier to work with in future steps.
Input: PEG Grammar + Ligature script
Output: Ligature AST
Input: Ligature AST
now that we've derived valid js from Ligature code we'll take a quick detour an provide a simple way of running arbitrary Ligature code from the command line:
$ ligature ./helloWorld.ls
For some projects stopping here should be totally fine but if you want to take it to the next level or just need lower level control over the performance characteristics of your language compilation is the next step. To make this step easier we'll use a tool called LLVM. LLVM website is retro to say the least but its a useful tool with a number of good resources online to describe how to use it. The primary reason we're using it is that it will alow us to run Optimized Ligature on multiple machines and has a couple of nice features such as supporting JIT compilers and can compile to web assembly which is cool.
LLVM takes 'LLVM intermediate representation' and is capable of compiling our language so that its independently runnable on various platforms(mac, pc, web assembly etc)
Input: Ligature AST
Output: LLVM IR -> Assembly or Web assembly
I was recently inspired by one of Allen Holub tweets:
Its a relatively simple observation about product development but its surprisingly easy to forget and one that I realized I've ignored before. Writing a language from scratch could easily suffer from the same problem - we implement all of the parser then the transpiler then give up after reading 5 different conflicting blog posts about llvm ir...
Instead we're going to implement each of the following features, in vertical slices down to the transpiler level as thats the quickest way to produce something that runs. Each of these features will constitute a separate article and I'll update the links below as I work though them. Not everyone need will want / need to compile down to assembly and it adds significant technical complexity so for some features I may treat the compilation as a separate optimization step:
- Intro to Peg.js
- Assignment and basic math
- Objects / Records
- Type system
- Partial evaluation of Operators
- Async abstractions (Algebraic effects??)
- Relational algebra operators
- Infrastructure annotations
Quality of life improvements:
- Readable error messages
- Assertions (testing)
- Basic Module system
- language server / syntax highlighting
If there are any features you'd like to see added to ligature leave a comment below!
This list is not final and may change.
We'll be using Peg.js extensively for parsing so I'd recommend checking out my Intro to Peg.js if you've never used it before.
I'll be releasing the first article on ligature: Assignment and basic math shortly so stay tuned!
If enjoyed it let me know by applauding the article and follow me on youtube and twitter to stay updated on all my latest content.