Making your own programming language is easier than you think (but also harder)

2026 May 6

In mid-December last year I started making my own programming language. It's waaay far from any production quality yet (though I did manage to write a working 1k LOC Monte-Carlo path tracer in it), but the project is on pause right now, so I figured it's a good time to write something about it.

Disclaimer #1: I'm not a professional PL designer or compiler implementor. Even though I do feel like I know what I'm talking about for the most part of this post, I might still end up talking some nonsense.

Disclaimer #2: it's not another C/C++/Rust/etc killer, and I doubt it'll ever be actually used to any noticeable extent. I'm just having fun and talking about me having fun.

Disclaimer #3: if you have some strong opinions about programming languages, please, keep in mind that I'm not forcing you to use this language, and that it's a bit rude to be telling random people on the internet what they should do. If, on the other hand, you have constructive feedback and suggestions, I'm all ears!

Introduction

Why now?

I mean, most programmers dream of their own perfect programming language. I've been programming for about 17 years, so why did I decide to make a language at this specific point in time?

It just so happens that 3 different things converged in my mind.

Of course, I always wanted to make my own programming language as well. I made a bunch of silly interpreters for some esoteric languages in the past (FALSE is probably my favourite), as well as interpreters for various flavours of lambda calculus, but that doesn't scratch the itch of making a real language, one that is at least somewhat production-oriented and doesn't feel like a toy.
As you might probably know, I'm working on a big game which is highly susceptible to modding, and I've been thinking about how to approach modding since the start of this project. I've analyzed a ton of options, and it just so happens that making a custom programming language is actually one of the simplest solutions.
In December 2025, the amazing Matt Godbolt introduced the Advent of Compiler Optimisations, where he'd post some fun examples of what C++ compilers are capable of, walking through the generated assembly. Apart from this being an excellent series, it really made me want to mess with some assembly once again.

Of course, making a non-toy programming language is a gargantuan endeavour, but somehow after looking at assembly for a few weeks, I felt like it shouldn't be that bad.

Modding

I want to elaborate on the modding thing. Essentially, I have 3 main concerns with respect to modding:

My game is highly simulation-heavy. There are hundreds of thousands of entities simulated via a custom ECS engine. Ideally, I'd want the modding language to be able to just take a bunch of component pointers and iterate over them like you would in a C for loop.
It's hard to control what's going on in mods, so some level of protection for the player would be nice to have. Ideally, I'd want the modding language to be easily sandboxable – i.e. I want to be able to disable all IO and similar stuff with a single switch.
I want modding to be as easy as it can be. Ideally you'd throw a script in a certain folder and there you have it, a mod can be used.

It was somewhat a surprise to me that there doesn't seem to exist a solution satisfying these two requirements. Let's go over common possibilities.

Lua

(or any other JIT-compiled scripting language for that matter). That's a standard choice, but it turns out that it's really hard to sandbox it. Apparently you need to prepend any untrusted Lua code with some kind of prelude that explicitly deletes all known standard library functions that can be used for IO and such. There are even lists of these functions online in the forms of github gists. Even if this probably does work, it doesn't sound like a reliable solution to me.

Furthermore, Lua is a high-level dynamically-typed language that doesn't know anything about C pointers. Bridging ECS entity iteration into it will either force per-entity native \(\leftrightarrow\) Lua \(\leftrightarrow\) native jumps with nonzero overhead, or constructing a Lua array from the native entities, and then deconstructing it back. Either way, this doesn't sound good.

Not to mention that standard Lua and LuaJIT have diverged some versions ago, which might make it extremely confusing both for modders and myself.

C++

There's always the option to make mods "natively". All the iteration problems are gone, but distributing mods becomes a nightmare. If they'd be distributed in binary, I'd have to provide some sort of a dev environment for all platforms, and a centralized storage for binary artifacts. If they would otherwise be distributed as source code, I'd have to bundle a C++ compiler with the game, which are known to be heavy and slow (a basic LLVM installation takes about 10-20 times more disk space than my current version of the game).

Oh, and sandboxing becomes impossible. If you're loading a native DLL which declares and uses int open();, you're doomed – there's basically no way to prevent it from accessing the filesystem, network, etc.

And, – that goes without saying, – even though I personally do enjoy writing C++, I'd rather not force the modders to do that.

All this applies to a bunch of other languages like Rust, by the way.

Please not that while I do put modding as one of the goals for the language, I'm still very much unsure whether I'm going to actually use it this way, and I don't want to over-specialize the language to this use case. As I've said, I'm mostly messing around and having fun.

Design goals

Ok, so what do I want from my programming language? Quite a lot, actually:

Seamless C interop – so that bridging between native game code and modding code would be as simple as a function call
Low level – which is mostly a consequence of having to handle raw arrays of entities
Practical and ergonomic – I want the modders to be able to write code with reasonable ease
Easy sandboxing – for reasons outlined earlier
Small compiler footprint – I don't want to embed a 1Gb compiler into a 50Mb game
Fast compilation – I don't want to force players to wait hours for mod compilation (though this can be partially solved by extensive caching)
Cross-platform for real – I'm fine with supporting only a few widespread desktop platforms and making certain assumptions (like being 64-bit or having IEEE754 support)
Reasonably fast – which is a relatively low bar compared to most dynamic languages
Try not to just recreate C++ – I cannot but acknowledge that C++, being my favourite and primary language for years, has had a heavy influence on my views on programming languages; I really want to try to steer away from it when I can (spoiler: I don't think I succeeded much in that)

Honestly, if I were just making a programming language strictly for fun, I'd start with System F and then iterate from there. But, given the above constraints, that's not really an option.

The language

Let's have a look at what I've come up with. It's a weird blend of C++, Rust, Python, Zig, and maybe a few other languages.

Overview

The working title is pslang, from my pet game engine psemek. It is an imperative, eager-evaluated, call-by-value, low-level programming language with a static, strict and nominal type system. It looks something like this:

func min(x: i32, y: i32) -> i32:
    return if x < y then x else y

struct vec3i:
    x: i32
    y: i32
    z: i32

func apply(f: i32 -> i32, v: vec3i) -> vec3i:
    return vec3i(f(v.x), f(v.y), f(v.z))

func as_array(v: vec3i) -> i32[3]:
    return [v.x, v.y, v.z]

Let's unpack that.

Scoping

As you can see, the language uses indentation-based scoping, mostly so that the language feels somewhat like a scripting language and thus looks more friendly to newcomers. Also there's less visual noise thanks to that.

Right now I'm using tab characters for indentation. Might replace them with spaces later, we'll see.

Each function, loop body, if body, etc creates a new scope. Functions and structs can be defined inside any scope, and they are only visible within that scope. Note that local functions don't have access to variables in the scope they are defined in: they are not closures, the scoping only affects name resolution.

The top-level scope (the one not inside any function) is treated just like any other scope, and it contains the file's entry point, i.e. code that runs when the file is loaded/initialized. It's the equivalent of main(), and allows initializing global variables at module import or writing scripts that simply consist of a sequence of commands to run. (Internally, the top-level scope is wrapped into an anonymous function.)

Primitive types

There are *checks notes* 13 primitive types: bool, 4 signed integer types, 4 unsigned integer types, 3 floating-point types, and unit. The numeric types fit nicely in a table:

i8  i16  i32  i64
u8  u16  u32  u64
    f16  f32  f64

The iNN types are signed integers, the uNN types are unsigned integers, and fNN are floating-point. As you can see, there's no f8 type, as it isn't supported by most desktop CPUs and there isn't a consensus about what 8-bit floating-point even means (afaik there are a bunch of competing standards for that).

f16 isn't useful for most people, but we use it routinely in graphics (for HDR colors, vertex attributes, etc), and not having it in the host language is always a noticeable inconvenience. Most desktop CPUs these days implement IEEE754 f16, so it doesn't really cost me anything to support this type out of the box.

Some people had very strong opinions that I should exclude unsigned types altogether. Having been using specifically unsigned types in graphics and computations my whole life, I simply cannot fathom how that would even work.

Btw all integer arithmetic is two's complement with overflow, no UB here.

The unit type is a bit special. It has a single value called unit(), and it is the formal return type of functions not returning anything. If you omit the return type of a function, it automatically returns unit. If you omit a return statement in the end of such function, it automatically inserts it (otherwise it is an error not to return anything from a non-unit function). It can also be used for opaque pointers, though it's better to create empty structs for that.

Numeric literals

By default, numbers like 10 mean i32. For other sizes, you can use suffixes like 10b (byte), 10s (short) or 10l (long). For unsigned literals, you add a 'u' suffix: 10ub is unsigned byte, 10us is unsigned short, 10u is unsigned 32-bit, and 10ul is unsigned long, i.e. u64.

Floating-point literals, i.e. those with a decimal separator, mean f32 by default. There are suffixes for other sizes as well: 10.0h (half) for 16-bit, and 10.0d (double) for 64-bit. You can't omit the integer or the fractional part and simply put a dot there, like 10. or .5 – you have to spell it in full like 10.0 and 0.5.

Thus, all numeric literals have an unambiguous type.

Arrays

Arrays are built-in first-class types. As opposed to C or C++, you can pass arrays to functions (not pointers to arrays, but whole arrays), return them from functions, assign arrays to each other, etc. Array size is always known at compile-time. They mostly behave like structs having a bunch of fields of the same type.

You declare an array type as i32[5], and you create an array literal as [1, 2, 3, 4, 5]. Of course, arrays support indexing.

Some people argued for syntax like [5]i32. This introduces ambiguities in cases like [5]i32* – is this an array of pointers or a pointer to an array? These can be solved by putting all type modifiers to the left instead of to the right, but I don't really see how this is more readable.

Function types

This is basically what C calls function pointers, but with a cleaner syntax. (a, b, c) -> d is a function type, and you can omit the parentheses if there's only one argument a -> b. Internally, these are regular function pointers, not closures (they don't pass data with them).

Pointers

i32* is a pointer type. Pointers are immutable (like const in C++) by default, a mutable pointer type is declared as i32 mut*.

Pointers are used mostly like in C: you can take an address of a variable &x, or take a mutable pointer &mut x, you can dereference a pointer like *p or use pointer arithmetic *(p + 10).

Structs

Structs are declared using the struct keyword and listing all the fields and their types:

struct string_view:
    size: u64
    data: u8*

You create structs using a built-in function-like constructor string_view(10, data). You access struct field via a dot v.x. If you have a pointer to struct, you can still access its fields with the same dot syntax.

Notice that struct fields have no mutability specifiers: the fields of a mutable object are mutable, the fields of an immutable object are immutable. Also there are no access specifiers, i.e. fields are always public.

That's essentially the whole type system!

Memory layout

All objects have guaranteed memory layout: primitive types have alignment equal to their size (btw, bool takes 1 byte), pointers and function types are always 64-bit (and the same alignment), arrays have the same alignment as their elements, and structs have padding to satisfy alignment requirements. This is mostly to simplify C interop and usage in GPU programming.

Empty types

There are certain types that I call empty. They are actually not empty, but have a single valid value. These are unit and empty structs (those without any fields). I've decided that these types don't occupy memory at all, their size is literally 0 bytes. Passing it to a function does nothing, declaring such a variable does nothing, having such a field doesn't affect struct size, etc. These can be useful as type-level compile-time tags or stuff like that.

I haven't decided on reading/writing such objects via pointers. It might be useful in generic code, but it doesn't actually do anything. For now, I've made pointer arithmetic on such types illegal.

This also means that I don't follow the C++ rule that each object has a unique memory address. We'll see how it goes.

Variables

Variables are declared like let x = 10 if they're immutable, and mut x = 20 if they are mutable (the value can be reassigned). Of course, you can't take a mutable pointer to an immutable variable.

You can explicitly state the variable type like let x: i32 = 10, but you don't have to, as the language is designed in such a way that the type of any expression can be deduced unambigously.

You do have to initialize any variable with something, though.

Functions

Functions are declared like func foo(x: A, y: B) -> C: followed by the function body. If the return type is omitted, it is unit.

All functions adhere to the C ABI native to the platform the code is run on, for C interop (specifically so that functions could be passed as function pointers into C code, as callbacks / ECS systems / etc).

Declaration order

Within a scope (including the top-level scope), functions and structs can be declared in any order, i.e. I can use a struct or a function which is declared later in code. This is mostly because it is simple to implement and saves me from inventing a syntax for forward declarations (which is necessary for mutually recursive functions or data structures).

This could make type inference more complicated by turning it into an equation-solving problem. However, because all functions are required to fully spell their argument and return types, this doesn't cause issues – type inference is still pretty much trivial.

Control flow

There are typical if-else statements like

if x < 10:
    x += 15
else if x > 20:
    x -= 5
else:
    x = 0

and while loops:

while x > 0:
    r *= x
    x -= 1

There are no for loops, which I'll talk about a bit later.

There's also an if-expression, which looks like if A then B else C.

Foreign functions

You can declare a function as being foreign, like so:

foreign func sin(x: f64) -> f64

This means that this function isn't implemented here, but should be linked to elsewhere (e.g. in a dynamic library like libc). Right now, the interpreter just dlsym's such functions from the interpreter executable itself.

This is the primary mechanism for interfacing with the C library and any other third-party library. The raytracer example uses this to compute square roots, write to files, compute timings, and even create threads.

Type casting

There are no implicit type casts, ever. You can cast types manually using the as operator, like (x as f32). All numeric types can be cast to each other, all pointer types can be cast to each other (except turning an immutable pointer into a mutable one), and pointer types can be cast to u64 and back. Notice that nothing can be cast to or from bool.

I am considering to add just one implicit cast, though: from a mutable pointer T mut* to an immutable one T*. I'm still not sure about that.

Operators

Pretty much all the standard operators are here: arithmetics, logical operations, comparisons, etc. One notable thing is that there are & and | as well as && and ||. They all work both for booleans and for integers (bitwise). The difference is that & and | always evaluate both operands, while && and || are short-circuiting (don't evaluate the second operand if the result is known from the first operand).

The arithmetic and comparisons only work for a pair of same-type numeric arguments, i.e. no numeric type promotion takes place.

That's basically all there is in the language right now. It doesn't sound like much, but it already allows one to write real programs with reasonable comfort!

Compiler architecture

I've split the whole project into a set of libraries:

types – the type system definitions
ast – abstract syntax tree definitions & utilities
parser – the parser
ir – intermediate representation
interpreter – the interpreter
jit – the JIT-compiler

The idea is that the interpreter and the compiler are just simple CLI apps using one of these libraries (right now I only have the interpreter in JIT mode, though). If you want to embed the language, you just use the parser + jit libraries.

Parser

For the parser, I took the Bison parser generator. A few people asked for good Bison tutorials and sadly I don't know of any – I simply read the docs to get started.

There is the lexer grammar which specifies individual tokens (keywords, operators, literals, etc), and the parser grammar for the language. The grammar is mostly rather straightforward: a file is a list of statements; a statement can be a function declaration, a control flow operator, a variable declaration, or just an expression; expressions can be a literal, a variable, an operator, a function call, etc, etc.

I did have to fix shift/reduce conflicts in the grammar a few times (it's when the parser can't decide which rule should be used in a certain context). Bison has a lovely -Wcounterexamples command-line flag to show what exact scenario causes the conflict. I have to confess that I'm not a master at writing LALR(1) grammars, and I usually just googled stuff and poked around to solve these conflicts. Most of the time it meant rewriting a grammar like

smth
: A B
| A C

into

smth
: A x

x
: B
| C

I'm using the lalr1.cc Bison skeleton which generates a C++ parser class. By default, Bison generates a C parser with global variables as the parser's state. This might work for single-shot parsers like in a standalone compiler executable, but it doesn't work for e.g. interpreters or game mods (I might want to parse many files in parallel). Generating a C++ class solves this problem.

I've inserted the execution of Bison as a build step in the CMake scripts. It was fairly straightforward, though I didn't manage to put the generated files in a separate directory (I have to include them like "parser.hpp" and not something like <pslang/parser/generated/parser.hpp>).

The output of the parser is a C++ object representing the AST of the parsed file.

Indentation

There's one problem with the parser, and that's my indentations. You see, because of them the grammar isn't actually context-free, because e.g. whether a certain statement belongs to the body of a while loop depends on the number of indentation tokens in front of it!

To solve this, I do something horrible: each line is parsed as a standalone statement together with a number signifying its indentation level. Then, a simple linear pass resolves the scoping simply by looking at the indentation levels. It's hacky but it works and it's really fast, so I'm fine with that. Additionally, this pass makes sure that break and continue statements are only inside loops, return statements are only inside functions (the top-level scope isn't considered a function, even if it's compiled as one), that field definitions are only inside structs, etc.

Type checking

After parsing, there are a few more passes before compilation can take place. The first one simply resolves all identifiers, literally linking the identifier node to the variable/function/struct definition node that this identifier refers to.

Then, the most important pass takes place – the one that checks and infers all types. As I've said earlier, type inference is mostly straightforward, as is type checking: it's just a bunch of conditions based on a specific AST node type. E.g. the type of an expression inside an if or while must be bool, the types of two operands to addition must either be both the same numeric type or one integer and one pointer, etc.

Interpreter

All this is nice and cool, but there's still no way to actually execute the code. That's what the interpreter is for! Or, at least what the first version of the interpreter was for, as currently it is in a very broken state, and I'm planning to rewrite it completely using the IR.

It's a "tree-walking interpreter" (something I didn't even know has a name!). It executes code by literally visiting all required AST nodes and executing the corresponding C++ constructs. Its main functions are exec() and eval(), which can call each other internally as needed. exec() executes a single statement, while eval() computes the value of a single expression and returns it. Because C++ is statically typed, eval() returns a variant of all possible value types in the language (structs are represented as arrays of name-value pairs, one for each field). The interpreter uses the same variant to store variable values.

The main purpose of the interpreter is to provide a simple cross-platform way to run any code in the language, and to aid in debugging both the language implementation and the programs written in it (as I can shove much more validation into the interpreter). It's not meant to be fast at all.

Something this interpreter can't do is execute foreign functions – I can't just pass them to eval(), they have to be called via the C calling convention, with the number and types of arguments unknown beforehand. I'll probably have to do some vararg magic or use libffi for that.

The interpreter can dump all its internal state (the names, types, and values of variables) to stdout, which was my main way of debugging the parser and interpreter before I made a proper compiler.

Compiler, version 1

I was on holiday during the first few weeks of January 2026, and I only had my M1 Mac with me, so that's the architecture I decided to write a compiler for. At the moment of writing this post, it's still the only supported arch :)

This is a JIT-compiler: the result is a memory blob (which is mapped with correct bits to make it executable), and a pointer to the start of each function (including the entry point).

I know I've said "fairly straightforward" about most parts of the language implementation. Well, the compiler wasn't fairly straightforward, it was actually rather tricky, mostly because of the platform-specific stuff.

The high-level parts of the compiler were straightforward, though! It's almost a classical stack-based compiler: when computing an expression, just take the arguments from the stack, and put the result back on the stack. Except that to make things faster and a bit simpler, I've decided that my compiler will place the result of an expression in the same way as it would be placed by a function with the same return type using AAPCS64 (the standard C calling convention on Aarch64 Macs).

This means that e.g. integers and pointers are returned in the x0 general-purpose register, floating-points are returned in the v0 floating-point register, and structs are returned in a register or on the stack based on their size. This reduces the number of memory access operations, making the compiler output faster code and simplifying function calls. The stack is typically used for intermediate results, e.g. for binary operations. For example, an expression like A + B compiles into

(eval A)         # the value of A is in x0
push x0          # the value of A is on stack top
(eval B)         # the value of B is in x0
pop x1           # the value of A is in x1
add x0, x0, x1   # the value of A+B is in x0

Control flow structures are a bit tricky to compile. They all turn into conditional jumps (e.g. if the value of the expression is zero, jump to skip the if body), but the problem is that with a single-pass compilation we don't know where to jump, since we haven't compiled the if/while body yet. To solve this, I output jump instructions with zero offset, and inject the actual jump offset into them later, when I know the target offset. The same applies to function calls.

To generate target CPU instructions, I could use some 3rd-party library, but I'm trying to make the compiler as minimal as possible, so I decided to write them all myself. It basically boiled down to digging through the instruction manual and writing down the required bits.

Aarch64 specifics

As I've said, the basic compiler is fairly straightforward, but the specifics of the target architecture make it rather tricky.

First of all, all instructions in Aarch64 are 32-bit. This sounds nice at first (easy to address them, easy to store, easy to handle), but then you try to figure out how to put a 32-bit constant into a register. You need at least 5 bits to select a register (there are 32 of them), some bits to indicate the "put a constant into register" command, and 32 bits for the constant itself. The math doesn't add up – there's no way to put that into a single 32-bit instruction! Not to mention that you might want a 64-bit constant...

Instead, you have to either build the constant from 16-bit patches (there's an instruction that loads a 16-bit constant with an offset of 0, 16, 32, or 48 bits), or put the constant into constant memory and load it from there (that's what I do for floating-point constants, by the way).

There are no push/pop instructions (as opposed to x86), but there are instructions that e.g. write/read some register to/from a memory address computed as some other register plus a potential 9-bit signed address or 12-bit unsigned adress multiplied by 4, and then advances the address register...or something like that. Because all commands are exactly 32 bits, there's a ton of commands that do weird stuff, and you have to constantly pay attention to whether the offsets are signed or unsigned, whether they are premultiplied by some constant, whether the command modifies the address register, etc, etc.

The stack itself is a bit funny: when reading/writing from it relative to the SP register (stack pointer), the register must always be 16-bit aligned. Since the possible offsets are bound by 12 bits, you have to insert special code for cases when your stack frame is larger than 16Kb or something like that (I haven't implemented that yet).

The calling convention has some special cases for when structs are passed to/returned from functions in up to 2 general-purpose registers, in floating-point registers, or via pointer to memory, so there's a bunch of code in the compiler that covers specifically that.

IR

After writing the basic interpreter & compiler, I figured that I'd want to reuse some code between them, simplify writing compilers for other architectures, and add some optimizations, as the generated code is still far from reasonable (I'm not even saying "optimal").

The answer is simple: use some sort of an intermediate representation! That's what I did next. My IR is something like SSA (single static asssignment), except that I do allow re-assigning values to the same nodes and don't use phi-nodes. So it's not actually single assignment. So it's actually not like SSA at all. Oh well.

The IR is a sequence of nodes, each one being a literal, an operation (with inputs being some previously-defined nodes), a jump (conditional or unconditional), a function call, etc. Nodes that represent values also store the type of that value. Because I allow reassignment, there's a special assign IR instruction that reassigns the value of a previously-defined node.

Conditional jumps are split into separate jump_if_zero and jump_if_nonzero nodes, since they typically correspond to different CPU instructions (which are faster than negating the value and using the other instruction).

Because the language supports function pointers, there are separate instructions for calling by a known IR node and calling by an unknown pointer value.

To simplify further optimizations (which will remove/insert nodes at arbitrary positions), the nodes are stored in a linked list (std::list), and references using list iterators.

The trickiest problem with IR was how to support structs. I can't have struct-valued literals, so there's a special alloc node that is meant to represent the struct value, and typically compiles into allocating the struct on the stack without initializing its value. Then, the struct itself is built by assigning to individual fields.

However, structs can contain other structs, so reading a nested field like a.x.y would be represented in the IR as first reading a.x into a new node, and then reading the y field of this node. Assignment to a nested field is even worse: a.x.y = b would be represented as

Read t = a.x
Write t.y = b
Write a.x = t

That's really wasteful and hard to optimize, so instead this has special treatement in the IR. There's a copy node that can extract any nested field from a struct, and the assign node allows assigning to any nested field in a struct. Nested fields are represented as an array of indices (e.g. take the field #0, then take its field #2, then take its field #5).

Compiler, version 2

After that, I rewrote the Aarch64 compiler using the IR. It is now split into two parts: AST \(\rightarrow\) IR compiler, and IR \(\rightarrow\) Aarch64 compiler. The former is rather straightforward again, but the latter is a complete mess. Right now it's much worse than the previous stack-based compiler. At function start, it allocates as much stack space as is required for all IR nodes of this function, even though most of them are short-lived intermediate values. It's so bad that I had to split one function in my raytracer into two, simply so that the stack frame would fit into the 12-bit restriction I talked earlier.

That's pretty expected, though, as this compiler is meant to use a register allocator, and I expect the resulting code to be orders of magnitude better after that.

Future plans

That's pretty much all what's implemented in the language right now, and it takes just about 10k lines of C++ code. Obviously there's still a long way to go, but I'm happy that it all actually works, and the compiler is pretty small by modern standards!

There's a ton of stuff I'd like to add to the language, which I'll talk about below.

Compiler/interpreter

Register allocator

As I've already said, the current IR \(\rightarrow\) Aarch64 compiler is awful, and really needs a register allocator. I'm planning to use the standard linear scan allocator, which seems to be a good tradeoff between compilation speed and code quality.

IR optimizations

The IR allows for a ton of optimizations to be performed. In particular, I'm hoping to add

Constant propagation
Arithmetic simplification
Dead code elimination
Inlining
Loop unrolling

I'm not aiming at outperforming GCC or LLVM, but it would be nice if some simple functions like adding 3D vectors get compiled to as little CPU instructions as possible.

IR interpreter

I'm planning to rewrite the interpreter by evaluating the IR directly. This should simplify the interpreter considetably.

Producing executables

Right now, the compiler can only generate JIT-compiled memory blobs to be executed right away. I want to support generating runnable executables as well, in a platform-specific format. This will mostly require digging into the binary specs of the binary formats (ELF, Mach-O, PE). Would be cool to try and generate executables as small as possible, just for some sport.

Debugging

I've spent a fair amount of time stepping through JIT-produced assembly in lldb and I really want to be able to debug the language properly. This would probably require supporting the DWARF debug info format, which I know pretty much nothing about, so it should prove to be quite an adventure.

Language features

Struct constructors

Right now, structs can only be created by setting all fields like vec3i(1, 2, 3), or zero-initializing them like vec3i(). I want to support arbitrary constructors, which will work simply by declaring a function with the name equal to the struct name:

func vec3i(x: i32, y: i32) -> vec3i:
    return vec3i(x, y, 0)

I'm not sure about this, though – maybe giving unique names to such functions is better.

Global variables

Right now, globals are simply not supported. I'm planning to add a global keyword that creates a global variable (access to them is still restricted by scoping rules, so you can make function-local global variables like static variables in C).

Note that top-level variables aren't actually global (unless they use the global keyword), but instead are local to the file's entry point function. This might prove to be confusing to users; I'm still considering options regarding that.

This will create problems on Mac, which doesn't allow memory mappings that are both writeable and executable at the same time. This means that I'll have to allocate the globals separately from code, map them with different flags, and access globals by runtime-resolved addresses instead of compile-time known offsets.

However, it seems that you can use mprotect() to change the flags for a part of a mapping, so that's what I'm going to try first.

Method call syntax

Just to make the code more readable, I want code like x.f(y) to mean f(&x, y) or f(&mut x, y) whenever possible.

Polymorphism

That's probably the most important potential feature. There are many ways to add polymorphism to a language like that, but the two most promising (in my opinion) are

C++ style function overloading + unrestricted function templates + struct templates (without specialization)
Haskell / Rust style explicit traits + trait-constrained generic functions and structs

C++ style is more powerful, easier to read in simple cases, easier to implement in the compiler, but the error messages can be exceptionally cryptic.

Explicit traits can be easier to read in some cases (when it's hard to understand which of the used functions come from where in generic code), are harder to implement in the compiler (traits and trait bounds are a whole new system; traits themselves can be generic to support multiple-parameter traits, etc), but are stricter (which can be both good and bad) and they definitely solve the error messages problem.

I'm still not sure what to choose. I've said that I'm trying not to reinvent C++, but I'm reeeeally leaning towards the first option. Something like

struct vec2<t: type>:
    x: t
    y: t

func min<t: type>(x: t, y: t) -> t:
    return if x < y then x else y

with argument deduction for functions whenever possible.

Operator overloading

This requires some form of polymorphism to be present. Other than that, it's pretty straightforward: an operator like a + b could call an overloaded function add(a, b) or a method of a trait Add::add, something like that.

For loops

You can emulate for loops with while loops, so I'm planning to use for loops as collection-based loops (like the range-based loops in C++, loops in Python, etc). This, of course, requires some sort of range/iterator interface, which in turn requires some polymorphism once again.

Automatic resource management

I do believe that an ergonomic and practical language must provide some ways to help with freeing resources like memory, files, sockets, mutexes, etc. There are many ways to do that, in particular

C++ style RAII + move: automatic destructors that are called at the end of object's lifetime + moving data from one object into another
Zig style defer keyword that allows automatically executing arbitrary code at the end of scope
Linear types, which require an object to be used exactly once (e.g. passed to some deallocation function)

The main downside of RAII is that it is implicit, adding hidden instructions and control flow.

defer is explicit, but requires you to insert it manually all the time and doesn't prevent you from forgetting to insert it. Freeing up nested collections is also rather inconvenient, e.g. for an array of files you'd have to close each file before deallocating the array itself:

defer free(array)
defer for file in array:
    close(file)

Linear types are a promising idea: they are still explicit (you manually call free/close), but they also force you to "consume" the object using some resource-freeing function. However, they are once again hard to mix with nested collections, e.g. a dynamic array of files.

I'm not sure which option to choose; I'm hoping to expand on the linear types idea somehow.

Polymorphic literals

There are a few cases when type inference can fail because some literal can mean several things.

The first such case that I've faced is the empty array literal. Given an array like [1, 2, 3], we can check that all elements have the same type, and infer the type of the array from the elements. However, for an empty array [] we can infer the size (zero), but not the type!

A similar problem comes from the null literal – it can mean any pointer type. I also want an inf literal, which can mean any floating-point type.

I see three possible ways to solve this:

Haskell-like polymorphic literals – this requires a huge re-working of the language and its type system, and potentially turns type inference from a straightforward algorithm into an equation solver
Special built-in/library types that support implicit conversions (like nullptr_t in C++)
Treat them as special literals in AST and add ad-hoc handling in the compiler

I'm leaning towards the last option: simply allow e.g. null in places where we know that a pointer of a specific type is expected (initializing a variable with an explicitly specified type, passing an argument to a function, etc). This option is the simplest, but isn't extensible (can't allow constructing custom types from null). Maybe that's not a bad thing, though.

Compile-time evaluation

This would be a really fun feature for metaprogramming and similar stuff. First, variables can be declared using the const keyword, which means that this is a compile-time variable (so, not really a variable). It can be used in compile-time expressions (e.g. the size of an array type), it cannot be reassigned, and you can't take its address.

Then, any suitable function can be called in a compile-time expression (if it doesn't access global variables, doesn't have side-effects, etc). Inside the function body, it operates just like a regular function, but is executed during compilation, and the result is a compile-time expression.

This feature will need some tricks to declare that certain foreign functions are safe to call in compile-time (like math or memory allocation), otherwise the compile-time evaluation will be quite limited.

Type computations

It would be cool to support computations on types, once again for metaprogramming. These would be compile-time only, simply because I don't want to invent some runtime encoding scheme for types, and having types in runtime in a statically-typed language has very limited utility anyway.

This could also be used for something like C++ style concepts, but without the need for any special syntax – just using compile-time calls. We'd need some way to check type properties, though:

func comparable(t: type) -> bool:
    // Implemented somehow...

func min<t: comparable type>(x: t, y: t) -> t:
    return if x < y then x else y

Coroutines

That's honestly more like dreaming than planning, but adding some Python or JS style async/await wouldn't hurt.

Library

Modules

Of course, writing everything in the same file is madness, and the language really needs modules. I'm planning for a simple import lib.sublib statement, which can be placed anywhere in code and also adheres to the scoping rules (allowing e.g. function-local imports). Note that, as before, scoping affects visibility only; the actual loading process happens at compilation time, and the entry point of an imported module is executed before your module.

The library name will directly correspond to the filesystem path (relative to some root paths specified to the compiler/interpreter) leading to this library. It can be a single source file (in which case only that file is imported), or it can be the a whole directory (in which case all the files from that directory are imported in some order). I'd need some syntax to refer to files in the same directory, though – maybe something like import .another.

The imported functions / global variables can be used without any prefixes, or they can be prefixed by the library name like io.print(x) in case of an ambiguity.

The modules' entry points will be executed in a deterministic order (in order of importing + topological sorting of the recursive imports), which should solve the initialization order fiasco typical to C or C++.

One thing I haven't decided on yet is how multiple-module programs will be laid out in memory. I can put each module in a separate memory patch, and resolve the function calls / global variable accesses at runtime. Alternatively, I can build them up as a single huge memory mapping, and use relative offsets. This should be faster at runtime, but will make it harder to compile several modules in parallel. We'll see.

Prelude

Once we have modules, we can have some basic utility stuff put into a certain prelude module that is implicitly included in all programs. It could contain stuff like length() function and iterator interface for built-in arrays, a string view type, numeric range like Python's range(n), etc.

String literals

There are no string literals in the language yet, simply because I have no idea what they should mean. My plan is to have an immutable string_view type in the prelude, and a string contents would be placed somewhere in executable memory, while the literal itself would turn into a string_view pointing to this memory.

Standard library

Of course, with great modules comes a great standard library. I'd love for it to contain some subset of the following:

Math library (including vectors and matrices)
Memory management, at least in the form of alloc/free functions bridged from libc
Dynamic arrays
Dynamic strings and formatting
Hash tables
I/O facilities for the console and files
Filesystem helpers
Timing/clock helpers
Networking

Conclusion

So, when will I implement all this stuff, and will I ever use the language for modding in my game or for anything else? I have literally no idea. That's an ambitious project, and I don't think it's a good idea to have more than one ongoing ambitious project treated seriously. I'll work on it when I feel like it, and right now my priority is still the game – after all, you can't mod a game that isn't made yet.

If you have some fun ideas about anything I've talked about above, don't hesitate to ping me! In any case, thanks for reading.

Contents