First Blog Post on $ugar Programming Language

#compiler #newlanguage #sugar #development

Introducing Myself

This will be a lengthy introduction for me into the world of blogging. I’m first going to introduce myself alongside my experience as a programmer to contextualize some of the design decisions of the language I am developing, before laying out the design goals of $ugar.

My name is Cora River Maxwell, I’m 21, and I’m attending a community college where I’m quadruple majoring in Mathematics, Physics, Computer Science, and Music, and I almost have my associates in all four. I have been programming for 7 years now, initially learning JavaScript, Python, and Java in high school. I used to really like Java because I do think OOP is a good way of wrapping your head around divide and conquering a problem. But as I did more serious projects in Java, it quickly became frustrating. After High School, I learned C, C++, and C# where I quickly began to like C# and equated the language to a better Java. C is also a great language and helped me understand how programs work with your computer. C++ on the other hand, I hate to this day. Now I’m learning Rust and Haskell, and I have been enjoying both languages immensely. As much as I enjoy OOP, Functional Programming is a wonderful paradigm and I feel I am especially drawn to it because of my love for math.

Some projects I have worked on over the years include video games—many of which unfinished (although I did my first Game Jam this year, which I am incredibly proud of myself for), chess engines, ultimate tic tac toe engines, an interpreted language for Brainfuck, and a couple basic applications. I do feel my lack of web development gives me an incomplete perspective on programming in general: I understand a lot of the core concepts, but I have no real applicative experience in web development. Despite 7 years, I also would say that I’m not a great programmer, I’m intermediate at best. So I do think that developing a new Programming Language will increase my skills as a programmer and could potentially be a helpful tool to me.

Design Goals of $ugar

I should probably confirm that yes, $ugar is named after the phrase “syntactic sugar” which I plan to include a lot of. I think easy, repetitive things should be easy and quick to write.

Here are some of the design goals of $ugar: strong type system, statically typed with type inference, multi paradigm, multi memory management scheme, eager compiler and lazy evaluation, and syntax that gives way to easily modifying/adapting your code.

Type Safety I think is kind of a given in newer languages today, it is an important feature that prevents you (mostly) from doing stupid shit that doesn’t make sense to data. I want a strong, robust, and versatile type system. I personally feel that statically typed languages are easier to work with: being able to explicitly state that this data is this type and can only behave as such until I say otherwise is so helpful in larger code bases and I think it’s generally easier to read. I do think however that code readability is a function of experience however, so I will try to refrain from that as a reason for decisions. Despite that, I think it makes more sense to say “this is an int, I cannot do bool stuff to it.” rather than “this data is set to what I think is an int, but at any point I can later reassign it to be a string or bool willy nilly”. Later on, I will discuss some features of $ugar that I think exemplifies these goals.

A lot of people joke that Physics is applied Math, and other science fields are just chains of applied Physics. While I do agree, I think there’s another take: Math in its purest form is logic, abstraction, and problem solving; and Computer Science and Programming is a real world application of logic, abstraction, and problem solving, so in a sense Programming is applied Math. When you have a problem in Math, there are many different ways to solve it using different fields in Math, because all in all every field in Math is related to each other and I think programming paradigms are similar. I know there are more paradigms than OOP and FP, but these are the ones I (and I think most people) are familiar with—I do need to learn more paradigms though, for the reasons stated in this paragraph. OOP is a great way to organize data and associated functionality and FP is great at defining processes and stringing them together in an easy to logically reason about them way. Every paradigm has its downsides: OOP allows for early nested abstraction which leads to expensive refactors and difficult to read code, while pure FP makes mutating data really difficult (I know this is the point, however there are lots of times where mutating data is helpful—mainly for performance reasons) and because computers are imperative, it’s hard to reason how functional programs translate to the computer in a lot of cases. I think good engineers follow good practices, understand the pitfalls of them, and adjust accordingly, as apposed to following a set of rules/concepts blindly. There are going to be good ideas in every programming paradigm, and it’s better to use those ideas when appropriate.

The same can be applied to memory management, there’s benefits and trade offs for every memory management scheme, so why should be forced to stay with just one for an entire program? Why can’t I choose what data to be garbage collected and what data I want to manually handle so that I can minimize the runtime performance cost of a GC while manually dealing with memory that is easy to handle? I want to give syntactical features that allow users to define how they want to handle the memory for data they create. The two primary options in $ugar would be GC or the Ownership Model that Rust uses. I personally really like the Ownership Model, and I think it’ll be especially nice if you have to opt into it, so you don’t have to think about lifetimes or borrowing until you want to. Also with Unsafe in Rust, you are able to do complete manual memory management (as long as you code in such a way to avoid undefined behavior) and have complete control, so I think the GC and Ownership are enough.

When I say I want an eager compiler, I want a compiler that will do as much as it can to increase runtime performance. The one major downside to this goal is compiler times, but you do get the benefits of extra safety and runtime performance. Safety might be a little difficult to intuit as to how an eager compiler accomplishes this idea but here’s a basic example: I define an array with elements [1, 2, 3], and I then ask to access the 0th element (1) 100 times, the compiler should be able to figure out ahead of time that it’s going to be a 1 and thus in the assembly code should just be using a constant of 1 instead of a pointer offset dereference; in that case, the compiler should also be able to tell when you’re trying to access bad memory, like say you are trying to access the 3rd element of the array which does not exist, the compiler should in simple cases be able to call you a dumbass. Another example would be an if statement with true in the condition and an error throw in the body, compiling this should throw a compile time error, because the compiler should be able to tell the if block should always run and will always throw an error. There are some ideas I have to help with slow compile times, like do multiple passes, where a quick syntax check is the first pass, and the second pass would be checking for breaking any Ownership rules and Type Safety and Simple Data Safety, so that if there are syntax errors, the next set of passes shouldn’t run. Alongside this, I think Lazy Evaluation is a smart concept for runtime optimization, and I prefer opting into Eager Evaluation than the other way around. You can mock Lazy Evaluation by wrapping data in functions in JavaScript and C# and a lot of languages that support higher order functions, however it’s never fully optimized with Thunks like in Haskell, unless you’re using a package. Generally I think it’s better to have ease of use and safety as default, and performance and unsafety as opt in, which in a way somewhat applies to Lazy Eval: it does allow for ease of use with certain things like infinite lists, and you can opt in to Eager Evaluation for performance reasons (when in cases where Eager Evaluation does have better runtime performance).

Some features in other languages are really nice for expressing ideas concisely but can be annoying to modify when your program naturally changes. For example, switching from an if statement to a switch, or adding debug statements to a ternary operator. So I want syntax that lets you easily do some of these things and revert back quickly, easily, and with little frustration. For example, Jonathan Blows Jai has some cool syntax that emulates this like how constant and variable definitions are similarly typed (:: vs :=) and switch statements are an if statement except the condition is your value == followed by your body (if x == { case 3; case 4; } ).

Language Features: Aliases, Class Parameters, and Oxidization

For types, I want to talk about a couple planned features: aliases and class parameters.

Aliases is the ability to give different names to a thing: variables, class names, function names, etc. This allows you to have descriptive variable names but shorthands for them to keep your code concise.

pub i32 slope alias s = 3;
pub i32 y_intercept alias b = -3;
pub i32 input alias x;
pub i32 output alias y;

x = std::read;
y = m * x + b;
std::io::writeln “output: “ ++ y ++ “\n”
     ++ “input: “ ++ x ++ “\n”
     ++ “slope: “ ++ m ++ “\n”
     ++ “y-intercept: “ ++ b;

Aliases will help with type safety in a performant way by defining data types that only certain functions can handle. I’m a big fan of implicit casts outside of function arguments, but requiring explicit casts for function arguments. $ugar will have function overloading, so knowing exactly what data you’re inputting and outputting is important. For example:

pub struct Degree alias f32;
pub struct Radian alias f32;

pub const f32 Pi = 3.14159;

pub [3]Vector u, v;
u.GetRandomVector;
v.GetRandomVector;

pub Degree theta1 = u Vector::AngleBetween v;
pub Radian theta2 = Pi;

pub f32 a b c d e;
a = Math::Sin theta1;
b = Math::Sin $ toRadian theta1;
c = Math::Sin theta2;
d = Math::Sin $ toDegree theta2;
e = Math::Sin $ u Vector::AngleBetween v;

Now say AngleBetween outputted Degrees and Sin only took in Radians. In that case, the lines assigning a, d, and e with a value would throw a compiler error because Radians and Degrees are different types even though they’re really just floats. Radians and Degrees are essentially just new primitive types. Now say if there was a Sin function that took in Degree and another Sin function that took in Radian, every value would be evaluates as intended by the developer: a = b = e and c = d (floating precision error might occur in these cases). This is similar to the NewType Rust pattern, however there’s slightly less boilerplate.

Class Parameters is being able able to give variable information to a user defined type to specify functionality attached to that data type. In other words, a class parameter is a compile-time field that differentiates it as a unique type compared to objects with a different class parameter. For example, say I have a Car that can either be electric or gas powered. In a lot of languages, you can solve this using inheritance where you have a Car class and then ElectricCar and GasCar classes. With the class parameter feature, I can write it like this instead:

pub enum CarType {
     Electric, Gas
}

pub class [CarType type]Car {
     pub String name,
     pub i32 year
}

trait Car {
     fn Drive $ self, i32;
     fn Print $ self;
}

impl Car {
     pub mut fn Print $ self {
          std::writeln self.name ++ “ “ ++ self.year ++ “ “ ++ self.type;
     }
}

impl [CarType.Electric]Car {
     pub mut fn Drive $ miles {
         std::writeln “I drove “ ++ miles ++ “ electrically”;
     }

     pub mut fn Electric $ self {
          std::writeln “Electric.”;
     }
}

impl [CarType.Gas]Car {
     pub mut fn Drive $ miles {
         std::writeln “I drove “ ++ miles ++ “ with gas”;
     }

     pub mut fn Gas $ self {
          std::writeln “Gas.”;
     }
}

pub mut fn Main {
     [CarType.Gas]Car vehicle = Car{name: “Lamborghini Civic”, year: 1838};
     vehicle.Gas;
     vehicle.Drive;
     vehicle.Print;
     vehicle.Electric;
}

In the above code, a compiler error will throw saying that type [CarType.Gas]Car has no associated method called Electric. In this specific case, you can think of Car as an abstract class, and the two other types as subclasses, except there’s this constant field that’s unique to every subclass that can be read from. There’s two upsides in this case: the inheritance can only go one layer deep, and only the [CarType.Gas]Car class will generate in assembly. Here is a more complicated example with a Vector class, which really shows the power of Class Parameters.

pub class [u8 size]Vector alias [f32; size];

impl Vector {
     pub fn Dot $ [n]Vector a $ infix $ [n]Vector b : f32 {
          prv mut i32 i = 0;
          prv mut f32 product = 0;
          while i < n {
               product += a[i] * b[i];
               i++;
          }
          return product;
     }
}

impl [3]Vector {
     pub fn Cross $ [3]Vector a $ infix $ [3]Vector b : [3]Vector {
          pub mut [f32; 3] elements;
          elements[0] = a[1] * b[2] - a[2] * b[1];
          elements[1] = a[2] * b[0] - a[0] * b[2];
          elements[2] = a[0] * b[1] - a[1] * b[0];
          return elements;
     }
}

pub mut fn Main {
     [2]Vector u = $ 1 0;
     [2]Vector v = $ 0.5 0.5;
     [64]Vector w = $ 0..=64;
     f32 uv = u Vector::Dot v;
     f32 uw = u Vector::Dot w;
     [3]Vector a = u Vector::Cross v;
}

We have just generated 256 unique classes! Each of which will only generate assembly if the program uses said type! In the above code, two compiler errors will throw: the initialization of uw and the initialization of a. uw errors because Dot requires both arguments to have a class parameter size n, however u has size 3 and w has size 64, and a errors because Cross requires both arguments to have a class parameter size 3, but u and v has size 2. Type Safety is automatically built in, and you don’t have to manually code 256 classes. This is an example of the versatility I want in $ugar’s type system.

To showcase the multi memory management scheme (mmms), I will discuss the oxy keyword (oxy being short for oxidized, based off Rust). The implementation of the mmms I will discuss in a future blog. The compiler will put objects that it knows the size and the lifetime of on the stack, so a Coord struct with x and y integer fields will go on the stack. When an object’s size or lifetime is unknown, it will be allocated to a heap. When an object is heap allocated, by default it is assigned to be cleaned up by the GC, meaning you don’t have to worry about handling the memory of that data. This gives the benefit of writing/prototyping code much more quickly and easily than if you were to deal with manually managing the memory or deal with the borrow checker. However, you can assign data to be handled by the Ownership Model instead of the GC by attaching the oxy keyword to the object’s type. For example, oxy [i32] vs [i32]. Functions will infer if an argument is oxidized and will run the borrow checker to ensure that the ownership rules are not being broken. You can add the oxy keyword to an object you’ve already written code for, and then add your lifetimes and change your references to abide by the borrow checker.

I will do my best to blog my progress and other cool features in between school and work. I hope you follow me on this journey of developing $ugar and that this initial blog has you excited for its development.

DEV Community

First Blog Post on $ugar Programming Language

Introducing Myself

Design Goals of $ugar

Language Features: Aliases, Class Parameters, and Oxidization

Top comments (0)

Read next

Blocker: How to load environment variables on broswer environment using webpack.config.js

2024-04-29: It's gonna be May

How to get the Developer Role on the Mode Discord Server

RDS while connection error: no pg_hba.conf entry for host