I'm interested in building my own programming language and I want to know: what are your most-loved and most-hated features of any programming lang...
For further actions, you may consider blocking this person and/or reporting abuse
Features I like:
pip install library_nameand for it to just work™ is awesome.let x = if something { 1 } else { 2 }.Resulttype. In Elixir, you return a tuple which includes information on whether an error occurred.matchstatements like in Rust and Scala that support pattern matching. Bonus points if the compiler enforces that matches are exhaustive.if (variable != null) { ... }checks and find it really hard to read.dbg!(something)to print out an object and all of its data without having to implement atoStringor similar)Features I dislike:
null,NullPointerException, etc.do/endsyntax in languages like Elixir and Ruby. This is super subjective though and with modern text editors/IDEs it just becomes an aesthetic thing.I could probably go on all day, but those are the first things that spring to mind!
Add to that:
I've been designing and building languages (now full time) for many years. You'll find you can end up adding an infinite list of things.
My advice, try writing a very simple Lisp interpreter: it can be done in under a day. Then try adding a few things.
You might also want to check out LLVM, which has a tutorial Implementing A Language With LLVM
My other advice, as soon as possible "Eat Your Own Dogfood". The programming language, compiler services, and all it's libraries should be written itself. To do this write a bare minimum language compiler from your language to C, C++, Java or whatever (C++ did this initially with "cfront"). Then rewrite that simple pre-processor in your new language. Then add more features.
This is the best and most efficient way to validate your work - if you like using your own language more than some other, you are on possibly on the right track.
Thanks for all the advice. I'm going to have a huge list of things to research before I even think about starting this project. I'm sure I'll come back to your comment more than a few times.
I forgot to say the most influential book for me are:
"Programming Languages: An Interpreter Based Approach" by Samuel N Kamin - though it says it's about Interpreters, it's really looking at how to implement language features for various languages. This makes you feel like you could do it yourself, because it explains each feature and gives example code. One of the first books I read on the subject.
"Types and Programming Languages" by Benjamin C Pierce. Totally opposite and quite heavy reading. Assumes you can do degree level set-theoretic logic - but this book basically tells you how to build a proper type system from a mathematical perspective. The concepts are key, so it's possible to read and gloss over the maths. Often academia has the future or bleeding edge hidden in research papers, so it's worth reading these also, even if the maths goes way over ones head.
As I mentioned, Lisp is a great place to start because it has a very simple lexical and syntactical grammar, and the semantics can be expressed in very minimally. I quick google search gave me: Lisp in Less Than 200 Lines Of Code
Don't research absolutely everything to begin with, it's too overwhelming a subject. The basics are covered in the "Dragon Book":
Another thing I tend to do is read/use a lot of languages and steal... err... leverage... ideas. Most also have their compilers and libraries open sourced. Some interesting languages: Swift, C#, Kotlin, Rust, Julia, Scala, Clojure, Erlang.
Don't be daunted, the subject, like most, is quite deep and involved when you really look into it.
Wow! Thanks a lot, Harvey! This is a great list of books. I really appreciate it. I'll definitely have a look at Lisp.
I find the Erlang/Elixir treatment of null to be acceptable.
It (nil) is an atom (as are false, and true), definitely not conflatable with zero.
The only thing that is "dangerously" affected is "if", which fails on "false" and "nil" exclusively. Everywhere else you have to treat nil as its own entity.
I think the main thing that people look for in a modern "alternative" language is convenience and clarity.
Some specific possibilities:
And the fun part, some anti-features:
And one more thing: I think there's the most available space around asynchronous-by-default. Play with an async runtime once you're up and running. There's potential there I haven't seen anyone fully hit.
REPL is a good shout. That will definitely go hand-in-hand with developing the language at the beginning.
I think Wolfram Alpha does something similar to this, and iirc most Computer Algebra Systems do this as well, or at least they achieve the same goal, possibly with a different implementation.
Also are you going to be using flex and bison for the language? Or is it going to be a PEG parser?
I don't know what any of this means. (Sorry, I'm new to language design.) Can you elaborate?
A PEG is a parsing expression grammar. It is a way of specifying all valid strings in a language. It looks kinda like this
These would be converted into a Finite State Automata that can parse a string in LL(1) or O(n). It's basically a program that takes one character after another and changes state depending on the character. Each state can accept different characters leading to different states. This way, a state can encompass multiple non-terminals.
Let's say you define a language as having members "hero" and "hello".
The way to parse a given string without backtracking or lookahead is to construct this finite state machine. Opening state accepts an
h, leading to state 1. State 1 acceptse, leading to state 2. State 2 acceptsl, leading to state 3, orrleading to state 4. State 3 acceptsl, leading to state 5, which acceptso, leading to a state which it is valid to end on. State 4 also accepts anoand is valid to end on. In this case you could reuse state 4 for 5 if desired.Bison and Yaac are tools that allow you to write a grammar and it will spit out a finite state machine like I described.
Alright, as far as I know, there are two general types of parsers, top down, and bottom up. Bottom up is more performant iirc, but is harder to get used to, while top down is less performant (O(n3) if you know what I mean, and iirc), but I hear that they're much simpler to work with.
Ohh before I elaborate further, I'll need to know what sorts of systems and tools you're used to working with. What language, OS, compiler?
In grad school I programmed in C/C++ for 5 years, now I'm mostly Java and R, but I'm teaching myself Haskell, as well. I work on Windows, Ubuntu, and macOS. If you're asking which compiler I use to compile my own code, just
gccor whatever's available. I haven't yet looked into anything for my language project.I found this article while researching this and it mentions
yacc, which I have heard of. I wasn't aware thatbisonis its successor. I don't know howyaccworks, though, I just recognise the name.Ruby has a lot of expressive language features that seem like aliases for other things but are actually a bit different.
For example,
and,or, andnotexist which could be interesting alternatives to&&,||, and!which are still more common in the language, except they have subtly different behavior so you can't really interchange them per se.Lots of little ways to cut yourself in that way.
Oh, and that's also the best feature of the language. It's been crafted to be expressive and intuitive. Not settling for inelegant solutions. Tools like
[].empty?and the many, many more are really nice to have.Of course this all comes with performance and memory bloat concerns but it's still a great tool for many jobs.
One thing I would like to emphasize in my to-be-created language is that there should, generally speaking, only be one correct way to do something. I think it would make the syntax more uniform and make things less difficult to document, etc.*
So I suppose at some point I'll have to decide between
andvs.&&,notvs.!vs.~and so on. Leaning toward the English-like options.*One interesting effect this would have is that there would only be one kind of loop. No
forvs.while, just some kind ofloopfeature.Yeah, Ruby pretty much goes the other way completely with that. I try not to indulge it too much, but this mentality jives with my personality, probably helps create the right coder language fit.
Spinoff!
What is your "Coder/Language Fit"
Ben Halpern
A difference between method syntax and calling a member function. Enums that are not numbers or strings (unless you say so), like enum class in C++. If I were making a language I would consider the idea of minimizing keywords by using built-in function calls instead. Just a thought.
Can you give an example of what you don't want to see vs. what you want to see? I'm not sure I understand.
In JavaScript, a function call implicitly passes
thisdepending on the call site. In Lua there is a difference between a method call and a function being called from a namespace. This is accomplished through syntactic sugar.In Lua:
Object:Move(1)is equivalent toObject.Move(Object, 1)This function can be declared like so:
function Object:Move(Amount)The syntactic sugar puts a
selfas the first parameter. You could do this yourself as well:function Object.Move(self, Amount)(you could name self whatever you want this way)That means method calls can be localized and called later if necessary because
selfis actually a parameter.local f = Object.Movef(Object). You can't do this in JS, but it isn't just that. There isn't a need to implicitly pass this into namespaces.The other solution could be to wrap localized functions with an implicit this, but that would make each instance of a class have a unique version of each method.
I would recommend under the hood using a data oriented approach. Instances of classes wouldn't hold all their data next to each other, an instance of a class would just be an integer which is the index at which its data can be retrieved from a bunch of arrays that each hold a single member of each instance. This is smarter design because it is much more frequent that at one time you might iterate through a particular property of all instances and with the way CPU caching works you can load up the right array and use just that data, without fetching unnecessary data you aren't using for comparison. Ex:
(Obviously this example is a simplification)
Thanks for all the advice! All of these are good points which I'll definitely have to consider when designing my language.
R conflating length 1 vectors and scalars is something to avoid. MATLAB does the same thing, and it was a bad idea there, too. Perl had the same error where it autoflattened an array of arrays unless you carefully inserted reference marks. So much of programming is getting data into the appropriate structure, and anything that gets in the way is a problem.
Boxing and unboxing is much more complicated than you would first think because of arrays. Say I have a class A with a subclass B. I can put an instance of B in an array of A. But when you say double[], you probably want a contiguous hunk of memory directly storing values. C++ fully exposes this semantic difference. Julia began its type system by insisting on unboxed arrays of doubles. Java compromised between making numeric computing not ridiculously inefficient and not complicating its semantics enormously.
Your gripe about Number not having any methods is spot on, and is why Stepanov invented the math that led to the Standard Template Library in C++ and part of why Common Lisp went with multiple dispatch in CLOS. This is a ubiquitous problem with single dispatch object systems.
How does it work in MATLAB and Perl? In R, they're pretty upfront about the fact that there's really no such thing as a scalar, but you can emulate it with a length-1 vector.
I'll definitely have to think about how I want data laid out for matrices and things. Lots of things to consider when you want to balance performance and syntax, etc.
What are your opinions on single vs. multiple dispatch mechanisms?
MATLAB does the same thing as R (or really, historically, R does the same thing as MATLAB): scalars are length 1 vectors. Perl 5 doesn't do that, but if you write
@(1, 2, @(3, 4)), which in most languages would give you a list of length three containing two scalars and another list, you instead get a list of length four.These choices make certain programming tasks more straightforward at the expense of all other programming tasks. Which is fine if you know the language is meant for just those tasks.
For single vs multiple dispatch, multiple dispatch is the clear winner in all respects except familiarity is multiple dispatch.
For balancing syntax, I remember the BitC folks saying that their best decision was using S-expressions for syntax until the language semantics stabilized, because they ended up changing deep things that would have been a real pain if they had a more structured syntax. Then they built a syntax besides S-expressions after the semantics stabilized.
The property tags in Go:
The fact that I can have a variable called
InvitationTokenin Go, that is calledinvitationTokenin JSON andinvitation_tokenin my database is just absolutely amazing. It resolves one of my biggest pet peeves in programming, which is mixing naming cases.This is basically just metadata that you can attach to a variable, right?
related: stackoverflow.com/questions/108587...
That's the gist of it yes. I think you could implement the same thing with tagged template literals in JS, but Go just offers it natively. 🤓
In js you would use decorators, which are native langage features:
Null-stuff operators like in C# return nullObject?.somefunction() ?? "this is nice sometimes".
I actually like Java because of it's ecosystem but I also like checked exceptions (by this point some would think I'm crazy). Working with C# for a while and .NET Core made me realise I actually miss forcing me to handle exceptions. I find myself adding more and more handlers to middleware because a lot of things I didn't expect and did not know handler was missing catch block for them so they ended up in catch exception -> 500 something happend block. I'm not sure about them but I definitely like them when building REST.
I like being able to extend class but it hides a lot of things sometimes and I find myself looking through files to see where the actual thing happend. But if used properly it's good.
Kotlin has data classes which to me sounded like a good feature but I did not like the way they are done in Kotlin so something like that would be good.
That's all I can think of now. Maybe I'll edit to include Ruby, Groovy and Scala stuff.
Is this:
...a ternary operator? Like
...in Java?
Yeah but you can chain it so it makes things a lot more readable:
Nice! That's a cool little operator.
a bit late but yeah.. Valentin answerd already but ?. checks for null and prevents null pointer exception and returns null for value but also ?? is null coalescing operator which in case something returns null takes right side value as in java ternary operator but this one's shorter so nullvaluestring ?? "somedefaultvalue" looks a bit prettier :D
Definitely +1 for compile-time type checking. I really don't like dynamic typing. It feels messy to me.
1) A native
notoperator that can be prepended to any bool expression.Simple
!is so unmarkable and not noticable when reading. But usingfalse == ...()is ugly2)
deferlike in golang. And option to defer a loop, not only a function. So action to be executed right after break (not just return)3) a crazy idea but I'm always thinking about having a
break-from-ifoperator. When you have compex if-logic, this option allows to make it easy, more flattened and therefore more readableLiterally today I tried to break from an
ifin Java and got a compiler error. It would be a super useful addition.Both Perl 6 and Common Lisp have a concept of Rational Number which exactly preserved the ratio of two integers. They also boast accurate floating point math. Perl 6 is relatively new -- and not much like Perl 5 if you have heard bad things about the latter.
Well much of what is in D. Most it would be nice to have different defaults.
On the other hand there hand there is Lua. It embeds nicely into D and there are things I don't like.
Ranges and slices are two things I definitely want to implement from the get-go.
1-based indexing isn't even on the table!
One thing that's pretty similar to your "storing the formula used to calculate the number and then calculating the precision on-demand" idea that you have is exact real arithmetic. Several implementations exist for Haskell. One of the downsides of this approach, besides performance, is that equality is undecidable. The best you can do is determine that two numbers are within a certain distance of eachother.
Thanks for the link!
That's true, but it's also true with floating-point numbers in any programming language. Doing something like setting the default precision to the number of significant digits would eliminate this problem, I would think?
If you set
x = 3.0 * 0.20(= 0.60@ 2 sig digits) andy = 0.599 * 1.0(= 0.60@ 2 sig digits) thenyandxare equivalent when only significant figures are considered. Doing something likey - xwould yield0.599 * 1.0 - 3.0 * 0.20 = -0.001, which, to 2 significant figures, is zero. That's equality.What do you think?
For your first suggestion, I would seriously give this talk a watch, and learn from the very well-fought struggles of another PL:
youtube.com/watch?v=C2RO34b_oPM
I think julia's typesystem is quite fascinating, because it's used in a totally different way from pretty much every other PL.
Elixir is a language that really optimizes for programmer joy, one of the best PL features is the pipe operator.
I can do this:
result = value
|> IO.inspect(label: "value")
|> function_1
|> IO.inspect(label: "result 1")
|> function_2(with_param)
|> IO.inspect(label: "result 2")
|> function_3
|> IO.inspect(label: "result")
instead of:
value
println("value: $value")
r1 = function_1(value)
println("result 1: $r1")
r2 = function_2(r1, with_param)
println("result 2: $r2")
result = function_3(r2)
Indespensable for easy-to-read code and println debugging.
That is pretty neat. Thanks for the suggestions! I'll check out that video ASAP.
Pattern matching is a must.
Haskell like syntax (small but expressive).
Lambdas are the best