This article is the note and thought I get from the conference.
I would mention the following topics:
- type checker for Ruby(steep)
- GC compaction
Basically, I understand what Matz(creator of Ruby) want is a type inferer generate a type definition file, then we can modify it to be more accurate. Finally, the checker check program based on the file modified by you. So that we can avoid typing any extra information but still have a static type analyzer.
Few questions I have are:
- what if I regenerate the type definition file after I modified it?
- seems like would be replaced right now
how it avoids Hindley Milner type system limit?
For example, a mutable cell holding a list of values of unspecified type.
p.s. What we can do is value restriction here. Anyway, we need some extension for the type system.
Then we keep going since I have more talk with Steep creator(Soutaro Matsumoto) XD.
Not going to talk about too many type inference here but I would talk a little bit for the question from Soutaro Matsumoto.
The problem he has is:
class Array<T> def `=`: # for arr = T.new (Integer, T) -> T # for arr[0, 1] = T.new | (Integer, Integer, T) -> T # for arr[0, 1] = [T.new] | (Integer, Integer, Array<T>) -> Array<T> # for arr[0..1] = T.new | (Range<Integer>, T) -> T # for arr[0..1] = [T.new] | (Range<Integer>, Array<T>) -> Array<T> end
Now we have a wrong typed example:
arr = [1, 2, 3] # Array<Integer> arr = "foo"
From human view, we expect an error like
expected: Integer but got: String.However, for overloading, we have to try other methods since we can’t sure it would match or not. So the checker keeps going on, and failed at the final definition and report
expected: Array<Integer> but got: String which is a little bit confusing.
Let’s consider more general overloading, at here general means semantic is less limited if the overloading is a normal method than an operator.
class C<T> def foo: (Integer, T) -> T | (Integer, Integer, T) -> T end
I think, in this case, we can’t do more than say can’t find a method for blablabla. For example:
c = C<String>.new c.foo(1, 2)
We have no idea user is trying to match
(Integer, Integer, T) -> T by missing one argument, or trying to type
c.foo(1, "2") but wrong.
However, in my understanding, we can’t add more operators into Ruby by the problem.
So operators actucally are different than normal methods. For example:
arr = [1, 2, 3] arr = 2 arr[0, 2] = 9
arr[0, 2] = didn’t follow a right hand side expression, it actually represents the same type as
arr = 2 which is obviously wrong. So we can depend on this extra semantic and redefine the type of
= operator to:
class Array<T> def `=`: (Integer) -> (T) -> T | (Integer, Integer) -> (T) -> T | (Integer, Integer) -> (Array<T>) -> Array<T> | (Range<Integer>) -> (T) -> T | (Range<Integer>) -> (Array<T>) -> Array<T> end
But since we actually don’t want the assign missing the right-hand side so we have to have another special notation for this special function, so it could be:
class Array<T> def `=`: (Integer) => (T) -> T | (Integer, Integer) => (T) -> T | (Integer, Integer) => (Array<T>) -> Array<T> | (Range<Integer>) => (T) -> T | (Range<Integer>) => (Array<T>) -> Array<T> end
=> cuts semantic steps.
-> followed by the finally return type.
class Array<T> def `=`: [Integer] = T -> T | [Integer, Integer] = T -> T | [Integer, Integer] = Array<T> -> Array<T> | [Range<Integer>] = T -> T | [Range<Integer>] = Array<T> -> Array<T> end
p.s. After I mention the idea with Soutaro Matsumoto, he points out the new syntax might not a good idea, that also make sense in the context. Since break current program won’t be anyone wants. But in type checker can use the abstraction to do better inference.
Why we need compaction? Without compaction, we can have out of memory due to the fragmented heap. What is a fragmented heap?
Without compaction, we would find there still has enough space but is not contiguous space so can’t allocate the object.
Algorithm use by Lisp. The idea is using two pointers, one move from left to right(ideally) called free, one from right to left called scan, free stops while getting an empty chunk, scan stops while getting a non-empty chunk, when both stop, swap the chunk, end this process until they point to the same chunk, then we update all references.
Cons: Only work with a fixed-size heap -> How to improve it?
When a pointer in C points to a reference from Ruby, and GC moved the object or delete it, what would happen? We can separate into different conditions:
- moved, in this condition, null reference caused exit
- delete, same as the previous one
- moved and move another object in, in this case, UB would happen, since you might treat an array object as a string object, the behavior is random, this is the craziest condition
To avoid them, we have to tell GC what is hold by C pointer, so in C code not just allocate and free memory also have to write something like
gc_c_mark(pointer_to_ruby_object) so GC know these objects must be pinned, it can’t move them.
Following things are related to how to allow movement in C
GC calls the C callback function after compaction is done, give C a chance to update its reference.
No Pin marking
use no pin function for reference object, so GC can manage the object.
new location function
rb_gc_locationwould return the new location of the object.
Ruby Object C Object | | | | \---> Third Object <---/
Ruby automatically marked some object(gc_mark_no_pin), and not mark from C.
So when compacter run, GC would move the Third Object however C still reference to old place so program explode.
- Maximum Chaos: doubling Heap
- Zombie Objects: let some slots can’t be GC, always use a zombie object fill it
- Address Sanitizer: https://github.com/google/sanitizers/wiki/AddressSanitizer
For the rest part I don’t have enough knowledge to sort out or I didn’t take a look or I’m lazy. Here are video links.
- Automatic Differentiation for Ruby
- Virtual Machines: their common parts and what makes them special
- The Journey to One Million
Yes, another Ruby conference I join though I didn’t know Ruby :).