How CRuby decides an `if` is a modifier

Yutaka HARA — Sat, 06 Feb 2021 16:27:15 +0000

Ruby has two styles to write if.

if foo then bar end
foo if bar

This reads natural to human, but not to machines. For example, can you tell if this code is valid or not?

p if 1 then 2 else 3 end

The answer is:

$ ruby -e 'p if 1 then 2 else 3 end'
-e:1: syntax error, unexpected `then', expecting end-of-input

Because the if here is recognized as "modifier if", not "keyword if". So how does Ruby decides the type of if?

parse.y

The answer should be in the parse.y, which defines Ruby's grammer.

In the parse.y, you see keyword_if and modifier_if. It means the type of if is decided by the lexer, not the parser.

lex.c.blt

By grepping modifier_if, you will find lex.c.blt has a table of keywords in the function rb_reserved_word.

#line 31 "defs/keywords"
      {gperf_offsetof(stringpool, 33), {keyword_if, modifier_if}, EXPR_VALUE},

parse.y

The lexer starts from yylex. It calls parser_yylex, which handles the symbols like +, -, etc. If the character is not a symbol, parse_ident is called.

parse_ident checks if a keyword begins from the current position with rb_reserved_word. The returned kw is a member of the table we've seen in lex.c.blt.

    /* See if it is a reserved word.  */
    kw = rb_reserved_word(tok(p), toklen(p));

In the case of if keyword, kw->id[0] corresponds to keyword_if and kw->id[1] corresponds to modifier_if.

Actually id has two values to distinguish keywords and modifiers. According to lex.c.blt, Ruby has five modifiers.

x if y
x unless y
x while y
x until y
x rescue y

When an `if` is a modifier

This is the condition that distinguishes keyword_if and modifier_if. In short, an if is a keyword if the lexer state is EXPR_BEG; otherwise, it is a modifier.

            if (IS_lex_state_for(state, (EXPR_BEG | EXPR_LABELED)))
                return kw->id[0];
            else {
                if (kw->id[0] != kw->id[1])
                    SET_LEX_STATE(EXPR_BEG | EXPR_LABEL);
                return kw->id[1];
            }

The lexer state

Among the states of the lexer, EXPR_BEG, EXPR_END and EXPR_ARG are the most important. They decides operators like +, - is unary or binary. For example:

1 - 2: This is binary minus because the state is EXPR_END after the 1.
foo(-1): This is unary minus because the state is EXPR_BEG after the (.

EXPR_ARG is a bit tricky; On this state, the meaning of - changes by the space after it.

foo - 1: binary minus
foo -1: unary minus

What is interesting is that this rule is not so difficult for humans. The former "looks like" binary and the latter "looks like" unary. So you will actually never be bothered by this, unless you are implementing the parser.

keyword if and modifier if

Now you can tell an if is a keyword or modifier by checking the lexer state.

foo() if ...: This is modifier_if because the state is EXPR_END after the ).
foo(if ...): This is keyword_if because the state is EXPR_BEG after the (.
foo if ...: This is modifier_if because the state is EXPR_ARG after the before if.

Why this matters to me

I think most Rubyists does not care about corner cases like this; However I needed to figure out this because I'm making my original programming language Shiika which has Ruby-like syntax.

As you've seen, parsing Ruby-like syntax is not easy, especially parsing method calls without parentheses. I'm happy if this entry helps someone who want to make a Rubyish language.

Automatically prepare windows at tmux startup

Yutaka HARA — Wed, 06 Mar 2019 17:33:12 +0000

Do you have a rule for tmux windows? For example, on which window do you start your editor? Here is my rule.

Window 0: reposh
Window 1: vim
Window 2: zsh
Window 3: Server process (i.e. rails server)

It means that everytime I start tmux, I open 4 windows and run these. Today I learned to do it automatically.

Command file

The first step is to write a tmux command file. This is a list of tmux commands that I want to run:

$ cat ~/proj/dotfiles/tools/tmux_multi
new-window
new-window
new-window
select-window -t 0
send-keys 'reposh' Enter
select-window -t 1
send-keys 'vim' Enter

Then you can load this file at tmux startup by:

$ tmux new-session ";" source-file ~/proj/dotfiles/tools/tmux_multi

Alias

Of course this is too long to remember. So I added an alias to my .zshrc:

alias tmux!='tmux new-session ";" source-file ~/proj/dotfiles/tools/tmux_multi'

Now my tmux sets up the 4 windows by tmux!.

Tips: setting iTerm2 tab title

If you are using iTerm2 on Mac, it is useful to set the tab title before starting tmux.

alias tmux!='TAB_NAME=`ruby -e "print Dir.pwd.split(\"/\").last"`; echo -ne "\e]1;$TAB_NAME\a"; tmux new-session ";" source-file ~/proj/dotfiles/tools/tmux_multi'

DEV Community: Yutaka HARA

How CRuby decides an `if` is a modifier

parse.y

lex.c.blt

parse.y

When an if is a modifier

The lexer state

keyword if and modifier if

Why this matters to me

Automatically prepare windows at tmux startup

Command file

Alias

Tips: setting iTerm2 tab title

When an `if` is a modifier