DEV Community

loading...

Writing an Assembler in Rust, and How I'm Redoing the Lexer

Ashton Scott Snapp
Hello.
・3 min read

I've continued work on the assembler I've been working on. I finished the Lexer, got it to compile, ran it, tested it, and found that the lexer just did not work. Luckily, I found a crate called logos that helps you make a fast lexer, which I'm using in order to re-do the lexer.

GitHub logo AshtonSnapp / chasm

The Official Cellia Cross-Assembler for Modern Computers

chasm

The Official Cellia Cross-Assembler for Modern Computers

Building chasm

Clone this repository to your local machine, cd into the chasm directory, and run cargo build. Simple!




As of right now, I am writing some callback functions. Specifically, I'm writing the one responsible for handling character immediates (immediate in the sense that the processor doesn't have to fetch an address then fetch the value from that address, it just has to fetch the value). This is being done via a giant match statement. Here's basically what the code for this function looks like right now:

fn char(lex: &mut Lexer<Token>) -> Result<u8, ()> {
    let slice: &str = lex.slice();

    let poss_char: &str = &slice[slice.len() - 2];

    // Welcome to hell.

    match poss_char {
        "\x00" => Ok(0),
        "\x01" => Ok(1),
        "\x02" => Ok(2),
        "\x03" => Ok(3),
        "\x04" => Ok(4),
        "\x05" => Ok(5),
        "\x06" => Ok(6),
        "\x07" => Ok(7),
        "\x08" => Ok(8),
        "\x09" => Ok(9),
        "\x0A" => Ok(10),
        "\x0B" => Ok(11),
        "\x0C" => Ok(12),
        "\x0D" => Ok(13),
        "\x0E" => Ok(14),
        "\x0F" => Ok(15),
        "\x10" => Ok(16),
        "\x11" => Ok(17),
        "\x12" => Ok(18),
        "\x13" => Ok(19),
        "\x14" => Ok(20),
        "\x15" => Ok(21),
        "\x16" => Ok(22),
        "\x17" => Ok(23),
        "\x18" => Ok(24),
        "\x19" => Ok(25),
        "\x1A" => Ok(26),
        "\x1B" => Ok(27),
        "\x1C" => Ok(28),
        "\x1D" => Ok(29),
        "\x1E" => Ok(30),
        "\x1F" => Ok(31),
        "\x20" => Ok(32),
        "\x21" => Ok(33),
        "\x22" => Ok(34),
        "\x23" => Ok(35),
        "\x24" => Ok(36),
        "\x25" => Ok(37),
        "\x26" => Ok(38),
        "\x27" => Ok(39),
        "\x28" => Ok(40),
        "\x29" => Ok(41),
        "\x2A" => Ok(42),
        "\x2B" => Ok(43),
        "\x2C" => Ok(44),
        "\x2D" => Ok(45),
        "\x2E" => Ok(46),
        "\x2F" => Ok(47),
        "\x30" => Ok(48),
        "\x31" => Ok(49),
        "\x32" => Ok(50),
        "\x33" => Ok(51),
        "\x34" => Ok(52),
        "\x35" => Ok(53),
        "\x36" => Ok(54),
        "\x37" => Ok(55),
        "\x38" => Ok(56),
        "\x39" => Ok(57),
        "\x3A" => Ok(58),
        "\x3B" => Ok(59),
        "\x3C" => Ok(60),
        "\x3D" => Ok(61),
        "\x3E" => Ok(62),
        "\x3F" => Ok(63),
        "\x40" => Ok(64),
        "\x41" => Ok(65),
        "\x42" => Ok(66),
        "\x43" => Ok(67),
        "\x44" => Ok(68),
        "\x45" => Ok(69),
        "\x46" => Ok(70),
        "\x47" => Ok(71),
        "\x48" => Ok(72),
        "\x49" => Ok(73),
        "\x4A" => Ok(74),
        "\x4B" => Ok(75),
        "\x4C" => Ok(76),
        "\x4D" => Ok(77),
        "\x4E" => Ok(78),
        "\x4F" => Ok(79),
        "\x50" => Ok(80),
        "\x51" => Ok(81),
        "\x52" => Ok(82),
        "\x53" => Ok(83),
        "\x54" => Ok(84),
        "\x55" => Ok(85),
        "\x56" => Ok(86),
        "\x57" => Ok(87),
        "\x58" => Ok(88),
        "\x59" => Ok(89),
        "\x5A" => Ok(90),
        "\x5B" => Ok(91),
        "\x5C" => Ok(92),
        "\x5D" => Ok(93),
        "\x5E" => Ok(94),
        "\x5F" => Ok(95),
        "\x60" => Ok(96),
        "\x61" => Ok(97),
        "\x62" => Ok(98),
        "\x63" => Ok(99),
        "\x64" => Ok(100),
        "\x65" => Ok(101),
        "\x66" => Ok(102),
        "\x67" => Ok(103),
        "\x68" => Ok(104),
        "\x69" => Ok(105),
        "\x6A" => Ok(106),
        "\x6B" => Ok(107),
        "\x6C" => Ok(108),
        "\x6D" => Ok(109),
        "\x6E" => Ok(110),
        "\x6F" => Ok(111),
        "\x70" => Ok(112),
        "\x71" => Ok(113),
        "\x72" => Ok(114),
        "\x73" => Ok(115),
        "\x74" => Ok(116),
        "\x75" => Ok(117),
        "\x76" => Ok(118),
        "\x77" => Ok(119),
        "\x78" => Ok(120),
        "\x79" => Ok(121),
        "\x7A" => Ok(122),
        "\x7B" => Ok(123),
        "\x7C" => Ok(124),
        "\x7D" => Ok(125),
        "\x7E" => Ok(126),
        "\x7F" => Ok(127),
        _ => Err(())
    }
}
Enter fullscreen mode Exit fullscreen mode

Yes. I had to write all of that. Because I can't really guarantee that whoever's using the assembler has Unicode support in their program. That whole function was painful to write. At least now, all I have to write in terms of callback functions are the ones for character escape sequences, strings, addresses, immediates, identifiers (labels and symbols), and actual instruction mnemonics. (Also, need to stop trying to Ctrl+S while using a browser)

Discussion (1)

Collapse
ashtonsnapp profile image
Ashton Scott Snapp Author

Quick edit notice: all character related callbacks are complete - the char function in this post was edited to reflect its current state, and char_esc_hex is now hell 2: electric boogaloo since it now contains all 256 different char values.