DEV Community

knut
knut

Posted on

A Framework Laptop Hacking Story

I bought a Framework Laptop a few months ago. I was really drawn to the idea of a laptop that I could customize and which was built with repairability in mind. That's a really great stance to take, and it didn't hurt that the laptop build and specs looked good, so I went for it. It was also cool that I could buy it with no memory, no hard drive, no power adapter, and no operating system, and supply those things separately. In general I was very happy with it, but it had one little behavior that bugged me. This is a long journey, but I hope it contains some useful information.

The keyboard problem

I noticed sometimes when typing or editing files, keys wouldn't repeat properly when held down. I noticed it most when pressing and releasing combinations of keys in quick succession. Eventually I figured out a simple and concise repro. You can try this at home on your computer and compare the behavior:

  1. Press and hold a key, say 'A'
  2. Press and hold another key, say 'B'. Now you have two keys held down.
  3. Release 'A' but keep 'B' held down

Expected: since you still have the 'B' key held down, it should continue emitting repeated keystrokes.
Actual: no keystrokes are being emitted.

I noticed this reproed not only in all programs in Windows, but even when navigating BIOS settings, which gave me a strong feeling that it was either a hardware or firmware issue.

Support response

Framework support engaged readily with me on it and immediately replaced my Input Cover free of charge, no questions asked. I was already sort of skeptical about whether this would work, but I didn't think it was worth turning down this free troubleshooting.

Sure enough, the problem kept happening with the new Input Cover. By this point I had read some of the source code and was pretty sure I had spotted the bug. In keyboard_8042.c, there's a function called keyboard_state_changed that is called for each key when it changes state between being pressed or not pressed. It has the following code in it:

if (is_pressed) {
    keyboard_wakeup();
    set_typematic_key(scan_code, len);
    task_wake(TASK_ID_KEYPROTO);
} else {
    clear_typematic_key();
}
Enter fullscreen mode Exit fullscreen mode

If a key is pressed, set it up as the new "typematic" (i.e. repeated) key. That sets up timers so that it is automatically sent at some configured interval to do the repeating. However, if it is released, clear the typematic configuration so that no key is being repeated, irrespective of which key that was. In other words, releasing any key will stop any other key repeating.

Bug or no bug, this isn't how basically every other computer I've used in my life works. I reported the code behavior to Framework support, who checked with their firmware engineer and unsurprisingly said "by design", because the same code is present in the upstream ChromeOS Embedded Controller repo.

So every ChromeOS based computer will have this behavior. That's a lot of devices. Is it a bug? At this point, who knows. It's definitely a behavior difference. Is it desirable? Some people are probably used to it by now, so there isn't a clear cut answer. I doubt this behavior will get changed at this point. So, what am I to do?

Framework Hacking

One of the coolest things about the Framework laptop is that because they published their embedded controller (EC) source code, it's possible to rebuild and reflash the firmware, which includes this keyboard code. Although Framework itself doesn't publish guides on this (more thoughts on this later), members of the community like DHowett have. He has a whole series of posts on how to rebuild the EC and flash it. In fact I haven't found any other comprehensive resources like that. It's really amazing stuff, and I'm thankful for it.

I started working my way through the guide on modifying the EC firmware. It was going pretty smoothly. As an aside, I installed WSL so that I had a reasonable dev environment, because I'm a scrub who does all his dev work on Windows. I'd never used WSL before, and I was amazed at how easy it was to follow Linux-centric instructions pretty much to the letter. You get a fully fledged Linux environment running in a Windows command prompt. It's actually magic.

Anyway, with the success there, I was able to spit out "ec.bin", which I could flash to my device. At this point, I promptly chickened out.

Hacking with safety

I am way too risk-averse to potentially brick my $2000+ laptop due to a coding bug or some issue with the way the firmware image I produce. I had to have a backup plan in case something went wrong. I took to the Framework forums to ask if others had tools or suggestions on a backup plan.

It turns out I might be more risk averse than anyone else...? Haha... or everyone else already has the knowhow to fix a problem like this on their own if it arises. I have just dabbled in embedded programming, so that is probably the case.

There are basically three potential backup routes that I could see:

  • use the JECDB header to connect a SWD probe like a Picoprobe and debug the firmware
  • use a UART debug adapter from the USB-C port
  • connect a flash programmer directly to the flash chip that holds the EC firmware and write a backup to it

The JECDB header (labeled JSWDB) is not populated and is so tiny that it would require microsoldering. I don't have a hot air station or any experience here, so I wasn't too keen to try that, although it would involve some new toys. Who doesn't love new toys?

Upper-right corner of the top side of the Framework 12th gen mainboard, showing the JECDB header, labeled JSWDB

The UART console would be cool and useful. Someone actually used to make a thing called a SuzyQable that would do this. Actually DHowett made a limited run of similar functionality in a Framework laptop expansion card.

But most people on the forum suggested getting a flash programmer and interfacing directly with the flash chip.

Equipment for chip flashing

I didn't realize this before, but many flash chips speak protocols that are well known by tools, so you can take an unpowered one, touch the pins with a connector to the programmer, and read, erase, and write contents. There is a very good write-up I referenced a lot about unbricking a Chromebook (which this laptop sort of is, in this respect).

From that guide, I learned some of the equipment I would need: the CH341a flash programmer itself, potentially a voltage adjuster, and a chip clip or other type of connection to the exposed pins on the mainboard. Not too surprisingly, it's possible to find a bundle with all of this flash programming equipment together. The bundle came with a SOIC-8 chip clip. I wasn't sure yet if that would be the right type.

Here's a useful pic of the programmer, taken from that Unbricking page. I had to consult it a bunch of times to remember the orientation of the pins.

A CH341a usb flash programmer annotated showing pin order 4/3/2/1 starting at the top-left pin and 5/6/7/8 in the bottom-left pin, with the USB connector on the left side

Finding the flash chip, part 1

The unbricking guide of course didn't tell me where specifically to find the flash chip on my laptop's motherboard, so I had to go hunting. Its advice was to look for Winbond chips, and I found one on the top side.

Framework 12th gen mainboard close up photo of the battery receptacle and a Winbond 25R256JVEN chip

This one looks like a Winbond 25R256JVEN. The chip package is a WSON-8 8mm x 6mm. The SOIC-8 clip, while it had the right dimensions and spacing between pins, isn't physically compatible with the extremely flush mounting that WSON-8 has; the chip's leads don't stick out far enough for a chip clip to attach to it. I would need to buy a test probe instead.

Another important point is the voltage level that the chip requires. From the Winbond web page I saw that Vcc is 2.7V - 3.6V, so it would accept 3.3V from the flash programmer and I wouldn't need to use the 1.8V voltage adjuster.

I found a ton of test probes on AliExpress. Here's the one I ended up buying. Description is "2023 DFN8 QFN8 WSON8 Chip Probe Line Read Write Burning Test Adapter Socket 1.27 6x8 5x6 for CH341A TL866 RT809H/F Programmer". I actually had ordered another one from eBay before that, and it came damaged with one pin bent, which made it effectively useless. You know, it's really frustrating that all this test probe does is hold 8 pins in a particular shape. It has no logic, nothing complicated at all about it. But I don't have any other way of holding 8 wires touching the chip's pins all at the same time, so I'm beholden to this. Curse these inadequate human hands! And it's doubly frustrating because the only place I can get one of these is overseas, so I had to wait 2-3 weeks for it to arrive.

When it finally did arrive, I was able to use a program called AsProgrammer to use the CH341a flash programmer to read the contents of the flash chip. Here are some things I noticed about using this tool:

  • if you don't have the test probe firmly touching the flash chip's leads, it won't detect the flash chip type correctly
  • however, once you start the "Read IC" operation to dump the contents, if your test probe doesn't have firm contact, it will silently read zeroes

Screenshot of the AsProgrammer tool, showing a hex dump mostly filled with 0xFF and the IC identified as W25Q80DV

Therefore after dumping the flash it's important to execute a "Verify IC" command, which compares the dumped buffer with the flash contents. If it fails, it either means you moved the test probe during the "Read IC" or the "Verify IC" operation. Likewise, whenever you write to the flash chip, you need to do a "Verify IC" command after to make sure that you didn't lose contact during the writing process. Even with a pretty steady hand, I messed up the read operation a couple times. Honestly, it's terrible, and I would really rather have a better way.

Top-side 32 MB flash chip

Anyway, on to the chip contents. Firstly, I'm sharing all the things I dumped from the mainboard, plus close up pictures online. I encountered a real dearth of pictures of the under-side of the mainboard, so I took a bunch of close-up pics so people can look at what ICs are found there. They're not the best quality but it's better than nothing.

I wasn't quite able to figure out what this flash was. It was much bigger than the 524,288 bytes that the EC firmware image normally is. I was able to find a copy of the EC firmware image in it at offset 0x1000, but not sure why. I uploaded my backup of this chip under the name "top-side_near-cmos-battery_Winbond_25R256JVEN.bin", but I didn't feel this is the important backup to make.

So if this isn't the EC firmware flash, where is that? I peeked under all the components on the top side of the board that I could (fan, etc.) but couldn't find any other flash chips. I tried looking at the schematic, and while it mentions two flash ROMs, it doesn't mention where to find them. Reluctantly, I took out the mainboard and checked the under-side.

Under-side flash chips

I found three identical Winbond 25Q80DVIG chips on the under side. One of them was quite close to the MEC1521 embedded controller, so I took a guess that this one contains the EC firmware.

Close up photo of an MEC1521 chip and a Winbond W25Q80DVIG chip on the under side of a Framework laptop 12th gen Intel motherboard

These ones are WSON-8 6mm x 5mm package, different than 6mm x 8mm that the upper-side chip was. It's a good thing I had both size test probes. This one also accepts 3.3V. I pulled the contents of this one off under the name "under-side_bottom-left_Winbond_25Q80DVIG.bin". Finally, when I compared its contents to the dump I got from ECTool, it was an exact match.

I also dumped and uploaded the other two flash chips on the under side, though I couldn't quite tell what they're for. Just from looking at strings, they seem to be firmware for other components, but I couldn't quite tell what. I also couldn't find these flash parts in the schematic, so I'm not sure what's up with that.

At this point I felt like I had a route to restoring a backup if I had to, so I felt ready to proceed.

Detour - other problems

At this point I reassembled the laptop and... it didn't turn on and wouldn't charge. I saw forum posts relating to the 11th gen Intel mainboard having some issue, but since I have a 12th gen Intel laptop I didn't think it would apply. I tried stuff like trickle charging with a non-PD USB-C adapter, but it didn't help.

I saw a suggestion to try popping out the CMOS battery and popping it back in. I uh, tried to do that, but managed to snap the receptacle. Within a day, Framework shipped out a replacement mainboard free of charge and shipping. I was blown away by the quality of customer service.

A good baseline

When I was getting ready to flash again, I noticed an issue about the compiler version used to build the firmware binary. I followed the advice, but more importantly I noticed that the issue has been recently fixed, and in the resolution, the maintainer says "Next release (hx20 3.19, hx30 3.07) will include them". It reminded me of something crucial: the Framework EC firmware source code repo doesn't have any particular indication of its level of stability at any given commit. Which commits could be considered fully tested releases? What if the head of the branch introduces a bug that they're working on fixing?

When I build my fix, I want to apply it as a delta on top of something I know is fully tested, or at good enough for them to release. As part of my spelunking through firmware images earlier, I pulled all the strings out of the different firmware images to get clues about what they were. The very first string in the EC firmware image is "hx30_v0.0.1-7a61a89". That looks suspiciously like a commit hash. Can I look it up in the Framework EC repo? Hey, look, it sure is a valid commit!

With this I could git checkout 7a61a89 and then create my topic branch with my fix from here. This version was clearly good enough for them to ship in-box, and that's a pretty good quality bar.

Flashing with ECTool

Finally with the new mainboard and the laptop operational again, I was ready to use ECTool to flash my bug fixed firmware. It actually flashed correctly without a hitch. All the stuff I did before was to prepare for the worst, but it didn't happen.

Well, it wasn't completely without an issue. I spent so much time worrying about the flashing procedure that I was surprised to find my bug fix not only did not fix the bug, it almost made it impossible to reflash without using my backup. What I found is that whenever I pressed a key (not held it down), it would have a slight delay and then start repeating and not stop repeating. Ittt madeeee itttt harddd tooo typeeee stufffff.....

No problem, let me just use ECTool to reflash the backup. Uh oh, the first thing ECTool does is say "press any key to abort". Due to my keys repeating, it kept aborting! Finally after some panicking, I figured out I could press Shift after hitting Enter, and it wouldn't count for its "press any key" logic. With that, I was able to reflash a backup. As a side note, I have a fork of ECTool that adds an option to avoid the "press any key to abort" behavior. I'll see about getting this feature into ECTool proper.

Debugging the keyboard fix

Now that I've managed to dig myself out of the hole my bug created, I need to debug and figure out why my fix didn't work. Let's dig into the code. Here is the entirety of the fix:

if (is_pressed) {
    keyboard_wakeup();
    set_typematic_key(scan_code, len);
    task_wake(TASK_ID_KEYPROTO);
} else {
    // FIX STARTS HERE
    // Only clear typematic key if that is the key being released. This fixes
    // an issue where if keys A and then B are both held down at the same time,
    // and the user releases A, B will also stop repeating.
    if (len == typematic_len &&
        memcmp(scan_code, typematic_scan_code, len) == 0) {
        clear_typematic_key();
    }
    // FIX ENDS HERE
}
Enter fullscreen mode Exit fullscreen mode

When a key is pressed, set_typematic_key is called, which sets global variables to store the scancode of the key that should be repeated. My thinking was when the key is released, I should be able to compare the scancode that's being released and only clear the typematic scancode if it's the one being repeated.

From the observed behavior with this buggy change, clear_typematic_key is somehow not being called on key release. Here were the first level causes I could think of:

  1. the function containing this code isn't being called at all
  2. len doesn't match
  3. scan_code doesn't match

I was able to rule out #1 pretty quickly by looking through the code that calls this. Also I was pretty sure this code was being called before when a key is pressed and then again when released.

Options 2 and 3 are interesting. I don't know what scan_code really is. Is it identical when a key is pressed versus released? I need to look at the code that creates the scancode value:

ret = matrix_callback(row, col, is_pressed, scancode_set, scan_code, &len);
Enter fullscreen mode Exit fullscreen mode

I notice right away that is_pressed is a parameter. It's possible that the resultant scancode embeds a pressed/released bit inside. Let's look deeper.

static
enum ec_error_list
matrix_callback(
    int8_t row,
    int8_t col,
    int8_t pressed,
    enum scancode_set_list code_set,
    uint8_t *scan_code,
    int32_t *len
    ) {
    uint16_t make_code;
// ...
    scancode_bytes(make_code, pressed, code_set, scan_code, len);
// ...
}
Enter fullscreen mode Exit fullscreen mode

Still plumbing through pressed...

static
void
scancode_bytes(
    uint16_t make_code,
    int8_t pressed,
    enum scancode_set_list code_set,
    uint8_t *scan_code,
    int32_t *len
    ) {
// ...
    if (pressed) {
        scan_code[(*len)++] = make_code;
    } else {
        scan_code[(*len)++] = 0xf0;
        scan_code[(*len)++] = make_code;
    }
// ...
}
Enter fullscreen mode Exit fullscreen mode

Aha, my suspicion was correct! The scancode for a key being pressed is different than a key being released, so my attempted fix will never work.

A working fix

Instead of comparing the scancode directly (which won't work, because it's different when releasing a key), I can use the make_code value, which seems to more directly indicate the key without incorporating the pressed/released state.

Here's a link to a working fix. I also threw in a defensive measure for testing, to clear the typematic settings after the key has been repeating for N seconds straight. I built it, flashed it with ECTool, and my laptop keyboard behavior is now perfect.

Feedback for Framework Computers

This process was a real journey. My laptop was out of commission for like three months off and on while waiting for equipment or replacement parts. Framework did an amazing job with their customer service, but there are still things I'd like to see them do to make life easier for customers like me who want to customize it in the future.

  1. Publish an official guide to safe EC firmware flashing. This probably requires also doing some of the other things in this list.
  2. Populate the JECDB/JSWDB header on the mainboard out of the box, so that if we brick a laptop we can more easily debug and fix it.
  3. Productize and sell a UART debugger expansion port card, like the one DHowett made.
  4. Publish official pictures of the mainboard for reference.
  5. Update the schematics to include all the components on the mainboard, for example all the 4 flash chips I found.
  6. Add tags or branches for releases in the EC firmware repo, so we can know which commits are good places to make deltas on top of.
  7. Allow certain kinds of modifications under warranty. I acknowledge this is a bit of a stretch though.

I'm hoping this blog post contains some useful information in this niche space, and that the flash chip dumps and mainboard pics I uploaded may help others who don't feel like taking apart their laptops.

Top comments (0)