Log for 2019-06-02:
Today, I'm working on a C# application that needs to connect securely to a remote Linux host over SSH. This seems like a good opportunity to use my OnlyKey. Unfortunately, the documentation for the OnlyKey SSH agent leaves me with more questions than answers. It looks like it's time for... Let's Read the Source Code!
OnlyKey, by the way, is a small USB device, about the size of a flash drive, that has a handful of security-related features. Among these features, it can generate and store tens of cryptographic keys, and use those keys for signing and decryption. In order to help me keep track of things, it supports assigning each key a label.
Note that comments beginning in MP
are my own, and usually stand in place of omitted code.
Goals
- Tell my OnlyKey to generate a new ED25519 key in a particular slot.
- Give the key slot an appropriate label.
- Retrieve the public key of the generated key as RFC-4716, so that I can add it to
authorized_keys
. - Interact with the host over SSH, using the generated key as the credential.
Documentation
First, I'll establish what I can learn already from the documentation, and what I need to confirm or discover from the source code.
Generate the private key, retrieve the public key
The documentation says,
- Generate public key using onlykey-agent
onlykey-agent user@example.com
.- Log in to your server as usual and copy the row containing the output from the previous step into
~/.ssh/authorized_keys
file on your server
I now know the syntax of the command I need to perform in order to accomplish goals 1 and 3 at the same time, but I need to confirm that the default behavior won't overwrite any keys I already have on the OnlyKey, and discover how I can specify a slot explicitly if necessary.
Give the key slot a label
The documentation doesn't say anything about a label. I need to identify whether or not a label is created by default, and if so then how it's determined. If no label is set, I already know how that can be accomplished with a different utility, so I won't get into that here.
Interact with the host
Documentation says,
From now on you can log in to your server using OnlyKey using the following command:
$ onlykey-agent -c user@example.com
That should be very straightforward. If it's interactive, I'll need to make sure I understand how to send commands and retrieve results from C#. If that turns out to be difficult, then it will be useful to know if there's a way to do one-off commands.
Reading Time!
Clone It
As always, my first step is to clone the source code to my local machine, so that I can browse it through the appropriate JetBrains IDE (PyCharm in this case), taking advantage of the code navigation shortcuts that I've internalized.
Set-Location Code\trustcrypto
git clone https://github.com/trustcrypto/onlykey-agent.git
Slot Selection
The first thing I'll look for is the default slot-selection behavior, and the ability to specify a slot explicitly.
Entrypoint
This is a command-line utility, of which there are many unique architectures, so I can't immediately make any assumptions about the code structure. The command for creating a key doesn't use any explicit command like "newkey" that I could search for, so I'll need to start at the very beginning of the program and follow its flow until I reach the point where the first argument is parsed. As a multi-file Python application, I know that __main__.py
is the execution entry-point, so that's where I should start.
__main__.py
contains import
statements, instantiates a logging object (log = logging.getLogger(__name__)
), and defines several functions, but does not directly invoke any of them. I highly doubt that getLogger
is meant to kickstart the argument parsing part of the application, so the "main function" is likely defined in one of the local includes (whose bodies are implicitly run as they are imported, because Python is an interpreted language). That turned out to not be the case either! Two more possibilities immediately come to mind:
- One of the third-party includes (i.e.
argparse
) dispatches the flow of execution via reflection. - There is a manifest that specifies a non-standard entrypoint for
pip
.
I noticed there's a file called setup.py
, which sounds like a strong candidate for possibility #2... and it turns out to include the following:
entry_points={'console_scripts': [
'onlykey-agent = onlykey_agent.__main__:run_agent',
]}
There it is! Our entrypoint is the run_agent
function (which is defined in __main__.py
).
Argument Parsing
@handle_connection_error
def run_agent(client_factory=client.Client):
"""Run ssh-agent using given hardware client factory."""
args = create_agent_parser().parse_args()
setup_logging(verbosity=args.verbose)
//MP: Continued farther below after other examples
The first call is to create_agent_parser
- argument parsing is highly relevant, so I'll explore that function. create_agent_parser
immediately defers to create_parser
; I see that there's a create_git_parser
as well, which also calls create_parser
, and that all of these contain p.add_argument
calls. From this, I understand that I'm seeing an inheritance pattern: create_parser
defined arguments that are common to all usages of this program, and create_*_parser
are for each alternative entrypoint. For the purposes of this exploration, I can just consider create_parser
and create_agent_parser
as one and the same, ignoring the others.
In reading the parser code, I'm looking for any sort of slot ID parameter that might exist, despite not being documented.
def create_parser():
"""Create argparse.ArgumentParser for this tool."""
p = argparse.ArgumentParser()
p.add_argument('-v', '--verbose', default=0, action='count')
curve_names = [name for name in formats.SUPPORTED_CURVES]
curve_names = ', '.join(sorted(curve_names))
p.add_argument('-e', '--ecdsa-curve-name', metavar='CURVE',
default=formats.CURVE_NIST256,
help='specify ECDSA curve name: ' + curve_names)
p.add_argument('--timeout',
default=server.UNIX_SOCKET_TIMEOUT, type=float,
help='Timeout for accepting SSH client connections')
p.add_argument('--debug', default=False, action='store_true',
help='Log SSH protocol messages for debugging.')
return p
--verbose
, --timeout
, and --debug
are all clearly irrelevant...
def create_agent_parser():
"""Specific parser for SSH connection."""
p = create_parser()
g = p.add_mutually_exclusive_group()
g.add_argument('-s', '--shell', default=False, action='store_true',
help='run ${SHELL} as subprocess under SSH agent')
//MP: Continued below
Then there's --shell
, which might be relevant to how I choose to integrate this with C#, but doesn't sound like it will be relevant to slot selection so I'll get back to it later.
//MP: Continued from above def create_agent_parser()
g.add_argument('-c', '--connect', default=False, action='store_true',
help='connect to specified host via SSH')
p.add_argument('identity', type=str, default=None,
help='proto://[user@]host[:port][/path]')
p.add_argument('command', type=str, nargs='*', metavar='ARGUMENT',
help='command to run under the SSH agent')
return p
Next is -c
a.k.a. --connect
, which was clear enough in the documentation. Finally are identity
and command
, each of which seem to map to the positional arguments described in the documentation. I don't have experience with Python's argparse
library, but seeing this correlation, and knowing that "identity" and "command" were not a part of any examples in the documentation, is enough for me to conclude that the absence of -
characters preceding an argument declaration causes it to be interpreted as positional.
Alas, there turned out not to be any built-in way to specify the slot through the command line. So, it's back to run_agent
:
Key Retrieval
//MP: Continued from farther above def run_agent(client_factory=client.Client)
with client_factory(curve=args.ecdsa_curve_name) as conn:
//MP: Continued farther below after other examples
The next thing that happens is construction and entry of the Client
class:
class Client(object):
"""Client wrapper for SSH authentication device."""
def __init__(self, curve=formats.CURVE_NIST256):
"""Connect to hardware device."""
self.device_name = 'OnlyKey'
self.ok = OnlyKey()
self.curve = curve
//MP: Continued below
There's not much going on here. An OnlyKey
is instantiated, but it comes from another library - I already know that this won't implicitly create any keys from having worked with that library before, but it would be reasonable to assume so anyhow. Client
doesn't define any setters, so storage of device_name
and curve
aren't suspect as paths of execution that would lead to key generation.
//MP: Continued from above class Client(object)
def __enter__(self):
"""Start a session, and test connection."""
self.ok.read_string(timeout_ms=50)
empty = 'a'
while not empty:
empty = self.ok.read_string(timeout_ms=50)
return self
__enter__
is effectively a no-op, and could be removed entirely: The contents of the while
will never run because not 'a'
resolves to false (theses are simply the sloppy remains of code that used to do something important).
What's next in the "main" function, run_agent
, then, after constructing the OnlyKey
instance?
//MP: Continued from farther above def run_agent(client_factory=client.Client)
label = args.identity
command = args.command
public_key = conn.get_public_key(label=label)
//MP: Remaining function body omitted
A few values are retrieved from the args, with identity
being rebranded as label
, and then passed into get_public_key
(which sounds very promising).
def get_public_key(self, label):
//MP: Continued below
From reading the parser, it's known that label
is "user@example.com" in the case of the documentation's example.
//MP: Continued from above def get_public_key(self, label)
# Compute the challenge pin
h = hashlib.sha256()
h.update(label)
data = h.hexdigest()
//MP: Continued below...
This is turned into a SHA256 hash, which is 256 bits and therefore 256 bits / 8 bits-per-byte = 32 bytes long. The hash is retrieved with hexdigest
, whose name implies that it will return a hex-encoded version of the hash. Hex-encoding requires two characters per byte, so we can expect this to be a 64-character string.
The "compute challenge pin" comment appears to be a red herring. From my prior experience with OnlyKey, that's not similar to other challenge pin code that I've seen, the function doesn't display a challenge pin to the user before expecting valid feedback from the device (as you'll see below), and I don't remember ever having to use a challenge pin for public key retrieval. It's most likely a no-longer-relevant relic from an earlier version of the code.
//MP: Continued from above def get_public_key(self, label)
if self.curve == formats.CURVE_NIST256:
data = '02' + data
else:
data = '01'+ data
data = data.decode("hex")
//MP: Continued below
A 1
or 2
, depending on the curve type, is prepended, then the now 66-character string is decoded to a 33-long array of numbers.
//MP: Continued from above def get_public_key(self, label)
self.ok.send_message(msg=Message.OKGETPUBKEY, slot_id=132, payload=data)
//MP: Continued below...
OKGETPUBKEY
is obviously a command to retrieve a public key, but the rest is still a bit mysterious: There are fewer than 132 slots to choose from, and what is the significance of sending the hashed "label" in addition to specifying a slot?
//MP: Continued from above def get_public_key(self, label)
time.sleep(.5)
for _ in xrange(2):
ok_pubkey = self.ok.read_bytes(64, to_str=True, timeout_ms=10)
if len(ok_pubkey) == 64:
break
//MP: Remaining function body omitted
The next bit of code expects, invariably, for a public key to be returned by the device, so the query must be enough to create new keys as well as retrieve existing ones. I'm going to have to look into the hardware's firmware source code to understand how this messages is actually interpreted by the device! That's possible by running...
git clone https://github.com/trustcrypto/libraries
... and opening the cloned code in CLion.
Firmware
To prevent this from becoming too long of a journey, I'll summarize a bit that I know about the onlykey
Python library that stands in-between the above agent code, and the below firmware code: The message being sent is the following series of bytes: { /* header */ 255, 255, 255, 255, /* message */ OKGETPUBKEY, /* slot */ 132, /* payload */ ... }
. The effective entry-point in the firmware is void recvmsg()
from okcore.cpp
, and the OKGETPUBKEY
message is picked up by switch (recv_buffer[4])
's case OKGETPUBKEY:
. This in turn calls void GETPUBKEY (uint8_t *buffer)
that is defined in okcrypto.cpp
:
uint8_t temp[64] = {0};
#ifdef DEBUG
Serial.println();
Serial.println("OKGETPUBKEY MESSAGE RECEIVED");
#endif
if (buffer[5] < 5 && !outputU2F && !buffer[6]) { //Slot 101-132 are for ECC, 1-4 are for RSA
if (onlykey_flashget_RSA ((int)buffer[5])) GETRSAPUBKEY(buffer);
} else if (buffer[5] < 131 && !outputU2F && !buffer[6]) { //132 and 131 are reserved
if (onlykey_flashget_ECC ((int)buffer[5])) GETECCPUBKEY(buffer);
} else if (buffer[6] <= 3 && !outputU2F) { // Generate key using provided data, return public
DERIVEKEY(buffer[6], buffer+7);
RawHID.send(ecc_public_key, 0);
} else if (buffer[6] == 0xff) { //Search Keylabels for matching key, return slot
//MP: omitted, not relevant
}
Right away, there are already some answers in the comments: "Slot 101-132 are for ECC, 1-4 are for RSA; 132 and 131 are reserved". buffer[5]
in this case is 132
, so the first two if
statements resolve to false. The third tests buffer[6]
only - this is the first byte of the payload, which I know is either 1 or 2 depending on the curve type. Both of those possible values are <= 3
, and I think it's safe to assume that outputU2F
is false because we're not doing anything U2F-related, so it's definitely this branch that's taken. It's described as "Generate key using provided data, return public".
This is turning into a fascinating mystery! What exactly is the behavior of these special ECC slots 31 and 32? It seems they always generate, and never simply retrieve. And what does it mean for buffer[6]
to be 3
? And what role does the hashed identity play??
DERIVEKEY
is called with the curve type and identity hash, and it's expected that this will result in the global variable ecc_public_key
being filled with the public key to be sent back to the program over USB HID (back to Python). There's an interesting conclusion that can be made at this point: !buffer[6]
(curve type) is tested in the previous two if statements while buffer[5]
(slot ID) isn't used at all in the third, meaning that when a curve type is defined, the slot ID doesn't actually matter! That's a bit sloppy as far as binary APIs go, in my opinion. It's misleading, for sure: The unnecessary presence of the slot number in the Python code led me to believe it's no different than any other slot. Total omission of the slot number, or using 0xFF
for the slot number, would have more clearly communicated "this isn't a normal key that you file into your slots, it's something special and fixed". It would also be confusing to some programmer if they're getting the same result no matter which slot they specify.
Anyhow, the next step is to explore what DERIVEKEY
really does, in order to have a satisfying understanding of what the heck is even going on here. It's a relatively terse and convoluted bunch of C, so unlike all of the code above, I'm providing my own version of the original code, which simplifies things and uses more communicative variable names, but should behave exactly the same:
#define flashstorestart 0x3B000
#define flashstoresec7 (flashstorestart + 12288ul)
#define EEpos_ecckey1 (EEpos_totpkey24len + EElen_totpkeylen)
#define EElen_ecckey 1
void DERIVEKEY (uint8_t ktype, uint8_t *data)
{
onlykey_flashget_ECC (132);
//MP: This function continued much farther below
}
int onlykey_flashget_ECC (uint8_t slot)
{
uint8_t slotIndex = slot - 101;
extern uint8_t typeAndFeatures;
onlykey_eeget_common(/* out */ &typeAndFeatures, EEpos_ecckey1 + slotIndex * EElen_ecckey, EElen_ecckey);
uint8_t type = typeAndFeatures & 0x0F;
//MP: This function continued below
}
int onlykey_eeget_common (uint8_t *ptr, int addr, int len) {
while (len--) { *ptr++ = eeprom_read_byte(addr++); }
}
- It's clear now that ECC key numbers, by convention, are offset by 100, explaining why we're asking for #132 when they only go up to #32.
-
EEpos_ecckey1
, or "position of the first ECC key in EEPROM", is defined as being immediately after the 24th TOTP key, which in turn is probably defined as being after something else, ultimately deriving from the beginning of memory. This is a convenient way to define memory locations - if the firmware author ever wanted to rearrange, add, or remove things from the EEPROM structure, only the one definition immediately after the modification would need to be changed. -
EElen_ecckey
, or "length of an ECC key in EEPROM", is just one byte. Therefore, it's clearly not the ECC key itself that's stored there, but rather some one-byte property of that key. -
onlykey_eeget_common
copies one byte from EEPROM to the variabletype
(the fact that it's an externally accessible heap variable appears to have no relevance yet - perhaps these qualifiers are a requirement of any memory thateeprom_read_byte
returns into).
From that, I can conclude the first thing that happens is it reads the key's curve type and "features" (probably referring to sign/decrypt/backup). The second thing that happens...
extern uint8_t ecc_public_key[(MAX_ECC_KEY_SIZE*2)+1];
extern uint8_t ecc_private_key[MAX_ECC_KEY_SIZE];
extern uint8_t profilekey[32];
//MP: Continued from above int onlykey_flashget_ECC (uint8_t slot)
unsigned long ptrToKeyInFlash = flashstoresec7 + slotIndex * 32;
onlykey_flashget_common(ecc_private_key, ptrToKeyInFlash, 32);
//MP: This function continued below
}
void onlykey_flashget_common (uint8_t *dest, unsigned long *src, int lenBytes) {
for (int z = 0; z < lenBytes; z += 4)
{
*(ptr++) = (uint8_t) ((*src >> 24) & 0xFF);
*(ptr++) = (uint8_t) ((*src >> 16) & 0xFF);
*(ptr++) = (uint8_t) ((*src >> 8) & 0xFF);
*(ptr++) = (uint8_t) ((*(src++)) & 0xFF);
}
}
... the key itself is read from flash memory into a global variable ecc_private_key
, similar to the global ecc_public_key
I saw used in void GETPUBKEY (uintu_t* buffer)
: onlykey_flashget_common
reads from the given "from address" four times before incrementing it, each time shifting a decreasing number of bits. This appears to be a big-endian/little-endian conversion. I suppose I could have left it at "flash_get sounds like copying stuff from flash memory", but it's fascinating to observe that flash and RAM use a unified memory address space while differing in endianness.
As for the remaining code...
//MP: Continued from above int onlykey_flashget_ECC (uint8_t slot)
aes_gcm_decrypt(ecc_private_key, slot, features, profilekey, 32);
if (type==1)
Ed25519::derivePublicKey(ecc_public_key, ecc_private_key);
else if (type==2) {
const struct uECC_Curve_t * curve = uECC_secp256r1();
uECC_compute_public_key(ecc_private_key, ecc_public_key, curve);
}
else if (type==3) {
const struct uECC_Curve_t * curve = uECC_secp256k1();
uECC_compute_public_key(ecc_private_key, ecc_public_key, curve);
}
return typeAndFeatures;
}
...the implementations of these functions are increasingly complex and low-level so I'll actually stick to guessing based on their names this time:
- The private key was encrypted in flash memory, so it needs to be decrypted in volatile RAM.
- The public key is derived from the private key - a different algorithm is used for this depending on the specified key type.
Spooky Key #32
At this point, there's an interesting contradiction: A key is read from memory - nothing has been created from scratch yet. The key's curve type is stored in read-only memory (which can feasibly be changed by putting the hardware into a special mode, but we're not in that mode, so it comes from some previous point in time). That means that no matter what curve type we've specified to the agent, and no matter what slot, it's going to read whatever key happens to be in ECC slot 32, whatever curve that key happens to use. So what is that key? Where does it come from?
Based on the pattern of the primary function names, I searched for SETPRIV
, and noticed it's used in several places. The first I jumped to...
// Generate and encrypt default key
recv_buffer[4] = 0xEF; //MP: OKSETPRIV
recv_buffer[5] = 0x84; //MP: 132
recv_buffer[6] = 0x61; //MP: binary 0110 0001
RNG2(recv_buffer+7, 32);
SETPRIV(recv_buffer); //set default ECC key
... appears to set the key in the method onlykey_flashset_pinhashpublic
which I can presume from the fact that it's called only from SETPIN
, is called once upon setting up the device (the only point at which the PIN is set).
Following the code, SETPRIV
is nearly a mirror image of GETPUBKEY
, onlykey_eeset_ecckey
is nearly a mirror of onlykey_eeget_ecckey
, etc. Along those lines, I confirmed that buffer[6]
(0x61
) is stored into EEPROM, at the same address from which onlykey_flashget_ECC
retrieved typeAndFeatures
. From that function's uint8_t type = typeAndFeatures & 0x0F;
, I know that only the lesser nibble denotes type - in this case, that's 1 which means ED25519. In summary, slot 32 is given an ED25519 private key, generated on-device, when the device is initialized.
Wacky Crypto
Then there's the payload: RNG2
, containing interesting lines like rngloop(); //Gather entropy
, appears to be an implementation of OnlyKey's proudly touted random number generation that uses sensors to factor in ambient entropy. So... is it junk data, or is it actually a private key? A quick search for "ED25519 random number" led me to the answer, "yes".
That answer, and in particular the comment "Note that this is not specific to the Ed25519 curve" from user Maarten Bodewes, begins to reveal the actual sense behind the seeming nonsense: The difference between the key curve options manifests only in the public key. A few more searches led me to another post that further confirms this, and on which the same Maarten also commented! On the second topic, he added, "you could also store 128 bits of data or more and derive the two keys from that using a KDF with two labels.". I don't know what a KDF is, but "deriving" more keys using "labels" sounds an awful lot like what I saw in the code for retrieving public key #32. Let's revisit the rest of that code.
//MP: Continued from farther above void DERIVEKEY (uint8_t ktype, uint8_t *data)
//MP: Remember, in this context, `data` is the identity hash
memset(ecc_public_key, 0, sizeof(ecc_public_key));
SHA256_CTX ekey;
sha256_init(&ekey);
sha256_update(&ekey, ecc_private_key, 32); //Add default key to ekey
sha256_update(&ekey, data, 32); //Add provided data to ekey
sha256_final(&ekey, ecc_private_key); //Create hash and store
//MP: Continued below
}
The private key is salted with the identity hash in order to create a new private key (I know now that random numbers are valid private keys for these algorithms, so an sha256 result will also suffice as one). "ekey" in the comments likely means "ephemeral key". This salting (and, in particular, the fact that the curve type is included in the identity hash) is likely to prevent a theoretical attack mentioned by Yehuda Lindell in that second Stack Exchange link:
...having two public keys on different curves with the same private key could reveal information. I don't know how, but I am sure that you cannot prove a reduction from one to another...
In other words, the salt & hash appears to be the KDF.
//MP: Continued from above void DERIVEKEY (uint8_t ktype, uint8_t *data)
if (ktype==1) {
Ed25519::derivePublicKey(ecc_public_key, ecc_private_key);
return;
}
else if (ktype==2) {
const struct uECC_Curve_t * curve = uECC_secp256r1();
uECC_compute_public_key(ecc_private_key, ecc_public_key, curve);
}
else if (ktype==3) {
const struct uECC_Curve_t * curve = uECC_secp256k1();
uECC_compute_public_key(ecc_private_key, ecc_public_key, curve);
}
Finally, the specified curve is used to create the public key.
A third, undocumented, key type (secp256*k*1) is supported by the device firmware, although it's not yet supported by the agent library. Perhaps it's not commonly used.
Conclusion for Slot Selection
Slot selection is totally unnecessary. We have an intriguingly clever setup where a special private key on the device can produce an endless amount of ephemeral yet deterministic keys, so that every user/host combination produces a unique key.
It now makes more sense why the documentation didn't include much detail, but my journey here illustrates how documentation needs to describe what isn't at play in addition to what is. Where there's outstanding simplicity, that siplicity must be highlighted, otherwise users will fill in the gaps with assumptions they've learned to expect from other software, and because they know from experience that most software documentation does exclude important details.
Label
What else was I even looking for in this code? I've forgotten at this point...
Right. Label. If slots are irrelevant then so are labels. Moving on!
Interactivity
Remember that undocumented "shell" arg from the parser? It was described in the code as,
run ${SHELL} as subprocess under SSH agent
At first, that was confusing to me. What does it mean for the shell to not run as a subprocess? But then I remembered this from the documentation:
This method can also be used for git push or other mechanisms that are using SSH as their communication protocol:
$ onlykey-agent user@example.com git push
Knowing that, I suspect that --shell
is a way to enter an interactive shell where you can use "mechanisms that are using SSH" and expect the agent to handle the authorization of said SSH usages. Noting the presence of g = p.add_mutually_exclusive_group()
supports this suspicion, by indicating that --shell
is an action that is mutually exclusive with --connect
. Right now, though, my intention is to run commands directly in the remote shell.
So, all that's left on this topic is just figuring out best-practice for hooking C# up to an interactive process, which won't involve reading source code. So...
ALL DONE!
In summary, slot selection is totally unnecessary, and therefore so is labeling. I just need to run onlykey-agent -e ed25519 user@host
to get the public key, and onlykey-agent -e ed25519 -c user@host
to start the interactive SSH session.
So was all of this a waste of effort?
Not at all!
I learned some really interesting new things about cryptography, and had fun along the way. Sometimes, reading the source code results in the conclusion that you should just have done what the documentation said all along. But it's still worth doing in those cases, because you end up with a better understanding of why.
I believed that I would need to do something that was actually totally unnecessary, a common mistake in software engineering. This mistake stemmed from the fact that the way I've learned to think about the problem that's solved by SSH keys is different from the way that trustcrypto decided to solve that problem with onlykey-agent
. Instead of the typical paradigm of let's randomly generate a key to use as my identity, it's let's deterministically generate a key that is based on my identity. Remember when writing your own documentation that describing the paradigms and motivations, in addition to the syntax, can be useful to your users in this way.
F U T U R E
I often read source code, both for work and in my free time, and thought it would be nice to post logs like this about the latter. My aspiration is that this is the first of many to come. If you have any critiques or suggestions, please let me know!
If you need help documenting or reverse-engineering any open-source or legacy code that your company uses, please reach out to me via Freeform Labs.
Top comments (3)
As a follow-up, I couldn't get
onlykey-agent
to work in Windows, so I ended up re-implementing it as an SSH.NET authentication method (which I intend to publish).Hey Max,
I have to say this write-up is absolutely fascinating, and I am not just saying that because I wrote this code! You did a great job here of picking this apart and explaining, which I know was not easy. I wrote both the firmware and the parts you mentioned in the agent and have been meaning to document more here just trying to find the time. I would be very interested in helping you get the .net apps you are working on finished, I know there are lots of OnlyKey user's that wish this worked with Windows. So to answer some of the questions you had here:
We ignore the slot because when we first released the ssh agent we didn't derive the keys, the user had to load the ecc keys through the app and and specify in the cli which one to use. When we went to deriving keys I wanted to still have an option to specify a slot if you didn't want to use the KDF but have not implemented that in firmware yet.
secp256k1 is the same curve used for Bitcoin, its not really used much for SSH.
As OnlyKey is built on an embedded chip space really matters so things are written in a way to save space. For example as you found using 1/2 of a byte (nibble) for a setting when the setting is only needed to be a 1 - 16 value.
We do now support keylabels, each key can have a label its supported but just not used. It could be useful for several things if supported in an app.
If you have any other questions feel free to reach out.
Great write-up!
Thank you for taking the time and effort to investigate this stuff. I just bought my first onlykey today (awaiting delivery this week) and I expect I'll be referring to this article for my own explorations.
Thank you for sharing the knowledge!