One of the main reasons I got into unix systems was the ability to remotely log into another machine. While this is common now amongst most operating systems, the ability to telnet into another machine was once a novelty. Before long people discovered that telnet's use of plain text was a security issue, and ssh became the dominate remote execution protocol.
I love being able to ssh into one of my machines at home from where ever I like (or can find a client to use). Having recently installed OpenBSD 6.6 on a box laying around the house I of course enabled ssh during the installation. Everything seemed fine from the machine itself. It wasn't until I tried to log in remotely that I got this gem.
Not exactly my dreams of having the resources of my server available at the touch of a button. After a reboot it worked fine, then after another I get the same error message. Then after waiting a while I can log in again. But where to look for answers? My first clue was to check the dmesg buffer from boot and see what network device all this was happening under.
rl0 at pci2 dev 2 function 0 "Realtek 8139" rev 0x10: apic 2 int 22
So I know to look in the "rl" manual page for any resources. After a brief glance I find this bit of knowledge.
BUGS Since outbound packets must be longword aligned, the transmit routine has to copy an unaligned packet into an mbuf cluster buffer before transmission. The driver abuses the fact that the cluster buffer pool is allocated at system startup time in a contiguous region starting at a page boundary. Since cluster buffers are 2048 bytes, they are longword aligned by definition. The driver probably should not be depending on this characteristic. The Realtek data sheets are of especially poor quality: the grammar and spelling are awful and there is a lot of information missing, particularly concerning the receiver operation. One particularly important fact that the data sheets fail to mention relates to the way in which the chip fills in the receive buffer. When an interrupt is posted to signal that a frame has been received, it is possible that another frame might be in the process of being copied into the receive buffer while the driver is busy handling the first one. If the driver manages to finish processing the first frame before the chip is done DMAing the rest of the next frame, the driver may attempt to process the next frame in the buffer before the chip has had a chance to finish DMAing all of it. The driver can check for an incomplete frame by inspecting the frame length in the header preceding the actual packet data: an incomplete frame will have the magic length of 0xFFF0. When the driver encounters this value, it knows that it has finished processing all currently available packets. Neither this magic value nor its significance are documented anywhere in the Realtek data sheets.
Which made me focus on this...
One particularly important fact that the data sheets fail to mention relates to the way in which the chip fills in the receive buffer. When an interrupt is posted to signal that a frame has been received, it is possible that another frame might be in the process of being copied into the receive buffer while the driver is busy handling the first one.
Swap out the network cards and voila! No more refused connections.