DEV Community

Charles Anthony
Charles Anthony

Posted on

2024-02-17 Multics Hang at 'CPU Add' During Boot, Part 2

A user reported a repeatable hang on a Raspberry Pi; I fired up my Pi 4 Model B and ran the latest simulator (15e7721dd6a2c58b5d3eea4e0428c3e71a9004ec) with Quickstart 12.8, modified the Multics config, adding:

cpu -tag b -port 6 -state on -type dps8 -model 70. -cache 8. 
cpu -tag c -port 5 -state on -type dps8 -model 70. -cache 8. 
cpu -tag d -port 5 -state on -type dps8 -model 70 -cache 8. 
cpu -tag e -port 4 -state on -type dps8 -model 70 -cache 8. 
Enter fullscreen mode Exit fullscreen mode

This adds CPUs B, C, D, and E to the configuration, with state set to on, they will be started during boot.

 ./dps8 MR12.8_boot.ini 
DPS8/M simulator X3.0.1+16 (64-bit)
  Commit: 15e7721dd6a2c58b5d3eea4e0428c3e71a9004ec
Enter fullscreen mode Exit fullscreen mode
bce (boot) 0817.2: M-> [auto-input] boot star

0817.9  CPU A: Model #: DPS 8/SIM M; Serial #: 0; Ship date: 240215; PROM Layout Version: 2; 
          Simulator Release: X3.0.1 (2024-02-15); Build Number: <None>;  
          Build Arch: aarch64; Build OS: Linux; 
          Target Arch: AArch64/ARM64/64-bit; Target OS: GNU/Linux.
CPU B thread created.
0817.9  start_cpu: Added CPU B.
CPU C thread created.
0817.9  start_cpu: Added CPU C.
Enter fullscreen mode Exit fullscreen mode

and it hangs...

Interestingly, this is not the symptom seen by the issue reporter, there hang is at the "CPU B thread created." message; they never see "0817.9 start_cpu: Added CPU B.".
(The "thread created" message is from the simulator; the messages starting with a time code are from Multics.)

$ gdb dps8 16473
(gdb) p/o cpus[0].PPR
$1 = {PRR = 0, PSR = 034, P = 01, IC = 02427}
(gdb) p/o cpus[1].PPR
$2 = {PRR = 0, PSR = 034, P = 01, IC = 02427}
(gdb) p/o cpus[2].PPR
$3 = {PRR = 0, PSR = 041, P = 01, IC = 0320127}
(gdb) p/o cpus[3].PPR
$4 = {PRR = 0, PSR = 0, P = 0, IC = 0}
Enter fullscreen mode Exit fullscreen mode

CPUs A and B are executing 34:2427

bound_interceptors                 34  (0, 0, 0) read execute privileged encacheable wired

Component                            Text        Int-Stat       Symbol
                                 Start Length  Start Length  Start Length

fim                                  0   2210      0      0    100    266
wired_fim                         2210    332      0      0    366    230
Enter fullscreen mode Exit fullscreen mode

34:2427 is offset 2427-2210 --> 217 in wired_fim:

                                   378  " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
                                   379  "
                                   380  "       START_WAIT - Wait until new CPU has started up.
                                   381  "
                                   382  "
                                   383  " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "
                                   384
    000205                         385  start_wait:
    000205  aa   000001 3352 07    386          lca     1,dl            all ones in A
    000206  4a  4 00016 6753 20    387          era     prds$processor_pattern  turn off bit for this CPU
    000207  4a  4 00030 3553 20    388          ansa    scs$processor_start_wait  check ourselves off
                                   389
    000210  4a  4 00074 3737 20    390          eppsb   prds$           push a frame onto the prds
    000211  0a   000325 2272 00    391          ldx7    push            ..
    000212  4a  4 00076 7003 20    392          tsx0    fim_util$push_stack_32  ..
                                   393
    000213  aa  6 00050 3503 00    394          eppap   notify_regs     ap -> place to copy conditions
    000214  4a  4 00100 7003 20    395          tsx0    fim_util$copy_mc        copy the conditions into stack
                                   396
    000215  4a  4 00102 7003 20    397          tsx0    fim_util$set_mask       uninhibit to prevent lockups
    000216                         398          inhibit off     <-><-><-><-><-><-><-><-><-><-><-><->
                                   399
    000216  4a  4 00072 2341 20    400          szn     scs$connect_lock        test connect lock
    000217  0a   000223 6000 00    401          tze     *+4             wait until it is cleared
    000220  aa   000110 7770 00    402          llr     72
    000221  aa   000110 7770 00    403          llr     72
    000222  0a   000216 7100 00    404          tra     *-4
Enter fullscreen mode Exit fullscreen mode

The start CPU code holds connect_lock during new CPU startup; I am confused as to why both CPUs A (cpus[0]) and B (cpus[1]) are both in start_wait, but that may just be my not understanding how the running not-bootload CPUs work during CPU add.

CPU C (the newly added CPU) is at 320127

This seems to be

    000125  aa   000002 2352 07    158  swerr:  lda     rcerr_addcpu_bad_switches,dl
    000126  aa   000272 7552 04    159          sta     wait_flag-*,ic  set it for start_cpu
    000127  aa   077777 6372 03    160  swerr_lp:       ldt     =o77777,du      prevent timer runout faults
    000130  aa   000270 2352 04    161          lda     wait_flag-*,ic  has start_cpu given use a green lite?
    000131  aa   000043 6042 04    162          tmi     nogo-*,ic               no, bad switches go to DIS
    000132  aa   000002 1152 07    163          cmpa    rcerr_addcpu_bad_switches,dl is start_cpu still thinking about it?
    000133  aa   777774 6002 04    164          tze     swerr_lp-*,ic   yes, go through another loop
Enter fullscreen mode Exit fullscreen mode

Which makes no sense; the bootload CPU issued the CPU started message, which means CPU C had long ago passed the switch tests. Also, this is the same symptom that I was seeing yesterday.

Doing a instruction trace of CPU C:

2: 00041:030315 0 700026764161 (LPRP4 PR7|26,*AU) 000034 545(0) 1 0 0 00
2: 00041:030316 0 200001710100 (TRA PR2|1) 000000 764(0) 0 0 0 01
2: 00043:001033 0 000446710000 (TRA 000446) 000001 710(0) 1 0 0 00
2: 00043:000446 0 000002235120 (LDA PR0|2,N*) 000446 710(0) 0 0 0 00
2: 00043:000447 0 000007735000 (ALS 000007) 000214 235(0) 0 0 0 00
2: 00043:000450 0 000004035120 (ADLA PR0|4,N*) 000007 735(0) 0 0 0 00
2: 00043:000451 0 000003735000 (ALS 000003) 002473 035(0) 0 0 0 00
2: 00043:000452 0 000000620005 (EAX0 000000,AL) 000003 735(0) 0 0 0 00
2: 00043:000453 0 000006237120 (LDAQ PR0|6,N*) 000000 620(0) 0 0 0 05
2: 00043:000454 0 400132057120 (SSCR PR4|132,N*) 000120 237(0) 0 0 0 00
2: 00043:000455 0 700044710120 (TRA PR7|44,N*) 000000 057(0) 0 0 0 10
2: 00041:030325 0 600000373100 (EPBP7 PR6|0) 030325 710(0) 0 0 0 00
2: 030326 320050710200 (TRA 320050) 000000 373(0) 1 0 0 00
2: 320050 000346754204 (STI 000346,IC) 320050 710(0) 0 1 0 00
2: 320051 000345235204 (LDA 000345,IC) 000346 754(0) 0 1 0 04
2: 320052 000020315207 (CANA 000020,DL) 000345 235(0) 0 1 0 04
2: 320053 000120600204 (TZE 000120,IC) 000020 315(0) 0 1 0 07
2: 320054 320372674202 (LCPR 320372,QU) 000120 600(0) 0 1 0 04
2: 320055 000000623200 (EAX3 000000) 320372 674(0) 0 1 0 02
2: 320056 000000627200 (EAX7 000000) 000000 623(0) 0 1 0 00
Enter fullscreen mode Exit fullscreen mode

76K instructions in, it decides it wants to start executing init_processor code in ABS.

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs