DEV Community

Trinity
Trinity

Posted on

Assembly Code to Machine Code (ARM)

Summary of the video https://www.youtube.com/watch?v=ttJZjP0p_uE

I have heard assembly code is the closest of how humans can write in a fashion that machines can understand. I have never understood how. Here's the bit of how the translation occurs from assembly instruction to machine code.

In ARM Assembly, the following is the translation from assembly code to binary. After looking at each section in detail, we will try to translate ARM operations into binary operations that machine (ARM chip in this case) understands.

31:28 27:26 25 24:21 20 19:16 15:12 11:0
cond op I cmd S Rn Rd Src2

Cond

Opcode [31:28] Mnemonic extension Interpretation Status flag state for execution
0000 EQ Equal / equals zero Z set
0001 NE Not equal Z clear
0010 CS/HS Carry set / unsigned higher or same C set
0011 CC/LO Carry clear / unsigned lower C clear
0100 MI Minus / negative N set
0101 PL Plus / positive or zero N clear
0110 VS Overflow V set
0111 VC No overflow V clear
1000 HI Unsigned higher C set and Z clear
1001 LS Unsigned lower or same C clear or Z set
1010 GE Signed greater than or equal N equals V
1011 LT Signed less than N is not equal to V
1100 GT Signed greater than or equal Z clear and N equals V
1101 LE Signed less than or equal Z set or N is not equal to V
1110 AL Always any
1111 NV Never (do not use!) none

SOME NOTES

  • ADD operation takes Opcode 1110 AL for always.
  • Flag bit is usually the previous operation that results in some bit set. In ADDEQ it will execute if the previous instruction set Z flag to true (1).

op

I

  • It stands for immediate
  • Example, ADD, R1, R2, #0x28
  • If a constant / label is used, then the I field is set to 1

cmd

  • This is the operation that we are well aware of
  • It is ARM data processing instructions
Opcode [24:21] Mnemonic Meaning Effect
0000 AND Logical bit-wise AND Rd := Rn AND Op2
0001 EOR Logical bit-wise exclusive OR Rd := Rn EOR Op2
0010 SUB Subtract Rd := Rn - Op2
0011 RSB Reverse subtract Rd := Op2 - Rn
0100 ADD Add Rd := Rn + Op2
0101 ADC Add with carry Rd := Rn + Op2 + C
0110 SBC Subtract with carry Rd := Rn - Op2 + C - 1
0111 RSC Reverse subtract with carry Rd := Op2 - Rn + C - 1
1000 TST Test Scc on Rn AND Op2
1001 TEQ Test equivalence Scc on Rn EOR Op2
1010 CMP Compare Scc on Rn - Op2
1011 CMN Compare negated Scc on Rn + Op2
1100 ORR Logical bit-wise OR Rd := Rn OR Op2
1101 MOV Move Rd := Op2
1110 BIC Bit clear Rd := Rn AND NOT Op2
1111 MVN Move negated Rd := NOT Op2

S

  • Setting S here means, we want the status of operation
  • For example, ADDS R1, R2, R3 means, put on the status like whether the operation will be Zero, Negative, Carry (this will be shown in CPSR register - which we will cover but not in this article)
  • ADDS operation will set S status to 1 and let us start tracking the status.

Rn (19:16)

  • It is called first source register
  • In ADD R1, R2, R3 Rn is R2
  • Hence get's binary value of 0010 in 19:16

Rd (15:12)

  • It is also called destination register
  • In ADD R1, R2, R3 Rd is R1
  • Hence it gets binary value of 0001 in 15:12

Src2 (11:0)

  • Second Source: Can be a variety of things a) Immediate b) Register c) Register-shifted Register

Immediate

11:8 7:0
rot imm8
  • rot is for rotation. Mnemonic for it is ROR for rotate right.
  • NOTE: It is subject to rotate right by twice the value in the rotate field
  • 11:8 bits represent the amount of rotation to right of their immediate counterpart (7:0)

Register

11:7 6:5 4 3:0
shamt5 sh 0 Rm
  • shamt5 represents amount of shift whether left or right
  • sh is the shift operators - ops table at the bottom
  • Rm is the register of target whose values being shifted

Register-shifted Register

11:8 7 6:5 4 3:0
Rs 0 sh 1 Rm
  • Rs is the register that holds the amount of shift
  • Rm is the target register whose value is being shifted

sh table

Instruction sh Operation
LSL 00 Logical shift left
LSR 01 Logical shift right
ASR 10 Arithmetic shift right
ROR 11 Rotate right

STARTING WITH THE EASY ONE ADD R5, R6, R7

Let's unpack one section at a time

  1. cond is ALWAYS hence 1110, since there's no condition to prevent ADD being done.
  2. op is 00
  3. I is 0 (there's no immediate values here)
  4. cmd is ADD which translates to 0100
  5. No Status indicator ADD(S) S is omitted, hence S is 0
  6. Rn is source, hence R6, 0110
  7. Rd is destination, hence R5, 0101
  8. shamt5 is 00000, since there's no shift
  9. Sh is 00 as there's no shift
  10. Rm is src2 hence, 7. 0111.

Combining them leads to

31:28 27:26 25 24:21 20 19:16 15:12 11:7 6:5 4 3:0
1110 00 0 0100 0 0110 0101 00000 00 0 0111
cond op I cmd S Rn Rd shamt5 sh N/A Rm

Nicely formatted binary here:
1110 0000 1000 0110 0101 0000 0000 0111
Care to convert to hex?
0xE0865007

SLIGHTLY HARDER: ADD R5, R6, R7, LSR #4

  • we pick immediate variety for shift operations since because #4 is a literal value
  1. Most are the same except fields 11:0
  2. LSR has sh code as 01
  3. LSR amount is 4 so shamt5 is 00100
  4. Rm is 7, hence 0111
31:28 27:26 25 24:21 20 19:16 15:12 11:7 6:5 4 3:0
1110 00 0 0100 0 0110 0101 00100 01 0 0111
cond op I cmd S Rn Rd shamt5 sh null Rm

LET'S DO MORE: ADD R0, R1, #42

  1. The third one is immediate, hence
  2. I is set to 1
  3. Src2 becomes immediate format (rot for 11:8 and immediate value 7:0)
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0
1110 00 1 0100 0 0001 0000 0000 00101010
cond op I cmd S Rn Rd Rot imm8

BRING SOME MORE! SUB R2, R3, #0xFF0

  1. Rd is 2, Rn is 3, imm - 0xff0
  2. SUB has 0010 code.
  3. OH NO, BUT #0xFF0 does not fit in 8 bit.
  4. That's ok. That's what rot is for.
  5. 0xFF0 is 0000 0000 0000 0000 0000 1111 1111 0000
  6. 0xFF is 0000 0000 0000 0000 0000 0000 1111 1111
  7. How many shift to right will make 0xFF the 0xFF0?
  8. 1 shift right is 1000 0000 0000 0000 0000 0000 0111 1111
  9. Following? Let's shift right a little more.
  10. 4 shift right is 1111 0000 0000 0000 0000 0000 0000 1111
  11. 8 shift right is 1111 1111 0000 0000 0000 0000 0000 0000
  12. guess what? it takes 24 shift right to get 0xFF0!
  13. So, rot should be 12 since by our rule the actual rotation is twice the value at rot.
  14. Hence, the 11:8 bit values will be 1100 and 7:0 1111 1111
  15. which is just a representation of 0xff0 into 8 bit number combined with rotation.
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7:0
1110 00 1 0010 0 0011 0000 1100 11111111
cond op I cmd S Rn Rd Rot imm8

WHAT ABOUT THIS? LSL R0, R9, #7

  1. WAIT WAIT... LSL is not in the command table. How am I supposed to put in the bit field 24:21?
  2. Thanks Rakesh, the creator of the video: Basically LSL is equivalent to this: MOV R0, R9, LSL #7
  3. Wait again... Rakesh says R9 is not Rn... Hm.. I thought it would be the same as how SUB was done above.
  4. In the MOV operation, that's not the case, as per user guide armasm user guide page 333 MOV R0, R9, LSL #7 applies to the following syntax: MOV{S}{cond} Rd, Operand2 where operand2 is (according to page 244 the same guide) can be Register with optional shift.. Hence, on page 246 of the guide it says, register with optional shift, is Rm{, shift}.
  5. Still following?
  6. Hence, R9 here is Rm, where Rm is the register holding the data for the second operand.
  7. Hence, Rn here is 0 and Rm is 9
  8. (BY THE WAY THE REFERENCE I'M TALKING ABOUT IS armasm User Guide Version 6.6) - the latest is here
31:28 27:26 25 24:21 20 19:16 15:12 11:7 6:5 4 3:0
1110 00 0 1101 0 0000 0000 00111 00 0 1001
cond op I cmd S Rn Rd shamt5 sh null Rm

OK TAKE A BREAK AND COME BACK! ROR R3, R5, #21

  1. GUESS WHAT.. SHIFT AGAIN! Which means Rm is 5
  2. This is equivalent to MOV R3, R5, ROR, #21
  3. Same translation step for LSL above..
31:28 27:26 25 24:21 20 19:16 15:12 11:7 6:5 4 3:0
1110 00 0 1101 0 0000 0011 10101 11 0 0101
cond op I cmd S Rn Rd shamt5 sh null Rm

KEY TAKEAWAY:

  • There's no one rule for all in the translations. Sometimes, you have to look up command table sometimes you will face operation that are not in one table hence, need to break down the command.
  • But, all should be translated to binary otherwise, machine won't understand! So, let's stick to the basics and see if we can translate!!!!!
  • Good news and bad news: You and I have learned how to translate,, not entirely but seen a bit of it... But these translation steps will be also different in A64 architecture but... we learned how to apply our knowledge in some way... Some methods must be similar... must be..

OK AFTER YOUR DINNER... LSR R4, R8, R6

  1. This time the shift amount is in R6.
  2. Does that make R8, the source register the Rn?
  3. NOPE!!
  4. This is equivalent to MOV R4, R8, LSR, R6
  5. R6 is Rs haha! Found it!
  6. Rm is 8 hohoho
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7 6:5 4 3:0
1110 00 0 1101 0 0000 0100 0110 0 01 1 1000
cond op I cmd S Rn Rd Rs N/A sh N/A Rm

PHEW LET'S SLEEP AFTER THIS... ASR R5, R1, R12

  • What is Rd, Rn, Rm, Rs?? Is some of them 0? Which one?
  • Answer below:
31:28 27:26 25 24:21 20 19:16 15:12 11:8 7 6:5 4 3:0
1110 00 0 1101 0 0000 0101 1100 0 10 1 0001
cond op I cmd S Rn Rd Rs N/A sh N/A Rm

Top comments (0)