So I've been fascinated by RISC-V ever since I discovered it like back in... Sometime around march or April of 2020. Maybe earlier. But the instruction format always confused me. The ISA manual used figures for all the instructions, and so it was hard for me to figure out the layout of the instruction itself and all that. The riscv-opcodes repository gave me code for masks and matches, e.g.: 0x8002 and 0xf07f for the JR instruction using the constant names MATCH_C_JR and MASK_C_JR) but that doesn't really help me with the layout of the instructions.
Wikipedia defines all of the instruction formats, but I get lost in the sub-tables. The LaTex code doesn't really help either. Can someone link me to a resource that describes, or write a description for, the layout of the various instruction formats and how to tell the difference between each one?
RISC-V Instruction Format
-
- Member
- Posts: 5588
- Joined: Mon Mar 25, 2013 7:01 pm
Re: RISC-V Instruction Format
It's not too crazy if you're familiar with other RISC encodings.
All of the base RV32I instructions can be split into six fields: opcode, rd, func3, rs1, rs2, and func7. The opcode field is bits 0-6, rd is bits 7-11, func3 is bits 12-14, rs1 is bits 15-19, rs2 is bits 20-24, and func7 is bits 25-31. The opcode, func3, and func7 fields determine which instruction it is and how it's encoded. The rd field usually specifies the destination register. The rs1 and rs2 fields usually specify the first and second source registers.
Instructions are further categorized into four types based on how they encode their immediate operands. The R-type (register-to-register) instructions are split into the six fields listed above without dedicating any fields to immediate operands. I-type (immediate) instructions replace the rs2 and func7 fields with the immediate operand, placing it in bits 20-31. S-type (store or branch) instructions replace the rd and func7 fields with the immediate operand, with the lower bits replacing rd and the upper bits replacing func7. U-type (upper immediate or jump) instructions keep only opcode and rd fields, with the other four fields replaced by the immediate operand.
Branch and jump instructions are sometimes counted as separate encodings because they store the immediate operand with shuffled bits, rather than placing all of the bits in order the way other instructions do. This difference isn't important unless you're writing an assembler or disassembler.
Some of the instruction set extensions split instructions into different fields, but hopefully this is enough to get you through the common instructions.
All of the base RV32I instructions can be split into six fields: opcode, rd, func3, rs1, rs2, and func7. The opcode field is bits 0-6, rd is bits 7-11, func3 is bits 12-14, rs1 is bits 15-19, rs2 is bits 20-24, and func7 is bits 25-31. The opcode, func3, and func7 fields determine which instruction it is and how it's encoded. The rd field usually specifies the destination register. The rs1 and rs2 fields usually specify the first and second source registers.
Instructions are further categorized into four types based on how they encode their immediate operands. The R-type (register-to-register) instructions are split into the six fields listed above without dedicating any fields to immediate operands. I-type (immediate) instructions replace the rs2 and func7 fields with the immediate operand, placing it in bits 20-31. S-type (store or branch) instructions replace the rd and func7 fields with the immediate operand, with the lower bits replacing rd and the upper bits replacing func7. U-type (upper immediate or jump) instructions keep only opcode and rd fields, with the other four fields replaced by the immediate operand.
Branch and jump instructions are sometimes counted as separate encodings because they store the immediate operand with shuffled bits, rather than placing all of the bits in order the way other instructions do. This difference isn't important unless you're writing an assembler or disassembler.
Some of the instruction set extensions split instructions into different fields, but hopefully this is enough to get you through the common instructions.
Re: RISC-V Instruction Format
riscv-spec-20191213.pdf should help.
Look at "Figure 1.1: RISC-V instruction length encoding. Only the 16-bit and 32-bit encodings are considered frozen at this time."
On the next page in the same section...
Then look at e.g. "Figure 2.2: RISC-V base instruction formats.".
It shows 4 kinds of 32-bit encodings (R-, I-, S-, U-) and it shows where the various instruction fields are, in which bits.
The 7 opcode bits in 32-bit-long instructions include the bits that tell the instruction length and these are in the 16-bit parcel/halfword at the lowest address in instruction memory.
You can now jump to the tables in section "RV32/64G Instruction Set Listings".
Specifically, there's (sub)table "RV32I Base Instruction Set", which lists encodings of specific instructions with values of special fields like opcode and funct3. Stuff in "Table 24.2: Instruction listing for RISC-V" is pretty self-explanatory I'd say.
If you don't know what some field means (e.g. rs1 or aq), just search within the document. Your PDF reader has this function.
Look at "Figure 1.1: RISC-V instruction length encoding. Only the 16-bit and 32-bit encodings are considered frozen at this time."
On the next page in the same section...
It tells you how to figure out the instruction length and the endiannes of the instructions.Expanded Instruction-Length Encoding wrote:RISC-V base ISAs have either little-endian or big-endian memory systems, with the privileged architecture further defining bi-endian operation. Instructions are stored in memory as a sequence of 16-bit little-endian parcels, regardless of memory system endianness. Parcels forming one instruction are stored at increasing halfword addresses, with the lowest-addressed parcel holding the lowest-numbered bits in the instruction specification.
...
We have to fix the order in which instruction parcels are stored in memory, independent of memory system endianness, to ensure that the length-encoding bits always appear first in halfword address order. This allows the length of a variable-length instruction to be quickly 10 Volume I: RISC-V Unprivileged ISA V20191213 determined by an instruction-fetch unit by examining only the first few bits of the first 16-bit instruction parcel.
Then look at e.g. "Figure 2.2: RISC-V base instruction formats.".
It shows 4 kinds of 32-bit encodings (R-, I-, S-, U-) and it shows where the various instruction fields are, in which bits.
The 7 opcode bits in 32-bit-long instructions include the bits that tell the instruction length and these are in the 16-bit parcel/halfword at the lowest address in instruction memory.
You can now jump to the tables in section "RV32/64G Instruction Set Listings".
Specifically, there's (sub)table "RV32I Base Instruction Set", which lists encodings of specific instructions with values of special fields like opcode and funct3. Stuff in "Table 24.2: Instruction listing for RISC-V" is pretty self-explanatory I'd say.
If you don't know what some field means (e.g. rs1 or aq), just search within the document. Your PDF reader has this function.
Re: RISC-V Instruction Format
So, I found a solution in the most unlikely of places. The riscv organization on GitHub has a tool called parse_opcodes, written in Python, that contains this block of bit pattern tuples:
Thus, this block in RV opcode format:
Translates into the binary:
The tricky part is ordering the bits properly. Its harder to follow the pattern when you have to insert other characters other than raw binary. But it works.
The only thing I'm curious about is how the different register types are encoded. For example, there are only 5 bits allocated to registers (excluding vector registers) so I'm assuming that the register number and exactly what register it is is determined by the opcode. Is that right?
Code: Select all
arglut = {}
arglut['rd'] = (11,7)
arglut['rs1'] = (19,15)
arglut['rs2'] = (24,20)
arglut['rs3'] = (31,27)
arglut['aqrl'] = (26,25)
arglut['fm'] = (31,28)
arglut['pred'] = (27,24)
arglut['succ'] = (23,20)
arglut['rm'] = (14,12)
arglut['funct3'] = (14,12)
arglut['imm20'] = (31,12)
arglut['jimm20'] = (31,12)
arglut['imm12'] = (31,20)
arglut['imm12hi'] = (31,25)
arglut['bimm12hi'] = (31,25)
arglut['imm12lo'] = (11,7)
arglut['bimm12lo'] = (11,7)
arglut['zimm'] = (19,15)
arglut['shamt'] = (25,20)
arglut['shamtw'] = (24,20)
# for vectors
arglut['vd'] = (11,7)
arglut['vs3'] = (11,7)
arglut['vs1'] = (19,15)
arglut['vs2'] = (24,20)
arglut['vm'] = (25,25)
arglut['wd'] = (26,26)
arglut['amoop'] = (31,27)
arglut['nf'] = (31,29)
arglut['simm5'] = (19,15)
arglut['zimm11'] = (30,20)
Code: Select all
amoadd.w rd rs1 rs2 aqrl 31..29=0 28..27=0 14..12=2 6..2=0x0B 1..0=3
Code: Select all
00000aqrl rs2rs1 10rd 101111
The only thing I'm curious about is how the different register types are encoded. For example, there are only 5 bits allocated to registers (excluding vector registers) so I'm assuming that the register number and exactly what register it is is determined by the opcode. Is that right?