# Proposed Instructions for the RISC-V Base P Extension

#### John Hauser

#### December 18, 2024

Warning! This document is only a draft proposal and is not an official document of the RISC-V International Association. The Base P extension that is eventually ratified by RISC-V International is liable to differ from this proposal in many details.

This document proposes a set of instructions for the RISC-V Base P extension, both for RV32 and for RV64.

The changes in this version (012) from the previous one (011) are explained here: https://lists.riscv.org/g/tech-p-ext/message/511

The instructions being proposed for the Base P extension are listed here by name, but this document does not, for the most part, describe exactly what each instruction does. Rather, instruction names are constructed based on instruction function, according to a set of rules given by another document, System for RISC-V P Extension Instruction Names. Inferring the function of each instruction from its name requires a thorough understanding of the naming system used for instructions, explained by that other document.

### 1 Commonalities

Some general principles apply to the selection of instructions for the Base P extension:

- The default width of operands is XLEN, 32 bits for RV32, or 64 bits for RV64. Where both RV32 and RV64 have the same packed-SIMD instruction, the instruction operates on 32-bit-size operands for RV32, but 64-bit-size operands for RV64, unless indicated specially otherwise.
- An instruction can have no more than three independent register operands: two source operands and one destination. But unlike the base RISC-V ISA, instructions may both read and write their destination operand, which is necessary for accumulate operations, for example.
- Instructions with register-pair operands exist only for RV32, not for RV64. RV32 needs register-pair operands in order to operate conveniently on 64-bit fixed-point values, but RV64 already supports 64-bit values inherently.
  - (Although some computations might benefit from specialized RV64 instructions with register-pair operands, the extra costs of implementing register-pair operands in a physical processor are judged not to be justified for only a few, limited uses. For applications that really need 128-bit or wider packed-SIMD operands for performance, product developers are advised to look to the RISC-V V extension.)
- An instruction with register-pair operands (RV32 only) can read from a maximum of one register pair and two single registers, or equivalently, a maximum of two pairs. For example, if an instruction reads from a register-pair destination operand, and both its source operands are also registers, the source operands can be only single registers, not pairs.
  - (This rule limits the number of register ports and associated forwarding paths needed in a hardware implementation, when instructions are not broken into multiple micro-ops.)
- There is no concern for overflow of 64-bit values. Hence, there are no instructions that saturate doubleword results, and no instructions that perform averaging addition/subtraction of doublewords.
- For all instructions that accumulate (adding a computed value to the previous value of the destination operand), the accumulate addition does not saturate. If saturation is needed, a separate instruction for saturating addition must be executed.
- No instructions exist that do subtract-accumulate (subtract a computed value from the previous value of the destination operand). If needed, subtraction must be done by a separate instruction.
- For any instruction that includes an accumulate operation, an equivalent instruction without accumulate is usually provided too.
- Among instructions that exist primarily to aid complex arithmetic, only halfword-size and wordsize elements are supported, not byte-size, because complex numbers with byte-size components are not considered common enough to justify dedicated instructions. (Unsigned components are also not supported, for obvious reasons.)

Most packed-SIMD instructions that do not multiply have a fairly regular structure, whereas those that do multiply exhibit more variety. The instructions of the Base P extension that do not perform any multiplications are listed in the next section, while instructions that do multiply are grouped together separately in Section 3.

## 2 Instructions without multiplications

The non-multiplying instructions proposed for the Base P extension are presented in tables in two subsections, the first for instructions that do not have register-pair operands (RV32 and RV64), and then another for instructions with register-pair operands (RV32 only).

The following explains the more unusual or esoteric functions among the proposed instructions:

#### SH1ADD, SSH1SADD:

Function SSH1SADD (saturating shift left by 1, saturating addition) is a double-saturating version of the Zba extension's SH1ADD instruction. It serves as a building block to better support full-width "Q-format" products (a full-width product shifted left by 1 bit) in combination with saturating addition. While the operation of shifting left by 1 bit and saturating, "SSH1", can be done using SADD, having SSH1SADD available allows two instructions (SADD and SADD) to be reduced to one.

SH1ADD is of course the same computation without any saturation. This can be useful as well for full-width "Q-format" products in circumstances where saturation is not needed or desired.

Instructions for byte-width SH1ADD or SSH1SADD are not proposed because full-width products are always at least 16 bits wide.

#### PACK, PACKBT, PACKTB, PACKT:

See: https://lists.riscv.org/g/tech-p-ext/message/247

#### ZIP, UNZIP, WZIP:

See: https://lists.riscv.org/g/tech-p-ext/message/287

#### SLX, SRX:

Shift Left Extended and Shift Right Extended.

```
See: https://lists.riscv.org/g/tech-p-ext/message/267
    https://lists.riscv.org/g/tech-p-ext/message/272
    https://lists.riscv.org/g/tech-p-ext/message/277
```

#### MVM, MVMN, MERGE:

Move Masked, Move Masked Not, and Merge.

```
See: https://lists.riscv.org/g/tech-p-ext/message/267
https://lists.riscv.org/g/tech-p-ext/message/276
```

#### WADD, WADDA, WSUB, WSUBA:

For RV32, instructions for widening addition and subtraction, with and without accumulate, are justified as follows: First, addition/subtraction in various forms are among the most common operations of a computer. Second, other widening instructions for RV32, such as WMACC (widening multiply and accumulate), also require the same structure, with single-register source operands, and accumulation into a register-pair destination. And third, much like SH1ADD and SSH1SADD, widening addition/subtraction is useful sometimes as a building block for functions that may exist as single instructions in other architectures but not in the Base P extension for RISC-V.

### 2.1 No operands are register pairs

| В           | Н           | W          |             | D          |
|-------------|-------------|------------|-------------|------------|
| RV32/RV64   | RV32/RV64   | RV32       | RV64        | RV64 only  |
| PLI.B (*1)  | PLI.H (*2)  | (ADDI)     | PLI.W (*2)  | (ADDI)     |
|             | PLUI.H      | (LUI)      | PLUI.W      | (*3)       |
| PADD.B.B0   | PADD.H.H0   | (ADD)      | PADD.W.W0   | (ADD)      |
| PADD.B      | PADD.H      | (ADD)      | PADD.W      | (ADD)      |
| PSUB.B      | PSUB.H      | (SUB)      | PSUB.W      | (SUB)      |
| PSADD.B     | PSADD.H     | SADÓ       | PSADD.W     | (*4)       |
| PSADDU.B    | PSADDU.H    | SADDU      | PSADDU.W    |            |
| PSSUB.B     | PSSUB.H     | SSUB       | PSSUB.W     |            |
| PSSUBU.B    | PSSUBU.H    | SSUBU      | PSSUBU.W    |            |
| PAADD.B     | PAADD.H     | AADD       | PAADD.W     |            |
| PAADDU.B    | PAADDU.H    | AADDU      | PAADDU.W    |            |
| PASUB.B     | PASUB.H     | ASUB       | PASUB.W     |            |
| PASUBU.B    | PASUBU.H    | ASUBU      | PASUBU.W    |            |
| (*5)        | PSH1ADD.H   | (SH1ADD)   | PSH1ADD.W   | (SH1ADD)   |
|             | PSSH1SADD.H | SSH1SADD   | PSSH1SADD.W | (*4)       |
| (*6)        | PAS.HX      | not        | PAS.WX      | not        |
|             | PSA.HX      | applicable | PSA.WX      | applicable |
|             | PSAS.HX     | (*7)       | PSAS.WX     | (*8)       |
|             | PSSA.HX     | , ,        | PSSA.WX     |            |
|             | PAAS.HX     |            | PAAS.WX     |            |
|             | PASA.HX     |            | PASA.WX     |            |
| PDIF.B      | PDIF.H      | (*9)       |             | (*9)       |
| PDIFU.B     | PDIFU.H     |            |             |            |
| PSABS.B     | PSABS.H     | (*10)      |             | (*4)       |
| PREDSUM.B   | PREDSUM.H   | (ADD)      | PREDSUM.W   | (ADD)      |
| PREDSUMU.B  | PREDSUMU.H  | (ADD)      | PREDSUMU.W  | (ADD)      |
| PDIFSUMU.B  | (*11)       |            |             |            |
| PDIFSUMAU.B |             |            |             |            |

- (\*1) PLI.B (Packed-SIMD Load Immediate, Byte elements) can load any 8-bit immediate into all destination bytes.
- (\*2) The immediates that can be loaded into halfword and word elements are 10 bits wide, signed. PLI (Packed-SIMD Load Immediate) and PLUI (Packed-SIMD Load Upper Immediate) differ in whether the immediate is loaded into the least-significant or most-significant 10 bits of each destination element.
- (\*3) Use existing instructions for loading doubleword constants.
- (\*4) Saturation is not supported at doubleword size, nor averaging add/subtract.
- (\*5) Operations SH1ADD and SSH1SADD are not supported for bytes.
- (\*6) The instructions that add/subtract even/odd elements (PAS.HX, PSA.HX, etc.) exist for complex arithmetic, which is not equally supported for byte-size components.
- (\*7) A single RV32 register can hold only one 32-bit word, so cannot contain two elements to swap. The equivalent function is performed by two scalar instructions, such as ADD and SUB.
- (\*8) A single RV64 register can hold only one 64-bit doubleword, so cannot contain two elements to swap. The equivalent function is performed by two scalar instructions, such as ADD and SUB.

- (\*9) Absolute difference can be computed in three operations: MAX, MIN, and SUB for DIF; or MAXU, MINU, and SUB for DIFU. Absolute value can be computed either with the scalar ABS instruction or in two operations: SUB to negate, then MINU of the original and negated values.
- (\*10) Saturating absolute value SABS can be computed in two operations: SSUB to negate with saturation, then MINU of the original and negated values.
- (\*11) Sum of absolute difference is not supported for sizes larger than bytes.

| В         | Н           | W        |             | D         |
|-----------|-------------|----------|-------------|-----------|
| RV32/RV64 | RV32/RV64   | RV32     | RV64        | RV64 only |
|           | PSEXTB.H    | (SEXT.B) | PSEXTB.W    | (SEXT.B)  |
|           |             | (SEXT.H) | PSEXTH.W    | (SEXT.H)  |
| (*1)      | PSATI.H     | SATI     | PSATI.W     | SATI      |
|           | PUSATI.H    | USATI    | PUSATI.W    | USATI     |
| PSLLI.B   | PSLLI.H     | (SLLI)   | PSLLI.W     | (SLLI)    |
| PSLL.B.B0 | PSLL.H.H0   | (SLL)    | PSLL.W.W0   | (SLL)     |
| PSRLI.B   | PSRLI.H     | (SRLI)   | PSRLI.W     | (SRLI)    |
| PSRL.B.B0 | PSRL.H.H0   | (SRL)    | PSRL.W.W0   | (SRL)     |
| PSRAI.B   | PSRAI.H     | (SRAI)   | PSRAI.W     | (SRAI)    |
| PSRA.B.B0 | PSRA.H.H0   | (SRA)    | PSRA.W.W0   | (SRA)     |
| (*1)      | PSSLAI.H    | SSLAI    | PSSLAI.W    | (*2)      |
|           | PSRARI.H    | SRARI    | PSRARI.W    | SRARI     |
|           | PSSHA.H.H0  | SSHA     | PSSHA.W.W0  | SHA (*2)  |
|           | PSSHAR.H.H0 | SSHAR    | PSSHAR.W.W0 | SHAR      |
| PMIN.B    | PMIN.H      | (MIN)    | PMIN.W      | (MIN)     |
| PMINU.B   | PMINU.H     | (MINU)   | PMINU.W     | (MINU)    |
| PMAX.B    | PMAX.H      | (MAX)    | PMAX.W      | (MAX)     |
| PMAXU.B   | PMAXU.H     | (MAXU)   | PMAXU.W     | (MAXU)    |
| PMSEQ.B   | PMSEQ.H     | MSEQ     | PMSEQ.W     | (*3)      |
| PMSLT.B   | PMSLT.H     | MSLT     | PMSLT.W     |           |
| PMSLTU.B  | PMSLTU.H    | MSLTU    | PMSLTU.W    |           |

- (\*1) These operations are not supported for bytes: arbitrary-width saturation (SAT, USAT) and advanced arithmetic shifts (SSLA, SRAR, SSHA, and SSHAR).
- (\*2) Saturation is not supported at doubleword size.
- (\*3) Condition masks are not supported at doubleword size. Use existing instructions for conditional execution, or negate the result of a standard SLT or SLTU to form a 64-bit mask.

| В:В → Н   | H:H→W  |           | W:W→D     |
|-----------|--------|-----------|-----------|
| RV32/RV64 | RV32   | RV64      | RV64 only |
| PPACK.H   | (PACK) | PPACK.W   | (PACK)    |
| PPACKBT.H | PACKBT | PPACKBT.W | PACKBT    |
| PPACKTB.H | PACKTB | PPACKTB.W | PACKTB    |
| PPACKT.H  | PACKT  | PPACKT.W  | PACKT     |

| В      |          | Н           |           | W         |
|--------|----------|-------------|-----------|-----------|
| RV32   | RV64     | RV32        | RV64      | RV64 only |
| (REV8) |          | (*1)        | REV16     | (*1)      |
| (*2)   | ZIP8P    | (PACK) (*2) | ZIP16P    | (PACK)    |
|        | ZIP8HP   | PACKT       | ZIP16HP   | PACKT     |
| (*3)   | UNZIP8P  | (PACK)      | UNZIP16P  | (PACK)    |
|        | UNZIP8HP | PACKT       | UNZIP16HP | PACKT     |

- (\*1) The upper and lower halves of a register can be swapped using PACKTB.
- (\*2) For RV32, the equivalent of both ZIP8P and ZIP8HP can be done together by a single WZIP8P instruction, and the equivalent of both ZIP16P and ZIP16HP can be done together by a single WZIP16P instruction.
- (\*3) For RV32, the functions of UNZIP8P and UNZIP8HP can be achieved by instruction PNSRLI.B (narrowing shift-right), with a shift distance of 0 for UNZIP8P or 8 for UNZIP8HP.

| scalar only |     |       |        |  |
|-------------|-----|-------|--------|--|
| RV32/RV     | V64 |       | RV64   |  |
| ABS         | SLX | MVM   | ABSW   |  |
| (CLZ)       | SRX | MVMN  | (CLZW) |  |
| CLS         |     | MERGE | CLSW   |  |
| REV         |     |       |        |  |

# 2.2 One or more operands are even-odd register pairs (RV32 only)

| RV32 only, register-pair destination |                   |                   |  |
|--------------------------------------|-------------------|-------------------|--|
| В→Н                                  | $H \rightarrow W$ | $W \rightarrow D$ |  |
| PWADD.B                              | PWADD.H           | WADD              |  |
| PWADDA.B                             | PWADDA.H          | WADDA             |  |
| PWADDU.B                             | PWADDU.H          | WADDU             |  |
| PWADDAU.B                            | PWADDAU.H         | WADDAU            |  |
| PWSUB.B                              | PWSUB.H           | WSUB              |  |
| PWSUBA.B                             | PWSUBA.H          | WSUBA             |  |
| PWSUBU.B                             | PWSUBU.H          | WSUBU             |  |
| PWSUBAU.B                            | PWSUBAU.H         | WSUBAU            |  |
| PWSLLI.B                             | PWSLLI.H          | WSLLI             |  |
| PWSLL.B.B0                           | PWSLL.H.H0        | WSLL              |  |
| PWSLAI.B                             | PWSLAI.H          | WSLAI             |  |
| PWSLA.B.B0                           | PWSLA.H.H0        | WSLA              |  |

| RV32 only, register-pair destination |         |      |  |
|--------------------------------------|---------|------|--|
| B    H    W                          |         |      |  |
| WZIP8P                               | WZIP16P | (*1) |  |

<sup>(\*1)</sup> For RV32, the equivalent function for words is accomplished by two MV instructions.

| RV32 only, regis | RV32 only, register-pair operands |              |                |
|------------------|-----------------------------------|--------------|----------------|
| В                | H                                 | W            | D              |
| PLI.DB           | PLI.DH                            | (*1)         | (*2)           |
|                  | PLUI.DH                           |              |                |
| PADD.DB          | PADD.DH                           | PADD.DW      | ADDD           |
| PSUB.DB          | PSUB.DH                           | PSUB.DW      | SUBD           |
| PSADD.DB         | PSADD.DH                          | PSADD.DW     |                |
| PSADDU.DB        | PSADDU.DH                         | PSADDU.DW    |                |
| PSSUB.DB         | PSSUB.DH                          | PSSUB.DW     |                |
| PSSUBU.DB        | PSSUBU.DH                         | PSSUBU.DW    |                |
| PAADD.DB         | PAADD.DH                          | PAADD.DW     |                |
| PAADDU.DB        | PAADDU.DH                         | PAADDU.DW    |                |
| PASUB.DB         | PASUB.DH                          | PASUB.DW     |                |
| PASUBU.DB        | PASUBU.DH                         | PASUBU.DW    |                |
| (*3)             | PSH1ADD.DH                        | PSH1ADD.DW   |                |
|                  | PSSH1SADD.DH                      | PSSH1SADD.DW |                |
| (*4)             | PAS.DHX                           | (*5)         | not applicable |
|                  | PSA.DHX                           |              | (*6)           |
|                  | PSAS.DHX                          |              |                |
|                  | PSSA.DHX                          |              |                |
|                  | PAAS.DHX                          |              |                |
|                  | PASA.DHX                          |              |                |
| PDIF.DB          | PDIF.DH                           | (*7)         | (*2)           |
| PDIFU.DB         | PDIFU.DH                          |              |                |
| PSABS.DB         | PSABS.DH                          | (*8)         |                |

- (\*1) PLI.DW and PLUI.DW would be inconvenient to support because RV32 has no exactly corresponding scalar instruction for words.
- (\*2) Few doubleword scalar instructions are provided for RV32. In particular, saturation is not supported at doubleword size, nor averaging add/subtract.
- (\*3) Operations SH1ADD and SSH1SADD are not supported for bytes.
- (\*4) The instructions that add/subtract even/odd elements exist for complex arithmetic, which is not equally supported for byte-size components.
- (\*5) For RV32, swapping words (.DWX) is not supported. The equivalent function is performed by two scalar instructions, such as ADD and SUB.
- (\*6) An RV32 register pair can hold only one 64-bit doubleword, so cannot contain two elements to swap.
- (\*7) Absolute difference can be computed in three operations: MAX, MIN, and SUB for DIF; or MAXU, MINU, and SUB for DIFU. Absolute values of two words can be done with two ABS instructions.
- (\*8) Saturating absolute value SABS can be computed in two operations: SSUB to negate with saturation, then MINU of the original and negated values.

| RV32 only, register-pair first source (only) |             |      |  |
|----------------------------------------------|-------------|------|--|
| B    H    W                                  |             |      |  |
| PREDSUM.DB                                   | PREDSUM.DH  | (*1) |  |
| PREDSUMU.DB                                  | PREDSUMU.DH |      |  |

(\*1) The equivalent function is accomplished by two ADD instructions, or a single WADDA or WADDAU.

| RV32 only, register-pair operands |           |           |      |
|-----------------------------------|-----------|-----------|------|
| В                                 | H         | W         | D    |
|                                   | PSEXTB.DH | PSEXTB.DW | (*1) |
|                                   |           | PSEXTH.DW |      |
| (*2)                              | PSATI.DH  | PSATI.DW  |      |
|                                   | PUSATI.DH | PUSATI.DW |      |
| PSLLI.DB                          | PSLLI.DH  | PSLLI.DW  |      |
| PSRLI.DB                          | PSRLI.DH  | PSRLI.DW  |      |
| PSRAI.DB                          | PSRAI.DH  | PSRAI.DW  |      |
| (*2)                              | PSSLAI.DH | PSSLAI.DW |      |
|                                   | PSRARI.DH | PSRARI.DW |      |
| PMIN.DB                           | PMIN.DH   | PMIN.DW   |      |
| PMINU.DB                          | PMINU.DH  | PMINU.DW  |      |
| PMAX.DB                           | PMAX.DH   | PMAX.DW   |      |
| PMAXU.DB                          | PMAXU.DH  | PMAXU.DW  |      |
| PMSEQ.DB                          | PMSEQ.DH  | PMSEQ.DW  | 1    |
| PMSLT.DB                          | PMSLT.DH  | PMSLT.DW  |      |
| PMSLTU.DB                         | PMSLTU.DH | PMSLTU.DW |      |

- (\*1) Few doubleword scalar instructions are provided for RV32. Doubleword shifts by a constant distance can be done in two instructions, making use of narrowing shift instructions NSRLI and NSRAI.
- (\*2) These operations are not supported for bytes: arbitrary-width saturation (SAT, USAT) and advanced arithmetic shifts (SSLA, SRAR).

| RV32 only, register-pair first source and destination |              |             |      |
|-------------------------------------------------------|--------------|-------------|------|
| В                                                     | H            | W           | D    |
| PADD.DB.B0                                            | PADD.DH.H0   | PADD.DW.W   | (*1) |
| PSLL.DB.B0                                            | PSLL.DH.H0   | PSLL.DW.W   |      |
| PSRL.DB.B0                                            | PSRL.DH.H0   | PSRL.DW.W   |      |
| PSRA.DB.B0                                            | PSRA.DH.H0   | PSRA.DW.W   |      |
| (*2)                                                  | PSSHA.DH.H0  | PSSHA.DW.W  |      |
|                                                       | PSSHAR.DH.H0 | PSSHAR.DW.W |      |

- (\*1) Few doubleword scalar instructions are provided for RV32.
- (\*2) Advanced arithmetic shifts are not supported for bytes.

| RV32 only, register-pair operands |                            |            |  |
|-----------------------------------|----------------------------|------------|--|
| В:В→Н                             | $\mid$ H:H $\rightarrow$ W | $W:W\to D$ |  |
| PPACK.DH                          | PPACK.DW                   | (*1)       |  |
| PPACKBT.DH                        | PPACKBT.DW                 |            |  |
| PPACKTB.DH                        | PPACKTB.DW                 |            |  |
| PPACKT.DH                         | PPACKT.DW                  |            |  |

(\*1) Few doubleword scalar instructions are provided for RV32. Packing two words into a doubleword is accomplished by two MV instructions.

| RV32 only, register-pair first source (only) |                   |                   |
|----------------------------------------------|-------------------|-------------------|
| Н→В                                          | $W \rightarrow H$ | $D \rightarrow W$ |
| PNSRLI.B                                     | PNSRLI.H          | NSRLI             |
| PNSRL.B.B0                                   | PNSRL.H.H0        | NSRL              |
| PNSRAI.B                                     | PNSRAI.H          | NSRAI             |
| PNSRA.B.B0                                   | PNSRA.H.H0        | NSRA              |
| PNSRARI.B                                    | PNSRARI.H         | NSRARI            |
| PNSRAR.B.B0                                  | PNSRAR.H.H0       | NSRAR             |
| PNCLIPI.B                                    | PNCLIPI.H         | NCLIPI            |
| PNCLIP.B.B0                                  | PNCLIP.H.H0       | NCLIP             |
| PNCLIPRI.B                                   | PNCLIPRI.H        | NCLIPRI           |
| PNCLIPR.B.B0                                 | PNCLIPR.H.H0      | NCLIPR            |
| PNCLIPIU.B                                   | PNCLIPIU.H        | NCLIPIU           |
| PNCLIPU.B.B0                                 | PNCLIPU.H.H0      | NCLIPU            |
| PNCLIPRIU.B                                  | PNCLIPRIU.H       | NCLIPRIU          |
| PNCLIPRU.B.B0                                | PNCLIPRU.H.H0     | NCLIPRU           |

## 3 Instructions that perform multiplications

Besides the more general principles listed earlier, some additional ones apply specifically to the selection of multiplication instructions for the Base P extension:

- For byte × byte multiplication, only full-width products (B × B → H) are supported directly by instructions, not byte-width products (B × B → B), because the latter do not maintain enough precision for most computations. Algorithms that actually need byte-size products must convert back from full-width, halfword-size products.
- For byte × byte multiplication, multiply-add instructions are provided only in forms that do four-element-wide horizontal addition (4ADD), as these are believed to be sufficient for most needs. For any other cases, byte multiplication and subsequent addition can always be done using separate instructions.
- Instructions that multiply doublewords are not provided by the P extension.
- RV32 instructions with register-pair operands may not take more than 32 bits of factors to multiply from each of two source operands—i.e., 32 bits from the first source operand, and 32 bits from the second source operand. This limits the number of bits that may be multiplier inputs. (When more multiplier capability is needed for performance, developers are expected to step up to RV64P systems, or switch to the V extension.)

As before, the first subsection below presents the instructions that do not have register-pair operands, followed by a separate subsection for RV32-only instructions with register-pair destinations.

For instructions listed with suffix '.Hn', the n is the number of a halfword element within the second source operand, either '0' for bits 15:0 or '1' for bits 31:16.

For suffixes '.Hnn' and '.Wnn', each n is an element number within a source operand, either '0' or '1'. For halfwords, element number 0 is bits 15:0, and element number 1 is bits 31:16. For word size, element number 0 is bits 31:0, and element number 1 is bits 63:32. The first digit ('0' or '1') corresponds to the first source operand, and the second digit corresponds to the second source operand. The only combinations allowed are '00', '01', and '11'. For signed-unsigned (SU) multiplications, the '01' combination is not supported, only '00' or '11'.

For instructions listed with suffix '.Bp' or '.Hp', the p is either 'E' for even elements or 'O' for odd elements. For suffixes '.Bpp' and '.Hpp', the pp is either 'EE' (even elements from both source operands), 'EO' (even elements from the first source, odd elements from the second source), or 'OO' (odd elements from both source operands). For signed-unsigned (SU) multiplications, the 'EO' combination is not supported, only 'EE' or 'OO'.

# 3.1 No operands are register pairs

| $B \times B \rightarrow B$ | $H \times H \rightarrow H$ | $W \times W \rightarrow W$ |             |
|----------------------------|----------------------------|----------------------------|-------------|
| RV32/RV64                  | RV32/RV64                  | RV32                       | RV64        |
| (*1)                       | PMULH.H                    | (MULH)                     | PMULH.W     |
|                            | PMULHR.H                   | MULHR                      | PMULHR.W    |
|                            | PMULHSU.H                  | (MULHSU)                   | PMULHSU.W   |
|                            | PMULHRSU.H                 | MULHRSU                    | PMULHRSU.W  |
|                            | PMULHU.H                   | (MULHU)                    | PMULHU.W    |
|                            | PMULHRU.H                  | MULHRU                     | PMULHRU.W   |
|                            | PMULQ.H                    | MULQ                       | PMULQ.W     |
|                            | PMULQR.H                   | MULQR                      | PMULQR.W    |
|                            | PMHACC.H                   | MHACC                      | PMHACC.W    |
|                            | PMHRACC.H                  | MHRACC                     | PMHRACC.W   |
|                            | PMHACCSU.H                 | MHACCSU                    | PMHACCSU.W  |
|                            | PMHRACCSU.H                | MHRACCSU                   | PMHRACCSU.W |
|                            | PMHACCU.H                  | MHACCU                     | PMHACCU.W   |
|                            | PMHRACCU.H                 | MHRACCU                    | PMHRACCU.W  |

<sup>(\*1)</sup> For byte  $\times$  byte multiplication, instructions are provided only for full-width products (B×B $\rightarrow$ H), not byte-width products (B×B $\rightarrow$ B).

| $B \times B \rightarrow B \rightarrow H$ | $H \times H \rightarrow H \rightarrow W$ |               | $W \times W \rightarrow W \rightarrow D$ |
|------------------------------------------|------------------------------------------|---------------|------------------------------------------|
| RV32/RV64                                | RV32                                     | RV64          | RV64 only                                |
| (*1)                                     | MQACC.Hnn                                | PMQACC.W.Hpp  | MQACC.Wnn                                |
|                                          | MQRACC.Hnn                               | PMQRACC.W.Hpp | MQRACC.Wnn                               |
|                                          | PMQ2ADD.H                                |               | PMQ2ADD.W                                |
|                                          | PMQ2ADDA.H                               |               | PMQ2ADDA.W                               |
|                                          | PMQR2ADD.H                               |               | PMQR2ADD.W                               |
|                                          | PMQR2ADDA.H                              |               | PMQR2ADDA.W                              |

<sup>(\*1)</sup> For byte  $\times$  byte multiplication, instructions are provided only for full-width products (B×B $\rightarrow$ H), not byte-width products (B×B $\rightarrow$ B).

| $B \times B \rightarrow H$ | $H \times H \rightarrow W$ |               | $W \times W \rightarrow D$ |
|----------------------------|----------------------------|---------------|----------------------------|
| RV32/RV64                  | RV32                       | RV64          | RV64 only                  |
| PMUL.H.Bpp                 | MUL.Hnn                    | PMUL.W.Hpp    | MUL.Wnn                    |
| PMULSU.H.Bpp               | MULSU.Hnn                  | PMULSU.W.Hpp  | MULSU.Wnn                  |
| PMULU.H.Bpp                | MULU.Hnn                   | PMULU.W.Hpp   | MULU.Wnn                   |
| (*1)                       | MACC.Hnn                   | PMACC.W.Hpp   | MACC.Wnn                   |
|                            | MACCSU.Hnn                 | PMACCSU.W.Hpp | MACCSU.Wnn                 |
|                            | MACCU.Hnn                  | PMACCU.W.Hpp  | MACCU.Wnn                  |
|                            | PM2ADD.H                   | PM2ADD.H      |                            |
|                            | PM2ADDA.H                  |               | PM2ADDA.W                  |
|                            | PM2ADDSU.H                 |               | PM2ADDSU.W                 |
|                            | PM2ADDASU.H                |               | PM2ADDASU.W                |
|                            | PM2ADDU.H                  |               | PM2ADDU.W                  |
|                            | PM2ADDAU.H                 |               | PM2ADDAU.W                 |
|                            | PM2ADD.HX                  |               | PM2ADD.WX                  |
|                            | PM2ADDA.HX                 |               | PM2ADDA.WX                 |
|                            | PM2SADD.H (*2)             |               | (*3)                       |
|                            | PM2SADD.HX                 |               |                            |
|                            | PM2SUB.H                   |               | PM2SUB.W                   |
|                            | PM2SUBA.H                  |               | PM2SUBA.W                  |
|                            | PM2SUB.HX                  |               | PM2SUB.WX                  |
|                            | PM2SUBA.HX                 | _             | PM2SUBA.WX                 |

- (\*1) For byte  $\times$  byte multiplication, the only multiply-add instructions provided are those with four-element-wide horizontal addition (4ADD).
- (\*2) Saturation of the horizontal addition (2SADD) will occur only when all four source halfwords equal -0x8000. Versions of these instructions that accumulate are not provided because saturation is not supported for an accumulate operation. To get the effect of multiply-accumulate, a PM2SADD.H or PM2SADD.HX will most likely be combined with SADD or PSADD.DW (RV32) or PSADD.W (RV64).
- (\*3) Saturation is not supported at doubleword size.

| $B \times B \rightarrow H \rightarrow W$ | $H \times H \rightarrow W \rightarrow D$ |
|------------------------------------------|------------------------------------------|
| RV32/RV64                                | RV64 only                                |
| PM4ADD.B                                 | PM4ADD.H                                 |
| PM4ADDA.B                                | PM4ADDA.H                                |
| PM4ADDSU.B                               | PM4ADDSU.H                               |
| PM4ADDASU.B                              | PM4ADDASU.H                              |
| PM4ADDU.B                                | PM4ADDU.H                                |
| PM4ADDAU.B                               | PM4ADDAU.H                               |

| $H \times B \rightarrow H$ | $W \times H \rightarrow W$ |                  |
|----------------------------|----------------------------|------------------|
| RV32/RV64                  | RV32                       | RV64             |
| PMULH.H.Bp                 | MULH.Hn                    | PMULH.W.Hp       |
| PMULHSU.H.Bp               | MULHSU.Hn                  | PMULHSU.W.Hp     |
| PMHACC.H.Bp                | MHACC.Hn                   | PMHACC.W.Hp      |
| PMHACCSU.H.Bp              | MHACCSU.Hn                 | PMHACCSU.W.H $p$ |

## 3.2 Destination is an even-odd register pair (RV32 only)

| RV32 only, register-pair destination     |                                          |                                          |
|------------------------------------------|------------------------------------------|------------------------------------------|
| $B \times B \rightarrow B \rightarrow H$ | $H \times H \rightarrow H \rightarrow W$ | $W \times W \rightarrow W \rightarrow D$ |
| (*1)                                     |                                          | MQWACC                                   |
|                                          | PMQRWACC.H                               | MQRWACC                                  |

(\*1) For byte  $\times$  byte multiplication, instructions are provided only for full-width products (B×B $\rightarrow$ H), not byte-width products (B×B $\rightarrow$ B).

| RV32 only, register-pair destination |                            |                            |
|--------------------------------------|----------------------------|----------------------------|
| $B \times B \rightarrow H$           | $H \times H \rightarrow W$ | $W \times W \rightarrow D$ |
| PWMUL.B                              | PWMUL.H                    | WMUL                       |
| PWMULSU.B                            | PWMULSU.H                  | WMULSU                     |
| PWMULU.B                             | PWMULU.H                   | WMULU                      |
| (*1)                                 | PWMACC.H                   | WMACC                      |
|                                      | PWMACCSU.H                 | WMACCSU                    |
|                                      | PWMACCU.H                  | WMACCU                     |

(\*1) For byte  $\times$  byte multiplication, the only multiply-add instructions provided are those with four-element-wide horizontal addition (4ADD).

| RV32 only, register-pair destination               |                                          |  |
|----------------------------------------------------|------------------------------------------|--|
| $\mid B \times B \rightarrow H \rightarrow W \mid$ | $H \times H \rightarrow W \rightarrow D$ |  |
| (*1)                                               | PM2WADD.H                                |  |
|                                                    | PM2WADDA.H                               |  |
|                                                    | PM2WADDSU.H                              |  |
|                                                    | PM2WADDASU.H                             |  |
|                                                    | PM2WADDU.H                               |  |
|                                                    | PM2WADDAU.H                              |  |
|                                                    | PM2WADD.HX                               |  |
|                                                    | PM2WADDA.HX                              |  |
|                                                    | PM2WSUB.H                                |  |
|                                                    | PM2WSUBA.H                               |  |
|                                                    | PM2WSUB.HX                               |  |
|                                                    | PM2WSUBA.HX                              |  |

(\*1) For byte  $\times$  byte multiplication, the only multiply-add instructions provided are those with four-element-wide horizontal addition (4ADD).