SSSE3 OPCODE Mode [for AMD OPEMU]

Meowthra · February 1, 2015

I'm just finishing Intel manual focus for opemu

CHAPTER 1

Bit and Byte Order

==================

In illustrations of data structures in memory, smaller addresses appear toward the bottom of the figure; addresses increase toward the top. Bit positions are numbered from right to left. The numerical value of a set bit is equal to two raised to the power of the bit position. Intel 64 and IA-32 processors are “little endian” machines; this means the bytes of a word are numbered starting from the least significant byte. See Figure 1-1.

Figure 1-1. Bit and Byte Order

                      Data Structure

Highest Address   31-24 23-06 15-08 07-00  Bit offset                                           
                  -----------------------
                 |     |     |     |     | 28
                  -----------------------
                 |     |     |     |     | 24
                  -----------------------
                 |     |     |     |     | 20
                  -----------------------
                 |     |     |     |     | 16
                  -----------------------
                 |     |     |     |     | 12
                  -----------------------
                 |     |     |     |     | 8
                  ----------------------- 
                 |     |     |     |     | 4
                  ----------------------- 
                 |byte3|byte2|byte1|byte0| 0  Lowest Address
                  -----------------------

                Figure 1-1. Bit and Byte Order

Reserved Bits and Software Compatibility

In many register and memory layout descriptions, certain bits are marked as reserved. When bits are marked as reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown, effect. The behavior of reserved bits should be regarded as not only undefined, but unpredictable. Software should follow these guidelines in dealing with reserved bits:

• Do not depend on the states of any reserved bits when testing the values of registers which contain such bits. Mask out the reserved bits before testing.

• Do not depend on the states of any reserved bits when storing to memory or to a register.

• Do not depend on the ability to retain information written into any reserved bits.

• When loading a register, always load the reserved bits with the values indicated in the documentation, if any, or reload them with values previously read from the same register.

NOTE

Avoid any software dependence upon the state of reserved bits in IA-32 registers. Depending upon the values of reserved register bits will make software dependent upon the unspecified manner in which the processor handles these bits. Programs that depend upon reserved values risk incompat- ibility with future processors.

Instruction Operands

====================

When instructions are represented symbolically, a subset of the IA-32 assembly language is used. In this subset, an instruction has the following format:

label: mnemonic argument1, argument2, argument3

where:

• A label is an identifier which is followed by a colon.

• A mnemonic is a reserved name for a class of instruction opcodes which have the same function.

• The operands argument1, argument2, and argument3 are optional. There may be from zero to three operands, depending on the opcode. When present, they take the form of either literals or identifiers for data items. Operand identifiers are either reserved names of registers or are assumed to be assigned to data items declared in another part of the program (which may not be shown in the example).

When two operands are present in an arithmetic or logical instruction, the right operand is the source and the left operand is the destination.

For example:

LOADREG: MOV EAX, SUBTOTAL

In this example, LOADREG is a label, MOV is the mnemonic identifier of an opcode, EAX is the destination operand, and SUBTOTAL is the source operand. Some assembly languages put the source and destination in reverse order.

Hexadecimal and Binary Numbers

==============================

Base 16 (hexadecimal) numbers are represented by a string of hexadecimal digits followed by the character H (for example, F82EH). A hexadecimal digit is a character from the following set: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.

Base 2 (binary) numbers are represented by a string of 1s and 0s, sometimes followed by the character B (for example, 1010B). The “B” designation is only used in situations where confu- sion as to the type of number might arise.

Segmented Addressing

====================

The processor uses byte addressing. This means memory is organized and accessed as a sequence of bytes. Whether one or more bytes are being accessed, a byte address is used to locate the byte or bytes in memory. The range of memory that can be addressed is called an address space.

The processor also supports segmented addressing. This is a form of addressing where a program may have many independent address spaces, called segments. For example, a program can keep its code (instructions) and stack in separate segments. Code addresses would always refer to the code space, and stack addresses would always refer to the stack space. The following notation is used to specify a byte address within a segment:

Segment-register:Byte-address

For example, the following segment address identifies the byte at address FF79H in the segment pointed by the DS register:

DS:FF79H

The following segment address identifies an instruction address in the code segment. The CS register points to the code segment and the EIP register contains the address of the instruction.

CS:EIP

Meowthra · February 12, 2015

CHAPTER 2

INSTRUCTION FORMAT

==================

The Intel 64 and IA-32 architectures instruction encodings are subsets of the format shown in Figure 2-1. Instruc- tions consist of optional instruction prefixes (in any order), primary opcode bytes (up to three bytes), an addressing-form specifier (if required) consisting of the ModR/M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if required).

Figure 2-1. IA-32 Instruction Format:

 ------------- ------------- ------------- ------------- ------------- -------------
|Instruction  |Opcode       |ModR/M       |SIB          |Displacement |Immediate    |
|Prefixes     |             |             |             |             |             |
 ------------- ------------- ------------- ------------- ------------- -------------

 ModR/M byte:                     SIB byte:
  7 - 6    5 - 3    2 - 0          7 - 6    5 - 3    2 - 0
 -------- -------- --------       -------- -------- --------
| Mod    | Reg/   | R/M    |     | Scale  | Index  | Base   |
|        | Opcode |        |      -------- -------- --------
 -------- -------- --------

Instruction Prefixes : Up to four prefixes of 1-byte each (optional)

Opcode : 1-, 2-, or 3-byte opcode

ModR/M : 1 byte (if required)

SIB : 1 byte (if required)

Displacement : Address displacement of 1, 2, or 4 bytes or none

Immediate : Immediate data of 1, 2, or 4 bytes or none

for example: 66 0F 38 01 02

66 = Prefixes

0F 38 01 = Opcode

02 = ModR/M

for example: 66 0F 38 01 04 02

66 = Prefixes

0F 38 01 = Opcode

04 = ModR/M

02 = SIB

for example: 66 0F 38 01 05 04 03 02 01

66 = Prefixes

0F 38 01 = Opcode

05 = ModR/M

04 03 02 01 = Displacement

for example: 66 0F 3A 0F 05 04 03 02 01 02

66 = Prefixes

0F 3A 0F = Opcode

05 = ModR/M

04 03 02 01 = Displacement

02 = Immediate

INSTRUCTION PREFIXES

====================

Instruction prefixes are divided into four groups, each with a set of allowable prefix codes. For each instruction, it is only useful to include up to one prefix code from each of the four groups (Groups 1, 2, 3, 4). Groups 1 through 4 may be placed in any order relative to each other.

• Group 1

— Lock and repeat prefixes:

• LOCK prefix is encoded using F0H

• REPNE/REPNZ prefix is encoded using F2H. Repeat-Not-Zero prefix applies only to string and

input/output instructions. (F2H is also used as a mandatory prefix for some instructions)

REP or REPE/REPZ is encoded using F3H. The repeat prefix applies only to string and input/output instructions. F3H is also used as a mandatory prefix for POPCNT, LZCNT and ADOX instructions.

• Group 2

— Segment override prefixes:

• 2EH—CS segment override (use with any branch instruction is reserved)

• 36H—SS segment override prefix (use with any branch instruction is reserved)

• 3EH—DS segment override prefix (use with any branch instruction is reserved)

• 26H—ES segment override prefix (use with any branch instruction is reserved)

• 64H—FS segment override prefix (use with any branch instruction is reserved)

• 65H—GS segment override prefix (use with any branch instruction is reserved)

— Branch hints:

• 2EH—Branch not taken (used only with Jcc instructions)

• 3EH—Branch taken (used only with Jcc instructions)

• Group 3

• Operand-size override prefix is encoded using 66H (66H is also used as a mandatory prefix for some instructions).

• Group 4

• 67H—Address-size override prefix

The LOCK prefix (F0H) forces an operation that ensures exclusive use of shared memory in a multiprocessor envi- ronment. See “LOCK—Assert LOCK# Signal Prefix” in Chapter 3, “Instruction Set Reference, A-M,” for a description of this prefix.

Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string. Use these prefixes only with string and I/O instructions (MOVS, CMPS, SCAS, LODS, STOS, INS, and OUTS). Use of repeat prefixes and/or undefined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior.

Some instructions may use F2H,F3H as a mandatory prefix to express distinct functionality. A mandatory prefix generally should be placed after other optional prefixes (exception to this is discussed in Section 2.2.1, “REX Prefixes”)

Branch hint prefixes (2EH, 3EH) allow a program to give a hint to the processor about the most likely code path for a branch. Use these prefixes only with conditional branch instructions (Jcc). Other use of branch hint prefixes and/or other undefined opcodes with Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable behavior.

The operand-size override prefix allows a program to switch between 16- and 32-bit operand sizes. Either size can be the default; use of the prefix selects the non-default size.

Some SSE2/SSE3/SSSE3/SSE4 instructions and instructions using a three-byte sequence of primary opcode bytes may use 66H as a mandatory prefix to express distinct functionality. A mandatory prefix generally should be placed after other optional prefixes (exception to this is discussed in Section 2.2.1, “REX Prefixes”)

Other use of the 66H prefix is reserved; such use may cause unpredictable behavior.

The address-size override prefix (67H) allows programs to switch between 16- and 32-bit addressing. Either size can be the default; the prefix selects the non-default size. Using this prefix and/or other undefined opcodes when operands for the instruction do not reside in memory is reserved; such use may cause unpredictable behavior.

Opcodes

=======

A primary opcode can be 1, 2, or 3 bytes in length. An additional 3-bit opcode field is sometimes encoded in the ModR/M byte. Smaller fields can be defined within the primary opcode. Such fields define the direction of opera- tion, size of displacements, register encoding, condition codes, or sign extension. Encoding fields used by an opcode vary depending on the class of operation.

Two-byte opcode formats for general-purpose and SIMD instructions consist of:

• An escape opcode byte 0FH as the primary opcode and a second opcode byte, or

• A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, and a second opcode byte (same as previous bullet)

For example, CVTDQ2PD consists of the following sequence: F3 0F E6. The first byte is a mandatory prefix (it is not considered as a repeat prefix).

Three-byte opcode formats for general-purpose and SIMD instructions consist of:

• An escape opcode byte 0FH as the primary opcode, plus two additional opcode bytes, or

• A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, plus two additional opcode bytes (same as previous bullet)

For example, PHADDW for XMM registers consists of the following sequence: 66 0F 38 01. The first byte is the mandatory prefix.

Valid opcode expressions are defined in Appendix A and Appendix B.

ModR/M and SIB Bytes

====================

Many instructions that refer to an operand in memory have an addressing-form specifier byte (called the ModR/M byte) following the primary opcode. The ModR/M byte contains three fields of information:

• The mod field combines with the r/m field to form 32 possible values: eight registers and 24 addressing modes.

• The reg/opcode field specifies either a register number or three more bits of opcode information. The purpose

of the reg/opcode field is specified in the primary opcode.

• The r/m field can specify a register as an operand or it can be combined with the mod field to encode an addressing mode. Sometimes, certain combinations of the mod field and the r/m field is used to express opcode information for some instructions.

Certain encodings of the ModR/M byte require a second addressing byte (the SIB byte). The base-plus-index and scale-plus-index forms of 32-bit addressing require the SIB byte. The SIB byte includes the following fields:

• The scale field specifies the scale factor.

• The index field specifies the register number of the index register.

• The base field specifies the register number of the base register. See Section 2.1.5 for the encodings of the ModR/M and SIB bytes.

Displacement and Immediate Bytes

Some addressing forms include a displacement immediately following the ModR/M byte (or the SIB byte if one is present). If a displacement is required; it be 1, 2, or 4 bytes.

If an instruction specifies an immediate operand, the operand always follows any displacement bytes. An imme- diate operand can be 1, 2 or 4 bytes.

Addressing-Mode Encoding & ModR/M and SIB Bytes

================================================

The values and corresponding addressing forms of the ModR/M and SIB bytes are shown in Table 2-1 through Table 2-3: 16-bit addressing forms specified by the ModR/M byte are in Table 2-1 and 32-bit addressing forms are in Table 2-2. Table 2-3 shows 32-bit addressing forms specified by the SIB byte. In cases where the reg/opcode field in the ModR/M byte represents an extended opcode, valid encodings are shown in Appendix B.

In Table 2-1 and Table 2-2, the Effective Address column lists 32 effective addresses that can be assigned to the first operand of an instruction by using the Mod and R/M fields of the ModR/M byte. The first 24 options provide ways of specifying a memory location; the last eight (Mod = 11B) provide ways of specifying general-purpose, MMX technology and XMM registers.

The Mod and R/M columns in Table 2-1 and Table 2-2 give the binary encodings of the Mod and R/M fields required to obtain the effective address listed in the first column. For example: see the row indicated by Mod = 11B, R/M = 000B. The row identifies the general-purpose registers EAX, AX or AL; MMX technology register MM0; or XMM register XMM0. The register used is determined by the opcode byte and the operand-size attribute.

Now look at the seventh row in either table (labeled “REG =”). This row specifies the use of the 3-bit Reg/Opcode field when the field is used to give the location of a second operand. The second operand must be a general- purpose, MMX technology, or XMM register. Rows one through five list the registers that may correspond to the value in the table. Again, the register used is determined by the opcode byte along with the operand-size attribute.

If the instruction does not require a second operand, then the Reg/Opcode field may be used as an opcode exten- sion. This use is represented by the sixth row in the tables (labeled “/digit (Opcode)”). Note that values in row six are represented in decimal form.

The body of Table 2-1 and Table 2-2 (under the label “Value of ModR/M Byte (in Hexadecimal)”) contains a 32 by 8 array that presents all of 256 values of the ModR/M byte (in hexadecimal). Bits 3, 4 and 5 are specified by the column of the table in which a byte resides. The row specifies bits 0, 1 and 2; and bits 6 and 7. The figure below demonstrates interpretation of one table value.

Figure 2-2. Table Interpretation of ModR/M Byte (C8H)

                 Mod   11
                 RM         000
/digit (Opcode); REG =   001
                      ---------
                 C8H   11001000

Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte

r8(/r)                     AL    CL    DL    BL    AH    CH    DH    BH
r16(/r)                    AX    CX    DX    BX    SP    BP    SI    DI
r32(/r)                    EAX   ECX   EDX   EBX   ESP   EBP   ESI   EDI
mm(/r)                     MM0   MM1   MM2   MM3   MM4   MM5   MM6   MM7
xmm(/r)                    XMM0  XMM1  XMM2  XMM3  XMM4  XMM5  XMM6  XMM7
/digit (Opcode)            0     1     2     3     4     5     6     7
REG =                      000   001   010   011   100   101   110   111

   Effective
+---Address---+ +Mod R/M+ +--------ModR/M Values in Hexadecimal--------+

[BX + SI]            000   00    08    10    18    20    28    30    38
[BX + DI]            001   01    09    11    19    21    29    31    39
[BP + SI]            010   02    0A    12    1A    22    2A    32    3A
[BP + DI]            011   03    0B    13    1B    23    2B    33    3B
[SI]             00  100   04    0C    14    1C    24    2C    34    3C
[DI]                 101   05    0D    15    1D    25    2D    35    3D
disp16               110   06    0E    16    1E    26    2E    36    3E
[BX]                 111   07    0F    17    1F    27    2F    37    3F

[BX+SI]+disp8        000   40    48    50    58    60    68    70    78
[BX+DI]+disp8        001   41    49    51    59    61    69    71    79
[BP+SI]+disp8        010   42    4A    52    5A    62    6A    72    7A
[BP+DI]+disp8        011   43    4B    53    5B    63    6B    73    7B
[SI]+disp8       01  100   44    4C    54    5C    64    6C    74    7C
[DI]+disp8           101   45    4D    55    5D    65    6D    75    7D
[BP]+disp8           110   46    4E    56    5E    66    6E    76    7E
[BX]+disp8           111   47    4F    57    5F    67    6F    77    7F

[BX+SI]+disp16       000   80    88    90    98    A0    A8    B0    B8
[BX+DI]+disp16       001   81    89    91    99    A1    A9    B1    B9
[BX+SI]+disp16       010   82    8A    92    9A    A2    AA    B2    BA
[BX+DI]+disp16       011   83    8B    93    9B    A3    AB    B3    BB
[SI]+disp16      10  100   84    8C    94    9C    A4    AC    B4    BC
[DI]+disp16          101   85    8D    95    9D    A5    AD    B5    BD
[BP]+disp16          110   86    8E    96    9E    A6    AE    B6    BE
[BX]+disp16          111   87    8F    97    9F    A7    AF    B7    BF

EAX/AX/AL            000   C0    C8    D0    D8    E0    E8    F0    F8
ECX/CX/CL            001   C1    C9    D1    D9    E1    E9    F1    F9
EDX/DX/DL            010   C2    CA    D2    DA    E2    EA    F2    FA
EBX/BX/BL            011   C3    CB    D3    DB    E3    EB    F3    FB
ESP/SP/AH        11  100   C4    CC    D4    DC    E4    EC    F4    FC
EBP/BP/CH            101   C5    CD    D5    DD    E5    ED    F5    FD
ESI/SI/DH            110   C6    CE    D6    DE    E6    EE    F6    FE
EDI/DI/BH            111   C7    CF    D7    DF    E7    EF    F7    FF

NOTES:

1. The default segment register is SS for the effective addresses containing a BP index, DS for other effective addresses.

2. The disp16 nomenclature denotes a 16-bit displacement that follows the ModR/M byte and that is added to the index.

3. The disp8 nomenclature denotes an 8-bit displacement that follows the ModR/M byte and that is sign-extended and added to the

index.

Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte

r8(/r)                     AL    CL    DL    BL    AH    CH    DH    BH
r16(/r)                    AX    CX    DX    BX    SP    BP    SI    DI
r32(/r)                    EAX   ECX   EDX   EBX   ESP   EBP   ESI   EDI
mm(/r)                     MM0   MM1   MM2   MM3   MM4   MM5   MM6   MM7
xmm(/r)                    XMM0  XMM1  XMM2  XMM3  XMM4  XMM5  XMM6  XMM7
/digit (Opcode)            0     1     2     3     4     5     6     7
REG =                      000   001   010   011   100   101   110   111
   Effective
+---Address---+ +Mod R/M+ +---------ModR/M Values in Hexadecimal-------+

[EAX]                000   00    08    10    18    20    28    30    38
[ECX]                001   01    09    11    19    21    29    31    39
[EDX]                010   02    0A    12    1A    22    2A    32    3A
[EBX]                011   03    0B    13    1B    23    2B    33    3B
[--] [--]        00  100   04    0C    14    1C    24    2C    34    3C
disp32               101   05    0D    15    1D    25    2D    35    3D
[ESI]                110   06    0E    16    1E    26    2E    36    3E
[EDI]                111   07    0F    17    1F    27    2F    37    3F

disp8[EAX]           000   40    48    50    58    60    68    70    78
disp8[ECX]           001   41    49    51    59    61    69    71    79
disp8[EDX]           010   42    4A    52    5A    62    6A    72    7A
disp8[EPX];          011   43    4B    53    5B    63    6B    73    7B
disp8[--] [--]   01  100   44    4C    54    5C    64    6C    74    7C
disp8[ebp]           101   45    4D    55    5D    65    6D    75    7D
disp8[ESI]           110   46    4E    56    5E    66    6E    76    7E
disp8[EDI]           111   47    4F    57    5F    67    6F    77    7F

disp32[EAX]          000   80    88    90    98    A0    A8    B0    B8
disp32[ECX]          001   81    89    91    99    A1    A9    B1    B9
disp32[EDX]          010   82    8A    92    9A    A2    AA    B2    BA
disp32[EBX]          011   83    8B    93    9B    A3    AB    B3    BB
disp32[--] [--]  10  100   84    8C    94    9C    A4    AC    B4    BC
disp32[EBP]          101   85    8D    95    9D    A5    AD    B5    BD
disp32[ESI]          110   86    8E    96    9E    A6    AE    B6    BE
disp32[EDI]          111   87    8F    97    9F    A7    AF    B7    BF

EAX/AX/AL            000   C0    C8    D0    D8    E0    E8    F0    F8
ECX/CX/CL            001   C1    C9    D1    D9    E1    E9    F1    F9
EDX/DX/DL            010   C2    CA    D2    DA    E2    EA    F2    FA
EBX/BX/BL            011   C3    CB    D3    DB    E3    EB    F3    FB
ESP/SP/AH        11  100   C4    CC    D4    DC    E4    EC    F4    FC
EBP/BP/CH            101   C5    CD    D5    DD    E5    ED    F5    FD
ESI/SI/DH            110   C6    CE    D6    DE    E6    EE    F6    FE
EDI/DI/BH            111   C7    CF    D7    DF    E7    EF    F7    FF

NOTES:

1. The [--][--] nomenclature means a SIB follows the ModR/M byte.

2. The disp32 nomenclature denotes a 32-bit displacement that follows the ModR/M byte (or the SIB byte if one is present) and that is added to the index.

3. The disp8 nomenclature denotes an 8-bit displacement that follows the ModR/M byte (or the SIB byte if one is present) and that is sign-extended and added to the index.

Table 2-3 is organized to give 256 possible values of the SIB byte (in hexadecimal). General purpose registers used as a base are indicated across the top of the table, along with corresponding values for the SIB byte’s base field. Table rows in the body of the table indicate the register used as the index (SIB byte bits 3, 4 and 5) and the scaling factor (determined by SIB byte bits 6 and 7).

Table 2-3. 32-Bit Addressing Forms with the SIB Byte

   r32                      EAX   ECX   EDX   EBX   ESP   [*]   ESI   EDI
   Base =                   0     1     2     3     4     5     6     7
   Base =                   000   001   010   011   100   101   110   111

+Scaled Index+ +SS Index+  +--------ModR/M Values in Hexadecimal--------+

[EAX]                000    00    01    02    03    04    05    06    07
[ECX]                001    08    09    0A    0B    0C    0D    0E    0F
[EDX]                010    10    11    12    13    14    15    16    17
[EBX]                011    18    19    1A    1B    1C    1D    1E    1F
none            00   100    20    21    22    23    24    25    26    27
[EBP]                101    28    29    2A    2B    2C    2D    2E    2F
[ESI]                110    30    31    32    33    34    35    36    37
[EDI]                111    38    39    3A    3B    3C    3D    3E    3F

[EAX*2]              000    40    41    42    43    44    45    46    47
[ECX*2]              001    48    49    4A    4B    4C    4D    4E    4F
[ECX*2]              010    50    51    52    53    54    55    56    57
[EBX*2]              011    58    59    5A    5B    5C    5D    5E    5F
none            01   100    60    61    62    63    64    65    66    67
[EBP*2]              101    68    69    6A    6B    6C    6D    6E    6F
[ESI*2]              110    70    71    72    73    74    75    76    77
[EDI*2]              111    78    79    7A    7B    7C    7D    7E    7F

[EAX*4]              000    80    81    82    83    84    85    86    87
[ECX*4]              001    88    89    8A    8B    8C    8D    8E    8F
[EDX*4]              010    90    91    92    93    94    95    96    97
[EBX*4]              011    98    89    9A    9B    9C    9D    9E    9F
none            10   100    A0    A1    A2    A3    A4    A5    A6    A7
[EBP*4]              101    A8    A9    AA    AB    AC    AD    AE    AF
[ESI*4]              110    B0    B1    B2    B3    B4    B5    B6    B7
[EDI*4]              111    B8    B9    BA    BB    BC    BD    BE    BF

[EAX*8]              000    C0    C1    C2    C3    C4    C5    C6    C7
[ECX*8]              001    C8    C9    CA    CB    CC    CD    CE    CF
[EDX*8]              010    D0    D1    D2    D3    D4    D5    D6    D7
[EBX*8]              011    D8    D9    DA    DB    DC    DD    DE    DF
none            11   100    E0    E1    E2    E3    E4    E5    E6    E7
[EBP*8]              101    E8    E9    EA    EB    EC    ED    EE    EF
[ESI*8]              110    F0    F1    F2    F3    F4    F5    F6    F7
[EDI*8]              111    F8    F9    FA    FB    FC    FD    FE    FF

NOTES:

1. The [*] nomenclature means a disp32 with no base if the MOD is 00B. Otherwise, [*] means disp8 or disp32 + [EBP]. This provides the following address modes:

MOD bits Effective Address

00 [scaled index] + disp32

01 [scaled index] + disp8 + [EBP]

10 [scaled index] + disp32 + [EBP]

IA-32E MODE

===========

IA-32e mode has two sub-modes. These are:

• Compatibility Mode. Enables a 64-bit operating system to run most legacy protected mode software unmodified.

• 64-Bit Mode. Enables a 64-bit operating system to run applications written to access 64-bit address space.

REX Prefixes

============

REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following:

• Specify GPRs and SSE registers.

• Specify 64-bit operand size.

• Specify extended control registers.

Not all instructions require a REX prefix in 64-bit mode. A prefix is necessary only if an instruction references one of the extended registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is ignored.

Only one REX prefix is allowed per instruction. If used, the REX prefix byte must immediately precede the opcode byte or the escape opcode byte (0FH). When a REX prefix is used in conjunction with an instruction containing a mandatory prefix, the mandatory prefix must come before the REX so the REX prefix can be immediately preceding the opcode or the escape byte. For example, CVTDQ2PD with a REX prefix should have REX placed between F3 and 0F E6. Other placements are ignored. The instruction-size limit of 15 bytes still applies to instructions with a REX prefix. See Figure 2-3.

Figure 2-3. Prefix Ordering in 64-bit Mode

 ---------- -------- -------- -------- ----- -------------- -----------
| Legacy   | REX    | Opcode | ModR/M | SIB | Displacement | Immediate |
| Prefixes | Prefix |        |        |     |              |           |
 ---------- -------- -------- -------- ----- -------------- -----------

Legacy Prefixes: Grp 1, Grp 2, Grp 3, Grp 4 (optional)
REX Prefix:      (optional)
Opcode:          1-, 2-, or 3-byte opcode
ModR/M:          1 byte (if required)
SIB:             1 byte (if required)
Displacement:    Address displacement of 1,2,or4bytes
Immediate:       Immediate data of1,2,or4 bytes or none

for example: 66 43 0F 3A 0F 04 05 01 02 03 04 05

66 = Legacy Prefixes

43 = REX Prefix

0F 3A 0F = Opcode

04 = ModR/M

05 = SIB

01 02 03 04 = Displacement

05 = Immediate

Encoding

========

Intel 64 and IA-32 instruction formats specify up to three registers by using 3-bit fields in the encoding, depending on the format:

• ModR/M: the reg and r/m fields of the ModR/M byte

• ModR/M with SIB: the reg field of the ModR/M byte, the base and index fields of the SIB (scale, index, base) byte

• Instructions without ModR/M: the reg field of the opcode

In 64-bit mode, these formats do not change. Bits needed to define fields in the 64-bit context are provided by the addition of REX prefixes.

More on REX Prefix Fields

=========================

REX prefixes are a set of 16 opcodes that span one row of the opcode map and occupy entries 40H to 4FH. These opcodes represent valid instructions (INC or DEC) in IA-32 operating modes and in compatibility mode. In 64-bit mode, the same opcodes represent the instruction prefix REX and are not treated as individual instructions.

The single-byte-opcode form of INC/DEC instruction not available in 64-bit mode. INC/DEC functionality is still available using ModR/M forms of the same instructions (opcodes FF/0 and FF/1).

See Table 2-4 for a summary of the REX prefix format. Figure 2-4 though Figure 2-7 show examples of REX prefix fields in use. Some combinations of REX prefix fields are invalid. In such cases, the prefix is ignored. Some addi- tional information follows:

• Setting REX.W can be used to determine the operand size but does not solely determine operand width. Like the 66H size prefix, 64-bit operand size override has no effect on byte-specific operations.

• For non-byte operations: if a 66H prefix is used with prefix (REX.W = 1), 66H is ignored.

• If a 66H override is used with REX and REX.W = 0, the operand size is 16 bits.

• REX.R modifies the ModR/M reg field when that field encodes a GPR, SSE, control or debug register. REX.R is ignored when ModR/M specifies other registers or defines an extended opcode.

• REX.X bit modifies the SIB index field.

• REX.B either modifies the base in the ModR/M r/m field or SIB base field; or it modifies the opcode reg field

used for accessing GPRs.

Table 2-4. REX Prefix Fields [bITS: 0100WRXB]

Field Name   Bit   Position Definition
             7-4   0100
W            3     0 = Operand size determined by CS.D
                   1 = 64 Bit Operand Size
R            2     Extension of the ModR/M reg field
X            1     Extension of the SIB index field
B            0     Extension of the ModR/M r/m field, SIB base field, or Opcode reg field

In the IA-32 architecture, byte registers (AH, AL, BH, BL, CH, CL, DH, and DL) are encoded in the ModR/M byte’s reg field, the r/m field or the opcode reg field as registers 0 through 7. REX prefixes provide an additional addressing capability for byte-registers that makes the least-significant byte of GPRs available for byte operations.

Certain combinations of the fields of the ModR/M byte and the SIB byte have special meaning for register encod- ings. For some combinations, fields expanded by the REX prefix are not decoded. Table 2-5 describes how each case behaves.

Special Cases of REX Encodings

ModR/M Byte

mod ≠ 11 & r/m = b*100(ESP) SIB byte present. SIB byte required for ESP-based addressing. REX prefix adds a fourth bit which is not decoded (don't care). SIB byte also required for R12-based addressing.

mod = 0 & r/m = b*101(EBP) Base register not used. EBP without a displacement must be done using

mod = 01 with displacement of 0. REX prefix adds a fourth bit which is not decoded (don't care).

Using RBP or R13 without displacement must be done using mod = 01 with a displacement of 0.

SIB Byte

index = 0100(ESP) Index register not used. ESP cannot be used as an index register. REX prefix adds a fourth bit which is decoded.

There are no additional implications. The expanded index field allows distinguishing RSP from R12, therefore R12 can be used as an index.

base = 0101(EBP) Base register is unused if mod = 0. Base register depends on mod encoding. REX prefix adds a fourth bit which is not decoded. This requires explicit displacement to be used with EBP/RBP or R13.

Displacement

============

Addressing in 64-bit mode uses existing 32-bit ModR/M and SIB encodings. The ModR/M and SIB displacement sizes do not change. They remain 8 bits or 32 bits and are sign-extended to 64 bits.

Direct Memory-Offset MOVs

In 64-bit mode, direct memory-offset forms of the MOV instruction are extended to specify a 64-bit immediate absolute address. This address is called a moffset. No prefix is needed to specify this 64-bit memory offset. For these MOV instructions, the size of the memory offset follows the address-size default (64 bits in 64-bit mode). See Table 2-6.

Table 2-6. Direct Memory Offset Form of MOV

Opcode   Instruction
A0       MOV AL, moffset
A1       MOV EAX, moffset
A2       MOV moffset, AL
A3       MOV moffset, EAX

Immediates

==========

In 64-bit mode, the typical size of immediate operands remains 32 bits. When the operand size is 64 bits, the processor sign-extends all immediates to 64 bits prior to their use.

Support for 64-bit immediate operands is accomplished by expanding the semantics of the existing move (MOV reg, imm16/32) instructions. These instructions (opcodes B8H – BFH) move 16-bits or 32-bits of immediate data (depending on the effective operand size) into a GPR. When the effective operand size is 64 bits, these instructions can be used to load an immediate into a GPR. A REX prefix is needed to override the 32-bit default operand size to a 64-bit operand size.

For example:

48 B8 8877665544332211 MOV RAX,1122334455667788H

RIP-Relative Addressing

=======================

A new addressing form, RIP-relative (relative instruction-pointer) addressing, is implemented in 64-bit mode. An effective address is formed by adding displacement to the 64-bit RIP of the next instruction.

In IA-32 architecture and compatibility mode, addressing relative to the instruction pointer is available only with control-transfer instructions. In 64-bit mode, instructions that use ModR/M addressing can use RIP-relative addressing. Without RIP-relative addressing, all ModR/M instruction modes address memory relative to zero.

RIP-relative addressing allows specific ModR/M modes to address memory relative to the 64-bit RIP using a signed 32-bit displacement. This provides an offset range of ±2GB from the RIP. Table 2-7 shows the ModR/M and SIB encodings for RIP-relative addressing. Redundant forms of 32-bit displacement-addressing exist in the current ModR/M and SIB encodings. There is one ModR/M encoding and there are several SIB encodings. RIP-relative addressing is encoded using a redundant form.

In 64-bit mode, the ModR/M Disp32 (32-bit displacement) encoding is re-defined to be RIP+Disp32 rather than displacement-only.

ModR/M Byte

mod = 00 & r/m = 101 disp32 RIP + Disp32 Must use SIB form with normal (zero-based) displacement addressing

SIB Byte

base = 101 if mod = 00, Disp32 RIP + Disp32

The ModR/M encoding for RIP-relative addressing does not depend on using prefix. Specifically, the r/m bit field encoding of 101B (used to select RIP-relative addressing) is not affected by the REX prefix. For example, selecting R13 (REX.B = 1, r/m = 101B) with mod = 00B still results in RIP-relative addressing. The 4-bit r/m field of REX.B combined with ModR/M is not fully decoded. In order to address R13 with no displacement, software must encode R13 + 0 using a 1-byte displacement of zero.

RIP-relative addressing is enabled by 64-bit mode, not by a 64-bit address-size. The use of the address-size prefix does not disable RIP-relative addressing. The effect of the address-size prefix is to truncate and zero-extend the computed effective address to 32 bits.

Default 64-Bit Operand Size

===========================

In 64-bit mode, two groups of instructions have a default operand size of 64 bits (do not need a REX prefix for this operand size). These are:

• Near branches

• All instructions, except far branches, that implicitly reference the RSP

Meowthra · February 12, 2015

INTEL® ADVANCED VECTOR EXTENSIONS (INTEL® AVX)

==============================================

Intel AVX instructions are encoded using an encoding scheme that combines prefix bytes, opcode extension field, operand encoding fields, and vector length encoding capability into a new prefix, referred to as VEX. In the VEX encoding scheme, the VEX prefix may be two or three bytes long, depending on the instruction semantics. Despite the two-byte or three-byte length of the VEX prefix, the VEX encoding format provides a more compact represen- tation/packing of the components of encoding an instruction in Intel 64 architecture. The VEX encoding scheme also allows more headroom for future growth of Intel 64 architecture.

Instruction Format

==================

Instruction encoding using VEX prefix provides several advantages:

• Instruction syntax support for three operands and up-to four operands when necessary. For example, the third source register used by VBLENDVPD is encoded using bits 7:4 of the immediate byte.

• Encoding support for vector length of 128 bits (using XMM registers) and 256 bits (using YMM registers)

• Encoding support for instruction syntax of non-destructive source operands.

• Elimination of escape opcode byte (0FH), SIMD prefix byte (66H, F2H, F3H) via a compact bit field represen- tation within the VEX prefix.

• Elimination of the need to use REX prefix to encode the extended half of general-purpose register sets (R8- R15) for direct register access, memory addressing, or accessing XMM8-XMM15 (including YMM8-YMM15).

• Flexible and more compact bit fields are provided in the VEX prefix to retain the full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are provided in the three-byte VEX prefix only because only a subset of SIMD instructions need them.

• Extensibility for future instruction extensions without significant instruction length increase.

Figure 2-8 shows the Intel 64 instruction encoding format with VEX prefix support. Legacy instruction without a VEX prefix is fully supported and unchanged. The use of VEX prefix in an Intel 64 instruction is optional, but a VEX prefix is required for Intel 64 instructions that operate on YMM registers or support three and four operand syntax. VEX prefix is not a constant-valued, “single-purpose” byte like 0FH, 66H, F2H, F3H in legacy SSE instructions. VEX prefix provides substantially richer capability than the REX prefix.

Figure 2-8. Instruction Encoding Format with VEX Prefix

 ---------- ------------ -------- -------- -------- ---------- ----------
| Prefixes | VEX Prefix | OPCODE | ModR/M | SIB    | DISP     | IMM      |
 ---------- ------------ -------- -------- -------- ---------- ----------
| 1-4 byte |  2-3 byte  | 1 byte | 1 byte | 1 byte | 0-4 byte | 0-2 byte |
 ---------- ------------ -------- -------- -------- ---------- ----------

VEX and the LOCK prefix

Any VEX-encoded instruction with a LOCK prefix preceding VEX will #UD.

VEX and the 66H, F2H, and F3H prefixes

Any VEX-encoded instruction with a 66H, F2H, or F3H prefix preceding VEX will #UD.

VEX and the REX prefix

Any VEX-encoded instruction with a REX prefix proceeding VEX will #UD.

The VEX Prefix

==============

The VEX prefix is encoded in either the two-byte form (the first byte must be C5H) or in the three-byte form (the first byte must be C4H). The two-byte VEX is used mainly for 128-bit, scalar, and the most common 256-bit AVX instructions; while the three-byte VEX provides a compact replacement of REX and 3-byte opcode instructions (including AVX and FMA instructions). Beyond the first byte of the VEX prefix, it consists of a number of bit fields providing specific capability, they are shown in Figure 2-9.

The bit fields of the VEX prefix can be summarized by its functional purposes:

• Non-destructive source register encoding (applicable to three and four operand syntax): This is the first source operand in the instruction syntax. It is represented by the notation, VEX.vvvv. This field is encoded using 1’s complement form (inverted form), i.e. XMM0/YMM0/R0 is encoded as 1111B, XMM15/YMM15/R15 is encoded as 0000B.

• Vector length encoding: This 1-bit field represented by the notation VEX.L. L= 0 means vector length is 128 bits wide, L=1 means 256 bit vector. The value of this field is written as VEX.128 or VEX.256 in this document to distinguish encoded values of other VEX bit fields.

• REX prefix functionality: Full REX prefix functionality is provided in the three-byte form of VEX prefix. However the VEX bit fields providing REX functionality are encoded using 1’s complement form, i.e. XMM0/YMM0/R0 is encoded as 1111B, XMM15/YMM15/R15 is encoded as 0000B.

— Two-byte form of the VEX prefix only provides the equivalent functionality of REX.R, using 1’s complement encoding. This is represented as VEX.R.

— Three-byte form of the VEX prefix provides REX.R, REX.X, REX.B functionality using 1’s complement encoding and three dedicated bit fields represented as VEX.R, VEX.X, VEX.B.

— Three-byte form of the VEX prefix provides the functionality of REX.W only to specific instructions that need to override default 32-bit operand size for a general purpose register to 64-bit size in 64-bit mode. For those applicable instructions, VEX.W field provides the same functionality as REX.W. VEX.W field can provide completely different functionality for other instructions.

Consequently, the use of REX prefix with VEX encoded instructions is not allowed. However, the intent of the REX prefix for expanding register set is reserved for future instruction set extensions using VEX prefix encoding format.

• Compaction of SIMD prefix: Legacy SSE instructions effectively use SIMD prefixes (66H, F2H, F3H) as an opcode extension field. VEX prefix encoding allows the functional capability of such legacy SSE instructions (operating on XMM registers, bits 255:128 of corresponding YMM unmodified) to be encoded using the VEX.pp field without the presence of any SIMD prefix. The VEX-encoded 128-bit instruction will zero-out bits 255:128 of the destination register. VEX-encoded instruction may have 128 bit vector length or 256 bits length.

• Compaction of two-byte and three-byte opcode: More recently introduced legacy SSE instructions employ two and three-byte opcode. The one or two leading bytes are: 0FH, and 0FH 3AH/0FH 38H. The one-byte escape (0FH) and two-byte escape (0FH 3AH, 0FH 38H) can also be interpreted as an opcode extension field. The VEX.mmmmm field provides compaction to allow many legacy instruction to be encoded without the constant byte sequence, 0FH, 0FH 3AH, 0FH 38H. These VEX-encoded instruction may have 128 bit vector length or 256 bits length.

The VEX prefix is required to be the last prefix and immediately precedes the opcode bytes. It must follow any other prefixes. If VEX prefix is present a REX prefix is not supported.

The 3-byte VEX leaves room for future expansion with 3 reserved bits. REX and the 66h/F2h/F3h prefixes are reclaimed for future use.

VEX prefix has a two-byte form and a three byte form. If an instruction syntax can be encoded using the two-byte form, it can also be encoded using the three byte form of VEX. The latter increases the length of the instruction by one byte. This may be helpful in some situations for code alignment.

The VEX prefix supports 256-bit versions of floating-point SSE, SSE2, SSE3, and SSE4 instructions. Note, certain new instruction functionality can only be encoded with the VEX prefix.

The VEX prefix will #UD on any instruction containing MMX register sources or destinations.

Figure 2-9. VEX bit fields

              Byte 0      Byte 1                Byte 2

BitPosition 7        0    7 6 5   4  3 2 1 0     7   6  3   2   1 0
            ----------   ------- ------------   --- ------ --- -----
3-byte VEX | 11000100 | | R X B | m- m m m m | | W | vvvv | L | p p |
            ----------   ------- ------------   --- ------ --- -----

            7        0    7   6  3   2   1 0
            ----------   --- ------ --- -----
2-byte VEX | 11000101 | | R | vvvv | L | p p |
            ----------   --- ------ --- -----

R: REX.R in 1’s complement (inverted) form

1: Same as REX.R=0 (must be 1 in 32-bit mode) 0: Same as REX.R=1 (64-bit mode only)

X: REX.X in 1’s complement (inverted) form

1: Same as REX.X=0 (must be 1 in 32-bit mode) 0: Same as REX.X=1 (64-bit mode only)

B: REX.B in 1’s complement (inverted) form

1: Same as REX.B=0 (Ignored in 32-bit mode). 0: Same as REX.B=1 (64-bit mode only)

W: opcode specific (use like REX.W, or used for opcode extension, or ignored, depending on the opcode byte)

m-mmmm:

00000: Reserved for future use (will #UD) 00001: implied 0F leading opcode byte

00010: implied 0F 38 leading opcode bytes 00011: implied 0F 3A leading opcode bytes 00100-11111: Reserved for future use (will #UD)

vvvv: a register specifier (in 1’s complement form) or 1111 if unused.

L: Vector Length

0: scalar or 128-bit vector

1: 256-bit vector

pp: opcode extension providing equivalent functionality of a SIMD prefix

00: None

01: 66

10: F3

11: F2

The following subsections describe the various fields in two or three-byte VEX prefix:

VEX Byte 0, bits[7:0]

=====================

VEX Byte 0, bits [7:0] must contain the value 11000101b (C5h) or 11000100b (C4h). The 3-byte VEX uses the C4h first byte, while the 2-byte VEX uses the C5h first byte.

VEX Byte 1, bit [7] - ‘R’

=========================

VEX Byte 1, bit [7] contains a bit analogous to a bit inverted REX.R. In protected and compatibility modes the bit must be set to ‘1’ otherwise the instruction is LES or LDS.

This bit is present in both 2- and 3-byte VEX prefixes.

The usage of WRXB bits for legacy instructions is explained in detail section 2.2.1.2 of Intel 64 and IA-32 Architec- tures Software developer’s manual, Volume 2A.

This bit is stored in bit inverted format.

3-byte VEX byte 1, bit[6] - ‘X’

===============================

Bit[6] of the 3-byte VEX byte 1 encodes a bit analogous to a bit inverted REX.X. It is an extension of the SIB Index field in 64-bit modes. In 32-bit modes, this bit must be set to ‘1’ otherwise the instruction is LES or LDS.

This bit is available only in the 3-byte VEX prefix.

This bit is stored in bit inverted format.

3-byte VEX byte 1, bit[5] - ‘B’

===============================

Bit[5] of the 3-byte VEX byte 1 encodes a bit analogous to a bit inverted REX.B. In 64-bit modes, it is an extension of the ModR/M r/m field, or the SIB base field. In 32-bit modes, this bit is ignored.

This bit is available only in the 3-byte VEX prefix.

This bit is stored in bit inverted format.

3-byte VEX byte 2, bit[7] - ‘W’

===============================

Bit[7] of the 3-byte VEX byte 2 is represented by the notation VEX.W. It can provide following functions, depending on the specific opcode.

• For AVX instructions that have equivalent legacy SSE instructions (typically these SSE instructions have a general-purpose register operand with its operand size attribute promotable by REX.W), if REX.W promotes the operand size attribute of the general-purpose register operand in legacy SSE instruction, VEX.W has same meaning in the corresponding AVX equivalent form. In 32-bit modes, VEX.W is silently ignored.

• For AVX instructions that have equivalent legacy SSE instructions (typically these SSE instructions have oper- ands with their operand size attribute fixed and not promotable by REX.W), if REX.W is don’t care in legacy SSE instruction, VEX.W is ignored in the corresponding AVX equivalent form irrespective of mode.

• For new AVX instructions where VEX.W has no defined function (typically these meant the combination of the opcode byte and VEX.mmmmm did not have any equivalent SSE functions), VEX.W is reserved as zero and setting to other than zero will cause instruction to #UD.

2-byte VEX Byte 1, bits[6:3] & 3-byte VEX Byte 2,

bits [6:3]- ‘vvvv’ the Source or dest Register Specifier

========================================================

In 32-bit mode the VEX first byte C4 and C5 alias onto the LES and LDS instructions. To maintain compatibility with existing programs the VEX 2nd byte, bits [7:6] must be 11b. To achieve this, the VEX payload bits are selected to place only inverted, 64-bit valid fields (extended register selectors) in these upper bits.

The 2-byte VEX Byte 1, bits [6:3] and the 3-byte VEX, Byte 2, bits [6:3] encode a field (shorthand VEX.vvvv) that for instructions with 2 or more source registers and an XMM or YMM or memory destination encodes the first source register specifier stored in inverted (1’s complement) form.

VEX.vvvv is not used by the instructions with one source (except certain shifts, see below) or on instructions with no XMM or YMM or memory destination. If an instruction does not use VEX.vvvv then it should be set to 1111b otherwise instruction will #UD.

In 64-bit mode all 4 bits may be used. See Table 2-8 for the encoding of the XMM or YMM registers. In 32-bit and 16-bit modes bit 6 must be 1 (if bit 6 is not 1, the 2-byte VEX version will generate LDS instruction and the 3-byte VEX version will ignore this bit).

Table 2-8. VEX.vvvv to register name mapping

VEX.vvvv   Dest Register   Valid in Legacy/Compatibility 32-bit modes?
1111B      XMM0/YMM0       Valid
1110B      XMM1/YMM1       Valid
1101B      XMM2/YMM2       Valid
1100B      XMM3/YMM3       Valid
1011B      XMM4/YMM4       Valid
1010B      XMM5/YMM5       Valid
1001B      XMM6/YMM6       Valid
1000B      XMM7/YMM7       Valid
0111B      XMM8/YMM8       Invalid
0110B      XMM9/YMM9       Invalid
0101B      XMM10/YMM10     Invalid
0100B      XMM11/YMM11     Invalid
0011B      XMM12/YMM12     Invalid
0010B      XMM13/YMM13     Invalid
0001B      XMM14/YMM14     Invalid
0000B      XMM15/YMM15     Invalid

The VEX.vvvv field is encoded in bit inverted format for accessing a register operand.

Instruction Operand Encoding and VEX.vvvv, ModR/M

=================================================

VEX-encoded instructions support three-operand and four-operand instruction syntax. Some VEX-encoded instruc- tions have syntax with less than three operands, e.g. VEX-encoded pack shift instructions support one source operand and one destination operand).

The roles of VEX.vvvv, reg field of ModR/M byte (ModR/M.reg), r/m field of ModR/M byte (ModR/M.r/m) with respect to encoding destination and source operands vary with different type of instruction syntax.

The role of VEX.vvvv can be summarized to three situations:

• VEX.vvvv encodes the first source register operand, specified in inverted (1’s complement) form and is valid for instructions with 2 or more source operands.

• VEX.vvvv encodes the destination register operand, specified in 1’s complement form for certain vector shifts. The instructions where VEX.vvvv is used as a destination are listed in Table 2-9. The notation in the “Opcode” column in Table 2-9 is described in detail in section 3.1.1.

• VEX.vvvv does not encode any operand, the field is reserved and should contain 1111b.

Table 2-9. Instructions with a VEX.vvvv destination

Opcode                       Instruction mnemonic
VEX.NDD.128.66.0F 73 /7 ib   VPSLLDQ xmm1, xmm2, imm8
VEX.NDD.128.66.0F 73 /3 ib   VPSRLDQ xmm1, xmm2, imm8
VEX.NDD.128.66.0F 71 /2 ib   VPSRLW xmm1, xmm2, imm8
VEX.NDD.128.66.0F 72 /2 ib   VPSRLD xmm1, xmm2, imm8
VEX.NDD.128.66.0F 73 /2 ib   VPSRLQ xmm1, xmm2, imm8
VEX.NDD.128.66.0F 71 /4 ib   VPSRAW xmm1, xmm2, imm8
VEX.NDD.128.66.0F 72 /4 ib   VPSRAD xmm1, xmm2, imm8
VEX.NDD.128.66.0F 71 /6 ib   VPSLLW xmm1, xmm2, imm8
VEX.NDD.128.66.0F 72 /6 ib   VPSLLD xmm1, xmm2, imm8
VEX.NDD.128.66.0F 73 /6 ib   VPSLLQ xmm1, xmm2, imm8

The role of ModR/M.r/m field can be summarized to two situations:

• ModR/M.r/m encodes the instruction operand that references a memory address.

• For some instructions that do not support memory addressing semantics, ModR/M.r/m encodes either the destination register operand or a source register operand.

The role of ModR/M.reg field can be summarized to two situations:

• ModR/M.reg encodes either the destination register operand or a source register operand.

• For some instructions, ModR/M.reg is treated as an opcode extension and not used to encode any instruction operand.

For instruction syntax that support four operands, VEX.vvvv, ModR/M.r/m, ModR/M.reg encodes three of the four operands. The role of bits 7:4 of the immediate byte serves the following situation:

• Imm8[7:4] encodes the third source register operand.

3-byte VEX byte 1, bits[4:0] - “m-mmmm”

=======================================

Bits[4:0] of the 3-byte VEX byte 1 encode an implied leading opcode byte (0F, 0F 38, or 0F 3A). Several bits are reserved for future use and will #UD unless 0.

Table 2-10. VEX.m-mmmm interpretation

VEX.m-mmmm     Implied Leading Opcode Bytes
00000B         Reserved
00001B         0F
00010B         0F 38
00011B         0F 3A
00100-11111B   Reserved
(2-byte VEX)   0F

VEX.m-mmmm is only available on the 3-byte VEX. The 2-byte VEX implies a leading 0Fh opcode byte.

2-byte VEX byte 1, bit[2], & 3-byte VEX byte 2, bit [2]- “L”

============================================================

The vector length field, VEX.L, is encoded in bit[2] of either the second byte of 2-byte VEX, or the third byte of 3- byte VEX. If “VEX.L = 1”, it indicates 256-bit vector operation. “VEX.L = 0” indicates scalar and 128-bit vector oper- ations.

The instruction VZEROUPPER is a special case that is encoded with VEX.L = 0, although its operation zero’s bits 255:128 of all YMM registers accessible in the current operating mode.

See the following table.

Table 2-11. VEX.L interpretation

VEX.L    Vector Length
0        128-bit (or 32/64-bit scalar)
1        256-bit

2-byte VEX byte 1, bits[1:0], & 3-byte VEX byte 2, bits [1:0]- “pp”

===================================================================

Up to one implied prefix is encoded by bits[1:0] of either the 2-byte VEX byte 1 or the 3-byte VEX byte 2. The prefix behaves as if it was encoded prior to VEX, but after all other encoded prefixes.

See the following table.

Table 2-12. VEX.pp interpretation

pp     Implies this prefix after other prefixes but before VEX
00B    None
01B    66
10B    F3
11B    F2

The Opcode Byte

===============

One (and only one) opcode byte follows the 2 or 3 byte VEX. Legal opcodes are specified in Appendix B, in color. Any instruction that uses illegal opcode will #UD.

The MODRM, SIB, & Displacement Bytes

====================================

The encodings are unchanged but the interpretation of reg_field or rm_field differs (see above).

The Third Source Operand (Immediate Byte)

=========================================

VEX-encoded instructions can support instruction with a four operand syntax. VBLENDVPD, VBLENDVPS, and PBLENDVB use imm8[7:4] to encode one of the source registers.

AVX Instructions & the Upper 128-bits of YMM registers

======================================================

If an instruction with a destination XMM register is encoded with a VEX prefix, the processor zeroes the upper bits (above bit 128) of the equivalent YMM register . Legacy SSE instructions without VEX preserve the upper bits.

Vector Length Transition and Programming Considerations

=======================================================

An instruction encoded with a VEX.128 prefix that loads a YMM register operand operates as follows:

• Data is loaded into bits 127:0 of the register

• Bits above bit 127 in the register are cleared.

Thus, such an instruction clears bits 255:128 of a destination YMM register on processors with a maximum vector- register width of 256 bits. In the event that future processors extend the vector registers to greater widths, an instruction encoded with a VEX.128 or VEX.256 prefix will also clear any bits beyond bit 255. (This is in contrast with legacy SSE instructions, which have no VEX prefix; these modify only bits 127:0 of any destination register operand.)

Programmers should bear in mind that instructions encoded with VEX.128 and VEX.256 prefixes will clear any future extensions to the vector registers. A calling function that uses such extensions should save their state before calling legacy functions. This is not possible for involuntary calls (e.g., into an interrupt-service routine). It is recommended that software handling involuntary calls accommodate this by not executing instructions encoded with VEX.128 and VEX.256 prefixes. In the event that it is not possible or desirable to restrict these instructions, then software must take special care to avoid actions that would, on future processors, zero the upper bits of vector registers.

Processors that support further vector-register extensions (defining bits beyond bit 255) will also extend the XSAVE and XRSTOR instructions to save and restore these extensions. To ensure forward compatibility, software that handles involuntary calls and that uses instructions encoded with VEX.128 and VEX.256 prefixes should first save and then restore the vector registers (with any extensions) using the XSAVE and XRSTOR instructions with save/restore masks that set bits that correspond to all vector-register extensions. Ideally, software should rely on a mechanism that is cognizant of which bits to set. (E.g., an OS mechanism that sets the save/restore mask bits for all vector-register extensions that are enabled in XCR0.) Saving and restoring state with instructions other than XSAVE and XRSTOR will, on future processors with wider vector registers, corrupt the extended state of the vector registers - even if doing so functions correctly on processors supporting 256-bit vector registers. (The same is true if XSAVE and XRSTOR are used with a save/restore mask that does not set bits corresponding to all supported extensions to the vector registers.)

AVX Instruction Length

======================

The AVX instructions described in this document (including VEX and ignoring other prefixes) do not exceed 11 bytes in length, but may increase in the future. The maximum length of an Intel 64 and IA-32 instruction remains 15 bytes.

Vector SIB (VSIB) Memory Addressing

===================================

In Intel Advanced Vector Extensions 2 (Intel® AVX2), an SIB byte that follows the ModR/M byte can support VSIB memory addressing to an array of linear addresses. VSIB addressing is only supported in a subset of Intel AVX2 instructions. VSIB memory addressing requires 32-bit or 64-bit effective address. In 32-bit mode, VSIB addressing is not supported when address size attribute is overridden to 16 bits. In 16-bit protected mode, VSIB memory addressing is permitted if address size attribute is overridden to 32 bits. Additionally, VSIB memory addressing is supported only with VEX prefix.

In VSIB memory addressing, the SIB byte consists of:

• The scale field (bit 7:6) specifies the scale factor.

• The index field (bits 5:3) specifies the register number of the vector index register, each element in the vector register specifies an index.

• The base field (bits 2:0) specifies the register number of the base register.

Table 2-3 shows the 32-bit VSIB addressing form. It is organized to give 256 possible values of the SIB byte (in hexadecimal). General purpose registers used as a base are indicated across the top of the table, along with corre- sponding values for the SIB byte’s base field. The register names also include R8L-R15L applicable only in 64-bit mode (when address size override prefix is used, but the value of VEX.B is not shown in Table 2-3). In 32-bit mode, R8L-R15L does not apply.

Table rows in the body of the table indicate the vector index register used as the index field and each supported scaling factor shown separately. Vector registers used in the index field can be XMM or YMM registers. The left-most column includes vector registers VR8-VR15 (i.e. XMM8/YMM8-XMM15/YMM15), which are only available in 64-bit mode and does not apply if encoding in 32-bit mode.

Table 2-13. 32-Bit VSIB Addressing Forms of the SIB Byte

   r32                      EAX   ECX   EDX   EBX   ESP   EBP   ESI   EDI
                            R8L   R9L   R10L  R11L  R12L  R13L  R14L  R15L
   Base =                   0     1     2     3     4     5     6     7
   Base =                   000   001   010   011   100   101   110   111

+Scaled Index+ +SS Index+  +-----Value of SIB Byte (in Hexadecimal)-----+

VR0/VR8              000    00    01    02    03    04    05    06    07
VR1/VR9              001    08    09    0A    0B    0C    0D    0E    0F
VR2/VR10             010    10    11    12    13    14    15    16    17
VR3/VR11             011    18    19    1A    1B    1C    1D    1E    1F
VR4/VR12   *1   00   100    20    21    22    23    24    25    26    27
VR5/VR13             101    28    29    2A    2B    2C    2D    2E    2F
VR6/VR14             110    30    31    32    33    34    35    36    37
VR7/VR15             111    38    39    3A    3B    3C    3D    3E    3F

VR0/VR8              000    40    41    42    43    44    45    46    47
VR1/VR9              001    48    49    4A    4B    4C    4D    4E    4F
VR2/VR10             010    50    51    52    53    54    55    56    57
VR3/VR11             011    58    59    5A    5B    5C    5D    5E    5F
VR4/VR12   *2   01   100    60    61    62    63    64    65    66    67
VR5/VR13             101    68    69    6A    6B    6C    6D    6E    6F
VR6/VR14             110    70    71    72    73    74    75    76    77
VR7/VR15             111    78    79    7A    7B    7C    7D    7E    7F

VR0/VR8              000    80    81    82    83    84    85    86    87
VR1/VR9              001    88    89    8A    8B    8C    8D    8E    8F
VR2/VR10             010    90    91    92    93    94    95    96    97
VR3/VR11             011    98    89    9A    9B    9C    9D    9E    9F
VR4/VR12   *4   10   100    A0    A1    A2    A3    A4    A5    A6    A7
VR5/VR13             101    A8    A9    AA    AB    AC    AD    AE    AF
VR6/VR14             110    B0    B1    B2    B3    B4    B5    B6    B7
VR7/VR15             111    B8    B9    BA    BB    BC    BD    BE    BF

VR0/VR8              000    C0    C1    C2    C3    C4    C5    C6    C7
VR1/VR9              001    C8    C9    CA    CB    CC    CD    CE    CF
VR2/VR10             010    D0    D1    D2    D3    D4    D5    D6    D7
VR3/VR11             011    D8    D9    DA    DB    DC    DD    DE    DF
VR4/VR12   *8   11   100    E0    E1    E2    E3    E4    E5    E6    E7
VR5/VR13             101    E8    E9    EA    EB    EC    ED    EE    EF
VR6/VR14             110    F0    F1    F2    F3    F4    F5    F6    F7
VR7/VR15             111    F8    F9    FA    FB    FC    FD    FE    FF

NOTES:

1. If ModR/M.mod = 00b, the base address is zero, then effective address is computed as [scaled vector index] + disp32. Otherwise the base address is computed as [EBP/R13]+ disp, the displacement is either 8 bit or 32 bit depending on the value of ModR/M.mod:

MOD Effective Address

00b [scaled Vector Register] + Disp32

01b [scaled Vector Register] + Disp8 + [EBP/R13]

10b [scaled Vector Register] + Disp32 + [EBP/R13]

64-bit Mode VSIB Memory Addressing

==================================

In 64-bit mode VSIB memory addressing uses the VEX.B field and the base field of the SIB byte to encode one of the 16 general-purpose register as the base register. The VEX.X field and the index field of the SIB byte encode one of the 16 vector registers as the vector index register.

In 64-bit mode the top row of Table 2-13 base register should be interpreted as the full 64-bit of each register.

VEX ENCODING SUPPORT FOR GPR INSTRUCTIONS

=========================================

VEX prefix may be used to encode instructions that operate on neither YMM nor XMM registers. VEX-encoded general-purpose-register instructions have the following properties:

• Instruction syntax support for three encodable operands.

• Encoding support for instruction syntax of non-destructive source operand, destination operand encoded via

VEX.vvvv, and destructive three-operand syntax.

• Elimination of escape opcode byte (0FH), two-byte escape via a compact bit field representation within the VEX prefix.

• Elimination of the need to use REX prefix to encode the extended half of general-purpose register sets (R8- R15) for direct register access or memory addressing.

• Flexible and more compact bit fields are provided in the VEX prefix to retain the full functionality provided by REX prefix. REX.W, REX.X, REX.B functionalities are provided in the three-byte VEX prefix only.

• VEX-encoded GPR instructions are encoded with VEX.L=0.

Any VEX-encoded GPR instruction with a 66H, F2H, or F3H prefix preceding VEX will #UD. Any VEX-encoded GPR instruction with a REX prefix proceeding VEX will #UD. VEX-encoded GPR instructions are not supported in real and virtual 8086 modes.

Meowthra · February 12, 2015

CHAPTER 3

Instruction Format

==================

The following is an example of the format used for each instruction description in this chapter. The heading below introduces the example. The table below provides an example summary table.

CMC—Complement Carry Flag [this is an example]

==============================================

Opcode   Instruction   Op/En   64/32-bit   CPUID        Description
                               Mode        Feature Flag
F5       CMC           A       V/V         NP           Complement carry flag.

Instruction Operand Encoding

Op/En  Operand 1  Operand 2  Operand 3  Operand 4
NP     NA         NA         NA         NA

Opcode Column in the Instruction Summary Table

(Instructions without VEX prefix)

==============================================

The “Opcode” column in the table above shows the object code produced for each form of the instruction. When possible, codes are given as hexadecimal bytes in the same order in which they appear in memory. Definitions of entries other than hexadecimal bytes are as follows:

• REX.W — Indicates the use of a REX prefix that affects operand size or instruction semantics. The ordering of the REX prefix and other optional/mandatory instruction prefixes are discussed Chapter 2. Note that REX prefixes that promote legacy instructions to 64-bit behavior are not listed explicitly in the opcode column.

• /digit — A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode.

• /r — Indicates that the ModR/M byte of the instruction contains a register operand and an r/m operand.

• cb, cw, cd, cp, co, ct — A 1-byte (cb), 2-byte (cw), 4-byte (cd), 6-byte (cp), 8-byte (co) or 10-byte (ct) value following the opcode. This value is used to specify a code offset and possibly a new value for the code segment register.

• ib, iw, id, io — A 1-byte (ib), 2-byte (iw), 4-byte (id) or 8-byte (io) immediate operand to the instruction that follows the opcode, ModR/M bytes or scale-indexing bytes. The opcode determines if the operand is a signed value. All words, doublewords and quadwords are given with the low-order byte first.

• +rb, +rw, +rd, +ro — A register code, from 0 through 7, added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte. See Table 3-1 for the codes. The +ro columns in the table are applicable only in 64-bit mode.

• +i — A number used in floating-point instructions when one of the operands is ST(i) from the FPU register stack. The number i (which can range from 0 to 7) is added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte.

Table 3-1. Register Codes Associated With +rb, +rw, +rd, +ro

 --------------------- --------------------- --------------------- ---------------------
| byte register       | word register       | dword register      | quadword register   |
 --------------------- --------------------- --------------------- ---------------------
|Reg   |REX.B  |Reg   |Reg   |REX.B  |Reg   |Reg   |REX.B  |Reg   |Reg   |REX.B  |Reg   |
|      |       |Field |      |       |Field |      |       |Field |      |       |Field |
 --------------------- --------------------- --------------------- ---------------------
|AL      None    0    |AX      None    0    |EAX     None    0    |RAX     None    0    |
|CL      None    1    |CX      None    1    |ECX     None    1    |RCX     None    1    |
|DL      None    2    |DX      None    2    |EDX     None    2    |RDX     None    2    |
|BL      None    3    |BX      None    3    |EBX     None    3    |RBX     None    3    |
|AH      N.E.    4    |SP      None    4    |ESP     None    4    |N/A     N/A     N/A  |
|CH      N.E.    5    |BP      None    5    |EBP     None    5    |N/A     N/A     N/A  |
|DH      N.E.    6    |SI      None    6    |ESI     None    6    |N/A     N/A     N/A  |
|BH      N.E.    7    |DI      None    7    |EDI     None    7    |N/A     N/A     N/A  |
|SPL     Yes     4    |SP      None    4    |ESP     None    4    |RSP     None    4    |
|BPL     Yes     5    |BP      None    5    |EBP     None    5    |RBP     None    5    |
|SIL     Yes     6    |SI      None    6    |ESI     None    6    |RSI     None    6    |
|DIL     Yes     7    |DI      None    7    |EDI     None    7    |RDI     None    7    |
 ---------------------------------------------------------------------------------------
|            Registers R8 - R15 (see below): Available in 64-Bit Mode Only              |
 ---------------------------------------------------------------------------------------
|R8L     Yes     0    |R8W     Yes     0    |R8D     Yes     0    |R8      Yes     0    |
|R9L     Yes     1    |R9W     Yes     1    |R9D     Yes     1    |R9      Yes     1    |
|R10L    Yes     2    |R10W    Yes     2    |R10D    Yes     2    |R10     Yes     2    |
|R11L    Yes     3    |R11W    Yes     3    |R11D    Yes     3    |R11     Yes     3    |
|R12L    Yes     4    |R12W    Yes     4    |R12D    Yes     4    |R12     Yes     4    |
|R13L    Yes     5    |R13W    Yes     5    |R13D    Yes     5    |R13     Yes     5    |
|R14L    Yes     6    |R14W    Yes     6    |R14D    Yes     6    |R14     Yes     6    |
|R15L    Yes     7    |R15W    Yes     7    |R15D    Yes     7    |R15     Yes     7    |
 ---------------------------------------------------------------------------------------

Opcode Column in the Instruction Summary Table

(Instructions with VEX prefix)

==============================================

In the Instruction Summary Table, the Opcode column presents each instruction encoded using the VEX prefix in following form (including the modR/M byte if applicable, the immediate byte if applicable):

VEX.[NDS].[128,256].[66,F2,F3].0F/0F3A/0F38.[W0,W1] opcode [/r] [/ib,/is4]

• VEX: indicates the presence of the VEX prefix is required. The VEX prefix can be encoded using the three-byte form (the first byte is C4H), or using the two-byte form (the first byte is C5H). The two-byte form of VEX only applies to those instructions that do not require the following fields to be encoded: VEX.mmmmm, VEX.W, VEX.X, VEX.B. Refer to Section 2.3 for more detail on the VEX prefix.

The encoding of various sub-fields of the VEX prefix is described using the following notations:

— NDS, NDD, DDS: specifies that VEX.vvvv field is valid for the encoding of a register operand:

• VEX.NDS: VEX.vvvv encodes the first source register in an instruction syntax where the content of source registers will be preserved.

• VEX.NDD: VEX.vvvv encodes the destination register that cannot be encoded by ModR/M:reg field.

• VEX.DDS: VEX.vvvv encodes the second source register in a three-operand instruction syntax where

the content of first source register will be overwritten by the result.

• If none of NDS, NDD, and DDS is present, VEX.vvvv must be 1111b (i.e. VEX.vvvv does not encode an operand). The VEX.vvvv field can be encoded using either the 2-byte or 3-byte form of the VEX prefix.

— 128,256: VEX.L field can be 0 (denoted by VEX.128 or VEX.LZ) or 1 (denoted by VEX.256). The VEX.L field can be encoded using either the 2-byte or 3-byte form of the VEX prefix. The presence of the notation VEX.256 or VEX.128 in the opcode column should be interpreted as follows:

• If VEX.256 is present in the opcode column: The semantics of the instruction must be encoded with VEX.L = 1. An attempt to encode this instruction with VEX.L= 0 can result in one of two situations: [a] if VEX.128 version is defined, the processor will behave according to the defined VEX.128 behavior; an #UD occurs if there is no VEX.128 version defined.

• If VEX.128 is present in the opcode column but there is no VEX.256 version defined for the same opcode byte: Two situations apply: [a] For VEX-encoded, 128-bit SIMD integer instructions, software must encode the instruction with VEX.L = 0. The processor will treat the opcode byte encoded with VEX.L= 1 by causing an #UD exception; For VEX-encoded, 128-bit packed floating-point instruc- tions, software must encode the instruction with VEX.L = 0. The processor will treat the opcode byte encoded with VEX.L= 1 by causing an #UD exception (e.g. VMOVLPS).

• If VEX.LIG is present in the opcode column: The VEX.L value is ignored. This generally applies to VEX- encoded scalar SIMD floating-point instructions. Scalar SIMD floating-point instruction can be distin- guished from the mnemonic of the instruction. Generally, the last two letters of the instruction mnemonic would be either “SS“, “SD“, or “SI“ for SIMD floating-point conversion instructions.

• If VEX.LZ is present in the opcode column: The VEX.L must be encoded to be 0B, an #UD occurs if VEX.L is not zero.

— 66,F2,F3: The presence or absence of these values map to the VEX.pp field encodings. If absent, this corresponds to VEX.pp=00B. If present, the corresponding VEX.pp value affects the “opcode” byte in the same way as if a SIMD prefix (66H, F2H or F3H) does to the ensuing opcode byte. Thus a non-zero encoding of VEX.pp may be considered as an implied 66H/F2H/F3H prefix. The VEX.pp field may be encoded using either the 2-byte or 3-byte form of the VEX prefix.

— 0F,0F3A,0F38: The presence maps to a valid encoding of the VEX.mmmmm field. Only three encoded values of VEX.mmmmm are defined as valid, corresponding to the escape byte sequence of 0FH, 0F3AH and 0F38H. The effect of a valid VEX.mmmmm encoding on the ensuing opcode byte is same as if the corre- sponding escape byte sequence on the ensuing opcode byte for non-VEX encoded instructions. Thus a valid encoding of VEX.mmmmm may be consider as an implies escape byte sequence of either 0FH, 0F3AH or 0F38H. The VEX.mmmmm field must be encoded using the 3-byte form of VEX prefix.

— 0F,0F3A,0F38 and 2-byte/3-byte VEX. The presence of 0F3A and 0F38 in the opcode column implies that opcode can only be encoded by the three-byte form of VEX. The presence of 0F in the opcode column does not preclude the opcode to be encoded by the two-byte of VEX if the semantics of the opcode does not require any subfield of VEX not present in the two-byte form of the VEX prefix.

— W0: VEX.W=0.

— W1: VEX.W=1.

— The presence of W0/W1 in the opcode column applies to two situations: [a] it is treated as an extended opcode bit, the instruction semantics support an operand size promotion to 64-bit of a general-purpose register operand or a 32-bit memory operand. The presence of W1 in the opcode column implies the opcode must be encoded using the 3-byte form of the VEX prefix. The presence of W0 in the opcode column does not preclude the opcode to be encoded using the C5H form of the VEX prefix, if the semantics of the opcode does not require other VEX subfields not present in the two-byte form of the VEX prefix. Please see Section 2.3 on the subfield definitions within VEX.

— WIG: can use C5H form (if not requiring VEX.mmmmm) or VEX.W value is ignored in the C4H form of VEX prefix.

— If WIG is present, the instruction may be encoded using either the two-byte form or the three-byte form of VEX. When encoding the instruction using the three-byte form of VEX, the value of VEX.W is ignored.

• opcode: Instruction opcode.

• /is4: An 8-bit immediate byte is present containing a source register specifier in imm[7:4] and instruction-

specific payload in imm[3:0].

• In general, the encoding o f VEX.R, VEX.X, VEX.B field are not shown explicitly in the opcode column. The encoding scheme of VEX.R, VEX.X, VEX.B fields must follow the rules defined in Section 2.3.

Instruction Column in the Opcode Summary Table

==============================================

The “Instruction” column gives the syntax of the instruction statement as it would appear in an ASM386 program. The following is a list of the symbols used to represent operands in the instruction statements:

• rel8 — A relative address in the range from 128 bytes before the end of the instruction to 127 bytes after the end of the instruction.

• rel16, rel32 — A relative address within the same code segment as the instruction assembled. The rel16 symbol applies to instructions with an operand-size attribute of 16 bits; the rel32 symbol applies to instructions with an operand-size attribute of 32 bits.

• ptr16:16, ptr16:32 — A far pointer, typically to a code segment different from that of the instruction. The notation 16:16 indicates that the value of the pointer has two parts. The value to the left of the colon is a 16- bit selector or value destined for the code segment register. The value to the right corresponds to the offset within the destination segment. The ptr16:16 symbol is used when the instruction's operand-size attribute is 16 bits; the ptr16:32 symbol is used when the operand-size attribute is 32 bits.

• r8 — One of the byte general-purpose registers: AL, CL, DL, BL, AH, CH, DH, BH, BPL, SPL, DIL and SIL; or one of the byte registers (R8L - R15L) available when using REX.R and 64-bit mode.

• r16 — One of the word general-purpose registers: AX, CX, DX, BX, SP, BP, SI, DI; or one of the word registers (R8-R15) available when using REX.R and 64-bit mode.

• r32 — One of the doubleword general-purpose registers: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI; or one of the doubleword registers (R8D - R15D) available when using REX.R in 64-bit mode.

• r64 — One of the quadword general-purpose registers: RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8–R15. These are available when using REX.R and 64-bit mode.

• imm8 — An immediate byte value. The imm8 symbol is a signed number between –128 and +127 inclusive. For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign- extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the immediate value.

• imm16 — An immediate word value used for instructions whose operand-size attribute is 16 bits. This is a number between –32,768 and +32,767 inclusive.

• imm32 — An immediate doubleword value used for instructions whose operand-size attribute is 32 bits. It allows the use of a number between +2,147,483,647 and –2,147,483,648 inclusive.

• imm64 — An immediate quadword value used for instructions whose operand-size attribute is 64 bits. The value allows the use of a number between +9,223,372,036,854,775,807 and – 9,223,372,036,854,775,808 inclusive.

• r/m8 — A byte operand that is either the contents of a byte general-purpose register (AL, CL, DL, BL, AH, CH, DH, BH, BPL, SPL, DIL and SIL) or a byte from memory. Byte registers R8L - R15L are available using REX.R in 64-bit mode.

• r/m16 — A word general-purpose register or memory operand used for instructions whose operand-size attribute is 16 bits. The word general-purpose registers are: AX, CX, DX, BX, SP, BP, SI, DI. The contents of memory are found at the address provided by the effective address computation. Word registers R8W - R15W are available using REX.R in 64-bit mode.

• r/m32 — A doubleword general-purpose register or memory operand used for instructions whose operand- size attribute is 32 bits. The doubleword general-purpose registers are: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI. The contents of memory are found at the address provided by the effective address computation. Doubleword registers R8D - R15D are available when using REX.R in 64-bit mode.

• r/m64 — A quadword general-purpose register or memory operand used for instructions whose operand-size attribute is 64 bits when using REX.W. Quadword general-purpose registers are: RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8–R15; these are available only in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• m — A 16-, 32- or 64-bit operand in memory.

• m8 — A byte operand in memory, usually expressed as a variable or array name, but pointed to by the

DS:(E)SI or ES:(E)DI registers. In 64-bit mode, it is pointed to by the RSI or RDI registers.

• m16 — A word operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.

• m32 — A doubleword operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.

• m64 — A memory quadword operand in memory.

• m128 — A memory double quadword operand in memory.

• m16:16, m16:32 & m16:64 — A memory operand containing a far pointer composed of two numbers. The number to the left of the colon corresponds to the pointer's segment selector. The number to the right corresponds to its offset.

• m16&32, m16&16, m32&32, m16&64 — A memory operand consisting of data item pairs whose sizes are indicated on the left and the right side of the ampersand. All memory addressing modes are allowed. The m16&16 and m32&32 operands are used by the BOUND instruction to provide an operand containing an upper and lower bounds for array indices. The m16&32 operand is used by LIDT and LGDT to provide a word with which to load the limit field, and a doubleword with which to load the base field of the corresponding GDTR and IDTR registers. The m16&64 operand is used by LIDT and LGDT in 64-bit mode to provide a word with which to load the limit field, and a quadword with which to load the base field of the corresponding GDTR and IDTR registers.

• moffs8, moffs16, moffs32, moffs64 — A simple memory variable (memory offset) of type byte, word, or doubleword used by some variants of the MOV instruction. The actual address is given by a simple offset relative to the segment base. No ModR/M byte is used in the instruction. The number shown with moffs indicates its size, which is determined by the address-size attribute of the instruction.

• Sreg — A segment register. The segment register bit assignments are ES = 0, CS = 1, SS = 2, DS = 3, FS = 4, and GS = 5.

• m32fp, m64fp, m80fp — A single-precision, double-precision, and double extended-precision (respectively) floating-point operand in memory. These symbols designate floating-point values that are used as operands for x87 FPU floating-point instructions.

• m16int, m32int, m64int — A word, doubleword, and quadword integer (respectively) operand in memory. These symbols designate integers that are used as operands for x87 FPU integer instructions.

• ST or ST(0) — The top element of the FPU register stack.

• ST(i) — The ith element from the top of the FPU register stack (i ← 0 through 7).

• mm — An MMX register. The 64-bit MMX registers are: MM0 through MM7.

• mm/m32 — The low order 32 bits of an MMX register or a 32-bit memory operand. The 64-bit MMX registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.

• mm/m64 — An MMX register or a 64-bit memory operand. The 64-bit MMX registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.

• xmm — An XMM register. The 128-bit XMM registers are: XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode.

• xmm/m32— An XMM register or a 32-bit memory operand. The 128-bit XMM registers are XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• xmm/m64 — An XMM register or a 64-bit memory operand. The 128-bit SIMD floating-point registers are XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• xmm/m128 — An XMM register or a 128-bit memory operand. The 128-bit XMM registers are XMM0 through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at the address provided by the effective address computation.

• <XMM0>— indicates implied use of the XMM0 register.

When there is ambiguity, xmm1 indicates the first source operand using an XMM register and xmm2 the second source operand using an XMM register.

Some instructions use the XMM0 register as the third source operand, indicated by <XMM0>. The use of the third XMM register operand is implicit in the instruction encoding and does not affect the ModR/M encoding.

• ymm — a YMM register. The 256-bit YMM registers are: YMM0 through YMM7; YMM8 through YMM15 are available in 64-bit mode.

• m256 — A 32-byte operand in memory. This nomenclature is used only with AVX instructions.

• ymm/m256 — a YMM register or 256-bit memory operand.

• <YMM0>— indicates use of the YMM0 register as an implicit argument.

• SRC1 — Denotes the first source operand in the instruction syntax of an instruction encoded with the VEX prefix and having two or more source operands.

• SRC2 — Denotes the second source operand in the instruction syntax of an instruction encoded with the VEX prefix and having two or more source operands.

• SRC3 — Denotes the third source operand in the instruction syntax of an instruction encoded with the VEX prefix and having three source operands.

• SRC — The source in a AVX single-source instruction or the source in a Legacy SSE instruction.

• DST — the destination in a AVX instruction. In Legacy SSE instructions can be either the destination, first source, or both. This field is encoded by reg_field.

Operand Encoding Column in the Instruction Summary Table

========================================================

The “operand encoding” column is abbreviated as Op/En in the Instruction Summary table heading. Instruction operand encoding information is provided for each assembly instruction syntax using a letter to cross reference to a row entry in the operand encoding definition table that follows the instruction summary table. The operand encoding table in each instruction reference page lists each instruction operand (according to each instruction syntax and operand ordering shown in the instruction column) relative to the ModRM byte, VEX.vvvv field or addi- tional operand encoding placement.

NOTES

The letters in the Op/En column of an instruction apply ONLY to the encoding definition table immediately following the instruction summary table.

In the encoding definition table, the letter ‘r’ within a pair of parenthesis denotes the content of the operand will be read by the processor. The letter ‘w’ within a pair of parenthesis denotes the content of the operand will be updated by the processor.

64/32-bit Mode Column in the Instruction Summary Table

======================================================

The “64/32-bit Mode” column indicates whether the opcode sequence is supported in [a] 64-bit mode or the Compatibility mode and other IA-32 modes that apply in conjunction with the CPUID feature flag associated specific instruction extensions.

The 64-bit mode support is to the left of the ‘slash’ and has the following notation:

• V — Supported.

• I — Not supported.

• N.E. — Indicates an instruction syntax is not encodable in 64-bit mode (it may represent part of a sequence of valid instructions in other modes).

• N.P. — Indicates the REX prefix does not affect the legacy instruction in 64-bit mode.

• N.I. — Indicates the opcode is treated as a new instruction in 64-bit mode.

• N.S. — Indicates an instruction syntax that requires an address override prefix in 64-bit mode and is not supported. Using an address override prefix in 64-bit mode may result in model-specific execution behavior.

The Compatibility/Legacy Mode support is to the right of the ‘slash’ and has the following notation:

• V — Supported.

• I — Not supported.

• N.E. — Indicates an Intel 64 instruction mnemonics/syntax that is not encodable; the opcode sequence is not applicable as an individual instruction in compatibility mode or IA-32 mode. The opcode may represent a valid sequence of legacy IA-32 instructions.

CPUID Support Column in the Instruction Summary Table

=====================================================

The fourth column holds abbreviated CPUID feature flags (e.g. appropriate bit in CPUID.1.ECX, CPUID.1.EDX for SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AESNI/PCLMULQDQ/AVX/RDRAND support) that indicate processor support for the instruction. If the corresponding flag is ‘0’, the instruction will #UD.

Description Column in the Instruction Summary Table

===================================================

The “Description” column briefly explains forms of the instruction.

Description Section

===================

Each instruction is then described by number of information sections. The “Description” section describes the purpose of the instructions and required operands in more detail.

Summary of terms that may be used in the description section:

• Legacy SSE: Refers to SSE, SSE2, SSE3, SSSE3, SSE4, AESNI, PCLMULQDQ and any future instruction sets referencing XMM registers and encoded without a VEX prefix.

• VEX.vvvv. The VEX bitfield specifying a source or destination register (in 1’s complement form).

• rm_field: shorthand for the ModR/M r/m field and any REX.B

• reg_field: shorthand for the ModR/M reg field and any REX.R

Operation Section

=================

The “Operation” section contains an algorithm description (frequently written in pseudo-code) for the instruction. Algorithms are composed of the following elements:

• Comments are enclosed within the symbol pairs “(*” and “*)”.

• Compound statements are enclosed in keywords, such as: IF, THEN, ELSE and FI for an if statement; DO and

OD for a do statement; or CASE... OF for a case statement.

• A register name implies the contents of the register. A register name enclosed in brackets implies the contents of the location whose address is contained in that register. For example, ES:[DI] indicates the contents of the location whose ES segment relative address is in register DI. [sI] indicates the contents of the address contained in register SI relative to the SI register’s default segment (DS) or the overridden segment.

• Parentheses around the “E” in a general-purpose register name, such as (E)SI, indicates that the offset is read from the SI register if the address-size attribute is 16, from the ESI register if the address-size attribute is 32. Parentheses around the “R” in a general-purpose register name, ®SI, in the presence of a 64-bit register definition such as ®SI, indicates that the offset is read from the 64-bit RSI register if the address-size attribute is 64.

• Brackets are used for memory operands where they mean that the contents of the memory location is a segment-relative offset. For example, [sRC] indicates that the content of the source operand is a segment- relative offset.

• A ← B indicates that the value of B is assigned to A.

• The symbols =, ≠, >, <, ≥, and ≤ are relational operators used to compare two values: meaning equal, not equal, greater or equal, less or equal, respectively. A relational expression such as A ← B is TRUE if the value of A is equal to B; otherwise it is FALSE.

• The expression “« COUNT” and “» COUNT” indicates that the destination operand should be shifted left or right by the number of bits indicated by the count operand.

The following identifiers are used in the algorithmic descriptions:

• OperandSize and AddressSize — The OperandSize identifier represents the operand-size attribute of the instruction, which is 16, 32 or 64-bits. The AddressSize identifier represents the address-size attribute, which is 16, 32 or 64-bits. For example, the following pseudo-code indicates that the operand-size attribute depends on the form of the MOV instruction used.

   IF Instruction ← MOVW
      THEN OperandSize = 16;
   ELSE
      IF Instruction ← MOVD
         THEN OperandSize = 32;
      ELSE
         IF Instruction ← MOVQ
            THEN OperandSize = 64;
         FI;
      FI;
   FI;

See “Operand-Size and Address-Size Attributes” in Chapter 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for guidelines on how these attributes are determined.

• StackAddrSize — Represents the stack address-size attribute associated with the instruction, which has a value of 16, 32 or 64-bits. See “Address-Size Attribute for Stack” in Chapter 6, “Procedure Calls, Interrupts, and Exceptions,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.

• SRC — Represents the source operand.

• DEST — Represents the destination operand.

• VLMAX — The maximum vector register width pertaining to the instruction. This is not the vector-length encoding in the instruction's prefix but is instead determined by the current value of XCR0. For existing processors, VLMAX is 256 whenever XCR0.YMM[bit 2] is 1. Future processors may defined new bits in XCR0 whose setting may imply other values for VLMAX.

VLMAX Definition

XCR0 Component        VLMAX
XCR0.YMM              256

The following functions are used in the algorithmic descriptions:

• ZeroExtend(value) — Returns a value zero-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, zero extending a byte value of –10 converts the byte from F6H to a doubleword value of 000000F6H. If the value passed to the ZeroExtend function and the operand-size attribute are the same size, ZeroExtend returns the value unaltered.

• SignExtend(value) — Returns a value sign-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, sign extending a byte containing the value –10 converts the byte from F6H to a doubleword value of FFFFFFF6H. If the value passed to the SignExtend function and the operand- size attribute are the same size, SignExtend returns the value unaltered.

• SaturateSignedWordToSignedByte — Converts a signed 16-bit value to a signed 8-bit value. If the signed 16-bit value is less than –128, it is represented by the saturated value -128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).

• SaturateSignedDwordToSignedWord — Converts a signed 32-bit value to a signed 16-bit value. If the signed 32-bit value is less than –32768, it is represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).

• SaturateSignedWordToUnsignedByte — Converts a signed 16-bit value to an unsigned 8-bit value. If the signed 16-bit value is less than zero, it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).

• SaturateToSignedByte — Represents the result of an operation as a signed 8-bit value. If the result is less than –128, it is represented by the saturated value –128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).

• SaturateToSignedWord — Represents the result of an operation as a signed 16-bit value. If the result is less than –32768, it is represented by the saturated value –32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).

• SaturateToUnsignedByte — Represents the result of an operation as a signed 8-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).

• SaturateToUnsignedWord — Represents the result of an operation as a signed 16-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 65535, it is represented by the saturated value 65535 (FFFFH).

• LowOrderWord(DEST * SRC) — Multiplies a word operand by a word operand and stores the least significant word of the doubleword result in the destination operand.

• HighOrderWord(DEST * SRC) — Multiplies a word operand by a word operand and stores the most significant word of the doubleword result in the destination operand.

• Push(value) — Pushes a value onto the stack. The number of bytes pushed is determined by the operand-size attribute of the instruction. See the “Operation” subsection of the “PUSH—Push Word, Doubleword or Quadword Onto the Stack” section in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B.

• Pop() removes the value from the top of the stack and returns it. The statement EAX ← Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop will return either a word, a doubleword or a quadword depending on the operand-size attribute. See the “Operation” subsection in the “POP—Pop a Value from the Stack” section of Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B.

• PopRegisterStack — Marks the FPU ST(0) register as empty and increments the FPU register stack pointer (TOP) by 1.

• Switch-Tasks — Performs a task switch.

• Bit(BitBase, BitOffset) — Returns the value of a bit within a bit string. The bit string is a sequence of bits in memory or a register. Bits are numbered from low-order to high-order within registers and within memory bytes. If the BitBase is a register, the BitOffset can be in the range 0 to [15, 31, 63] depending on the mode and register size. See Figure 3-1: the function Bit[RAX, 21] is illustrated.

Figure 3-1. Bit Offset for BIT[RAX, 21]

 63                31           21                      0
 ------------------- ---------- - ----------------------
|                   |          | |                      |
 ------------------- ---------- - ----------------------
                                ↑                       |
                                └-----------------------┘

If BitBase is a memory address, the BitOffset can range has different ranges depending on the operand size (see Table 3-2).

Operand Size   Immediate BitOffset   Register BitOffset
16             0 to 15               −2^15 to 2^15 −1
32             0 to 31               −2^31 to 2^31 −1
64             0 to 63               −2^63 to 2^63 −1

The addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase + (BitOffset DIV 8)) where DIV is signed division with rounding towards negative infinity and MOD returns a positive number (see

Figure 3-2).

Figure 3-2. Memory Bit Indexing

 7     5         0 7               0 7               0
 ----- - --------- ---------------   -----------------  
|     | |         |               | |                 |
 ----- - --------- --------------- - -----------------
    BitsBase +      BitsBase     |    BitsBase -
       ↑                         |
       └---- BitOffset ← +13 ---┘

 7               0 7               0 7    5          0
 --------------- - ----------------- ---- - ----------
|               | |                 |    | |          |
 --------------- - ----------------- ---- - ----------
    BitsBase     |  BitsBase -        BitsBase -
                 |                        ↑
                 └------ BitOffset ← ----┘

SSSE3 OPCODE Mode [for AMD OPEMU]

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites