|  | .. contents:: | 
|  | .. sectnum:: | 
|  |  | 
|  | ====================================== | 
|  | BPF Instruction Set Architecture (ISA) | 
|  | ====================================== | 
|  |  | 
|  | eBPF, also commonly | 
|  | referred to as BPF, is a technology with origins in the Linux kernel | 
|  | that can run untrusted programs in a privileged context such as an | 
|  | operating system kernel. This document specifies the BPF instruction | 
|  | set architecture (ISA). | 
|  |  | 
|  | As a historical note, BPF originally stood for Berkeley Packet Filter, | 
|  | but now that it can do so much more than packet filtering, the acronym | 
|  | no longer makes sense. BPF is now considered a standalone term that | 
|  | does not stand for anything.  The original BPF is sometimes referred to | 
|  | as cBPF (classic BPF) to distinguish it from the now widely deployed | 
|  | eBPF (extended BPF). | 
|  |  | 
|  | Documentation conventions | 
|  | ========================= | 
|  |  | 
|  | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | 
|  | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | 
|  | "OPTIONAL" in this document are to be interpreted as described in | 
|  | BCP 14 `<https://www.rfc-editor.org/info/rfc2119>`_ | 
|  | `<https://www.rfc-editor.org/info/rfc8174>`_ | 
|  | when, and only when, they appear in all capitals, as shown here. | 
|  |  | 
|  | For brevity and consistency, this document refers to families | 
|  | of types using a shorthand syntax and refers to several expository, | 
|  | mnemonic functions when describing the semantics of instructions. | 
|  | The range of valid values for those types and the semantics of those | 
|  | functions are defined in the following subsections. | 
|  |  | 
|  | Types | 
|  | ----- | 
|  | This document refers to integer types with the notation `SN` to specify | 
|  | a type's signedness (`S`) and bit width (`N`), respectively. | 
|  |  | 
|  | .. table:: Meaning of signedness notation | 
|  |  | 
|  | ==== ========= | 
|  | S    Meaning | 
|  | ==== ========= | 
|  | u    unsigned | 
|  | s    signed | 
|  | ==== ========= | 
|  |  | 
|  | .. table:: Meaning of bit-width notation | 
|  |  | 
|  | ===== ========= | 
|  | N     Bit width | 
|  | ===== ========= | 
|  | 8     8 bits | 
|  | 16    16 bits | 
|  | 32    32 bits | 
|  | 64    64 bits | 
|  | 128   128 bits | 
|  | ===== ========= | 
|  |  | 
|  | For example, `u32` is a type whose valid values are all the 32-bit unsigned | 
|  | numbers and `s16` is a type whose valid values are all the 16-bit signed | 
|  | numbers. | 
|  |  | 
|  | Functions | 
|  | --------- | 
|  |  | 
|  | The following byteswap functions are direction-agnostic.  That is, | 
|  | the same function is used for conversion in either direction discussed | 
|  | below. | 
|  |  | 
|  | * be16: Takes an unsigned 16-bit number and converts it between | 
|  | host byte order and big-endian | 
|  | (`IEN137 <https://www.rfc-editor.org/ien/ien137.txt>`_) byte order. | 
|  | * be32: Takes an unsigned 32-bit number and converts it between | 
|  | host byte order and big-endian byte order. | 
|  | * be64: Takes an unsigned 64-bit number and converts it between | 
|  | host byte order and big-endian byte order. | 
|  | * bswap16: Takes an unsigned 16-bit number in either big- or little-endian | 
|  | format and returns the equivalent number with the same bit width but | 
|  | opposite endianness. | 
|  | * bswap32: Takes an unsigned 32-bit number in either big- or little-endian | 
|  | format and returns the equivalent number with the same bit width but | 
|  | opposite endianness. | 
|  | * bswap64: Takes an unsigned 64-bit number in either big- or little-endian | 
|  | format and returns the equivalent number with the same bit width but | 
|  | opposite endianness. | 
|  | * le16: Takes an unsigned 16-bit number and converts it between | 
|  | host byte order and little-endian byte order. | 
|  | * le32: Takes an unsigned 32-bit number and converts it between | 
|  | host byte order and little-endian byte order. | 
|  | * le64: Takes an unsigned 64-bit number and converts it between | 
|  | host byte order and little-endian byte order. | 
|  |  | 
|  | Definitions | 
|  | ----------- | 
|  |  | 
|  | .. glossary:: | 
|  |  | 
|  | Sign Extend | 
|  | To `sign extend an` ``X`` `-bit number, A, to a` ``Y`` `-bit number, B  ,` means to | 
|  |  | 
|  | #. Copy all ``X`` bits from `A` to the lower ``X`` bits of `B`. | 
|  | #. Set the value of the remaining ``Y`` - ``X`` bits of `B` to the value of | 
|  | the  most-significant bit of `A`. | 
|  |  | 
|  | .. admonition:: Example | 
|  |  | 
|  | Sign extend an 8-bit number ``A`` to a 16-bit number ``B`` on a big-endian platform: | 
|  | :: | 
|  |  | 
|  | A:          10000110 | 
|  | B: 11111111 10000110 | 
|  |  | 
|  | Conformance groups | 
|  | ------------------ | 
|  |  | 
|  | An implementation does not need to support all instructions specified in this | 
|  | document (e.g., deprecated instructions).  Instead, a number of conformance | 
|  | groups are specified.  An implementation MUST support the base32 conformance | 
|  | group and MAY support additional conformance groups, where supporting a | 
|  | conformance group means it MUST support all instructions in that conformance | 
|  | group. | 
|  |  | 
|  | The use of named conformance groups enables interoperability between a runtime | 
|  | that executes instructions, and tools such as compilers that generate | 
|  | instructions for the runtime.  Thus, capability discovery in terms of | 
|  | conformance groups might be done manually by users or automatically by tools. | 
|  |  | 
|  | Each conformance group has a short ASCII label (e.g., "base32") that | 
|  | corresponds to a set of instructions that are mandatory.  That is, each | 
|  | instruction has one or more conformance groups of which it is a member. | 
|  |  | 
|  | This document defines the following conformance groups: | 
|  |  | 
|  | * base32: includes all instructions defined in this | 
|  | specification unless otherwise noted. | 
|  | * base64: includes base32, plus instructions explicitly noted | 
|  | as being in the base64 conformance group. | 
|  | * atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_). | 
|  | * atomic64: includes atomic32, plus 64-bit atomic operation instructions. | 
|  | * divmul32: includes 32-bit division, multiplication, and modulo instructions. | 
|  | * divmul64: includes divmul32, plus 64-bit division, multiplication, | 
|  | and modulo instructions. | 
|  | * packet: deprecated packet access instructions. | 
|  |  | 
|  | Instruction encoding | 
|  | ==================== | 
|  |  | 
|  | BPF has two instruction encodings: | 
|  |  | 
|  | * the basic instruction encoding, which uses 64 bits to encode an instruction | 
|  | * the wide instruction encoding, which appends a second 64 bits | 
|  | after the basic instruction for a total of 128 bits. | 
|  |  | 
|  | Basic instruction encoding | 
|  | -------------------------- | 
|  |  | 
|  | A basic instruction is encoded as follows:: | 
|  |  | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  | |    opcode     |     regs      |            offset             | | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  | |                              imm                              | | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  |  | 
|  | **opcode** | 
|  | operation to perform, encoded as follows:: | 
|  |  | 
|  | +-+-+-+-+-+-+-+-+ | 
|  | |specific |class| | 
|  | +-+-+-+-+-+-+-+-+ | 
|  |  | 
|  | **specific** | 
|  | The format of these bits varies by instruction class | 
|  |  | 
|  | **class** | 
|  | The instruction class (see `Instruction classes`_) | 
|  |  | 
|  | **regs** | 
|  | The source and destination register numbers, encoded as follows | 
|  | on a little-endian host:: | 
|  |  | 
|  | +-+-+-+-+-+-+-+-+ | 
|  | |src_reg|dst_reg| | 
|  | +-+-+-+-+-+-+-+-+ | 
|  |  | 
|  | and as follows on a big-endian host:: | 
|  |  | 
|  | +-+-+-+-+-+-+-+-+ | 
|  | |dst_reg|src_reg| | 
|  | +-+-+-+-+-+-+-+-+ | 
|  |  | 
|  | **src_reg** | 
|  | the source register number (0-10), except where otherwise specified | 
|  | (`64-bit immediate instructions`_ reuse this field for other purposes) | 
|  |  | 
|  | **dst_reg** | 
|  | destination register number (0-10), unless otherwise specified | 
|  | (future instructions might reuse this field for other purposes) | 
|  |  | 
|  | **offset** | 
|  | signed integer offset used with pointer arithmetic, except where | 
|  | otherwise specified (some arithmetic instructions reuse this field | 
|  | for other purposes) | 
|  |  | 
|  | **imm** | 
|  | signed integer immediate value | 
|  |  | 
|  | Note that the contents of multi-byte fields ('offset' and 'imm') are | 
|  | stored using big-endian byte ordering on big-endian hosts and | 
|  | little-endian byte ordering on little-endian hosts. | 
|  |  | 
|  | For example:: | 
|  |  | 
|  | opcode                  offset imm          assembly | 
|  | src_reg dst_reg | 
|  | 07     0       1        00 00  44 33 22 11  r1 += 0x11223344 // little | 
|  | dst_reg src_reg | 
|  | 07     1       0        00 00  11 22 33 44  r1 += 0x11223344 // big | 
|  |  | 
|  | Note that most instructions do not use all of the fields. | 
|  | Unused fields SHALL be cleared to zero. | 
|  |  | 
|  | Wide instruction encoding | 
|  | -------------------------- | 
|  |  | 
|  | Some instructions are defined to use the wide instruction encoding, | 
|  | which uses two 32-bit immediate values.  The 64 bits following | 
|  | the basic instruction format contain a pseudo instruction | 
|  | with 'opcode', 'dst_reg', 'src_reg', and 'offset' all set to zero. | 
|  |  | 
|  | This is depicted in the following figure:: | 
|  |  | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  | |    opcode     |     regs      |            offset             | | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  | |                              imm                              | | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  | |                           reserved                            | | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  | |                           next_imm                            | | 
|  | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 
|  |  | 
|  | **opcode** | 
|  | operation to perform, encoded as explained above | 
|  |  | 
|  | **regs** | 
|  | The source and destination register numbers (unless otherwise | 
|  | specified), encoded as explained above | 
|  |  | 
|  | **offset** | 
|  | signed integer offset used with pointer arithmetic, unless | 
|  | otherwise specified | 
|  |  | 
|  | **imm** | 
|  | signed integer immediate value | 
|  |  | 
|  | **reserved** | 
|  | unused, set to zero | 
|  |  | 
|  | **next_imm** | 
|  | second signed integer immediate value | 
|  |  | 
|  | Instruction classes | 
|  | ------------------- | 
|  |  | 
|  | The three least significant bits of the 'opcode' field store the instruction class: | 
|  |  | 
|  | .. table:: Instruction class | 
|  |  | 
|  | =====  =====  ===============================  =================================== | 
|  | class  value  description                      reference | 
|  | =====  =====  ===============================  =================================== | 
|  | LD     0x0    non-standard load operations     `Load and store instructions`_ | 
|  | LDX    0x1    load into register operations    `Load and store instructions`_ | 
|  | ST     0x2    store from immediate operations  `Load and store instructions`_ | 
|  | STX    0x3    store from register operations   `Load and store instructions`_ | 
|  | ALU    0x4    32-bit arithmetic operations     `Arithmetic and jump instructions`_ | 
|  | JMP    0x5    64-bit jump operations           `Arithmetic and jump instructions`_ | 
|  | JMP32  0x6    32-bit jump operations           `Arithmetic and jump instructions`_ | 
|  | ALU64  0x7    64-bit arithmetic operations     `Arithmetic and jump instructions`_ | 
|  | =====  =====  ===============================  =================================== | 
|  |  | 
|  | Arithmetic and jump instructions | 
|  | ================================ | 
|  |  | 
|  | For arithmetic and jump instructions (``ALU``, ``ALU64``, ``JMP`` and | 
|  | ``JMP32``), the 8-bit 'opcode' field is divided into three parts:: | 
|  |  | 
|  | +-+-+-+-+-+-+-+-+ | 
|  | |  code |s|class| | 
|  | +-+-+-+-+-+-+-+-+ | 
|  |  | 
|  | **code** | 
|  | the operation code, whose meaning varies by instruction class | 
|  |  | 
|  | **s (source)** | 
|  | the source operand location, which unless otherwise specified is one of: | 
|  |  | 
|  | .. table:: Source operand location | 
|  |  | 
|  | ======  =====  ============================================== | 
|  | source  value  description | 
|  | ======  =====  ============================================== | 
|  | K       0      use 32-bit 'imm' value as source operand | 
|  | X       1      use 'src_reg' register value as source operand | 
|  | ======  =====  ============================================== | 
|  |  | 
|  | **instruction class** | 
|  | the instruction class (see `Instruction classes`_) | 
|  |  | 
|  | Arithmetic instructions | 
|  | ----------------------- | 
|  |  | 
|  | ``ALU`` uses 32-bit wide operands while ``ALU64`` uses 64-bit wide operands for | 
|  | otherwise identical operations. ``ALU64`` instructions belong to the | 
|  | base64 conformance group unless noted otherwise. | 
|  | The 'code' field encodes the operation as below, where 'src' refers to the | 
|  | the source operand and 'dst' refers to the value of the destination | 
|  | register. | 
|  |  | 
|  | .. table:: Arithmetic instructions | 
|  |  | 
|  | =====  =====  =======  =================================================================================== | 
|  | name   code   offset   description | 
|  | =====  =====  =======  =================================================================================== | 
|  | ADD    0x0    0        dst += src | 
|  | SUB    0x1    0        dst -= src | 
|  | MUL    0x2    0        dst \*= src | 
|  | DIV    0x3    0        dst = (src != 0) ? (dst / src) : 0 | 
|  | SDIV   0x3    1        dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst s/ src)) | 
|  | OR     0x4    0        dst \|= src | 
|  | AND    0x5    0        dst &= src | 
|  | LSH    0x6    0        dst <<= (src & mask) | 
|  | RSH    0x7    0        dst >>= (src & mask) | 
|  | NEG    0x8    0        dst = -dst | 
|  | MOD    0x9    0        dst = (src != 0) ? (dst % src) : dst | 
|  | SMOD   0x9    1        dst = (src == 0) ? dst : ((src == -1 && dst == LLONG_MIN) ? 0: (dst s% src)) | 
|  | XOR    0xa    0        dst ^= src | 
|  | MOV    0xb    0        dst = src | 
|  | MOVSX  0xb    8/16/32  dst = (s8,s16,s32)src | 
|  | ARSH   0xc    0        :term:`sign extending<Sign Extend>` dst >>= (src & mask) | 
|  | END    0xd    0        byte swap operations (see `Byte swap instructions`_ below) | 
|  | =====  =====  =======  =================================================================================== | 
|  |  | 
|  | Underflow and overflow are allowed during arithmetic operations, meaning | 
|  | the 64-bit or 32-bit value will wrap. If BPF program execution would | 
|  | result in division by zero, the destination register is instead set to zero. | 
|  | Otherwise, for ``ALU64``, if execution would result in ``LLONG_MIN`` | 
|  | dividing -1, the desination register is instead set to ``LLONG_MIN``. For | 
|  | ``ALU``, if execution would result in ``INT_MIN`` dividing -1, the | 
|  | desination register is instead set to ``INT_MIN``. | 
|  |  | 
|  | If execution would result in modulo by zero, for ``ALU64`` the value of | 
|  | the destination register is unchanged whereas for ``ALU`` the upper | 
|  | 32 bits of the destination register are zeroed. Otherwise, for ``ALU64``, | 
|  | if execution would resuslt in ``LLONG_MIN`` modulo -1, the destination | 
|  | register is instead set to 0. For ``ALU``, if execution would result in | 
|  | ``INT_MIN`` modulo -1, the destination register is instead set to 0. | 
|  |  | 
|  | ``{ADD, X, ALU}``, where 'code' = ``ADD``, 'source' = ``X``, and 'class' = ``ALU``, means:: | 
|  |  | 
|  | dst = (u32) ((u32) dst + (u32) src) | 
|  |  | 
|  | where '(u32)' indicates that the upper 32 bits are zeroed. | 
|  |  | 
|  | ``{ADD, X, ALU64}`` means:: | 
|  |  | 
|  | dst = dst + src | 
|  |  | 
|  | ``{XOR, K, ALU}`` means:: | 
|  |  | 
|  | dst = (u32) dst ^ (u32) imm | 
|  |  | 
|  | ``{XOR, K, ALU64}`` means:: | 
|  |  | 
|  | dst = dst ^ imm | 
|  |  | 
|  | Note that most arithmetic instructions have 'offset' set to 0. Only three instructions | 
|  | (``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero 'offset'. | 
|  |  | 
|  | Division, multiplication, and modulo operations for ``ALU`` are part | 
|  | of the "divmul32" conformance group, and division, multiplication, and | 
|  | modulo operations for ``ALU64`` are part of the "divmul64" conformance | 
|  | group. | 
|  | The division and modulo operations support both unsigned and signed flavors. | 
|  |  | 
|  | For unsigned operations (``DIV`` and ``MOD``), for ``ALU``, | 
|  | 'imm' is interpreted as a 32-bit unsigned value. For ``ALU64``, | 
|  | 'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then | 
|  | interpreted as a 64-bit unsigned value. | 
|  |  | 
|  | For signed operations (``SDIV`` and ``SMOD``), for ``ALU``, | 
|  | 'imm' is interpreted as a 32-bit signed value. For ``ALU64``, 'imm' | 
|  | is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then | 
|  | interpreted as a 64-bit signed value. | 
|  |  | 
|  | Note that there are varying definitions of the signed modulo operation | 
|  | when the dividend or divisor are negative, where implementations often | 
|  | vary by language such that Python, Ruby, etc.  differ from C, Go, Java, | 
|  | etc. This specification requires that signed modulo MUST use truncated division | 
|  | (where -13 % 3 == -1) as implemented in C, Go, etc.:: | 
|  |  | 
|  | a % n = a - n * trunc(a / n) | 
|  |  | 
|  | The ``MOVSX`` instruction does a move operation with sign extension. | 
|  | ``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into | 
|  | 32-bit operands, and zeroes the remaining upper 32 bits. | 
|  | ``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit | 
|  | operands into 64-bit operands.  Unlike other arithmetic instructions, | 
|  | ``MOVSX`` is only defined for register source operands (``X``). | 
|  |  | 
|  | ``{MOV, K, ALU64}`` means:: | 
|  |  | 
|  | dst = (s64)imm | 
|  |  | 
|  | ``{MOV, X, ALU}`` means:: | 
|  |  | 
|  | dst = (u32)src | 
|  |  | 
|  | ``{MOVSX, X, ALU}`` with 'offset' 8 means:: | 
|  |  | 
|  | dst = (u32)(s32)(s8)src | 
|  |  | 
|  |  | 
|  | The ``NEG`` instruction is only defined when the source bit is clear | 
|  | (``K``). | 
|  |  | 
|  | Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) | 
|  | for 32-bit operations. | 
|  |  | 
|  | Byte swap instructions | 
|  | ---------------------- | 
|  |  | 
|  | The byte swap instructions use instruction classes of ``ALU`` and ``ALU64`` | 
|  | and a 4-bit 'code' field of ``END``. | 
|  |  | 
|  | The byte swap instructions operate on the destination register | 
|  | only and do not use a separate source register or immediate value. | 
|  |  | 
|  | For ``ALU``, the 1-bit source operand field in the opcode is used to | 
|  | select what byte order the operation converts from or to. For | 
|  | ``ALU64``, the 1-bit source operand field in the opcode is reserved | 
|  | and MUST be set to 0. | 
|  |  | 
|  | .. table:: Byte swap instructions | 
|  |  | 
|  | =====  ========  =====  ================================================= | 
|  | class  source    value  description | 
|  | =====  ========  =====  ================================================= | 
|  | ALU    LE        0      convert between host byte order and little endian | 
|  | ALU    BE        1      convert between host byte order and big endian | 
|  | ALU64  Reserved  0      do byte swap unconditionally | 
|  | =====  ========  =====  ================================================= | 
|  |  | 
|  | The 'imm' field encodes the width of the swap operations.  The following widths | 
|  | are supported: 16, 32 and 64.  Width 64 operations belong to the base64 | 
|  | conformance group and other swap operations belong to the base32 | 
|  | conformance group. | 
|  |  | 
|  | Examples: | 
|  |  | 
|  | ``{END, LE, ALU}`` with 'imm' = 16/32/64 means:: | 
|  |  | 
|  | dst = le16(dst) | 
|  | dst = le32(dst) | 
|  | dst = le64(dst) | 
|  |  | 
|  | ``{END, BE, ALU}`` with 'imm' = 16/32/64 means:: | 
|  |  | 
|  | dst = be16(dst) | 
|  | dst = be32(dst) | 
|  | dst = be64(dst) | 
|  |  | 
|  | ``{END, TO, ALU64}`` with 'imm' = 16/32/64 means:: | 
|  |  | 
|  | dst = bswap16(dst) | 
|  | dst = bswap32(dst) | 
|  | dst = bswap64(dst) | 
|  |  | 
|  | Jump instructions | 
|  | ----------------- | 
|  |  | 
|  | ``JMP32`` uses 32-bit wide operands and indicates the base32 | 
|  | conformance group, while ``JMP`` uses 64-bit wide operands for | 
|  | otherwise identical operations, and indicates the base64 conformance | 
|  | group unless otherwise specified. | 
|  | The 'code' field encodes the operation as below: | 
|  |  | 
|  | .. table:: Jump instructions | 
|  |  | 
|  | ========  =====  =======  =================================  =================================================== | 
|  | code      value  src_reg  description                        notes | 
|  | ========  =====  =======  =================================  =================================================== | 
|  | JA        0x0    0x0      PC += offset                       {JA, K, JMP} only | 
|  | JA        0x0    0x0      PC += imm                          {JA, K, JMP32} only | 
|  | JEQ       0x1    any      PC += offset if dst == src | 
|  | JGT       0x2    any      PC += offset if dst > src          unsigned | 
|  | JGE       0x3    any      PC += offset if dst >= src         unsigned | 
|  | JSET      0x4    any      PC += offset if dst & src | 
|  | JNE       0x5    any      PC += offset if dst != src | 
|  | JSGT      0x6    any      PC += offset if dst > src          signed | 
|  | JSGE      0x7    any      PC += offset if dst >= src         signed | 
|  | CALL      0x8    0x0      call helper function by static ID  {CALL, K, JMP} only, see `Helper functions`_ | 
|  | CALL      0x8    0x1      call PC += imm                     {CALL, K, JMP} only, see `Program-local functions`_ | 
|  | CALL      0x8    0x2      call helper function by BTF ID     {CALL, K, JMP} only, see `Helper functions`_ | 
|  | EXIT      0x9    0x0      return                             {CALL, K, JMP} only | 
|  | JLT       0xa    any      PC += offset if dst < src          unsigned | 
|  | JLE       0xb    any      PC += offset if dst <= src         unsigned | 
|  | JSLT      0xc    any      PC += offset if dst < src          signed | 
|  | JSLE      0xd    any      PC += offset if dst <= src         signed | 
|  | ========  =====  =======  =================================  =================================================== | 
|  |  | 
|  | where 'PC' denotes the program counter, and the offset to increment by | 
|  | is in units of 64-bit instructions relative to the instruction following | 
|  | the jump instruction.  Thus 'PC += 1' skips execution of the next | 
|  | instruction if it's a basic instruction or results in undefined behavior | 
|  | if the next instruction is a 128-bit wide instruction. | 
|  |  | 
|  | Example: | 
|  |  | 
|  | ``{JSGE, X, JMP32}`` means:: | 
|  |  | 
|  | if (s32)dst s>= (s32)src goto +offset | 
|  |  | 
|  | where 's>=' indicates a signed '>=' comparison. | 
|  |  | 
|  | ``{JLE, K, JMP}`` means:: | 
|  |  | 
|  | if dst <= (u64)(s64)imm goto +offset | 
|  |  | 
|  | ``{JA, K, JMP32}`` means:: | 
|  |  | 
|  | gotol +imm | 
|  |  | 
|  | where 'imm' means the branch offset comes from the 'imm' field. | 
|  |  | 
|  | Note that there are two flavors of ``JA`` instructions. The | 
|  | ``JMP`` class permits a 16-bit jump offset specified by the 'offset' | 
|  | field, whereas the ``JMP32`` class permits a 32-bit jump offset | 
|  | specified by the 'imm' field. A > 16-bit conditional jump may be | 
|  | converted to a < 16-bit conditional jump plus a 32-bit unconditional | 
|  | jump. | 
|  |  | 
|  | All ``CALL`` and ``JA`` instructions belong to the | 
|  | base32 conformance group. | 
|  |  | 
|  | Helper functions | 
|  | ~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Helper functions are a concept whereby BPF programs can call into a | 
|  | set of function calls exposed by the underlying platform. | 
|  |  | 
|  | Historically, each helper function was identified by a static ID | 
|  | encoded in the 'imm' field.  Further documentation of helper functions | 
|  | is outside the scope of this document and standardization is left for | 
|  | future work, but use is widely deployed and more information can be | 
|  | found in platform-specific documentation (e.g., Linux kernel documentation). | 
|  |  | 
|  | Platforms that support the BPF Type Format (BTF) support identifying | 
|  | a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID | 
|  | identifies the helper name and type.  Further documentation of BTF | 
|  | is outside the scope of this document and standardization is left for | 
|  | future work, but use is widely deployed and more information can be | 
|  | found in platform-specific documentation (e.g., Linux kernel documentation). | 
|  |  | 
|  | Program-local functions | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~ | 
|  | Program-local functions are functions exposed by the same BPF program as the | 
|  | caller, and are referenced by offset from the instruction following the call | 
|  | instruction, similar to ``JA``.  The offset is encoded in the 'imm' field of | 
|  | the call instruction. An ``EXIT`` within the program-local function will | 
|  | return to the caller. | 
|  |  | 
|  | Load and store instructions | 
|  | =========================== | 
|  |  | 
|  | For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the | 
|  | 8-bit 'opcode' field is divided as follows:: | 
|  |  | 
|  | +-+-+-+-+-+-+-+-+ | 
|  | |mode |sz |class| | 
|  | +-+-+-+-+-+-+-+-+ | 
|  |  | 
|  | **mode** | 
|  | The mode modifier is one of: | 
|  |  | 
|  | .. table:: Mode modifier | 
|  |  | 
|  | =============  =====  ====================================  ============= | 
|  | mode modifier  value  description                           reference | 
|  | =============  =====  ====================================  ============= | 
|  | IMM            0      64-bit immediate instructions         `64-bit immediate instructions`_ | 
|  | ABS            1      legacy BPF packet access (absolute)   `Legacy BPF Packet access instructions`_ | 
|  | IND            2      legacy BPF packet access (indirect)   `Legacy BPF Packet access instructions`_ | 
|  | MEM            3      regular load and store operations     `Regular load and store operations`_ | 
|  | MEMSX          4      sign-extension load operations        `Sign-extension load operations`_ | 
|  | ATOMIC         6      atomic operations                     `Atomic operations`_ | 
|  | =============  =====  ====================================  ============= | 
|  |  | 
|  | **sz (size)** | 
|  | The size modifier is one of: | 
|  |  | 
|  | .. table:: Size modifier | 
|  |  | 
|  | ====  =====  ===================== | 
|  | size  value  description | 
|  | ====  =====  ===================== | 
|  | W     0      word        (4 bytes) | 
|  | H     1      half word   (2 bytes) | 
|  | B     2      byte | 
|  | DW    3      double word (8 bytes) | 
|  | ====  =====  ===================== | 
|  |  | 
|  | Instructions using ``DW`` belong to the base64 conformance group. | 
|  |  | 
|  | **class** | 
|  | The instruction class (see `Instruction classes`_) | 
|  |  | 
|  | Regular load and store operations | 
|  | --------------------------------- | 
|  |  | 
|  | The ``MEM`` mode modifier is used to encode regular load and store | 
|  | instructions that transfer data between a register and memory. | 
|  |  | 
|  | ``{MEM, <size>, STX}`` means:: | 
|  |  | 
|  | *(size *) (dst + offset) = src | 
|  |  | 
|  | ``{MEM, <size>, ST}`` means:: | 
|  |  | 
|  | *(size *) (dst + offset) = imm | 
|  |  | 
|  | ``{MEM, <size>, LDX}`` means:: | 
|  |  | 
|  | dst = *(unsigned size *) (src + offset) | 
|  |  | 
|  | Where '<size>' is one of: ``B``, ``H``, ``W``, or ``DW``, and | 
|  | 'unsigned size' is one of: u8, u16, u32, or u64. | 
|  |  | 
|  | Sign-extension load operations | 
|  | ------------------------------ | 
|  |  | 
|  | The ``MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load | 
|  | instructions that transfer data between a register and memory. | 
|  |  | 
|  | ``{MEMSX, <size>, LDX}`` means:: | 
|  |  | 
|  | dst = *(signed size *) (src + offset) | 
|  |  | 
|  | Where '<size>' is one of: ``B``, ``H``, or ``W``, and | 
|  | 'signed size' is one of: s8, s16, or s32. | 
|  |  | 
|  | Atomic operations | 
|  | ----------------- | 
|  |  | 
|  | Atomic operations are operations that operate on memory and can not be | 
|  | interrupted or corrupted by other access to the same memory region | 
|  | by other BPF programs or means outside of this specification. | 
|  |  | 
|  | All atomic operations supported by BPF are encoded as store operations | 
|  | that use the ``ATOMIC`` mode modifier as follows: | 
|  |  | 
|  | * ``{ATOMIC, W, STX}`` for 32-bit operations, which are | 
|  | part of the "atomic32" conformance group. | 
|  | * ``{ATOMIC, DW, STX}`` for 64-bit operations, which are | 
|  | part of the "atomic64" conformance group. | 
|  | * 8-bit and 16-bit wide atomic operations are not supported. | 
|  |  | 
|  | The 'imm' field is used to encode the actual atomic operation. | 
|  | Simple atomic operation use a subset of the values defined to encode | 
|  | arithmetic operations in the 'imm' field to encode the atomic operation: | 
|  |  | 
|  | .. table:: Simple atomic operations | 
|  |  | 
|  | ========  =====  =========== | 
|  | imm       value  description | 
|  | ========  =====  =========== | 
|  | ADD       0x00   atomic add | 
|  | OR        0x40   atomic or | 
|  | AND       0x50   atomic and | 
|  | XOR       0xa0   atomic xor | 
|  | ========  =====  =========== | 
|  |  | 
|  |  | 
|  | ``{ATOMIC, W, STX}`` with 'imm' = ADD means:: | 
|  |  | 
|  | *(u32 *)(dst + offset) += src | 
|  |  | 
|  | ``{ATOMIC, DW, STX}`` with 'imm' = ADD means:: | 
|  |  | 
|  | *(u64 *)(dst + offset) += src | 
|  |  | 
|  | In addition to the simple atomic operations, there also is a modifier and | 
|  | two complex atomic operations: | 
|  |  | 
|  | .. table:: Complex atomic operations | 
|  |  | 
|  | ===========  ================  =========================== | 
|  | imm          value             description | 
|  | ===========  ================  =========================== | 
|  | FETCH        0x01              modifier: return old value | 
|  | XCHG         0xe0 | FETCH      atomic exchange | 
|  | CMPXCHG      0xf0 | FETCH      atomic compare and exchange | 
|  | ===========  ================  =========================== | 
|  |  | 
|  | The ``FETCH`` modifier is optional for simple atomic operations, and | 
|  | always set for the complex atomic operations.  If the ``FETCH`` flag | 
|  | is set, then the operation also overwrites ``src`` with the value that | 
|  | was in memory before it was modified. | 
|  |  | 
|  | The ``XCHG`` operation atomically exchanges ``src`` with the value | 
|  | addressed by ``dst + offset``. | 
|  |  | 
|  | The ``CMPXCHG`` operation atomically compares the value addressed by | 
|  | ``dst + offset`` with ``R0``. If they match, the value addressed by | 
|  | ``dst + offset`` is replaced with ``src``. In either case, the | 
|  | value that was at ``dst + offset`` before the operation is zero-extended | 
|  | and loaded back to ``R0``. | 
|  |  | 
|  | 64-bit immediate instructions | 
|  | ----------------------------- | 
|  |  | 
|  | Instructions with the ``IMM`` 'mode' modifier use the wide instruction | 
|  | encoding defined in `Instruction encoding`_, and use the 'src_reg' field of the | 
|  | basic instruction to hold an opcode subtype. | 
|  |  | 
|  | The following table defines a set of ``{IMM, DW, LD}`` instructions | 
|  | with opcode subtypes in the 'src_reg' field, using new terms such as "map" | 
|  | defined further below: | 
|  |  | 
|  | .. table:: 64-bit immediate instructions | 
|  |  | 
|  | =======  =========================================  ===========  ============== | 
|  | src_reg  pseudocode                                 imm type     dst type | 
|  | =======  =========================================  ===========  ============== | 
|  | 0x0      dst = (next_imm << 32) | imm               integer      integer | 
|  | 0x1      dst = map_by_fd(imm)                       map fd       map | 
|  | 0x2      dst = map_val(map_by_fd(imm)) + next_imm   map fd       data address | 
|  | 0x3      dst = var_addr(imm)                        variable id  data address | 
|  | 0x4      dst = code_addr(imm)                       integer      code address | 
|  | 0x5      dst = map_by_idx(imm)                      map index    map | 
|  | 0x6      dst = map_val(map_by_idx(imm)) + next_imm  map index    data address | 
|  | =======  =========================================  ===========  ============== | 
|  |  | 
|  | where | 
|  |  | 
|  | * map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_) | 
|  | * map_by_idx(imm) means to convert a 32-bit index into an address of a map | 
|  | * map_val(map) gets the address of the first value in a given map | 
|  | * var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id | 
|  | * code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions | 
|  | * the 'imm type' can be used by disassemblers for display | 
|  | * the 'dst type' can be used for verification and JIT compilation purposes | 
|  |  | 
|  | Maps | 
|  | ~~~~ | 
|  |  | 
|  | Maps are shared memory regions accessible by BPF programs on some platforms. | 
|  | A map can have various semantics as defined in a separate document, and may or | 
|  | may not have a single contiguous memory region, but the 'map_val(map)' is | 
|  | currently only defined for maps that do have a single contiguous memory region. | 
|  |  | 
|  | Each map can have a file descriptor (fd) if supported by the platform, where | 
|  | 'map_by_fd(imm)' means to get the map with the specified file descriptor. Each | 
|  | BPF program can also be defined to use a set of maps associated with the | 
|  | program at load time, and 'map_by_idx(imm)' means to get the map with the given | 
|  | index in the set associated with the BPF program containing the instruction. | 
|  |  | 
|  | Platform Variables | 
|  | ~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Platform variables are memory regions, identified by integer ids, exposed by | 
|  | the runtime and accessible by BPF programs on some platforms.  The | 
|  | 'var_addr(imm)' operation means to get the address of the memory region | 
|  | identified by the given id. | 
|  |  | 
|  | Legacy BPF Packet access instructions | 
|  | ------------------------------------- | 
|  |  | 
|  | BPF previously introduced special instructions for access to packet data that were | 
|  | carried over from classic BPF. These instructions used an instruction | 
|  | class of ``LD``, a size modifier of ``W``, ``H``, or ``B``, and a | 
|  | mode modifier of ``ABS`` or ``IND``.  The 'dst_reg' and 'offset' fields were | 
|  | set to zero, and 'src_reg' was set to zero for ``ABS``.  However, these | 
|  | instructions are deprecated and SHOULD no longer be used.  All legacy packet | 
|  | access instructions belong to the "packet" conformance group. |