.. _sfdp_doc: SFDP framework ============== SFDP ____ The StateFul Data Plane (SFDP) project aims at bringing stateful packet processing into the Vector Packet Processor (VPP) data plane on a per session basis. Data plane overview ___________________ When entering SFDP, packets are initially matched against the **sfdp session table**. The lookup key is the conjunction of the packet's 5 tuple and a **context id**. The context id is an opaque `u32`, allowing the cohabitation of multiple address/port spaces in the same data plane. Sessions in SFDP are bidirectional. That is why, session lookup results into a session and a **direction**. The direction of a packet is **forward** if the packet is oriented from initiator to responder (client to server) and is **reverse** otherwise. The conjunction of a session and a direction is called a **flow**. In other words, a session is composed of two flows, the **forward flow** and the **reverse flow**. Each flow is associated a **service bitmap** defining the sequence of nodes (called **services** ) that are to be traversed by a packet belonging to this flow. The data structure materializing a session (`sfdp_session_t`) is associated to a given thread. That is why, during the lookup phase, if the packet is found to be belonging to a thread different from the one where the lookup is happening, packet handoff is performed, so that all packets belonging to a given session (in either direction) are processed on the same thread (which is the thread associated to the session). The session lookup might fail (because there is no corresponding session in the table). In that case, a session is created on the fly by SFDP, and the current packet is associated to this session as a forward packet (because it is the packet initiating the session, so it goes from initiator to responder). If the session pool is exhausted, session creation fails and the lookup node marks the packet for the tenant's table-overflow slow-path node (`sp-ip4-table-overflow` or `sp-ip6-table-overflow`), which defaults to an error-drop unless configured otherwise. To avoid running out of session entries, SFDP relies on proactive eviction in the pre-input `sfdp-expire` node rather than evicting directly in the lookup path. On each pre-input cycle, the expiry module is asked to expire or evict sessions when the number of remaining sessions drops below the eviction margin. This is a best-effort mechanism designed to keep the pool above the margin so lookup-time allocations succeed. After session lookup (and, potentially, session creation), SFDP stores private metadata in the packet buffer: * a flow id equal to `(session_index << 1) | direction` with `session_index` being the index of the `sfdp_session_t` structure in the current per thread data, and `direction` being equal to 0 for a forward packet and 1 for a reverse packet * a 64-bit service bitmap, whose bit indexes are each associated to a certain SFDP graph node (a sfdp service). Each bit is set if and only if its indexes is associated to a service that is to be traverse by the packet. This service bitmap is obtained simply by copying the aforementioned service bitmap associated to the flow * a tenant index, which identifies the tenant of the session to which the packet belongs SFDP Session creation, tenants and contexts ___________________________________________ Even before entering SFDP (i.e., before the lookup), a packet must be associated a certain **tenant** (stored in the aforementioned tenant index). The way this association is done is outside of the scope of SFDP, it can be on a per-interface, per-VLAN, per VNI basis, etc... Tenants define: * the context id, that will be used by the lookup. Multiple tenant might share the same context id. * the forward and reverse service chains that are to be used whenever session creation is triggered by a packet associated with this tenant * various configuration options that are specific to the different services used (e.g., timeout values, etc...) In other words, a tenant is the data structure defining the configuration of any new session, as well as the context id (i.e., address/port space) to be used for lookup Service registration, scopes, and execution order _________________________________________________ Services are registered at init time via the SFDP service registry, which associates each service with a graph node name, a scope (to allow multiple independent service sets), a bitmap index (one of 64 slots), optional ordering constraints (runs-before / runs-after), and a terminal flag. Scopes are used to group services into independent pipelines, while ordering constraints are resolved into stable bitmap indices so the resulting service bitmap yields a deterministic traversal order. Services are typically registered with `SFDP_SERVICE_DEFINE` in their implementation code and linked into the scope ordering list. At runtime, the **service bitmap** in the buffer drives traversal. Each service node calls `sfdp_next()` to find the next set bit (lowest index), clears it, and dispatches to the corresponding node. A **terminal** service is the last SFDP node for a packet; it must inject the packet back into the regular VPP graph (e.g., `ip4-lookup` or `ip4-rewrite`). Service chains are configured per tenant and per direction. Use: :: set sfdp services tenant [ ...] forward set sfdp services tenant [ ...] reverse The equivalent API message is `sfdp_set_services`. To inspect the registered services and their bitmap indices, use `show sfdp services`. Scopes and scope-specific lookup injection __________________________________________ SFDP supports multiple **scopes**, which are independent service pipelines that share the same session table but can execute different service sets. A scope is identified by a string in the service registration (the default scope is named `default` when no scope is specified). During initialization, SFDP builds separate lookup and handoff nodes for each scope and assigns the corresponding scope index to their runtime data. To inject a packet into a specific scope, send it directly to the scope’s lookup node. The generated node name is the base name plus an optional `-` suffix, where `` is the scope name string. The default scope uses the base node names with no suffix, while non-default scopes use the suffix: * Default scope: `sfdp-lookup-ip4`, `sfdp-lookup-ip6` * Scope ``: `sfdp-lookup-ip4-`, `sfdp-lookup-ip6-` Scope-specific parser nodes (created for registered parsers) follow the same suffix convention. This allows features or DPOs to pick a scope explicitly by choosing the appropriate lookup node name, while keeping session lookup and handoff behavior identical across scopes. Injecting packets in SFDP _________________________ There are several ways to inject packets in the SFDP data plane, and it is mainly agnostic to *how* it receives packets. To inject a certain packet, code must: 1. Set the tenant index buffer metadata to define the configuration of any new session that might be initiated by the packet 2. Set the flow id buffer metadata to the context id which should be used for the session lookup (Note that this will be erased by lookup and replaced with `(session_index << 1) | direction` as per above) 3. Sent the packet to the `sfdp-lookup` node Typically, hooks to enter the SFDP data plane could be implemented as intercepting features in VPP, or special DPO if the FIB is to be used to decide whether the SFDP data plane is to be used or not. sfdp-interface-input example ____________________________ The `sfdp-interface-input` node (from `src/plugins/sfdp_services/base/interface_input/`) illustrates a simple injection pattern using the IP4 feature arc. The node maps each ingress interface to a tenant, sets the buffer metadata (`flow_id` to the tenant context id and `tenant_index`), and then forwards packets to `sfdp-lookup-ip4`. You can enable it per interface with the CLI: :: set sfdp interface-input tenant To disable the feature on the interface: :: set sfdp interface-input tenant disable Injecting packet into the regular VPP data plane after SFDP processing ______________________________________________________________________ The responsibility of injecting a packet processed by SFDP back into the regular VPP graph node falls to the *last traversed service in the service bitmap of the packet*. SFDP does not specify how that should be done, but the packet would typically be sent to `ip4-lookup` to be processed by the FIB, or `ip4-rewrite` to be sent directly on an interface. Buffer metadata layout and flags ________________________________ SFDP stores per-packet metadata in the VPP buffer `unused` area using the `sfdp_buffer_opaque_t` layout in `src/vnet/sfdp/common.h`. The layout includes the `service_bitmap`, the `tenant_index` associated with the packet, the `session_version_before_handoff` used to detect stale handoff, and a few protocol-specific fields (`flags`, `tcp_flags`, `ip6_final_proto`). The `sfdp_buffer()` accessor hides the cast from services and lookup code. When SFDP temporarily sends a packet outside its own graph (e.g., for IP reassembly), the metadata is saved and restored using the secondary buffer space (`sfdp_buffer2`). This preserves the SFDP state across the excursion so that the packet can resume processing with a consistent service bitmap and tenant context. Lookup parsing ______________ The lookup path performs protocol parsing that determines what services see after `sfdp-lookup`. Parsers provide alternate key extraction paths for non-IP or custom protocols, but feed into the same lookup semantics described here. IP4/IP6 headers are parsed and normalized, including the pseudo-direction logic described earlier, and the normalized 5-tuple is used for the session table lookup. On a hit, the lookup result is converted into a flow index and stored in the buffer, and the packet proceeds directly into the service chain. These behaviors live in the `src/vnet/sfdp/lookup/` sub-tree and define which fields in `sfdp_buffer_opaque_t` are valid when services run. Slow-path nodes _______________ The lookup node can classify a packet as requiring special handling and send it to a **slow-path node** instead of continuing through the service chain. Slow-path nodes are explicit graph nodes (per-tenant and per-protocol family) that implement exceptional handling, such as reassembly, ICMP error mapping, or drop-on-error. This keeps the hot lookup path minimal while still allowing controlled detours. Slow-path nodes are tenant-configurable: :: set sfdp sp-node tenant node set sfdp icmp-error-node tenant ip4 node set sfdp icmp-error-node tenant ip6 node The equivalent API messages are `sfdp_set_sp_node` and `sfdp_set_icmp_error_node`. Provided slow-path nodes ________________________ SFDP ships with a small set of built-in slow-path nodes. The most common are: * **ICMP error handling**: ICMP error packets are parsed to recover the original 5-tuple so they can be mapped back to the correct session. The ICMP error node is configured per tenant and per address family. * **Reassembly paths**: Fragmented packets are redirected to IP reassembly. Once reassembled, packets return to SFDP with their metadata restored so the service chain can proceed. Other slow-path nodes include table-overflow and unknown-protocol handlers, which default to error-drop unless configured otherwise. Normalized lookup and pseudo-direction ______________________________________ Session lookup is implemented in such a way two 5-tuple where the source and destination are reversed (for UDP and TCP packets) must match the same session. In order to do so, a 5-tuple is **normalized** before it is used for lookup, i.e., source and destination are reversed if needed, so as to ensure that the ip address in the ip dst field (`ip_addr_hi`) is larger than the ip address in the ip src field (`ip_addr_lo`). The **pseudo direction of a packet** is one if source and destination need be reversed before lookup, and zero otherwise. The **pseudo direction of a session** is the pseudo direction of the first packet of this session. Similarly the **pseudo flow index of a session** is ``(session_index << 1) | session_pseudo_direction``. Then, the obvious following statements hold: .. code-block:: packet_direction = packet_pseudo_direction ^ session_pseudo_direction packet_flow_index = packet_pseudo_direction ^ session_pseudo_flow_index Parsers and parser registration _______________________________ Beyond the built-in IP normalization, SFDP parsers are specialized lookup nodes used for non-IP or custom key formats. A parser defines how to extract a lookup key from a packet (`calc_key_fn`), how to normalize that key (`normalize_key_fn`), and the key size to use for the parser-specific session table. During initialization, each registered parser gets its own bihash table and a parser node is created for each scope, following the same `-` suffix convention as the lookup nodes. To register a parser, use the `SFDP_PARSER_REGISTER` macro in a compilation unit and fill in an `sfdp_parser_registration_t` with the parser name, callbacks, and key metadata. A corresponding node is created with `SFDP_PARSER_DEFINE_NODE`, which wires the parser into the SFDP lookup pipeline and associates the node name with the parser registration. Once registered, the parser node can be used as an injection point in the same way as the standard lookup nodes, including scope-specific variants. The generated parser node name is the parser name plus an optional `-` suffix, where `` is the scope name string. The default scope uses the base parser name with no suffix, while non-default scopes use the suffix. Primary and secondary session keys __________________________________ In some cases, a given session must be accessible through different 5-tuples. Typically, when implementing session-aware NAT, the 5-tuple of the session key for forward traffic is not equal to the 5-tuple of the session key for reverse traffic with reversed source and destination. That's why, some services need to be able to create a **secondary session key** pointing to an already-existing session. This can be done by using the `sfdp_session_try_add_secondary_key` internal API. Note that the provided secondary key must be normalized, and the **pseudo_flow_index** associated with this session and this new key must also be provided. An example is provided in `src/sfdp/nat/slowpath_node.c`. Callbacks on session lifecycle ______________________________ SFDP exposes callbacks for external modules that need to attach state to sessions or clean it up safely. The **notify_new_sessions** callback is invoked after session creation but before the first packet is fully processed, which allows services to initialize per-session state or even adjust the service chain. The **notify_deleted_sessions** callback is invoked during pre-input when sessions are being removed, so teardown happens in a context where no packets are concurrently processed on that thread. Callbacks are registered through `SFDP_REGISTER_NEW_SESSIONS_CALLBACK` and `SFDP_REGISTER_DELETED_SESSIONS_CALLBACK` in `src/vnet/sfdp/callbacks.h`, and can be blacklisted at init time. The callbacks run on worker threads and must follow the same threading rules as SFDP services. The use case for blacklisting a callback is typically to substitute a certain behavior depending on session creation events with a custom one. Session expiration and timer management _______________________________________ While session creation is exclusively done upon failed session lookup, session expiration is timer-based. The expiration timer of a session is initially set to the **embryonic timeout**. This value can be set at tenant configuration time. Then, depending on the protocol, services such as the l4-lifecycle service or the tcp-check service are responsible for rearming the session timeout value, depending on the state of the session. Services use two SFDP-provided functions to rearm session timeout: * `sfdp_session_timer_update` is to be used to change the timeout value so that the new expiration time is known to be **posterior** to the old expiration time * `sfdp_session_timer_update_maybe_past` is to be used to change the timeout value when it is unknown whether the new expiration time will happen before or after the old expiration time. Typically, using this function with a timeout of zero is the preferred way to force expiration of the session Session expiry and eviction model _________________________________ Session expiration is timer-driven but modular. SFDP defines an expiry interface that a module can implement to receive enable/disable callbacks, process expirations on each pre-input cycle, proactively evict sessions when the pool is running low, and provide remaining time for inspection. Services typically rearm session timers using `sfdp_session_timer_update` or `sfdp_session_timer_update_maybe_past` based on whether the new deadline is known to be in the future. The **eviction margin** controls when eviction is triggered and should be tuned based on how many new sessions can be created in a single VPP loop and the latency of the expiry module. Operators configure per-tenant timeouts via the CLI: :: set sfdp timeout tenant The equivalent API message is `sfdp_set_timeout`. Remaining TTL is visible in `show sfdp session-table` and `show sfdp session-detail`, and is also returned in the `sfdp_session_dump` / `sfdp_session_details` API messages. CLI and API surface (configuration and inspection) __________________________________________________ SFDP exposes a small but important configuration and inspection surface. Tenant creation and deletion are done via: :: sfdp tenant add context sfdp tenant del The equivalent API message is `sfdp_tenant_add_del`, where `context_id` defaults to the tenant id when omitted. Service chain configuration is done with `set sfdp services ...` or the `sfdp_set_services` API. Timeouts are configured with `set sfdp timeout ...` or `sfdp_set_timeout`. For inspection, the CLI provides: :: show sfdp services show sfdp session-table [tenant ] show sfdp session-detail 0x show sfdp tenant [ [detail]] show sfdp status The API equivalents are `sfdp_session_dump` / `sfdp_session_details` and `sfdp_tenant_dump` / `sfdp_tenant_details`. These cover the same core data shown by the CLI, including session keys, service bitmaps, and remaining expiry time. SFDP plugin configuration stanza ________________________________ The SFDP plugin exposes early configuration options via the `sfdp { ... }` stanza in VPP startup configuration. These settings control pool sizing and eviction behavior before the plugin is initialized. Supported options are: * `sessions-log2 `: log2 of the total session pool size. Default: `SFDP_DEFAULT_LOG2_SESSIONS` (19). * `sessions-per-thread-cache-log2 `: log2 of the per-thread session cache size. Default: `SFDP_DEFAULT_LOG2_SESSIONS - SFDP_DEFAULT_LOG2_SESSIONS_CACHE_RATIO` (12). If not specified in the stanza, SFDP derives a cache size from `sessions-log2` and disables caching for very small pools to avoid exhaustion. * `tenants-log2 `: log2 of the tenant pool size. Default: `SFDP_DEFAULT_LOG2_TENANTS` (15). * `eviction-sessions-margin `: threshold below which the expiry module is asked to evict sessions proactively. Default: `SFDP_DEFAULT_EVICTION_SESSIONS_MARGIN` (65536), capped at half the session pool. * `no-main`: disable SFDP on the main thread when worker threads are present. Default: disabled. Example: :: sfdp { sessions-log2 19 sessions-per-thread-cache-log2 7 tenants-log2 15 eviction-sessions-margin 65536 no-main } SFDP counters ________________________ After initialization, SFDP exposes per-tenant and per-flow counters in VPP's stats-segment. If running a multi-worker configuration, it is highly recommended to configure VPP's stats-segment size to be higher than default (32MB) to handle per-thread SFDP counters requirements. :: /sfdp/per_flow_counters/lookup /sfdp/per_tenant_counters/created /sfdp/per_tenant_counters/incoming /sfdp/per_tenant_counters/outgoing /sfdp/per_tenant_counters/removed