Cloud NAT
Overview
This plugin covers specific NAT use-cases that come mostly from the container networking world. On the contrary of the NAT concepts used for e.g. a home gateway, there is no notion of ‘outside’ and ‘inside’. We handle Virtual (or Real) IPs and translations of the packets destined to them
Terminology & Usage
Setting up the NAT will consist in the creation of a translation
that has several backends. A translation
is 3-tuple containing :
a fully qualified IP address a port and a protocol. All packets
destined to it (ip, port) will then choose one of the backends,
and follow its rewrite rules.
A backend
consists of four rewrites components (source & destination
address, source & destination port) that shall be applied to packets
on the way in, and reverted on the way back.
Backends are equally load-balanced with a flow hash. The choice
of a backend
for a flow will trigger the creation of a NAT session
,
that will store the packet rewrite to do and the one to undo
until the flow is reset or a timeout is reached
A session
is a fully resolved 9-tuple of src_ip, src_port, dest_ip, dest_port, proto
to match incoming packets, and their new attributes new_src_ip, new_src_port, new_dest_ip, new_dest_port
. It allows for backend
stickiness and a fast-path for established connections.
These sessions
expire after 30s for regular sessions
and 1h for established
TCP connections. These can be changed in vpp’s configuration file
cnat {
session-max-age 60
tcp-max-age 3600
}
Traffic is matched by inserting FIB entries, that are represented
by a client
. These maintain a refcount of the number of sessions
and/or translations
depending on them and be cleaned up when
all have gone.
Translating Addresses
In this example, all packets destined to 30.0.0.2:80
will be
rewritten so that their destination IP is 20.0.0.1
and destination
port 8080
. Here 30.0.0.2
has to be a virtual IP, it cannot be
assigned to an interface
cnat translation add proto TCP vip 30.0.0.2 80 to ->20.0.0.1 8080
If 30.0.0.2
is the address of an interface, we can use the following
to do the same translation, and additionally change the source.
address with 1.2.3.4
cnat translation add proto TCP real 30.0.0.2 80 to 1.2.3.4->20.0.0.1 8080
To show existing translations and sessions you can use
cnat show session verbose
cant show translation
SourceNATing outgoing traffic
A independent part of the plugin allows changing the source address of outgoing traffic on a per-interface basis.
In the following example, all traffic coming from tap0
and NOT
going to 20.0.0.0/24
will be source NAT-ed with 30.0.0.1
.
On the way back the translation will be undone.
NB: 30.0.0.1
should be and address known to the FIB (e.g. the
address assigned to an interface)
cnat snat with 30.0.0.1
cnat snat exclude 20.0.0.0/24
set interface feature tap0 cnat-snat-ip4 arc ip4-unicast
Other parameters
In vpp’s startup file, you can also configure the bihash sizes for
the translation bihash
(proto, port) -> translation
the session bihash
src_ip, src_port, dest_ip, dest_port, proto -> new_src_ip, new_src_port, new_dest_ip, new_dest_port
the snat bihash for searching
snat exclude
prefixes
cnat {
translation-db-memory 64K
translation-db-buckets 1024
session-db-memory 1M
session-db-buckets 1024
snat-db-memory 64M
snat-db-buckets 1024
}
Extending the NAT
This plugin is built to be extensible. For now two NAT types are defined, cnat_node_vip.c
and cnat_node_snat.c
. They both inherit from cnat_node.h
which provides :
Session lookup :
rv
will be set to0
if a session was foundTranslation primitives
cnat_translation_ip4
based on sessionsA session creation primitive
cnat_session_create
Creating a session will also create a reverse session (for matching return traffic), and call a NAT node back that will perform the translation.
Known limitations
This plugin is still under development, it lacks the following features :
* Load balancing doesn’t support parametric probabilities
* VRFs aren’t supported. All rules apply to fib table 0 only
* Programmatic session handling (deletion, lifetime updates) aren’t supported
* ICMP is not yet supported
* Traffic matching is only done based on (proto, dst_addr, dst_port)
source matching isn’t supported
* Statistics & session tracking are still rudimentary.