Detecting NAT in IPsec

The Xelerance team spent about 5 solid hours tonight on the NAT-Traversal code, debugging connections with Win32, OSX and Linux peers.

There were about 6 bugs in the tracker for NAT detection failing in various cases. It turns out it was all related - the NAT ‘detection’ via hashs of IP + ports code was at fault.

Various people had suggested patches, include Jacco’s here, and a few more in Mantis. Most fixed it for 1 case - thiers. Nothing was generic enough to work. Some blamed Apple, as they send the NAT Hashes in a different other than Openswan. Other blamed us, however interop with most other vendors was fine.

So I went back to RFC3947, which says:


3.2. Detecting the Presence of NAT

The NAT-D payload not only detects the presence of NAT between the
two IKE peers, but also detects where the NAT is. The location of
the NAT device is important, as the keepalives have to initiate from
the peer “behind” the NAT.

To detect NAT between the two hosts, we have to detect whether the IP
address or the port changes along the path. This is done by sending
the hashes of the IP addresses and ports of both IKE peers from each
end to the other. If both ends calculate those hashes and get same
result, they know there is no NAT between. If the hashes do not
match, somebody has translated the address or port. This means that
we have to do NAT-Traversal to get IPsec packets through.

If the sender of the packet does not know his own IP address (in case
of multiple interfaces, and the implementation does not know which IP
address is used to route the packet out), the sender can include
multiple local hashes to the packet (as separate NAT-D payloads). In
this case, NAT is detected if and only if none of the hashes match.

The hashes are sent as a series of NAT-D (NAT discovery) payloads.
Each payload contains one hash, so in case of multiple hashes,
multiple NAT-D payloads are sent. In the normal case there are only
two NAT-D payloads.

So basically everyone was wrong. The RFC says nothing about the order of the hashes, and in fact it says you can have more than 2 (if say, you have a multihomed machine, since the RFC says you send 1 hash per interface/port combination). Feeding this information to mcr resulted in him spitting back a refactored function in 15 minutes (the man is a C machine). We now generate our local hashes, and run through all of the hashes supplied by the peer, comparing various combinations to see where the NAT actually it, and if there’s more than one.

Paul and I tested the new code on Linux, OSX & Win32 under both NAT and no NAT networks, and I has to fix 1 minor bug. As a result, 2.4.5dr2 is now out, with a slew of NAT bugs fixes.

Enjoy!

Comments

Comments are closed.