I hope all of these Docker overlay networks start using the in-kernel overlay network technologies soon. User-space promiscuous capture is obscenely slow.
Take a look at GRE and/or VXLAN and the kernels multiple routing table support. (This is precisely why network namespaces are so badass btw). Feel free to ping me if you are working on one of these and want some pointers on how to go about integrating more deeply with the kernel.
It's worth mentioning these protocols also have reasonable hardware offload support, unlike custom protocols implemented on UDP/TCP.
OpenContrail can be used as an overlay network for docker: the overlay is implemented as a kernel module and comes very close to the theoretical maximum iperf performance on a server with 2x10G links.
This script https://github.com/pedro-r-marques/opencontrail-netns/blob/m... can be used to associate any docker container created with "--net=none" with an overlay network. Better yet you get all the semantics of the OpenStack neutron API: floating-ip, dhcp options, source-nat, LBaaS.
The kernel module also collects flow records of all the traffic and there is a web-ui that can display the analytics of all the traffic flows in your network.
Install guide: https://github.com/Juniper/contrail-controller/wiki/OpenCont...
Support on freenode.net #opencontrail.
If you're going down the path of VXLAN support in Docker, I'd love to talk. The company I founded built a Linux distribution for commodity hardware switches that can do VXLAN encap/decap in hardware at 2+ Tbit/sec. The same configuration that works in a Linux container host or a hypervisor works on the switches.
You have to pick either kernel or user space, not both. Either implement it purely in the kernel or purely in user space. In reality pure user space is faster, just look at Snabb switch https://github.com/SnabbCo/snabbswitch/wiki
Snabb and DPDK aren't magic though. Because they poll you have to dedicate a whole core to the vSwitch. Containers are a different case than VMs because the packets start in the kernel TCP/IP stack; to get into a userspace vSwitch they'd have to exit the kernel.
Since you seem to have some kernel expertise, do you know if there is an easy way (via an iptables/ebtables plugin or some such) to get packets to switch namespaces? It seems like you could do a whole lot with just simple kernel packet rewriting if you could have an in-container-namespace rule to jump into another namespace before routing. You could do some analog of this with a veth device, but it seems like it would be much faster to just switch the namespace.
"just switching namespaces" isn't easy, since a packet (in the kernel represented by an SKB) has to have an interface it came in on. The main role an veth pair has is to move the packet between namespaces, and to provide a new in interface, one that is visible in the new namespace.
Unless someone did something crazy, traversing a veth pair should just be doing a little bookkeeping on the SKB, no data copies at all.
Take a look at GRE and/or VXLAN and the kernels multiple routing table support. (This is precisely why network namespaces are so badass btw). Feel free to ping me if you are working on one of these and want some pointers on how to go about integrating more deeply with the kernel.
It's worth mentioning these protocols also have reasonable hardware offload support, unlike custom protocols implemented on UDP/TCP.