I suppose the issue could be alleviated by having the tunnel know it's payload is IP packets, keep a reasonably short buffer, and drop payload packets from the queue after a short timeout, e.g. 200ms. So essentially, the tunnel needs to behave like a switch would.
Edit: Obviously, it still reacts doubly as bad to the base connection dropping packets. I recall testing once how a TCP connection behaves when you subject it to a random % of packet loss (regardless of bandwidth it tries to use), and it resulted in completely stalled connections at a surprisingly low drop rates.
Interesting idea, although I'd argue that if it should ACK like UDP, retransmit like UDP and control flow and congestion like UDP... You should use UDP ;)
However, firewalls are usually much more permissive to TCP than to UDP. I wonder if there is any project that encapsulates UDP-semantic datagrams into TCP-looking segments?
UDP is treated fairly well by firewalls... at least compared to SCTP for example. QUIC/HTTP3 are UDP-based and even though there's usually a TCP/HTTP2 fallback they fare reasonably well.
I have various boxes which we send out to venues, on the whole outgoing connections are fine, but sometimes you get some really restrictive policies. I've had stuff that MITMs TCP/443, completely blocks UDP, etc.
My devices tend to try to connect back via
* UDP port 443, sometimes works
* an sstp vpn
* SSH to tcp/$highnumber, sometimes they blck/MITM port 80, 443, but leave the standard
* DNS
I can't think of a time that one of them didn't through.
It does make it through many firewalls these days, yes.
But all implementations I know use a much shorter timeout/keepalive period for UDP than they use for TCP because of firewalls/NATs. (I think the RFCs even recommend something like 300 seconds for TCP, but only 30 for UDP as a default?)
This has pretty significant implications on power consumption for mobile devices.
What exactly do you mean with "completely stalled connections"?
Do you mean, that the sending side is queueing up to-be-send messages and can't clear the queue because it is working on correcting packet losses all the time, so the queue will just grow and never shrink?
Do you recall at which % of packet loss this behaviour started?
Stalled as in, sender has data to send, receiver is ready to receive, but the transfer makes no progress or proceeds very slowly.
I can't quote a number for the drop percentage, honestly I've forgot. Discovering the limit was a side effect, we were simply looking to test how a piece of software would behave over a bad connection. I just remember being surprised that everything just stopped when I put in packet loss that wasn't anywhere near 100%.
My since-then-adjusted expectations would put any double digit percentage (yes, starting from 10%) of random packet loss as unusable conditions for TCP.
In my personal experience, BBR with SACK works pretty well even on fairly bad connections. It still slows way down but not as badly as others like Cubic.
That was with BitTorrent uploads of Linux ISOs to Taiwan. (Why do they download so many copies of Ubuntu 14.04 LTS?)
But since I didn't do controlled tests of multiple congestion types I could just be seeing things.
Edit: Obviously, it still reacts doubly as bad to the base connection dropping packets. I recall testing once how a TCP connection behaves when you subject it to a random % of packet loss (regardless of bandwidth it tries to use), and it resulted in completely stalled connections at a surprisingly low drop rates.
Edit2: In case anyone wants to try themselves, here's how: https://www.pico.net/kb/how-can-i-simulate-delayed-and-dropp...