Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
TLS in Linux 4.13 kernel (github.com/torvalds)
231 points by agumonkey on Sept 3, 2017 | hide | past | favorite | 57 comments


LWN's coverage of this patch: https://lwn.net/Articles/666509/


Thanks Jonathan.

Everyone: the above gives a lot of very helpful context & explanation for this (from when the patch was first posted in Dec 2015)


The merged patch does have some significant points of difference from the one described in 2015, though.


Wouldn't implementing TLS in the kernel result in a substantial increase in the exposed attack surface? This seems to me as if it would be similar to the time that Windows moved the graphics into the kernel and the havoc that ensued thereafter.


It appears the design of KTLS is intended to minimize the complexity necessary inside the kernel; the TLS handshake is handled in user space and then the TLS state is transcribed to a KTLS file descriptor which handles the traffic thereafter. This means the very complex public key part of TLS is user space. The kernel isn't involved with certificates etc. The kernel handles the symmetric cryptography once the user space handshake is complete. So yes, the net attack surface inside the kernel increases, but it isn't as though they are embedding some OpenSSL equivalent into the kernel.

The paper reports a relatively small performance improvements. But I suppose if you're Facebook (they and RedHat implemented this) and performance improvements are measured in millions of dollars you care a lot about 7% less CPU utilization.


>> The paper reports a relatively small performance improvements. But I suppose if you're Facebook (they and RedHat implemented this) and performance improvements are measured in millions of dollars you care a lot about 7% less CPU utilization.

You think 7 percent is a small performance improvement? IMHO it's rather large. I take the time for much smaller than that.

To understand why, let me ask a question do you think doubling performance is a good thing? In other words a 100 percent increase in throughput or a 50 percent reduction in execution time. I hope your answer is yes because a lot of people think so. So what's 7 percent? Well imagine there are 7 different areas of your code that could be improved with each one contributing 7 percent. That's 49 percent or about what we'd call a significant improvement. Fixing up a number of single digit things adds up. But there is a more sinister side to it. Suppose you wanted to let those 7 little things go, and look for the big fish - the one that's going to get you 50 percent all in one shot. Guess what? It doesn't exist. Why? Because these little things add up to about half of your time, so to find the big one would imply taking the execution time for the rest of the code down to zero, because that's the only place there's 50 percent left. Unless you're doing something really poorly, there isn't a big win to be had. Performance comes from compounding the effects of many small optimizations.

If just moving TLS from the application to the kernel gives 7 percent - presumably by eliminating overhead - that's a good thing. Even if it's just 7 percent of time in communication which is only a portion of application performance, it's still a good thing.

--- note, this is a comment about performance and says nothing about tradeoffs in security by putting TLS in kernel ---


>> You think 7 percent is a small performance improvement?

I think a 7% reduction in CPU utilization for TLS is small given the vast bulk of systems that spend most of their time IO bound on network interfaces and/or storage and will see little to no improvement in throughput. I allowed that Facebook may see this as an important improvement given their scale; it could be worth millions of dollars in power and/or hardware.

>> that's a good thing ... it's still a good thing.

Sure, it's good. I didn't characterize anything as "bad," merely "small."


The performance gain is in latency consistency, it seems [0] though this chart is silly, since I'm guessing the X axis is time, and you can't chart a centile, that's ridiculous.. Anyway, even the throughput improvement could be worth it, but I think the main improvement is in latency.

https://twitter.com/tgraf__/status/904475786622640128


> you can't chart a centile,

Why? If you have servers with millions of qps, the graph would show the 99% latency over, say, a second or ten.

I think this graph is pretty standard? But I'm happy for suggestions of better ones.


Well, more accurately you could chart a centile, but it would hide the very latency patterns you're looking for. The outliers in a given time bucket are the important data. If you're looking at one bucket, it could be useful to do a centile (though 99th is not really high enough to be meaningful), but if you're charting centile buckets, then you're missing the real latency spikes.

For all we know, KCM/KTLS generates the highest peak latency, but fewer times per bucket. That difference would completely change the interpretation of these data. If userspace data frame handling produces higher 99th percentile latency, that does not tell us anything about its maximum latency. Further, if we're looking at the 99th percentile, 1/100 data frames have latency higher than that percentile, this will happen hundreds, thousands, or millions of times per second depending on your interface bandwidth and typical frame size!


There are constant variations in the environment that are more significant than the tiny effect you are trying to measure.


There are many statical tests that can be used in these situations and have been honest over the years for use in scientific research.

For example, chi-squared and if you can quantify the error distribution and it's normal, student's t-test.


You can chart a centile. Each point can be the centile of X measures.


There's actually a notable security advantage to this. A userspace process can do key negotiation, hand the symmetric keys to the kernel, and then have an ordinary file descriptor, with the kernel handling encryption. Which means you could then pass that file descriptor to a separate process, let it read and write data, and not give it access to key material at all.


Can't you already do all of that with a userspace implementation?


Somewhat, if you pass a pipe around, but that also requires you to keep a separate process around, and shuttle data through two address spaces. With this, you can have a process do the negotiation, arrange a KTLS file descriptor, and then exec the new process.


Couldn't you open a TCP connection, do the handshake, then pass just the session key to the new process, along with the handler to the TCP connection?


You could, but you lose out on a nice interface. Kernel implementation allows you to not care that you're actually using SSL. It's a normal socket and polling / waiting / data transmission works exactly the same as without SSL. Handing over a session key means you still have to include an SSL lib.

This is likely to end up being supported in systemd soon (I'm guessing) to get an SSL socket activation.


Not really - if a control frame is received the KTLS file descriptor will report an error and disassociate itself from the TCP file descriptor, at which point you're supposed to transition back to using the userspace TLS library. You'll also need the userspace TLS library to do a clean shutdown.


That's pretty much what stunnel does.


In what scenario is it an advantage to be able to encrypt/decrypt data with a certain key, but not know the key directly?


Any scenario where the key is not ephemeral and you're handling client's input (can expect exploitation). If you have a good enough separation/sandboxing, any exploit wouldn't be able to steal the encryption key, or other private data, even if the exploit worked.

For handling encrypted secrets, this is popular as a HSM idea. You authenticate to a black box which does the crypto for you, but the key can't be extracted. Sometimes HSMs even have a physical tampering / self destruction protection.


So a scenario where the key is long-lived, but the process communicating is transient? I suppose that makes sense, although I struggle to imagine the use case.


SSL, certificate signing, configuration decryption, anything Amazon SSE is used for, ...


This only does the handling of TLS data frames in the kernel. All the handling of TLS control frames, including the handshake, is still done in userspace by an ordinary TLS library.

This means that kernel is only doing some symmetrical encryption and simple framing (which are both the same kinds of things it was already doing in other application domains).


This was my first thought as well. Looking through the repo [1] behind the paper the people behind it seem to have some crypto chops. A lot more can go wrong in kernel space, but the performance improvements might prompt the big players (Google, FB, etc.) to invest some time into locking it down.

Also, it seems that only particular parts of the stack are handled by the kernel. Per the README:

The socket does data transmission, the handshake, re-handshaking and other control messages have to be served by user space using appropriate libs such as OpenSSL or Gnu TLS.

[1]: https://github.com/ktls/af_ktls


Short answer, if all it does is symmetric cipher encryption/decryption it's probably not a big deal. The kernel already does crypto. Just keep the x509 parsing and handshake out of kernel.


> x509

Mmmm, I'm afraid I've got some unfortunate news for you :)


Too bad you got downvotes. It appears that kernel and/or kernel module signatures use x509 certs. https://access.redhat.com/documentation/en-US/Red_Hat_Enterp...

Thanks for the heads up.


Windows Server also moved HTTP to the kernel!


and since it was introduced (back in 2003), there were only 2 exploits discovered! (one RCE, one DoS)


So did Linux, once, with the Tux in kernel webserver (now removed).


the real advantage is that you can pass arround your encrypted connection as a file descriptor. so when you add encryption to some service, you do not have to manually replace all calls to read/write by libssl library functions but instead you can just continue to use read()/write() calls. pass tls connection as stdin/stdout to another process ect...

this is how plan9 does tls and how it really should have been in the first place. adding tls support is one function call to wrap an existing file descriptor in tls and you get another file descriptor back.


But you can accomplish that by simply having an external process or thread terminate the TLS connection, and connect to your process over Unix domain sockets.

Of course that is inefficient. But that is what KTLS is at its core: an optimization.


of course you can make a pipe and spawn a process that does the translation. why's nobody doing that? because linux is a cargo cult and nobody does the sane thing until it is provided by the infrastructure. because of fork not mixing well with multithreaded applications... now finally linux got its shit together so there is no more excuse why todo it in a covoluted way.


No mention of memory cleaning policy and other security guarantees. I hope this is handled properly.


It's too bad KTLS was completed for FreeBSD but still not integrated.


We're working on it. What we had was great for us, but pretty complex and not really upstreamable.

I'm almost done with re-writing it to use the same M_NOTREADY mechanism as async sendfile, and to do all the framing in the kernel, rather than doing the sendfile() framing in-kernel and sosend() in userspace. This removes the majority of the code, and makes it quite a bit simpler. The downside is that it depends on my vectorized mbufs. Hopefully we'll have something public in the next few months.


Just offering a few words of thanks and encouragement for your work regarding this. When this is ready it's ready. But thank you in advance.


I wonder why they opted not to use the existing packet transformation code that is already in the kernel? The same code is used as part of linux ipsec implementations.


The layering inside the kernel is completely different...

    Kernel Connection Multiplexor
    
    Facebook’s primary motivation was to gain access to the un-encrypted bytes
    in kernel space. KCM is used to decode the framing, and make intelligent
    scheduling choices, before sending the frames to user space. KTLS sockets
    are mapped 1:N to user space sockets, where N is the number of user space
    threads, which are usually mapped to cores. Using this scheme, KTLS + KCM
    is able to reduce the total number of thread migrations of an individual
    request


Maybe the link should be updated to use the 4.13 tag [0] instead of the master branch.

[0] https://github.com/torvalds/linux/blob/v4.13/Documentation/n...


What's funny, is that I was just reviewing my custom built kernel's menuconfig, saw TLS, and briefly wondered why I didn't recall seeing that when I first ran the config, then figured I had just missed it. Never thought maybe it had recently been added.


In these days of containers, wouldn't it allow a government to coerce a container-hosting company into using an unsecure kernel? Such a kernel would see all traffic, unencrypted... or did I miss something?


Please don't believe for even a second that your container host can't already extract the cleartext just because the SSL is executed in the container context.

A kernel facility like that certainly makes an attack even more obvious and maybe harder to detect. But if this is a concern you can't use external hosting.

This applies to VMs too, the hypervisor can see and transparently modify everything going on. There is development going on with special CPU instructions that could prevent the hypervisor from reading VM memory, but that is not state of the art and I would not trust it for a long time.

You can't even be really sure that a bare metal server you are renting is not extracting the cleartext via a modified bios/efi, bootloader etc, or modified hardware at the worst.


"that certainly makes it even more obvious": that's what I meant... easy mass surveillance


If a government wants to do that, and the hosting company is not able or willing to fight it on a legal basis, it's really not about how easy or hard the actual implementation is. This makes nearly zero difference in the grand scheme of such a surveillance program.


If you run the kernel, mass surveillance on the processes inside the kernel (whether in a container or not) is extremely easy, whether or not TLS is implemented in the kernel.

Honestly sometimes I wish we had a library of these sorts of attacks so that it's easier to believe that they're easy, but there is a vague security advantage in not having easy-to-use, well-tested implementations of these attacks available for free on the internets.


> makes an attack even more obvious and maybe harder to detect

eh? more obvious and harder to detect at the same time?


Sorry I mean the possibility for an attack, how and where you would implement that etc. Can't edit my post anymore.


Oh man - this is great. I hate having these proxy processes just for this task.

I will begin experimenting with this kernel this week.


This is why we need microkernels.


Can you explain this position?

In particular, my impression is that a large reason of why microkernels failed in the '90s - see e.g. OSF/1 - was the overhead of message-passing. The two most popular kernels today that vaguely resemble microkernels, namely Darwin (Mach-based) and NT (with its subsystems), do no isolation between parts of their kernel and just have ordinary function calls. TLS exists in userspace and works extraordinarily well, but the desire to move parts into kernelspace is specifically to avoid the overhead of copying data between kernelspace and userspace and to allow using things like sendfile() that wouldn't be possible across address spaces.


I think he meant to say that we need unikernels - a lot of people (including me) sometimes make that same mistake.


This looks similar to AT-TLS (application transparent TLS) on IBM mainframes.


Please link to pdf?





Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: