Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Interactive map of Linux kernel (makelinux.net)
397 points by mayankkaizen on March 16, 2018 | hide | past | favorite | 44 comments



That's a fun map.

The Linux kernel has more device drivers embedded in the core than I anticipated. I'm guessing most are used for bootstrapping/fallback (e.g. loading the "actual" drivers)? Such as ext4, ipw2100, ac97, i8042, etc.

Just to be clear, I'd imagine most of the above are common enough to have on almost all but the most niche Linux distributions. My question relates around having them this high up in the source tree, rather than why they're useful/what they do.


The map is at least 8+ years old.

https://www.linux.com/blog/interactive-map-linux-kernel (2009)


It's 2007 (see the image). First version on archive.org is 2008. I would add a (2007) to the title


The map is simply incomplete, so it only lists some sample of the network card drivers there, otherwise it would be a mess.


What do you mean "embedded in the core"?

All drivers are in-tree.


Technically unmodularizable

Architecturally integrated into everyone's "this is part of the kernel and not a random printer driver" mental model


Not all driver are there, of course. Many are out of tree.


I wonder how many of these device drivers are for old devices that are not available anymore; Therefore, they can be considered as dead code. Do they have an annual culling of devices drivers?


Just because you can't buy them doesn't mean they aren't still used.


Feels like this needs to be a 3D map. 2 dimensions isn't quite cutting it.



Linux is such a mess that it needs a map.

It's no surprise they're paying massive technical debt over it. Work that would take a few hours in a well-structured system does take months on Linux.

Isn't surprising either that Dragonfly BSD's outperforming Linux in network throughput, despite its small number of developers.

They went for a components as concurrent lockless system servers approach [1] instead of a complex mess of fine-grained locks like Linux (and FreeBSD, following them) did.

[1] https://www.dragonflybsd.org/presentations/


I can only imagine what sort of chaos exists in Windows and OSX. Unfortunately I can only do that: imagine, because it is closed source. With Linux, I can at least see the mess, and well, make a map out of it.


I have looked upon code that struck me down with Lovecraftian dread. Here, let me paint you a picture of a ubiquitous Microsoft product, written in C++. Entry point calls Initialize. Initialize calls Initialize2, which calls Initialize_internal, which recurses to Initialize with a flag that makes it branch another way, which calls Initialize_subsystem, which calls Initialize_subsystem2, etc, which finally recurses again to Initialize_internal with another flag ...

Honestly I never even made it out of the init code. I asked to be taken off the project immediately.


I find it annoying that any time someone criticises Linux for something someone has to come and point out that Windows and OSX are also kinda crappy. It's true enough, it's also just useless defensiveness. Maybe the bar we set for ourselves should be a little higher than the stuff everyone complains about?


Maybe it's just nigh impossible to build such a complex engineering project and keep it elegant through 25 years of development?


I think you are reading into my stance as defending Linux -- I am not. I am just saying you have to take this criticism into perspective, and consider the alternatives. I find the clamours for the <insert "beautiful" other OS here> equally useless, as they haven't had the success linux has had in the server world, and because of that, I don't care how much more "elegant" another solution is -- it doesn't have the body of work to back it up. Linux is, for now, the best option. If it's a "tangled mess of code" let's figure out how to make it better. After all, it is open source.


You can look at the OSX kernel code, just grab Apple's XNU repository. I've built bootable kernels from it. The code's not terrible, but the weirdest part to me is that there's the OSFMK microkernel portion, and then there's the BSD subsystem. They use different types, and they have their own syscall tables, and it was never clear to me where a particular bit of code would reside. I only dabbled in it, of course.


IIRC the windows team has been in the process of cleaning up the NT Kernel for awhile (since vista at least) with the clear goal of getting NT back to "Dave Cutler's NT". The visible bits of this are the Server Micro project that's used for containers and Server core.


Every mature project I have ever joined, ever, is such a mess that it needs a map. Typically when I start a new job, I have to make that map myself, because even the senior engineers can't explain it without stammering or omitting the half of the code they don't understand anymore. Would that I'd had something just like this map every time I started a new job!


How do you go about making such a map for a large code repository? I'd really appreciate a link for reading on this subject.


I basically start anywhere and study the call graph, taking notes as I go.


https://www.phoronix.com/scan.php?page=article&item=netperf-... is 1 1/2 years old and tells a different story where DragonflyBSD is either even or outperformed by _both_ linux and freebsd. If you have other numbers please post them, but don't just post such statements without sources.

Edit: Just saw your answer https://leaf.dragonflybsd.org/~sephe/perf_cmp.pdf and there linux has a higher latency, but the performance throughput is not that different from dragonflybsd. (in fact at forwarding linux wins). But thx for the numbers will follow up on that later.


>and there linux has a higher latency, but the performance throughput is not that different from dragonflybsd.

Same throughput, better latency, a dragonfly win.

Probably the same reasoning as usual (Linux likes to spend tens of milliseconds non-preemptable every now and again).


>Isn't surprising either that Dragonfly BSD's outperforming Linux in network throughput, despite its small number of developers.

This is a pretty meaningless statement. Network throughput on what? Low latency connections? High latency? Jumbo frames? What if you're using DPDK? How about vs. XDP? What is it tuned for?

If you mean that if you just install Dragonfly BSD and some random popular Linux distribution that you might see better performance out of the box on some arbitrary benchmarks, yeah, I can believe it. If you mean that you're going to get better performance in Dragonfly BSD than Linux when it comes to all scenarios with network throughput, this is silly, especially when you can take advantage of things like DPDK, XDP, af_packet v4, etc. If network performance is your biggest concern in 2018, you're caring more about that sort of tech and being on something that supports it than you are about how things perform out of the box.

(Also, Linux has had a lockless TCP stack since 2016)


>"Isn't surprising either that Dragonfly BSD's outperforming Linux in network throughput, despite its small number of developers"

Do you have a citation for this claim?


I know there's something more recent I've seen, but I couldn't find it. This is what I found, which should show enough.

http://lists.dragonflybsd.org/pipermail/commits/2017-Septemb...

https://www.dragonflydigest.com/2017/03/06/19425.html


Similar to how no plan survives contact with the enemy, no well-structured system survives contact with the real world. I'm not defending Linux, since I'm sure it could be better, but every large, mature system I've ever worked in has had structural deficiencies.


"outperforming Linux in network throughput" I'd like to see some numbers, I'm pretty sure Linux is as fast or faster than Dragonfly BSD in any scenarios. When you make such a claim you need facts.

Also Linux TCP stack is lockless since 4.4.


Are there any software projects of that size that are clean and well-structured?

The larger a codebase grows, the less tractable it becomes. Refactorings are harder, more expensive and riskier. Also, more developers means more attack surface for inconsistencies and crappy code of someone unmotivated.


I love Linux!

this map is very useful


> Please Enable JavaScript or use plain html

Link then goes 404.


As a noob observing..Many of them seem to have names nothing like what their actual purpose is


A dead comment by user 'simoooo':

> As a noob observing..Many of them seem to have names nothing like what their actual purpose is

Having worked a bit with low level kernel stuff (porting a unix OS to a new platform), I understand his observation. Even after working with the kernel for a while, many names don't really make sense.

Mandatory quote of Phil Karlton:

> There are only two hard things in Computer Science: cache invalidation and naming things.


Which leads to one of my favorite programming jokes:

> There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors.


Out of curiosity, do you know of any examples of seemingly misnamed objects in the kernel?


My recent venture in the filesystem have been somewhat fun. The problem isn't so much misnamed objects, and more that the kernel is filled with seemingly undocumented, arcane, old things. `sb_bread` isn't some kind of food, but the "SuperBlock Buffer Read" method, and `brelse` means "Buffer Release" (and not, somethingsomething else, as I initially thought).

BH means Buffer Head. The meaning and usage of Buffer Head has changed a lot across Linux versions. It's the basic unit of IO operations, but nowadays most users would use a bio for most of what you would have used a buffer head in the past. When you're working on filesystems you'll have to use BHs to talk to the hard drive though.

Linux FS infrastructure and Ext2 share a lot of names. For instance, ext2 and linux both have superblocks, inodes, blocks, etc. They are... logically the same, but at the same time they're very different - one lives purely on the disk, the other provides functions to the kernel. This makes conversation very complicated, when talking about the superblock, you have to mention which superblock you're talking about, because that's not always obvious from context.


Names like brelse, bread, namei and so on have a long history in Unix kernels and file systems. That history forms people's expectations. If you provide a namei operation in your code, you better call it namei and not make up some other, ostensibly more clear, identifier for it.


Not exactly an object 'inside' the kernel, but I was recently caught out by the fact that /proc/net/snmp doesn't have much to do with the SNMP protocol. Rather it tracks socket statistics for IP, ICMP, TCP, and UDP, and is one of the sources of information for the netstat utility.


A professor of mine said something along the lines of "If something isn't confusing enough, give it multiple names"


In the future, try vouching for a dead comment (click on its timestamp to go to its page, then click 'vouch' at the top). If a dead comment gets enough vouches, it is restored, and then you can reply to it the normal way.

Since simoooo's comment was vouched for by other users, I've moved your reply to be a child of that one.


Heres the improved version of that quote:

> There are only two hard things in Computer Science: cache invalidation, off-by-one errors and naming things.


Ah, cool!

Hacker News paydirt.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: