Whether it's all in pid 1 or not is irrelevant. What matters is that it has a mo...

cbsmith · on Sept 23, 2014

> Whether it's all in pid 1 or not is irrelevant.

All of the existing mechanisms are also a "system" that compromises a ton of processes... If systemd is monolithic on these grounds, then so are they.

> What matters is that it has a monolithic architecture, whereby breakage in any one part or their communication channels can bring down the whole system.

Uh-huh... I think you are speaking to branding more than technology. Keep in mind that systemd is using existing components in much the same fashion they were already being used (hence the accusations about them "absorbing" udev).

If you look at the architecture, it has got very clear points of encapsulation that is much more structured than the loosey gooesy stuff that came before it.

> This is not just a theoretical concern; it has REPEATEDLY happened.

Yeah... with existing systems. There's any number of points of failure that are the stuff of legends in Unix system administration. Obviously, it will take time to get systemd thoroughly cleaned up, but it's not hard to look at the design and see how it provides plumbing to simplify and avoid a whole host of these scenarios.

dalias · on Sept 23, 2014

Systems which do not use systemd simply do not have these problems because there is no analogous component. If syslogd goes down, the worst that happens is you don't get logs. Init doesn't go down because it essentially has no inputs. Individual services can go down if they're poorly written, but they won't bring the system down with them. Traditional systems (the hideousness that is "sysvinit") have plenty of other different problems (e.g. race conditions in process supervision), but deadlock or bringing down the whole system is not one of them.

With systemd on the other hand, all of the components under the systemd banner are tightly interconnected and communicating. In particular pid 1 has ongoing communication with multiple other components, and misbehavior from them can, both in theory and in practice, deadlock the whole system. In case you missed it, this is roughly what "monolithic architecture" means: even though the components are modular, they're designed for use in a tightly interwoven manner that's fragile. It's completely the opposite type of "monolithic" from the kernel, which has everything running in one address space, but with architectural modularity, where interdependency between components is kept fairly low.

cbsmith · on Sept 24, 2014

> In particular pid 1 has ongoing communication with multiple other components, and misbehavior from them can, both in theory and in practice, deadlock the whole system. In case you missed it, this is roughly what "monolithic architecture" means: even though the components are modular, they're designed for use in a tightly interwoven manner that's fragile.

You mean like how if even one of my SysV init system start up scripts hung indefinitely, all subsequent components would never get started? Or are you referring to how the whole system would hang when the root filesystem device was temporarily unmounted (really fun with network filesystems, although to be fair, NFS implementations eventually became robust enough that this wouldn't be a complete disaster)? Or are you referring to fork bombs those race conditions you mentioned that would bring my system to a complete stand still? Or are you referring to how a race condition with date formatting in syslog actually hung my entire system time and again? Or perhaps you mean how a lot of init scripts had little (if any) retry logic such that you'd often end up without the critical component of your system not running... often in ways where you'd not find out about it or worse still not be able to do anything about it without some really intrusive intervention? Or maybe you are referring to how if you got your init startup order wrong for one of many critical components, you'd have a deadlock before you ever got a chance to actually fail.. Or maybe you're referring to how the right kind of getty failures triggered by a weird byte in a config file could turn your system to a paperweight?

It's so hard to tell which scenario you are referring to. ;-)

lmm · on Sept 23, 2014

> If you look at the architecture, it has got very clear points of encapsulation that is much more structured than the loosey gooesy stuff that came before it.

Then why can't it offer a stable interface that lets me swap out e.g. udev with eudev, like I could before?

That's what makes it monolithic - not the implementation details but the absence of well-defined interfaces between the pieces.

cbsmith · on Sept 24, 2014

> Then why can't it offer a stable interface that lets me swap out e.g. udev with eudev, like I could before?

I'm not sure it can't.... To the extent it _doesn't_, I imagine it is not much of a priority, since eudev is a fork from udev, and is lacking the enhancements to udev the systemd project has been working on.

vidarh · on Sept 23, 2014

From experience with Linux init scripts, I'm far less concerned about systemd than SysV-init style boot processes, to be honest. I lost track of the number of boot issues related to poorly written init scripts I've dealt with many years ago.

chousuke · on Sept 23, 2014

I have an anecdote that occurred a short while while ago. We had a server with several database instances (with their own init scripts) running on it.

The scripts were buggy in such a way that starting the database would bring it up okay, but prevent the rest of the instances from starting. Also, using the "stop" directive would successfully stop the database... and all the others, as well.

The bug probably occurred because the init scripts were horrible to begin with and had been copied (ugh) to accommodate more instances, without the necessary modifications to not screw things up.

vidarh · on Sept 23, 2014

Sounds familiar..

One of my "favourite" problems with init scripts for service stop/start is that way too many of them basically throws their hands up if the contents of the pid-file doesn't match what it expects. Never mind that 90% of the time when I want to actually run stop/start/restart, it is because something has crashed or is misbehaving, and there's a high likelihood the pid file does not reflect reality.

So a far too common scenario is: Process dies. Tries to run "start". Nothing happens, because the pid-file exists and the script doesn't verify that the pid actually matches a running process (or it checks that it matches a running process, but not that the process with that pid is actually what we want).

Ok, so we try "restart" or "stop". Gets an error, because the pid-file content does not match a running process, and rather than then cleaning out the pid file and starting the process, it just bails.

Basically I don't trust init scripts from anyone but distro maintainers themselves, and even then there are often plenty of edge cases that cause problems.

Regardless of systemd, I really like the systemd solution to this of using cgroups to ensure it can keep proper track of exactly which processes belongs to a service without resorting to brittle pid-files which seem to rarely be properly implemented. Of course that cgroups approach could be implemented as a separate tool, but pid-files badly needs to die.