kernel_Notes/Zim/Utils/systemd.txt

Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-11-20T16:29:03+08:00

====== systemd ======
Created Tuesday 20 November 2012
http://www.0pointer.de/blog/projects/systemd.html

Fri, 30 Apr 2010

===== Rethinking PID 1 =====
If you are well connected or good at reading between the lines you might already know what this blog post is about. But even then you may find this story interesting. So grab a cup of coffee, sit down, and read what's coming.

This blog story is long, so even though I can only recommend reading the long story, here's the one sentence summary: we are experimenting with a new init system and it is fun.

Here's the code. And here's the story:

===== Process Identifier 1 =====
On every Unix system there is one process with the special process identifier 1. It is __started by the kernel__ before all other processes and is the parent process for all those other processes that have nobody else to be child of. Due to that it can do a lot of stuff that other processes cannot do. And it is also responsible for some things that other processes are not responsible for, such as __bringing up and maintaining userspace__ during boot.（用户空间的启动程序、服务和会话都是由init进程间接或直接启动起来的。）

Historically on Linux the software acting as PID 1 was the venerable **sysvinit** package, though it had been showing its age for quite a while. Many replacements have been suggested, only one of them really took off: **Upstart**, which has by now found its way into all major distributions.

As mentioned, the central responsibility of an init system is to __bring up userspace__. And a good init system does that fast. Unfortunately, the traditional SysV init system was not particularly fast.

For a fast and efficient boot-up two things are crucial:
* To start less.
* And to start more in parallel.

What does that mean? Starting less means __starting fewer services or deferring the starting of services__ until they are actually needed. There are some services where we know that they will be required sooner or later (syslog, D-Bus system bus, etc.), but for many others this isn't the case. For example, **bluetoothd** does not need to be running unless a bluetooth dongle is actually plugged in or an application wants to talk to its D-Bus interfaces. Same for a printing system: unless the machine physically is connected to a printer, or an application wants to print something, there is no need to run a printing daemon such as **CUPS. Avahi**: if the machine is not connected to a network, there is no need to run Avahi, unless some application wants to use its APIs. And even SSH: as long as nobody wants to contact your machine there is no need to run it, as long as it is then started on the first connection. (And admit it, on most machines where **sshd** might be listening somebody connects to it only every other month or so.)

Starting more in parallel means that if we have to run something, we should not serialize its start-up (as sysvinit does), but run it all at the same time, so that the available CPU and disk IO bandwidth is maxed out, and hence the overall start-up time minimized.

===== Hardware and Software Change Dynamically =====
Modern systems (especially general purpose OS) are highly dynamic in their configuration and use: they are mobile, different applications are started and stopped, different hardware added and removed again. __An init system that is responsible for maintaining services needs to listen to hardware and software changes.__ It needs to __dynamically__ start (and sometimes stop) services as they are needed to run a program or enable some hardware.

Most current systems that try to parallelize boot-up still synchronize the start-up of the various daemons involved: since Avahi needs D-Bus, D-Bus is started first, and only when D-Bus signals that it is ready, Avahi is started too. Similar for other services: livirtd and X11 need HAL (well, I am considering the Fedora 13 services here, ignore that HAL is obsolete), hence HAL is started first, before livirtd and X11 are started. And libvirtd also needs Avahi, so it waits for Avahi too. And all of them require syslog, so they all wait until Syslog is fully started up and initialized. And so on.

===== Parallelizing Socket Services =====
This kind of __start-up synchronization__ results in the serialization of a significant part of the boot process. Wouldn't it be great if we could get rid of the synchronization and serialization cost? Well, we can, actually. For that, we need to understand what exactly the daemons require from each other, and why their start-up is delayed. For traditional Unix daemons, there's one answer to it: __they wait until the socket the other daemon offers its services on is ready for connections.__ Usually that is an AF_UNIX socket in the file-system, but it could be AF_INET[6], too. For example, clients of D-Bus wait that **/var/run/dbus/system_bus_socket** can be connected to, clients of syslog wait for **/dev/log**, clients of CUPS wait for **/var/run/cups/cups.sock** and NFS mounts wait for **/var/run/rpcbind.sock** and the portmapper IP port, and so on. And think about it, this is actually the only thing they wait for!

Now, if that's all they are waiting for, if we manage to make those sockets available for connection __earlier__ and only actually wait for that instead of the full daemon start-up, then we can speed up the entire boot and start more processes in parallel. So, how can we do that? Actually quite easily in Unix-like systems: __we can create the listening sockets before we actually start the daemon, and then just pass the socket during exec() to it.__ That way, we can create all sockets for all daemons in one step in the init system, and then in a second step run all daemons at once. If a service needs another, and it is not fully started up, that's completely OK: what will happen is that the connection is queued in the providing service and the client will **potentially block** on that single request. But only that one client will block and only on that one request. Also, dependencies between services will __no longer__ necessarily have to be configured to allow proper parallelized start-up: if we start all sockets at once and a service needs another it can be sure that it can connect to its socket.

Because this is at the core of what is following, let me say this again, with different words and by example: if you start syslog and and various syslog clients at the same time, what will happen in the scheme pointed out above is that the messages of the clients will be added to the **/dev/log** socket buffer. As long as that buffer doesn't run full, the clients will not have to wait in any way and can immediately proceed with their start-up. As soon as syslog itself finished start-up, it will dequeue all messages and process them. Another example: we start D-Bus and several clients at the same time. If a synchronous bus request is sent and hence a reply expected, what will happen is that the client will have to **block**, however only that one client and only until D-Bus managed to catch up and process it.

Basically, __the kernel socket buffers help us to maximize parallelization, and the ordering and synchronization is done by the kernel,__ without any further management from userspace! And if all the sockets are available before the daemons actually start-up, dependency management also becomes redundant (or at least secondary): if a daemon needs another daemon, it will just connect to it. If the other daemon is already started, this will immediately succeed. If it isn't started but in the process of being started, the first daemon will not even have to wait for it, unless it issues **a synchronous request.** And even if the other daemon is not running at all, it can be **auto-spawned**. From the first daemon's perspective there is no difference, hence dependency management becomes mostly unnecessary or at least secondary, and all of this in optimal parallelization and optionally with __on-demand loading__. On top of this, this is also more robust, because the sockets stay available regardless whether the actual daemons might temporarily become unavailable (maybe due to crashing). In fact, you can easily write a daemon with this that can run, and exit (or crash), and run again and exit again (and so on), and all of that without the clients noticing or loosing any request.

It's a good time for a pause, go and refill your coffee mug, and be assured, there is more interesting stuff following.

But first, let's clear a few things up: is this kind of logic new? No, it certainly is not. The most prominent system that works like this is Apple's launchd system: on MacOS the listening of the sockets is pulled out of all daemons and done by launchd. The services themselves hence can all start up in parallel and dependencies need not to be configured for them. And that is actually a really ingenious design, and the primary reason why MacOS manages to provide the fantastic boot-up times it provides. I can highly recommend this video where the launchd folks explain what they are doing. Unfortunately this idea never really took on outside of the Apple camp.

The idea is actually even older than launchd. Prior to launchd the venerable **inetd** worked much like this: sockets were centrally created in a daemon that would start the actual service daemons passing the socket file descriptors during exec(). However the focus of inetd certainly wasn't local services, but Internet services (although later reimplementations supported AF_UNIX sockets, too). It also wasn't a tool to parallelize boot-up or even useful for getting implicit dependencies right.

For TCP sockets inetd was primarily used in a way that for every incoming connection a new daemon instance was spawned. That meant that for each connection a new process was spawned and initialized, which is __not__ a recipe for high-performance servers. However, right from the beginning inetd also supported another mode, where a single daemon was spawned on the first connection, and that single instance would then __go on__ and also accept the follow-up connections (that's what the wait/nowait option in inetd.conf was for, a particularly badly documented option, unfortunately.) Per-connection daemon starts probably gave inetd its bad reputation for being slow. But that's not entirely fair.

===== Parallelizing Bus Services =====
__Modern daemons on Linux tend to provide services via D-Bus instead of plain AF_UNIX sockets__. Now, the question is, for those services, can we apply the same parallelizing boot logic as for traditional socket services? Yes, we can, D-Bus already has all the right hooks for it: **using bus activation a service can be started the first time it is accessed**. Bus activation also gives us the **minimal per-request synchronisation** we need for starting up the providers and the consumers of D-Bus services at the same time: if we want to start Avahi at the same time as CUPS (side note: CUPS uses Avahi to browse for mDNS/DNS-SD printers), then we can simply run them at the same time, and if CUPS is quicker than Avahi via the bus activation logic we can get D-Bus to queue the request until Avahi manages to establish its service name.

So, in summary: the __socket-based service activation and the bus-based service activation together enable us to start all daemons in parallel__, without any further synchronization. Activation also allows us to do lazy-loading of services: if a service is rarely used, we can just load it the first time somebody accesses the socket or bus name, instead of starting it during boot.
__这里的activation指的是开启或激活某个服务，而不需要等待该服务的提供者或被依赖的服务启动完成。这是通过事先建立相关的socket 文件实现的。同时只有当某个进程(服务)读写该事先建立的socket file时，另外一个进程(服务)才会被启动，从而既达到了并行启动服务(不需要关心服务间的依赖)，又可以延迟加载服务的目的。__

And if that's not great, then I don't know what is great!

除了提供服务的daemon间的同步外，文件系统相关的jobs也需要同步。

===== Parallelizing File System Jobs =====
If you look at the serialization graphs of the boot process of current distributions, there are more synchronisation points than just daemon start-ups: most prominently there are **file-system related jobs**: mounting, fscking, quota（磁盘配额）. Right now, on boot-up a lot of time is spent idling to wait until all devices that are listed in **/etc/fstab** show up in the device tree and are then fsck'ed, mounted, quota checked (if enabled). Only after that is fully finished we go on and boot the actual services.   对磁盘的fsck发生在mount它之前。

Can we improve this? It turns out we can. Harald Hoyer came up with the idea of using the venerable **autofs** system for this:

Just like a connect() call shows that a service is interested in another service, an open() (or a similar call) shows that a service is interested in a specific file or file-system. So, in order to improve how much we can parallelize we can make those apps wait only if a file-system they are looking for is not yet mounted and readily available: __we set up an autofs mount point, and then when our file-system finished fsck and quota due to normal boot-up we replace it by the real mount__. While the file-system is not ready yet, the access will be queued by the kernel and the accessing process will block, but only that one daemon and only that one access. And this way we can begin starting our daemons even before all file systems have been fully made available -- without them missing any files, and maximizing parallelization.

Parallelizing file system jobs and service jobs __does not make sense for /__, after all that's where the service binaries are usually stored. However, for file-systems such as /home, that usually are bigger, even encrypted, possibly remote and seldom accessed by the usual **boot-up daemons**, this can improve boot time considerably. It is probably not necessary to mention this, but __virtual file systems, such as procfs or sysfs should never be mounted via autofs（因为它们是kernel动态生产的，没有fsck的开销。）__.

I wouldn't be surprised if some readers might find integrating autofs in an init system a bit fragile and even weird, and maybe more on the "crackish" side of things. However, having played around with this extensively I can tell you that this actually feels quite right. __Using autofs here simply means that we can create a mount point without having to provide the backing file system right-away.__ In effect it hence only **delays accesses**. If an application tries to access an autofs file-system and we take very long to replace it with the real file-system, it will hang in an interruptible sleep, meaning that you can safely cancel it, for example via C-c. Also note that at any point, if the mount point should not be mountable in the end (maybe because fsck failed), we can just tell autofs to return a clean error code (like ENOENT). So, I guess what I want to say is that even though integrating autofs into an init system might appear adventurous at first, our experimental code has shown that this idea works surprisingly well in practice -- if it is done for the right reasons and the right way.

Also note that these should be direct autofs mounts, meaning that from an application perspective there's little effective difference between a classic mount point and one based on autofs.

===== Keeping the First User PID Small =====
Another thing we can learn from the MacOS boot-up logic is that __shell scripts are evil__. Shell is fast and shell is slow. It is fast to hack, but slow in execution. The classic sysvinit boot logic is modelled around shell scripts. Whether it is /bin/bash or any other shell (that was written to make shell scripts faster), in the end the approach is doomed to be slow. On my system the scripts in /etc/init.d call grep at least 77 times. awk is called 92 times, cut 23 and sed 74. Every time those commands (and others) are called, a process is spawned, the libraries searched, some start-up stuff like i18n and so on set up and more. And then after seldom doing more than a trivial string operation the process is terminated again. Of course, that has to be incredibly slow. No other language but shell would do something like that. On top of that, shell scripts are also very fragile, and change their behaviour drastically based on environment variables and suchlike, stuff that is hard to oversee and control.

So, let's __get rid of shell scripts__ in the boot process! Before we can do that we need to figure out what they are currently actually used for: well, the big picture is that most of the time, what they do is actually quite boring. Most of the scripting is spent on trivial **setup and tear-down of services**, and should be rewritten in C, either in separate executables, or moved into the daemons themselves, or simply be done in the init system.

It is not likely that we can get rid of shell scripts during system boot-up entirely anytime soon. Rewriting them in C takes time, in a few case does not really make sense, and sometimes shell scripts are just too handy to do without. But we can certainly make them less prominent.

A good metric for measuring shell script infestation of the boot process is the PID number of the first process you can start after the system is fully booted up. Boot up, log in, open a terminal, and type echo $$. Try that on your Linux system, and then compare the result with MacOS! (Hint, it's something like this: Linux PID 1823; MacOS PID 154, measured on test systems we own.)

===== Keeping Track of Processes =====
A central part of a system that starts up and maintains services should be process babysitting: it should __watch services__. Restart them if they shut down. If they crash it should collect information about them, and keep it around for the administrator, and cross-link that information with what is available from crash dump systems such as abrt, and in logging systems like syslog or the audit system.

It should also be capable of shutting down a service completely. That might sound easy, but is harder than you think. __Traditionally on Unix a process that does double-forking can escape the supervision of its parent__, and the old parent will not learn about the relation of the new process to the one it actually started. An example: currently, a misbehaving CGI script that has double-forked is not terminated when you shut down Apache. Furthermore, you will not even be able to figure out its relation to Apache, unless you know it by name and purpose. **double-forking可以使子进程的父进程变为init进程。**

So, how can we keep track of processes, so that they cannot escape the babysitter, and that we can control them as one unit even if they fork a gazillion times?

Different people came up with different solutions for this. I am not going into much detail here, but let's at least say that approaches based on **ptrace** or the **netlink connector** (a kernel interface which allows you to get a netlink message each time any process on the system fork()s or exit()s) that some people have investigated and implemented, have been criticised as ugly and not very scalable.

So what can we do about this? Well, since quite a while the kernel knows __Control Groups (aka "cgroups")__. Basically they allow the creation of a hierarchy of groups of processes. The hierarchy is __directly exposed in a virtual file-system__, and hence easily accessible. The group names are basically directory names in that file-system. If a process belonging to a specific cgroup fork()s, its child will become a member of the same group. Unless it is privileged and has access to the cgroup file system it cannot escape its group. Originally, cgroups have been introduced into the kernel for the purpose of containers: certain kernel subsystems can enforce limits on resources of certain groups, such as limiting CPU or memory usage. __Traditional resource limits (as implemented by setrlimit()) are (mostly) per-process. cgroups on the other hand let you enforce limits on entire groups of processes.__ cgroups are also useful to enforce limits outside of the immediate container use case. You can use it for example to limit the total amount of memory or CPU Apache and all its children may use. Then, a misbehaving CGI script can no longer escape your setrlimit() resource control by simply forking away.

控制组的功能：
1. 进程层次结构的容器
2. 进程组资源限制
3. daemon追踪。

In addition to container and resource limit enforcement cgroups are very useful to k__eep track of daemons__: cgroup membership is securely inherited by child processes, they cannot escape. There's a notification system available so that a supervisor process can be notified when a cgroup __runs empty__. You can find the cgroups of a process by reading /proc/$PID/cgroup. cgroups hence make a very good choice to keep track of processes for babysitting purposes.

===== Controlling the Process Execution Environment =====
A good babysitter should not only oversee and control when a daemon starts, ends or crashes, but also set up a good, minimal, and secure __working environment__ for it.

That means setting obvious process parameters such as the setrlimit() resource limits, user/group IDs or the environment block, but does not end there. The Linux kernel gives users and administrators a lot of control over processes (some of it is rarely used, currently). For each process you can set CPU and IO scheduler controls, the capability bounding set, CPU affinity or of course cgroup environments with additional limits, and more.

As an example, ioprio_set() with IOPRIO_CLASS_IDLE is a great away to minimize the effect of locate's updatedb on system interactivity.在系统进行交互式，可以使用ioprio_set()函数降低updatedb命令的优先级。

On top of that certain high-level controls can be very useful, such as setting up read-only file system overlays based on __read-only bind__ mounts. That way one can run certain daemons so that all (or some) file systems appear read-only to them, so that EROFS is returned on every write request. As such this can be used to lock down what daemons can do similar in fashion to a poor man's SELinux policy system (but this certainly doesn't replace SELinux, don't get any bad ideas, please).

Finally logging is an important part of executing services: ideally every bit of output a service generates should be logged away. An init system should hence provide logging to daemons it spawns right from the beginning, and connect stdout and stderr to syslog or in some cases even **/dev/kmsg** which in many cases makes a very useful replacement for syslog (embedded folks, listen up!), especially in times where the kernel log buffer is configured ridiculously large out-of-the-box.

===== On Upstart =====
To begin with, let me emphasize that I actually like the code of Upstart, it is very well commented and easy to follow. It's certainly something other projects should learn from (including my own).

That being said, I can't say I agree with the general approach of Upstart. But first, a bit more about the project:

Upstart __does not share code with sysvinit__, and its functionality is a super-set of it, and provides compatibility to some degree with the well known SysV init scripts. It's main feature is its __event-based approach__: starting and stopping of processes is bound to "events" happening in the system, where an "event" can be a lot of different things, such as: a network interfaces becomes available or some other software has been started.

Upstart **does service serialization via these events**: if the syslog-started event is triggered this is used as an indication to start D-Bus since it can now make use of Syslog. And then, when dbus-started is triggered, NetworkManager is started, since it may now use D-Bus, and so on.

One could say that this way the actual logical dependency tree that exists and is understood by the admin or developer is translated and encoded into event and action rules: every logical "a needs b" rule that the administrator/developer is aware of becomes a "start a when b is started" plus "stop a when b is stopped". In some way this certainly is a simplification: especially for the code in Upstart itself. However I would argue that this simplification is actually detrimental. First of all, the logical dependency system does not go away, the person who is writing Upstart files must now __translate the dependencies manually into these event/action rules__ (actually, two rules for each dependency). So, instead of letting the computer figure out what to do based on the dependencies, the user has to manually translate the dependencies into simple event/action rules. Also, because the dependency information has never been encoded it is not available at runtime, effectively meaning that an administrator who tries to figure our why something happened, i.e. why a is started when b is started, has no chance of finding that out.（以来信息不能在运行时获取）

Furthermore, the event logic **turns around all dependencies**, from the feet onto their head. Instead of minimizing the amount of work (which is something that a good init system should focus on, as pointed out in the beginning of this blog story), it actually **maximizes the amount of work** to do during operations. Or in other words, instead of having a clear goal and only doing the things it really needs to do to reach the goal, it does one step, and then after finishing it, it does all steps that possibly could follow it.

Or to put it simpler: the fact that the user just started D-Bus is in no way an indication that NetworkManager should be started too (but this is what Upstart would do). It's right the other way round: when the user asks for NetworkManager, that is definitely an indication that D-Bus should be started too (which is certainly what most users would expect, right?).

__A good init system should start only what is needed, and that on-demand__. Either lazily or parallelized and in advance. However it should not start more than necessary, particularly not everything installed that could use that service.

Finally, I fail to see the actual usefulness of the event logic. It appears to me that most events that are exposed in Upstart actually are not punctual（准时的，正点的） in nature, but have duration: a service starts, is running, and stops. A device is plugged in, is available, and is plugged out again. A mount point is in the process of being mounted, is fully mounted, or is being unmounted. A power plug is plugged in, the system runs on AC, and the power plug is pulled. Only a minority of the events __an init system or process supervisor__ should handle are actually punctual, most of them are tuples of start, condition, and stop. This information is again not available in Upstart, because it focuses in singular events, and ignores durable dependencies.

Now, I am aware that some of the issues I pointed out above are in some way mitigated by certain more recent changes in Upstart, particularly condition based syntaxes such as start on (local-filesystems and net-device-up IFACE=lo) in Upstart rule files. However, to me this appears mostly as an attempt to fix a system whose __core design is flawed__.

Besides that Upstart does OK for babysitting daemons, even though some choices might be questionable (see above), and there are certainly a lot of missed opportunities (see above, too).

There are other init systems besides sysvinit, **Upstart and launchd**. Most of them offer little substantial more than Upstart or sysvinit. The most interesting other contender is Solaris SMF, which supports proper dependencies between services. However, in many ways it is overly complex and, let's say, a bit academic with its excessive use of XML and new terminology for known things. It is also closely bound to Solaris specific features such as the contract system.

===== Putting it All Together: systemd =====
Well, this is another good time for a little pause, because after I have hopefully explained above what I think a good PID 1 should be doing and what the current most used system does, we'll now come to where the beef is. So, go and refill you coffee mug again. It's going to be worth it.

You probably guessed it: what I suggested above as requirements and features for an ideal init system is actually available now, in a (still experimental) init system called __systemd__, and which I hereby want to announce. Again, here's the code. And here's a quick rundown of its features, and the rationale behind them:

systemd __starts up and supervises__ the entire system (hence the name...). It implements all of the features pointed out above and a few more. It is based around the notion of **units.** Units have a name and a type. Since their configuration is usually loaded directly from the file system, these **unit names are actually file names**. Example: a unit **avahi.service** is read from a configuration file by the same name, and of course could be a unit //encapsulating// the Avahi daemon. There are several kinds of units:

* **service**: these are the most obvious kind of unit: daemons that can be started, stopped, restarted, reloaded. For compatibility with SysV we not only support our own configuration files for services, but also are __able to read classic SysV init scripts__, in particular we parse the LSB header, if it exists. /etc/init.d is hence not much more than just **another source of configuration**.
* **socket**: this unit encapsulates a socket in the file-system or on the Internet. We currently support AF_INET, AF_INET6, AF_UNIX sockets of the types stream, datagram, and sequential packet. We also support **classic FIFOs** as transport. __Each socket unit has a matching service unit__, **that is started if the first connection comes in on the socket or FIFO**. Example: nscd.socket starts nscd.service on an incoming connection. __当socket unit被连接时，它会启动相应的service unit。__
* device: this unit encapsulates a device in the Linux device tree. If a device is marked for this __via udev rules__, it will be exposed as a device unit in systemd. Properties set with udev can be used as configuration source to set dependencies for device units.
* mount: this unit encapsulates a mount point in the file system hierarchy. systemd monitors all mount points how they come and go, and can also be used to mount or unmount mount-points. **/etc/fstab** is used here as an additional configuration source for these mount points, similar to how SysV init scripts can be used as additional configuration source for service units.
* automount: this unit type encapsulates an automount point in the file system hierarchy. __Each automount unit has a matching mount unit__, which is started (i.e. mounted) as soon as the automount directory is accessed.
* target: this unit type is used for **logical grouping of units**: instead of actually doing anything by itself it simply **references other units**, which thereby can be controlled together. Examples for this are: multi-user.target, which is a target that basically plays the role of run-level 5 on classic SysV system, or bluetooth.target which is requested as soon as a bluetooth dongle becomes available and which simply __pulls in__ bluetooth related services that otherwise would not need to be started: bluetoothd and obexd and suchlike.  **按需启动。**
* snapshot: similar to target units snapshots do not actually do anything themselves and their only purpose is to **reference other units**. Snapshots can be used to __save/rollback the state of all services and units__ of the init system. Primarily it has two intended use cases: to allow the user to temporarily enter a specific state such as "Emergency Shell", terminating current services, and provide an easy way to return to the state before, pulling up all services again that got temporarily pulled down. And to ease support for system suspending: still many services cannot correctly deal with system suspend, and it is often a better idea to shut them down before suspend, and restore them afterwards.

All these units can have dependencies between each other (both positive and negative, i.e. 'Requires' and 'Conflicts'): a device can have a dependency on a service, meaning that as soon as a device becomes available a certain service is started. Mounts get an implicit dependency on the device they are mounted from. Mounts also gets implicit dependencies to mounts that are their prefixes (i.e. a mount /home/lennart implicitly gets a dependency added to the mount for /home) and so on.

===== A short list of other features: =====
1. For each process that is spawned, you may __control__: the environment, resource limits, working and root directory, umask, OOM killer adjustment, nice level, IO class and priority, CPU policy and priority, CPU affinity, timer slack, user id, group id, supplementary group ids, readable/writable/inaccessible directories, shared/private/slave mount flags, capabilities/bounding set, secure bits, CPU scheduler reset of fork, private /tmp name-space, cgroup control for various subsystems. Also, you can easily connect stdin/stdout/stderr of services to syslog, /dev/kmsg, arbitrary TTYs. If connected to a TTY for input systemd will make sure a process gets exclusive access, optionally waiting or enforcing it.
2. Every executed process gets __its own cgroup__ (currently by default in the debug subsystem, since that subsystem is not otherwise used and does not much more than the most basic process grouping), and it is very easy to configure systemd to place services in cgroups that have been configured externally, for example via the libcgroups utilities.
3. The native configuration files use a syntax that closely follows the well-known __.desktop files__. It is a simple syntax for which parsers exist already in many software frameworks. Also, this allows us to rely on existing tools for i18n for service descriptions, and similar. Administrators and developers don't need to learn a new syntax.
4. As mentioned, we __provide compatibility with SysV init scripts__. We take advantages of **LSB and Red Hat chkconfig headers** if they are available. If they aren't we try to make the best of the otherwise available information, such as the start priorities in /etc/rc.d. These init scripts are simply **considered a different source of configuration**, hence an easy upgrade path to proper systemd services is available. Optionally we can read classic PID files for services to identify the main pid of a daemon. **Note that we make use of the dependency information from the LSB init script headers, and translate those into native systemd dependencies.** Side note: Upstart is unable to harvest and make use of that information. Boot-up on a plain Upstart system with mostly LSB SysV init scripts will hence not be parallelized, a similar system running systemd however will. In fact, for Upstart all SysV scripts together make one job that is executed, they are not treated individually, again in contrast to systemd where SysV init scripts are just another source of configuration and are all treated and controlled individually, much like any other native systemd service.
5. Similarly, we read the existing __/etc/fstab__ configuration file, and consider it just another source of configuration. Using the comment= fstab option you can even mark /etc/fstab entries to become systemd controlled automount points.
6. If the same unit is configured in multiple configuration sources (e.g. /etc/systemd/system/avahi.service exists, and /etc/init.d/avahi too), then the native configuration will always take precedence, the legacy format is ignored, allowing an easy upgrade path and packages to carry both a SysV init script and a systemd service file for a while.
7. We support a simple __templating/instance mechanism__. Example: instead of having six configuration files for six gettys, we only have one **getty@.service** file which gets instantiated to **getty@tty2.service** and suchlike. The interface part can even __be inherited by dependency expressions__, i.e. it is easy to encode that a service dhcpcd@eth0.service pulls in avahi-autoipd@eth0.service, while leaving the eth0 string wild-carded.
8. For socket activation we support full __compatibility with the traditional inetd modes__, as well as a very simple mode that tries to mimic launchd socket activation and is recommended for new services. The inetd mode only allows passing one socket to the started daemon, while the native mode supports passing arbitrary numbers of file descriptors. We also support one instance per connection, as well as one instance for all connections modes. In the former mode we **name the cgroup the daemon will be started in after the connection parameters, and utilize the templating logic mentioned** above for this. Example: sshd.socket might spawn services __（先建立socket，然后该socket按需spawn services daemon）__**sshd@192.168.0.1-4711-192.168.0.2-22.service** with a cgroup of **sshd@.service/192.168.0.1-4711-192.168.0.2-22** (i.e. the IP address and port numbers are used in the instance names. For AF_UNIX sockets we use PID and user id of the connecting client). This provides a nice way for the administrator to identify the various instances of a daemon and control their runtime individually. The native socket passing mode is very easily implementable in applications: if $LISTEN_FDS is set it contains the number of sockets passed and the daemon will find them sorted as listed in the .service file, starting from file descriptor 3 (a nicely written daemon could also use fstat() and getsockname() to identify the sockets in case it receives more than one). In addition we set $LISTEN_PID to the PID of the daemon that shall receive the fds, because environment variables are normally inherited by sub-processes and hence could confuse processes further down the chain. Even though this socket passing logic is very simple to implement in daemons, we will provide a BSD-licensed reference implementation that shows how to do this. We have ported a couple of existing daemons to this new scheme.
9. We provide compatibility with __/dev/initctl__ to a certain extent. This compatibility is in fact implemented with a FIFO-activated service, which simply translates these legacy requests to D-Bus requests. Effectively this means the old shutdown, poweroff and similar commands from Upstart and sysvinit continue to work with systemd.
10. We also provide compatibility with __utmp and wtmp__. Possibly even to an extent that is far more than healthy, given how crufty utmp and wtmp are.
11. systemd supports several kinds of dependencies between units. **After/Before** can be used to fix the ordering how units are activated. It is completely orthogonal （直角的，正交的）to **Requires and Wants**, which express a positive requirement dependency, either mandatory, or optional（After/Before和Requries/Wants的含义相同）. Then, there is **Conflicts** which expresses a negative requirement dependency. Finally, there are three further, less used dependency types.
12. systemd has a minimal __transaction system（事务系统）__. Meaning: if a unit is requested to start up or shut down we will add **it and all its dependencies** to a temporary transaction. Then, we will verify if the transaction is consistent (i.e. whether the ordering via After/Before of all units is cycle-free). If it is not, systemd will try to fix it up, and removes non-essential jobs from the transaction that might remove the loop. Also, systemd tries to suppress non-essential jobs in the transaction that would stop a running service. Non-essential jobs are those which the original request did not directly include but which where pulled in by Wants type of dependencies. Finally we check whether the jobs of the transaction contradict jobs that have already been queued, and optionally the transaction is aborted then. If all worked out and the transaction is consistent and minimized in its impact it is merged with all already outstanding jobs and added to the run queue. Effectively this means that before executing a requested operation, we will verify that it makes sense, fixing it if possible, and only failing if it really cannot work.
13. We record start/exit time as well as the PID and exit status of every process we spawn and supervise. This data can be used to **cross-link** daemons with their data in abrtd, auditd and syslog. Think of an UI that will highlight crashed daemons for you, and allows you to easily navigate to the respective UIs for syslog, abrt, and auditd that will show the data generated from and for this daemon on a specific run.
14. We support __reexecution of the init process itself__ at any time. The daemon state is serialized before the reexecution and deserialized afterwards. That way we provide a simple way to facilitate init system upgrades as well as handover from an initrd daemon to the final daemon. Open sockets and autofs mounts are properly serialized away, so that they stay connectible all the time, in a way that clients will not even notice that the init system reexecuted itself. Also, the fact that a big part of the service state is encoded anyway in the cgroup virtual file system would even allow us to resume execution without access to the serialization data. The reexecution code paths are actually mostly the same as the init system configuration reloading code paths, which guarantees that reexecution (which is probably more seldom triggered) gets similar testing as reloading (which is probably more common).
15. Starting the work of __removing shell scripts from the boot process__ we have recoded part of the basic system setup in C and moved it directly into systemd. Among that is mounting of the API file systems (i.e. virtual file systems such as /proc, /sys and /dev.) and setting of the host-name.
16. Server state is introspectable and controllable via __D-Bus__. This is not complete yet but quite extensive.
17. While we want to emphasize __socket-based and bus-name-based activation__, and we hence support dependencies between sockets and services, we also support traditional inter-service dependencies. We support multiple ways how such a service can signal its readiness: by forking and having the start process exit (i.e. traditional daemonize() behaviour), as well as by watching the bus until a configured service name appears.
18. There's an __interactive mode__ which asks for confirmation each time a process is spawned by systemd. You may enable it by passing systemd.confirm_spawn=1 on the kernel command line.
19. With the **systemd.default=** kernel command line parameter you can specify which unit systemd should start on boot-up. Normally you'd specify something like multi-user.target here, but another choice could even be a single service instead of a target, for example out-of-the-box we ship a service //emergency.service// that is similar in its usefulness as init=/bin/bash, however has the advantage of actually running the init system, hence offering the option to boot up the full system from the emergency shell.
20. There's a minimal UI that allows you to start/stop/introspect services. It's far from complete but useful as a debugging tool. It's written in Vala (yay!) and goes by the name of **systemadm**.

It should be noted that systemd uses many Linux-specific features, and does not limit itself to POSIX. That unlocks a lot of functionality a system that is designed for portability to other operating systems cannot provide.

===== Status =====
All the features listed above are already implemented. Right now systemd can already be used as a drop-in replacement for Upstart and sysvinit (at least as long as there aren't too many native upstart services yet. Thankfully most distributions don't carry too many native Upstart services yet.)

However, testing has been minimal, our version number is currently at an impressive 0. Expect breakage if you run this in its current state. That said, overall it should be quite stable and some of us already boot their normal development systems with systemd (in contrast to VMs only). YMMV, especially if you try this on distributions we developers don't use.

===== Where is This Going? =====
The feature set described above is certainly already comprehensive. However, we have a few more things on our plate. I don't really like speaking too much about big plans but here's a short overview in which direction we will be pushing this:

We want to add at least two more unit types: __swap__ shall be used to control swap devices the same way we already control mounts, i.e. with automatic dependencies on the device tree devices they are activated from, and suchlike. __timer__ shall provide functionality similar to cron, i.e. starts services based on time events, the focus being both monotonic clock and wall-clock/calendar events. (i.e. "start this 5h after it last ran" as well as "start this every monday 5 am")

More importantly however, it is also our plan to experiment with systemd not only for optimizing boot times, but also to make it the ideal __session manager__, to replace (or possibly just augment) **gnome-session, kdeinit** and similar daemons. The problem set of a session manager and an init system are very similar: quick start-up is essential and babysitting processes the focus. Using the same code for both uses hence suggests itself. Apple recognized that and does just that with launchd. And so should we: socket and bus based activation and parallelization is something session services and system services can benefit from equally.

I should probably note that all three of these features are already partially available in the current code base, but not complete yet. For example, already, you can run systemd just fine as a normal user, and it will detect that is run that way and support for this mode has been available since the very beginning, and is in the very core. (It is also exceptionally useful for debugging! This works fine even without having the system otherwise converted to systemd for booting.)

However, there are some things we probably should //fix in the kernel// and elsewhere before finishing work on this: we need swap status change notifications from the kernel similar to how we can already subscribe to mount changes; we want a notification when CLOCK_REALTIME jumps relative to CLOCK_MONOTONIC; we want to allow normal processes to get some init-like powers; we need a well-defined place where we can put user sockets. None of these issues are really essential for systemd, but they'd certainly improve things.

===== You Want to See This in Action? =====
Currently, there are no tarball releases, but it should be straightforward to check out the code from our repository. In addition, to have something to start with, here's a tarball with unit configuration files that allows an otherwise unmodified Fedora 13 system to work with systemd. We have no RPMs to offer you for now.

An easier way is to download this Fedora 13 qemu image, which has been prepared for systemd. In the grub menu you can select whether you want to boot the system with Upstart or systemd. Note that this system is minimally modified only. Service information is read exclusively from the existing SysV init scripts. Hence it will not take advantage of the full socket and bus-based parallelization pointed out above, however it will interpret the parallelization hints from the LSB headers, and hence boots faster than the Upstart system, which in Fedora does not employ any parallelization at the moment. The image is configured to output debug information on the serial console, as well as writing it to the kernel log buffer (which you may access with dmesg). You might want to run qemu configured with a virtual serial terminal. All passwords are set to systemd.

Even simpler than downloading and booting the qemu image is looking at pretty screen-shots. Since an init system usually is well hidden beneath the user interface, some shots of systemadm and ps must do:

That's systemadm showing all loaded units, with more detailed information on one of the getty instances.

That's an excerpt of the output of ps xaf -eo pid,user,args,cgroup showing how neatly the processes are sorted into the cgroup of their service. (The fourth column is the cgroup, the debug: prefix is shown because we use the debug cgroup controller for systemd, as mentioned earlier. This is only temporary.)

Note that both of these screenshots show an only minimally modified Fedora 13 Live CD installation, where services are exclusively loaded from the existing SysV init scripts. Hence, this does not use socket or bus activation for any existing service.

Sorry, no bootcharts or hard data on start-up times for the moment. We'll publish that as soon as we have fully parallelized all services from the default Fedora install. Then, we'll welcome you to benchmark the systemd approach, and provide our own benchmark data as well.

Well, presumably everybody will keep bugging me about this, so here are two numbers I'll tell you. However, they are completely unscientific as they are measured for a VM (single CPU) and by using the stop timer in my watch. Fedora 13 booting up with Upstart takes 27s, with systemd we reach 24s (from grub to gdm, same system, same settings, shorter value of two bootups, one immediately following the other). Note however that this shows nothing more than the speedup effect reached by using the LSB dependency information parsed from the init script headers for parallelization. Socket or bus based activation was not utilized for this, and hence these numbers are unsuitable to assess the ideas pointed out above. Also, systemd was set to debug verbosity levels on a serial console. So again, this benchmark data has barely any value.

===== Writing Daemons =====
An ideal daemon for use with systemd does a few things differently then things were traditionally done. Later on, we will publish a longer guide explaining and suggesting how to write a daemon for use with this systemd. Basically, things get simpler for daemon developers:

* We ask daemon writers not to fork or even double fork in their processes, but run their event loop from the initial process systemd starts for you. Also, don't call setsid().
* Don't drop user privileges in the daemon itself, leave this to systemd and configure it in systemd service configuration files. (There are exceptions here. For example, for some daemons there are good reasons to drop privileges inside the daemon code, after an initialization phase that requires elevated privileges.)
* Don't write PID files
* Grab a name on the bus
* You may rely on systemd for logging, you are welcome to log whatever you need to log to stderr.
* __Let systemd create and watch sockets for yo__u, so that socket activation works. Hence, interpret $LISTEN_FDS and $LISTEN_PID as described above.
* Use SIGTERM for requesting shut downs from your daemon.

The list above is very similar to what Apple recommends for daemons compatible with launchd. It should be easy to extend daemons that already support launchd activation to support systemd activation as well.

Note that systemd supports daemons not written in this style perfectly as well, already for compatibility reasons (launchd has only limited support for that). As mentioned, this even extends to existing inetd capable daemons which can be used unmodified for socket activation by systemd.

So, yes, should systemd prove itself in our experiments and get adopted by the distributions it would make sense to port at least those services that are started by default to use socket or bus-based activation. We have written proof-of-concept patches, and the porting turned out to be very easy. Also, we can leverage the work that has already been done for launchd, to a certain extent. Moreover, adding support for socket-based activation does not make the service incompatible with non-systemd systems.

===== FAQs =====
Who's behind this?
    Well, the current code-base is mostly my work, Lennart Poettering (Red Hat). However the design in all its details is result of close cooperation between Kay Sievers (Novell) and me. Other people involved are Harald Hoyer (Red Hat), Dhaval Giani (Formerly IBM), and a few others from various companies such as Intel, SUSE and Nokia.
Is this a Red Hat project?
    No, this is my personal side project. Also, let me emphasize this: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.
Will this come to Fedora?
    If our experiments prove that this approach works out, and discussions in the Fedora community show support for this, then yes, we'll certainly try to get this into Fedora.
Will this come to OpenSUSE?
    Kay's pursuing that, so something similar as for Fedora applies here, too.
Will this come to Debian/Gentoo/Mandriva/MeeGo/Ubuntu/[insert your favourite distro here]?
    That's up to them. We'd certainly welcome their interest, and help with the integration.
Why didn't you just add this to Upstart, why did you invent something new?
    Well, the point of the part about Upstart above was to show that the core design of Upstart is flawed, in our opinion. Starting completely from scratch suggests itself if the existing solution appears flawed in its core. However, note that we took a lot of inspiration from Upstart's code-base otherwise.
If you love Apple launchd so much, why not adopt that?
    launchd is a great invention, but I am not convinced that it would fit well into Linux, nor that it is suitable for a system like Linux with its immense scalability and flexibility to numerous purposes and uses.
Is this an NIH project?
    Well, I hope that I managed to explain in the text above why we came up with something new, instead of building on Upstart or launchd. We came up with systemd due to technical reasons, not political reasons.
    Don't forget that it is Upstart that includes a library called NIH (which is kind of a reimplementation of glib) -- not systemd!
Will this run on [insert non-Linux OS here]?
    Unlikely. As pointed out, systemd uses many Linux specific APIs (such as epoll, signalfd, libudev, cgroups, and numerous more), a port to other operating systems appears to us as not making a lot of sense. Also, we, the people involved are unlikely to be interested in merging possible ports to other platforms and work with the constraints this introduces. That said, git supports branches and rebasing quite well, in case people really want to do a port.
    Actually portability is even more limited than just to other OSes: we require a very recent Linux kernel, glibc, libcgroup and libudev. No support for less-than-current Linux systems, sorry.
    If folks want to implement something similar for other operating systems, the preferred mode of cooperation is probably that we help you identify which interfaces can be shared with your system, to make life easier for daemon writers to support both systemd and your systemd counterpart. Probably, the focus should be to share interfaces, not code.
I hear [fill one in here: the Gentoo boot system, initng, Solaris SMF, runit, uxlaunch, ...] is an awesome init system and also does parallel boot-up, so why not adopt that?
    Well, before we started this we actually had a very close look at the various systems, and none of them did what we had in mind for systemd (with the exception of launchd, of course). If you cannot see that, then please read again what I wrote above.

===== Contributions =====

We are very interested in patches and help. It should be common sense that every Free Software project can only benefit from the widest possible external contributions. That is particularly true for a core part of the OS, such as an init system. We value your contributions and hence do not require copyright assignment (Very much unlike Canonical/Upstart!). And also, we use git, everybody's favourite VCS, yay!

We are particularly interested in help getting systemd to work on other distributions, besides Fedora and OpenSUSE. (Hey, anybody from Debian, Gentoo, Mandriva, MeeGo looking for something to do?) But even beyond that we are keen to attract contributors on every level: we welcome C hackers, packagers, as well as folks who are interested to write documentation, or contribute a logo.

===== Community =====

At this time we only have source code repository and an IRC channel (#systemd on Freenode). There's no mailing list, web site or bug tracking system. We'll probably set something up on freedesktop.org soon. If you have any questions or want to contact us otherwise we invite you to join us on IRC!

Update: our GIT repository has moved.

posted at: 10:46 | path: /projects | permanent link to this entry | 336 comments
Posted by Raphael (esarbee) at Fri Apr 30 13:12:10 2010
Woah, quite a read! While I somewhat dislike the idea of yet another services management solution, I like it coming from you. You keep rocking the boat - as you did with PA, which I like very much - and that's a good thing.

I admit that I didn't read it with the attention it desired and promise to do so later the day. I do hope, however, that creating custom services will remain at least as easy as it is now. ;)

Posted by Alex Murray at Fri Apr 30 13:27:26 2010
This sounds like the kind of innovation Linux needs - a clearly well thought out solution to a problem, not just someone scratching an itch. Great work as always Lennart. The simplicity of the design (using implicit dependencies rather than hard-coding them) is awesome. Sounds perfect for the embedded space as well.

Would be awesome to see this get picked up by the big players (Ubuntu, Fedora, OpenSUSE, Debian etc).

Posted by Marco Barisione at Fri Apr 30 13:46:42 2010
First you broke networking then sound and now booting? :P

Posted by Kay Sievers at Fri Apr 30 14:02:24 2010
Sounds great, nice announcement. It runs well here on my box. Still a looong way to go ...

Happy so far, and good to know that all the many hours we spent on the phone lead to something that matters - however it will look like in the end. :)

Posted by Luiz Augusto von Dentz at Fri Apr 30 14:09:12 2010
Pretty amazing I must say, I just wonder now if systemd would incorporate things like powertop, monitoring processes/detecting process responsiveness and things like that.

Posted by Michael Scherer at Fri Apr 30 14:29:07 2010
This look nice. But maybe you should have splitted the article in smaller piece, and published them one by one.

About swap, do you think systemd could be extended to dynamically create swap files on the fly, as done on os x, as part as the babysitting ? This would allow distribution to have a simpler partitionnement step, since user will no longer care about this. Of course, some barriers should be added to avoid filling the harddrive with swap file. ( and of course, we should be sure that swap files are as fast as swap partitions ).

Posted by Joshua Pritikin at Fri Apr 30 15:15:04 2010
It sounds like you are fixing real design flaws in upstart. I hope you are aware of runit and bcron. http://smarden.org/runit/ http://untroubled.org/bcron/bcron.html

These are not complete solutions like you are proposing with systemd, but you ought to be familiar with the design of these two tools. I find them exceptionally well engineered.

Posted by John Drinkwater at Fri Apr 30 16:19:07 2010
What was the reason for naming services as /etc/systemd/system/avahi.service rather than /etc/systemd/system/service/avahi (same goes for all units)
Would be more readable (in ps, etc), and get rid of file name extension creep&#8230;

Posted by Grahame Bowland at Fri Apr 30 16:21:44 2010
A major advantage of the startup sequence being in shell is that an administrator can easily insert bits of code to track down problems. It sounds like your design will make it quite a bit more difficult to track down odd things.

For example, I had a RHEL5.5 machine the other day with a dodgy autofs setup; whenever 'autofs' started it remounted '/' readonly. Easy to track down at the moment, but it sounds like it might be trickier with systemd.

While there's a bit of a performance hit, I think on servers where you're booting very infrequently bootup speed is worth trading for determinism and transparency, plus the ability to modify and debug the system easily.

So, to be positive: how would you approach figuring out which startup script / service is causing a nasty problem under systemd?

Posted by Damien Thebault at Fri Apr 30 16:22:24 2010
This looks really nice and it removes a lot of problem from the daemon writers.

In addition, since it encourages a certain design of daemons (no fork, error messages on stderr), I think it's then even easier to use those daemons from any init system.

I really think that something like this should be used in many distributions and become the standard init on linux.

Posted by PJ at Fri Apr 30 17:05:41 2010
re: history

This reminds me a little bit of djb's daemontools thing.  And also of Richard Gooch's Bootscripts ca. 2002 ( http://www.safe-mbox.com/~rgooch/linux/boot-scripts/index.html ).

You seem to have taken the next step, however, and got services essentially autoconfiguring their own dependencies, which is awesome.

re: 'shell scripts are slow'
As Grahame Bowland points out above, the advantage of shell scripts is ease-of-debugging/modification.  I see a few options:
* move them to some dynamic language like python or groovy or something where they can be compiled and so run faster
* provide stripped-down versions of the common shell utils (awk, sed, etc) as builtins to the shell that fall back to calling the full version in complicated cases.  So the simple "sed 's/foo/bar/;'" case could be optimized into a shell builtin. thereby saving a process-spawn.

Also re: startup tools, have you looked at start-stop-daemon ?

Posted by nine at Fri Apr 30 17:36:11 2010
It's not an issue for SSD drives which will surely replace disk drives sooner or later, but: doesn't starting a whole bunch of daemons at once end up spawning a lot of IO requests, causing your disk to spend a whole lot of time on seeking overhead rather than actually reading/writing data?

Posted by Davide Repetto at Fri Apr 30 17:38:19 2010
Very interesting stuff indeed. As usual you rock Lennart!

I understand there may be an option to automatically shutdown seldom used services, do you envision a simple time-out or are you going for a self adjusting timeout?

Posted by sjansen at Fri Apr 30 17:50:08 2010
@nine That's the kernel's responsibility. Perhaps it was a valid concern a couple decades ago, but today it makes sense to design a system that takes advantage of Linux's high quality IO schedulers.

Posted by James Mansion at Fri Apr 30 18:30:01 2010
I don't think Solaris SMF is really the only other major system that handles dependencies and on-demand startup.

You want a system where you can say 'always start A' and 'start B,C on demand' and 'C depends on B' so starting C will start B?  Windows does this.

Posted by Paul Jakma at Fri Apr 30 18:53:02 2010
The process group stuff sounds very close to the contracts stuff put in place in Solaris for SMF. Just in case you're interested in looking over there.

Posted by Robert Szalai at Fri Apr 30 19:18:32 2010
As a personal opinion I very much like this idea. Am I right thinking that to use this to full potential one would need to modify the daemons? Also would it imply that properly written daemons won't need any init scripts? I pretty much dislike the idea of having scripts altogether, the daemons could just read their configuration files. Just wondering why would this be unfeasible, could someone enlighten me, in case I'm too hopeful?

Posted by Anonymous at Fri Apr 30 20:29:48 2010
Very impressive architecture.

I agree with your complaint about shell scripts, and at the same time I want to preserve the configurability they provide.  One crazy notion that might work: what if you use a compiled language like Vala to write the startup scripts, keep the Vala source as the canonical location, compile a binary from the source, and use make-style logic to decide if you need to recompile from the source?  With sufficient library support from systemd, vala should prove nearly as comfortable as shell, but you end up with a fast compiled binary to run, and in-process handling of things like string operations.

(If you want to avoid process startup times entirely, you could compile all the Vala configuration files into a single binary with various modules/functions/etc.)

Posted by Peter Lister at Fri Apr 30 20:31:25 2010
Damn good thinking.  As a sys admin I have hated sysvinit (and the crap that app authors and distributions put in it) for years.

Can you expand on what you think should happen at suspend / hibernate? And what happens for hot-swap hardware?

It seems to me that power-up, suspend/resume and discovery/insertion/removal of hardware are all general events that should be reacted to correctly.  The discovery of a storage medium and the filesystem(s) on it, the subsequent mounting and the starting of appropriate services are essentially the same whether it's /home on a SCSI disk detected at start-up and mounted so that logins can happen or just my inserting my MP3 player to have its podcasts updated.

Posted by Eric Moret at Fri Apr 30 21:50:08 2010
I love everything I read to far. There is one thing I ought to mention though! In the same vein as Polyp Audio (later renamed to Pulse Audio), you should be aware that System D has a somewhat negative meaning in french. See wikipedia entry on System_D.

Posted by Colin Guthrie at Fri Apr 30 22:01:07 2010
Awesome work. I now forgive you for spending time away from #pulseaudio :p

My two major problems with this article:
1. It's very biased towards coffee. I am a tea drinker you insensitive clod!

2. PID 1 is a silly name. You should have called it PID v2.0 like all the cool kids do on the web!

I can't think of any real/sensible criticism so I'll shut up now.

KUTGW as always :)

Col

Posted by Anon at Fri Apr 30 22:15:19 2010
Just when the last init replacement fell the init replacement war starts back up again!

I'd just like to see people standardise on one ideally but any idea if ChromeOS or MeeGo would benefit from this?

Posted by Ahmed Kamal at Fri Apr 30 22:34:00 2010
Wow, quiet the read. Extremely impressive design and analysis. Please keep the informative posts Lenart. And please keep pushing Linux forward :)

Posted by Dieter_be at Fri Apr 30 22:51:33 2010
very interesting read.
I don't think shell scripts are bad though. Sure they are slower and cause bumps in your pids, but they are so easy to hack on.  I think that's the most important.

Posted by Colin McCabe at Fri Apr 30 22:56:29 2010
Looks good so far!

Is the /sbin/service and chkconfig interface going to change with systemd?

Posted by Richi Plana at Fri Apr 30 23:14:09 2010
OMG! Finally!! Several people in the past (myself included) have opened up the idea of implementing system startup in a smarter way (only starting services that are needed and dynamically start things a'la xinetd), but would always get shot down with all sorts of excuses or the infamous "code it yourself" remark.

Thanks for starting this! Hope things go far.

Posted by anonymous at Sat May 1 00:39:14 2010
I've not read through all of this yet, but want to suggest haskell as a possible shell script replacement. haskell is a language with precise semantics - that translates to very tight control of state and could enable very succinct specification of shell script behaviour. you can really use it, it is very easy to understand at it's core (lambda calculus). you could connect with that community, they are very clever i suppose and the code is easy to read if what it describes is "boring" or easy. it might just be perfect. speed is in the same league as C, I think it will use LLVM very soon as well. just have a peek and look at some (easy) code examples!

And there are already replacements for grep, regular expressions and stuff like that to be found in the package repository at hackage.haskell.org , albeit maybe not perfectly structured.

this is just a suggestion :=) take it for what it's worth...

Posted by sztanpet at Sat May 1 00:44:40 2010
I was wondering if it would add any value to have Lua as the configuration format, it might be overkill but having a full fledged scripting language might come in handy

Posted by Claes at Sat May 1 01:27:57 2010
Very interesting. When you eventually start to design the scheduling functionality (cron "replacement"), please consider applying iCalendar semantics (RFC2445) to scheduling rules.

Posted by Richard at Sat May 1 06:14:16 2010
You make a good point about shell script inefficiency (repeated calls to grep,awk,cut etc).

Why not have a slightly larger bash (let's call it "busybash" in reference to busybox) that has some of these built in?

Bash already provides builtins such as echo, kill and test - why not expand the range to include grep,sed,ls,mv  and a few others.

(Bash does have support for loading extensions, but that's not really the point here)

Posted by codebeard at Sat May 1 06:16:20 2010
@ People suggesting replacing shell scripts with python/vala/haskell/whatever

If the goal is to retain easy debugging as /bin/sh provides, then replacing the scripts with another language is not going to achieve that. Part of the reason that shell scripts are so easy to debug and understand is that they are written in a very simple language that 90% of unix administrators can read and write. Replacing them with scripts written in your favourite scripting language, no matter how easy it seems to you personally, is bound to reduce the ease of debugging.

Actually, I am confident that most of the shell scripts can be removed without losing easy debugging with systemd. Here's why:

A quick survey of the init scripts on my system show six main functions (here ordered from most common to least common):
1) Process control (writing/checking PID files, signalling daemons)
2) Setting environment variables and daemon arguments
3) Checking to make sure certain requirements are met (kernel modules, other services, file existence etc)
4) Setting up a working environment (creating special files, setting SELinux contexts and file permissions)
5) Waiting and checking to see if something has completed or is running correctly. handling timeouts etc
6) Saving/loading states on shutdown/startup

Now, the reason that many scripts can be done away with is that much of this can be handled better by systemd.

Process control (1), the most common function of init scripts, is handled by systemd. And using the systemd utilities, instead of having to hack around with PID files, we will be able to see exactly what's running and what's not and manage all of this in a consistent manner.

As for setting environment variables and daemon arguments (2), I think this one needs to be thought about more. I think it should be possible to handle it for 90% of shell scripts, but I will post another comment about that in a bit.

Checking requirements (3) can be handled in most cases by simply setting the correct dependencies for the service. For dependencies on kernel modules, this should be a defined unit in systemd.

Sometimes a script will check requirements such as certain paths for the service data etc, but I would say a lot of this is simply in the init scripts to be distro-independent and that actually many of these checks can simply be removed. That is, if I am using a modern distro with package management, the data files will always be at location X, which is also more or less guaranteed to exist if the package hasn't been messed with (otherwise all bets are off anyway). If people have moved things to non-standard places or removed config files or something, then they should be responsible for making the appropriate changes to the service definitions.

Many checks are a little unnecessary anyway, in the sense that if something is wrong, the service should die gracefully and give the appropriate error messages, instead of duplicating these checks in the init script. Where checking config file syntax or file permissions may be useful is where you want to restart/reload a service; it's better to get a message that you made a typo in the config file than for the service to shutdown and then fail to start up again. So perhaps this case can be handled by having a PreReload/PreRestart parameter in a service definition for running a program/script to check things in this case.

Setting up working environments (4) should really be handled by either the service itself or by post-install scripts of the distro's package. The remainder of cases can still be handled by scripts. Setting the right file permissions and security also falls into the category of should-be-managed-properly-by-distro.

Waiting for things and handling timeouts (5) should ideally be handled by systemd. If configured to, it should try to restart a service that dies, possibly retrying a few times before giving up and putting the service in a maintenance state (like in Solaris).

Saving and loading states (6) should be handled by the service itself.

The scripts that are the real culprits for being inefficient are the ones that don't actually manage daemons but instead set up whole subsystems such as networking and file systems, with all the hacky config file grepping. It is nice to see that at least file systems will be handled natively by systemd. Maybe networks could also be handled, or perhaps systemd can be integrated with networkmanager or whatever.

Posted by codebeard at Sat May 1 06:18:50 2010
As for the exact way of debugging things with systemd, I don't know how it currently works, but I assume that the following will be possible:
- A log of services that started, commands that were run, etc and what chain or dependencies, events or other relationships caused them to be started
- Trace what happened to the sockets that systemd made for a service (did the service ever take control of it, etc)
- Force the serialisation of starting some or all of the services for tracking down race conditions
- Set an arbitrary script or command to run before or after a certain service or every service. This should satisfy those people wanting to be able to hack shell scripts.
- It would be completely awesome if you could "connect to" a service which hasn't forked itself into the background, with the ability to read the program's stdout and stderr in real-time as well as possibly interact with the service through its stdin. It would be even more awesome if you could set an option in the service definition which would start the service with its own pts so that you could connect to it and interact with it using a screen-like program -- some services give you a nice debug console when you run them in a tty so this would be great.

Posted by codebeard at Sat May 1 06:20:33 2010
I think it is important that systemd have some understanding of providing things in a timely fashion. For example, if it sets up an AF_INET socket unit but the service never manages to start properly (for example, if it hangs somewhere during startup), then eventually the buffers for a UDP socket are going to fill up and incoming connections will time out for a TCP socket. On slow systems this may actually mean that trying to start every service at once will lead to intermittent failures with services trying to connect to another daemon but timing out.

For example, let's say I have a web application (in apache) that needs to connect to a mysql socket (AF_UNIX). So, systemd creates the AF_INET socket for apache, and the AF_UNIX socket for mysql, then starts both services simultaneously. Let's say my database is pretty hefty, and mysql takes 40 seconds to get everything started (keep in mind that maybe another 10 or 20 services are also trying to start at the same time, so this isn't unreasonable). In the meantime, apache has only taken 3 seconds to start, and it takes control of the AF_INET socket that systemd made for it and users are now able to connect. However, a user that connects to it just after this will get a messed up webpage with errors about a MySQL timeout since the timeout was set in PHP to 30 seconds.

Can anything be done to avoid these issues?

Posted by anonymous at Sat May 1 06:27:48 2010
Seconding lua.

It's the ideal scripting language for booting utils:

1. Fast and portable. Less startup times than shell with more functionality, with less worries about bashisms/kshisms/sticking to POSIX.

2. Easily augmented via it's C API... more close to the metal than Python and Vala.

It'd be great to run this from a stripped down initrd with only lua, glibc+libudev+etc. and a fallback dash shell.

At the very least it should be seriously considered as the config language, instead of plain text freedesktopish files that aren't as easily augmented.

Posted by codebeard at Sat May 1 06:36:14 2010
Okay, to add one final comment for now, I wanted to ask this:
Instead of rewriting daemons to use some extra file descriptors given to it, wouldn't it be possible to create the socket and then transparently hand it over to the daemon when the daemon tries to create it? It may require a kernel patch, but wouldn't that be a lot more elegant? Even legacy or closed source daemons (or open source daemons with uncooperative developers) could be made to use a socket from systemd this way.

For example, let's say that modifying service foo to use a socket from systemd is not practical. So, systemd sees that the kernel supports this feature, and creates a socket /var/run/foo/foo.sock before starting the foo service and informs the kernel about it. As the foo service initialises, it makes the syscall to create /var/run/foo/foo.sock, and instead of receiving an "already exists" error, it will transparently be given the socket already made for it by systemd. As far as the foo service is concerned, it had just made the socket, when really it was made by systemd. They all live happily ever after.

Is there some reason that this couldn't work? Surely something like this would greatly reduce the amount of work that needs to be done to get systemd doing useful things on current systems.

Posted by codebeard at Sat May 1 06:52:05 2010
Looks like I missed posting one of the comments I had written earlier. Oops. Really this is my last comment for now.

There needs to be an easy way to set environment variables and daemon arguments that avoids having to run any kind of script, in any language, if possible.

In the current init script system, there is usually a file like /etc/sysconfig/blah for the blah service which is included in the the init script. It will define environment variables for the service and may also be used to set certain parameters on the commmand line for the daemon. It would be great if systemd could understand some of this. Setting these things in the service definition is not really enough, or at least it needs to be possible to override options in a file made for end-user modification. Users should not need to modify the service definitions for routine configuration. In other words, there needs to be a place that users can look in to change options for a program (e.g. which port a program runs on) without having to mess with the service definitions.

To facilitate this, perhaps systemd needs to understand a basic kind of variable, so that it can be used in a service definition.

That is, you might have foo.service:
[Service]
ExecStart=/usr/sbin/foo ${domain!} -n ${connections:10} ${debug?-d:-f} ${extra_args}

Where systemd would consult some /etc/sysdconfig/foo or something and read in any values before parsing the foo.service file.

It might look something like:
# comments blah
domain=example.org
connections=5
extra_args=--cores ${connections}

Above I use some possible syntax for these, such as ! for saying that the domain variable is not optional, the : for giving a default value if it is not set, the ? for treating the variable as a boolean (yes/1/true and no/0/false/unset) and inserting one value or another.

Of course the exact details of this would need to be considered carefully to try and cover a large range of cases (the goal might be to be able to be able to supersede 80% of the init scripts in the default installation of some distro).

Posted by Christoph at Sat May 1 10:26:31 2010
Lennart, once again you have proven to be a genius! This is a pretty long article, but everything is well explained, logical and easy to read.

I am surprised how far systemd has come by now and I think it has great potential. Looking forward to read more about if!

Posted by Peter Lister at Sat May 1 12:36:34 2010
@codebeard

Rewriting the daemons is a good thing!

Too many daemons still have stupid amounts of command line config, or require coddling with bash.

Daemons should just start, find their configs and get on with things without holding everyone else up.  I certainly do NOT want a kernel hack just because software authors can't be bothered to improve...

Posted by Richard Brooks at Sat May 1 21:02:52 2010
Excellent work. I hope the Upstart developers see the superiority of your solution and will help adopting systemd as the new standard PID 1.

Posted by codebeard at Sat May 1 21:15:27 2010
@ Peter Lister

How is having to rewrite daemons a good thing? Most of them are perfectly fine as they are, as well as being written well for portability. Most do not require stupid amounts of command line config or huge bash scripts. If we can have all of the benefits and keep the simplicity of the system, without having to patch every daemon, then systemd can be adopted much more easily.

My proposal to have the kernel copy the already created socket when a daemon bind()'s is really no different from mounting a file system when a program does an open(). So, I wouldn't call it any more of a hack.

Here's what I envision:
systemd:
fork();
sock = socket(); [e.g. 3]
bind(sock, addr);
fcntl(sock, F_INHERIT);
exec();

daemon:
...
sock = socket(); [e.g. 4]
bind(sock, addr);
...


Now, if addr matches the addr from a previously defined socket of the same type and with F_INHERIT, then the kernel copies the appropriate data structures (including any connections already made to the socket) from the socket 3 into the socket 4, and removes socket 3. This process of searching for a matching socket is only done for processes which are marked with having at least one F_INHERIT file descriptor.

Posted by Adam York at Sun May 2 00:50:08 2010
Sounds great.  My only worry is that this will take you away from Pulseaudio development.  My worries justified?

Posted by sam at Sun May 2 01:25:58 2010
To the commenter who suggested systemd was a bad name:

From the wikipedia entry:

System D [in French, Système D] is a shorthand term that refers back to the French word débrouillard[1]. The verb se débrouiller means "to untangle." The basic theory of System D is that it is a manner of responding to challenges that requires one to have the ability to think fast, to adapt, and to improvise when getting the job done.

That sounds just about perfect to be honest "untangling the boot process" - yes please.

Posted by horse at Sun May 2 06:20:48 2010
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan

Posted by Dude at Sun May 2 07:39:37 2010
WOW! Lets get this baby mainstream. Hopefully it will be easy to adopt to any distro.

Posted by Charles at Sun May 2 17:11:24 2010
@codebeard

Such a thing could be accomplished without resorting to a kernel hack, by using an optionally enabled LD_PRELOAD hook instead. Interesting idea...

Posted by David Björkevik at Mon May 3 01:06:26 2010
If the session manager would start using cgroups to kill off all the users' processes on session end, will this not break screen(1)?

Posted by Diego Calleja at Mon May 3 01:50:35 2010
Shell scripts are slow yes, but nobody has been able to prove they are a big bottleneck when booting. Maybe systemd will be so fast that bash will become the bottleneck, who knows. But until then, this "shell scripts are bad" attitude doesn't really have a lot of sense. There are more important things to do than rewritting bash scripts in C, IMHO

Posted by Peter Götz at Mon May 3 02:49:43 2010
Lennart,
this looks quite interesting! I downloaded, built and installed your code on Ubuntu 10.04. It starts systemd as I can see, but I get the following error:

Failed to mount /cgroup/debug: No such file directory.

I'm new to control groups. Any obvious hints for what I'm doing wrong? Thanks in advance!

Posted by codebeard at Mon May 3 03:40:25 2010
@ Charles

Actually, an LD_PRELOAD hook wouldn't work because bind()/etc are system calls, not part of a library.

Posted by deitarion/SSokolow at Mon May 3 04:23:53 2010
@Dude: Careful. Aside from the whole "fix a broken part by wrapping it rather than replacing it" aspect, PulseAudio's biggest problem was distros adopting it before it was mature enough.

Posted by vitaly at Mon May 3 10:14:06 2010
For lean, clean, portable and reliable service initialization, see perp, "the perpetrator":

  http://b0llix.net/perp/

Posted by Sherman T Potter at Mon May 3 13:45:39 2010
I used to think  PID 1 was always init. Things change.  We have new Solaris servers at work. I found out the init process on Solaris virtual zones could be ANY PID number.

Posted by Chirs Carpenter at Mon May 3 15:13:24 2010
Aren't we basically heading toward a microkernel here? We're abstracting it to where all the services are controlled by one central process (kernel?) that watches everything and reboots anything tha crashes, etc. It definitely sounds an awful lot like a microkernel (Not saying this is a bad thing). However, maybe we should start taking another look at GNU hurd?

Posted by owczi at Mon May 3 15:14:35 2010
This is definitely the way forward. You've once helped to sort out the the mess the Linux sound servers were, now it's time to clean up another area - way to go! Of course - as long as the final solution is well balanced (running services vs. on demand services) and we don't get into the situation Windows has been in for years now: you log on and you see your desktop - which gives you the impression that it's ready to work, while it will be loading services for the next minute or so before you can actually use the system. This is GUI I'm talking about, but it does rely on system services that need to be running.

Posted by Tel at Mon May 3 15:45:41 2010
You are perfectly correct about upstart's event driven system being the exact opposite approach to what it should be. One of the big problems with upstart is that if something is wrong with the system (e.g. some important component in the chain is missing) then upstart can't tell you anything useful.

For example, you expect that FOO should be running so you type:

# initctl status FOO

And all it can tell you is that FOO is stopped or waiting. Won't tell you WHY, which is what you really need to know. This gets even worse because some of the events can be given arbitrary names that have nothing to do with the package that provides them, and nothing to do with the program that might be running. As a consequence, if the thing is waiting for one of these type of events, even when you know exactly what event you need, you still don't know how to make that event happen, or which extra package you might need to install in order to make it happen.

Even when the upstart system is working correctly, you still cannot ask for status information about events that have been emitted in the past. That's because events are not actually jobs, you can only ask for status about jobs.

----

Finally, on a completely different issue, if you get a script-heavy system starting all sorts of daemons and you boot that system on VirtualBox a few times, you will see that it boots amazingly fast. To me, this suggests that CPU is not even slightly the bottleneck at the boot process, so you can forget about looking for optimizations in running grep fewer times. I'm 80% sure that the reason VirtualBox can reboot a machine so fast is disk cache, implying that what really matters for fast boots is how many disk files you touch -- doing stuff in parallel is pretty much a waste of time when disk is throttling you.

Shell scripts are very bad (especially nested scripts) for forcing a lot of seeks because the shell just reads a line at a time (presuming your system is not smart enough to actively readahead).

Posted by mario at Mon May 3 18:48:42 2010
Sounds serious. Upstart is a nice start, but not really easy to understand. Dependencies are never visualized, and if it's broken (mostly dbus and udev), then a typical user is lost.

With systemd standardizing a lot more features, this might be less of a problem. But still there seems to be much wiggle room for even more fragility of the Linux boot process. Automatisms are welcome, but not at the price of complexity and transparency. If there is too much self-righteous "intelligence" in any system service and hence superimposed user control restrictions, this is detrimental to usefulness.

So, I look forward to give this a shot once available. However I have this bad feeling it will come at a price as well. (At least if Ubuntu developers package it.)
So give us somewhat more helpful man pages, and proper control to work around builtin tool "intelligence" when necessary. Don't hide more inner system workings in the opaque dbus realm. Don't want to debug boot failures anymore. TNX

Posted by Enrico Tassi at Tue May 4 00:12:06 2010
You should definitively call your software "pid1".

Posted by Anonymous at Tue May 4 16:03:22 2010
This is a sick trend. These developers with the "innovations" not only trying to ruin stuff that works, but also to stole my precious time.

They think I have nothing to do but to learn their heroin trips every year.

There is a stable knowledge, the stable tool. Such a big gain! There are literature, courses, a great amount of materials about this stable tools. And now another moron tries to cancel it all. What he expect from us? A "thank you"?

Posted by alex jumba at Tue May 4 16:59:18 2010
great work @lennart, systemd seems poised to solve some fundamental problems.

Just an observation/comment. By your admission, hardware and software are dynamic and changing constantly during runtime. Does it make sense to add to that list usage patterns?? Given that, does it seem appropriate to allocate resources (e.g. I/O nice) to processes during spawning alone and let them maintain those resource levels during their lifetime, even though usage patterns, just like the hardware and software, change? If the overall goal is to let the machine determine stuff by itself without much manual intervention (e.g. as with what systemd does with dependency graphs), wouldnt it be appropriate for named not only to monitor hardware and daemons as it does currently (ingeniously BTW), but also dynamic resource management (resource = I/O/CPU nice etc)?? This is what I understancd what an INIT 1 system to be, a process manager (which the kernel has "outsourced" to), much like pulseaudio/jackd is for sound and X/kwin for display.

What am I getting at?? There has been discussion lately about interactivity of (recent) linux kernels, stirred mostly by BFS/BFQ. Your explanation about wanting other apps to get init-like powers brought an idea. Since it is these "servers" (pulseaudio/X etc) which know which apps are in focus and thus are actively being used by the user, wouldn't it be great if some of these "powers" e.g. renicing processes, be done by these servers (either directly or through giving advice/hints to systemd). This way, even resource allocation becomes dynamic (e.g. determined by user's usage patterns) and thus solve the problems with latency and such; (I remember windows has a setting for giving priority to foreground/bachground processes). This way, you don't have to always set slocates i/o priority to be lowest, you dont even have to set it AT ALL, and the system will automagically adjust itself for the workload.

Posted by Nazo at Wed May 5 00:24:26 2010
IMHO all major daemons will be in kernel space in future for wishing throughputs, latencies and low powers. I believe userspace daemon are completely replacable by kernelspace daemon.

IIRC kernel can use simd (RAID driver uses it) and floating point (kernel_fpu_begin/end) with a bit taskswitch slow down like userspace application. Also kernel can use userspace memory, preemptable threads and executing userspace application.

IIRC bypassing MMU is about 1.5x faster on x86. Some minor architectures may have problem because it has different operation set between kernel and user space. I don't know this is critical or not. But there are already some daemons like knfsd.

I want to see in-kernel fastest implementations for init, udevd, modprobe, mount, fsck, dbus and... anything!

References:
Unleashing SSL Acceleration and Reverse-Proxying with Kernel SSL (KSSL)
ttp://www.coresecuritypatterns.com/blogs/?p=1389
Kernel  D-Bus
ttp://www.mnementh.co.uk/home/projects/collabora/kdbus
TUX web server
ttp://en.wikipedia.org/wiki/TUX_web_server
[RFC] Unify  KVM kernel-space and user-space code  into a single project
ttp://www.gossamer-threads.com/lists/linux/kernel/1202521
Kernel APIs, Part 1: Invoking user-space applications from the kernel
ttp://www.ibm.com/developerworks/linux/library/l-user-space-apps/index.html
Re: ABI change for device drivers using future AVX instruction set
ttp://kerneltrap.org/mailarchive/linux-kernel/2008/6/28/2285574/thread

Posted by PaulWay at Wed May 5 05:42:22 2010
This looks great!  I really like the idea of deferred start-up of services, combined with an xinetd-style socket holder.

One thing that would be interesting to look into further would be to then shut down services after they go through some period of inactivity, or when pressure from other services for resources goes over a threshold.  If we've got a machine that only gets SSH connections once a week or so, why not shut down SSH after an hour or a day and give that memory to the database or web server.

Obviously this is a different use case from the initial purpose of systemd as you've stated - which is to speed up startup by not starting things until we really need them.  But I see it as an equally valid purpose for systemd, and it already takes care of some of this stuff for suspend and resume.

Alternately, maybe when the system is idle after starting up we could start those services who had been deferred?  If we want to be able to SSH into the machine with little delay, we could actually start up SSH so that it's ready to go.  Then swapping can handle the memory pressure problem above, as it currently does.  We've allowed critical things to start up as fast as possible, but we've still got the current level of responsiveness after the whole process is done.

(Which reminds me unpleasantly of Windows' way of allowing the user to log in while the system processes are still starting up, providing the illusion of quick start-up with the pain of discovering that it's a terrible lie for those users that don't leave their computer to boot for five minutes.  But I think the tactics above are better than that.)

Have fun,

Paul

Posted by Aaron at Wed May 5 08:32:30 2010
Awesome work! I've been envisioning something like this for a long time... ever since I first sat down with Red Hat 6, actually.

I think a project like this is one more step toward the unified Linux Desktop that was supposed to happen so many times in the recent years.

In the coming weeks I plan to get this working on funtoo, and maybe gentoo also. I'll provide updates for anyone else who is interested in trying this as well.

-Aaron

Posted by Tim Waugh at Wed May 5 11:15:37 2010
Sounds really great.  One minor point:

"unless the machine physically is connected to a printer, or an application wants to print something, there is no need to run a printing daemon such as CUPS"

This isn't completely true.  You might well have a CUPS server on the local network providing discoverable queues and PostScript/PDF drivers for a network printer.

But really I suppose this falls into the 'physically connected to a printer' category, so would be adjusted somehow by whoever configures the system?

Posted by Anonymous at Wed May 5 16:07:58 2010
at first I really liked the idea of an inetd-like startup system. but then I got really suspicious, when reading the following points:

- udev-support in pid 1? really? can't you just make a socket with simple text-io for control and introspection?

- different unit types, templates, extended dependency and ordering support: sounds all overly complex to me

- include mounting and setting hostname. so it's not a startup system, it's a startup-mount-hostname system. what else do you want to include? what happened to the unix philosophy? the beauty of shell scripts is, that they are generic. you don't need one central monolithic system to support everything. and if you claim, this would speed up booting, proof it or it's not true. it sounds more like you have a hammer looking for a nail.

- someone already mentioned HURD. that's how do it in a clean and consistent way: whenever a resource is requested, a translator is started to provide it. as linux doesn't support this, everything you can do is create a huge and complex hack. in that case i think i stick with sysv. it has it's weaknesses, but at least it's simple.

Posted by Lennart at Wed May 5 17:53:37 2010
Luiz: I don't see how powertop could be of any use in this system.

Michael: I am not sure such a swap logic really belongs in systemd. There already is an external daemon for this (http://sourceforge.net/projects/swapd/) and I am not convinced that there is reason enough to do that inside the init daemon.

Joshua: yes, we had a closer look on most other init systems, see our comments about them in the text.

John: I think that is a matter of taste. We think it it is nicer to use file name suffixes for this, as then an "ls" can give you a better overview about the units defined.

Grahame: Keeping shell in the boot process just because it can be used for debugging doesn't strike me a good idea. Shell is not a debugging tool, it's a scripting language. We should provide proper debugging tools for the boot process instead. Example: the interactive boot systemd already provides (look for confirm_spawn= in the article above) is a very useful debugging tool since it allows you to single step through the entire boot process.

Posted by Lennart at Wed May 5 22:59:47 2010
PJ: If we move the startup logic that currently exists in the various init scripts into the init daemon or the service daemons themselves, then this will actually remove a lot of the fragility of the boot process completely. Hence I see little need to replace shell by any other language. And even with systemd it is still easy to hook some shell script into one the services being run, should you really need to (Just add ExecStartPre=/foo/bar/waldo.sh to the .service file and this script will be run before the main daemon. You can have as many of those scripts as you wish). So summarizing this: there will be less to debug if we have this in robust C code, we should provide actual init debugging tools instead of just a shell for the purpose of debugging (and we are already doing that), and finally, even with systemd you can still hook in a shell script should you feel the need to.

Posted by Lennart at Wed May 5 23:12:41 2010
nine: systems like Fedora's "readahead" already linearize the disk seeks during startup. That is a problem orthogonal (though certainly not unrelated) to systemd.

Davide: automatic shut downs should only be done when the service itself thinks it is idle, and that is kinda hard to properly deduce from the outside. That said I tend to believe that we should not do work we don't really know is necessary. And that means that we don't do the work of shutting down something unless we have a really good reason for it. And that something is "idle" is usually not a good reason. In the end a correctly written daemon that is not being used should have a minimal impact on the system: it would be swapped out and sleep in a poll(), hence not influence the system measurably. So in summary: when doing stop-on-idle, then the daemons must do that themselves, and in many cases I'd not even bother.

James: indeed SMF is not the only system that does proper dependency management. A few systems do that. However, neither Upstart nor sysvinit do, and that's why I mentioned this.

Paul: yes, the contract stuff is very much like cgroups, however cgroups are in many way nicer, since you can name them in an fs and so on. (But I guess Solaris people disagree with that...)

Robert: yes, we'd need to patch daemons. That is explicitly mentioned in the text (look for Writing Daemons)

Anonymous: recompiling things whenever you change a bit of configuration is slow and cumbersome and requires a lot of dependencies installed. To build a vala program you need glibc, a lot of the gnome stack, gcc and more installed, something you certainly don't want to have around on a small system, just because you want to patch one configuration line. I mean, I like Vala (in fact systemd includes client tools written in Vala), but I don't think it has any place in an init systen, sorry.

Posted by Lennart at Wed May 5 23:19:19 2010
Peter: the plan for system suspend is to create a snapshot, activate the unit "suspend.target" which shuts down some services via "Conflicts" dependencies and then afterwards we activate the saved snapshot again.

And hot-swap hardware should be handled like any other hardware being plugged in our pulled out: .device units are activated and deactivated for them.

Eric: Well, I am sure that everything has some meaning in some language of this world. Also, reading the Wikipedia article I got the idea that the term wasn't negative at all?

Colin: I'll keep that in mind for my next init system ;-)

Anon: see the FAQ section: we welcome every distribution that is interested in this.

Dieter_be: as pointed out above getting rid of shell scripts by no means means loss of debugging capabilities or that we make it impossible to hook in shell scripts when the admin wants to.

Posted by Lennart at Wed May 5 23:25:57 2010
Colin: we'll probably provide similar calls in systemd.

anonymous: I am not convinced that Haskell would be good in the boot process. Also see my recent comments here that we don't need a replacement for the shell in the boot process. Having good debugging tools and most of the code in the daemons themselves or the init system is a much better choice.

sztanpet: The same applies for Lua.

Claes: thanks for the pointer, I'll keep that in mind and investigate that.

Richard: bash is already slow enough, adding even more stuff into it won't make things any better. The whole approach of shell is just slow.  Whatever you do, the focus of shell is always to defer operations to subprocesses spawned off frequently. And that's just wrong. Also see my notes above regarding replacements for shells.

Posted by Lennart at Wed May 5 23:55:11 2010
codebeard: very good ideas and I agree with most of them. A few comments:

1), 2), 4), 5) are already covered by systemd.

The idea regarding exposing kernel modules as units is interesting, I need to think about that a little more. The first thing that comes to my mind though is that I am a bit afraid of creating the illusion we'd know the same dependencies between modules that modprobe itself knows. I am also not really interested in duplicating that dependency tree in any way. But yepp, I need to think about this more.

We have most of the debugging functionality in place already. There are logs, and we store away a lot of information what happened. We also offer a serialized, single-stepping, interactive boot, to track down issues. And you can hook your own shell scripts into services if you want to (see my comments above).

Your ideas regarding that screen-like pty handling would probably mean that we'd have to implement our own virtual terminal (i.e. parsing of VT100 terminal sequences and such). I am not convinced I want to have that in an init system. I think interactive services like that are the wrong approach.

What we however already support is that you can connect a service to an existing tty, such as a virtual console terminal.

Regarding the Apache/MySQL issue: If people want to avoid that they should probably just add a dependency. i.e. instead of having apache.service just depend on mysql.socket, it could be changed to depend on mysql.service, if you understand what i mean. But they should do that only locally, of course.

Regarding your suggestions to fix the kernels so that we don't have to patch the daemons: we actually investigated that in much detail, however this turns out to be really hairy to do. Because at the time of the socket() call in the daemon we don't know that a listening socket is already existing in systemd. We figure that out only at the time of the bind(), and that complicates things considerably, since we'd have two sockets existing by then which would have to become one, in all their properties, i.e. sockopts and suchlike. And that is far from easy. That said, this is certainly something we'd be happy to have in the kernel, even if we don't see that we ourselves will hack that up any time soon.

Regarding your comments about command line parameters read from /etc/sysconfig: I think daemons that rely on cmdline configuration like that are broken, and should probably be fixed to have a proper configuration. That said should it turn out that many daemons work like that we could probably add something similar to what you suggest. We'll have to see.

Posted by Lennart at Thu May 6 00:11:01 2010
anonymous: see my other comments on lua/haskell/vala in the boot process.

Richard: unfortunately they haven't yet... But I hope this too ;-)

Adam: RH pays me primarily for PA, not systemd.

Charles: the right place for that is the kernel. LD_PRELOAD hacks will always be just that ... hacks.

David: Yes, screen is an interesting point. It probably would have to be patched to get its own session which is then treated seperately from the session it was created from.

Peter: You need a newer kernel probably, that enables the "debug" cgroup controller. Building systemd is not easy probably.

deitarion/SSokolow: I disagree with your assessment on PA, see my other recent blog post about that.

Chris: this has nothing to do with a micro kernel. We just pull a few things together that have previously been done at various seperate places, such as init, the init scripts, inetd,  mount(8) or even cron.

Anonymous: well, you are welcome to continue using a systemd-less system if you are this conservative and think this approach is so wrong.

alex: changing process properties like that from the outside at runtime is always racy. If something like this is desirable then it should probably be done in the kernel or in the daemons themselves.

Posted by Lennart at Thu May 6 00:33:54 2010
Nazo: well, I am pretty sure not many people would agree with your thoughts.

Paul: as mentioned above I don't think that shutting down sshd in the case you describe really is advisable. We should minimize the work we do, and that includes not shutting down anything we don't have to shut down. A properly written daemon that is swapped out and otherwise just hangs in poll() is not measurable in the system otherwise, and certainly doesn't take away much RAM from other processes. (and sshd is a properly written daemon like this)

And regarding your suggestions about delaying some daemon startups until the CPU is idle: that would basically mean that we'd add another CPU scheduler on top of the kernel scheduler, which I don't think is a wise idea. Hence: what you want to do we should do with the existing kernel CPU scheduler: by using nice levels and scheduling modes like SCHED_IDLE/SCHED_BATCH we can tell the kernel that some job should be delayed as long as there is something to do. It might make sense to utilize that to priorize things when we start things in parallel. We'll have to investigate that further.

Tim: I know that at least for the mDNS case if we browse for a printer the replies should be available in less than a second. Also, I believe the gnome printing dialog has a live view on the printers found, right? If that is the case and all other browsing protocols are as quick as mDNS then it should be OK to start cups only when the printing dialog is opened: it might show no printer in the beginning, but after a second it should be populated fully. I'd argue that this user experience would be acceptable to the user, if he even would notice at all.

Anonymous: hmm? udev uses a very simple protocol that is mostly text-based. Or did you mean "dbus" when you typed "udev"? Well, I don't buy into dbus hatred. D-Bus is just an IPC, it is much better using a well-known, well-analyzed, standardized and introspectable IPC like D-Bus then have each and every single service come up with its own homegrown IPC. Also, Upstart relies on D-Bus the same way systemd does.

And systemd is called "systemd". We want to manage the system, that's why we called it that way. And setting the hostname and mounting file systems is a core part of the system, and hence we integrate it into systemd.

I don't buy into Unix philosophy. Unix is broken. It might be one of the better system designs of all those existing, but that doesn't mean it wasn't broken too. We need to fix it and improve it where this is necessary. Strict Unix traditions or POSIX compliance hold us back, and are conservatism where progress is needed. Unix can inspire, but it is unsuitable as a dogma for system design 30 years after its inception.

Posted by Anonymous at Thu May 6 03:19:04 2010
Lennart: yes I meant d-bus and i don't use upstart. and yes, unix is broken, but the philosophy is right: make the tools simple and use plain text. whenever something adheres to this, it is a pleasure to work with. it is sometimes amazing how you can use these tools for things that no one has thought about, when they were created. and they still allow you to do something in emergency situations when everything else fails. and this is still true after 30 years and will still be true in the next 30 years to come. as soon as things like d-bus, or xml for that matter, come into play, it becomes a real PITA. I could give many example from my own experience, but then that post would become very long.

Posted by Lennart at Thu May 6 03:33:28 2010
Anonymous: Good for you that you don't use Upstart. However, all distros have now switched. All big distros now use D-Bus from the beginning of the boot process on. And introducing systemd does not change that fact in any way.

Anyway, I don't believe in the Unix philosophy. Sorry for that. The discussion about Unix philosophy is mostly off-topic however and hence should not be continued here.

Posted by Tim Waugh at Thu May 6 10:58:30 2010
Re: CUPS

What I'm trying to say is that there may not be a local client.  cupsd is a network service as well as serving local clients, and so its socket may never be connected to.  Network clients are other cupsd instances (which yes, systemd may start when the user sees the Print dialog), which will just wait to hear UDP browse packets from the cupsd running on the server.  These packets are only sent once every minute or so.

I really like the system, I am just struggling to see exactly how cupsd fits it and can benefit.

Here is the system I'm worrying about:

PrinterA }
PrinterB }- server (running cupsd)
PrinterC }

cupsd on this machine has been configured to know about these three network printers, and has been told to advertised them on the local network.  This is a common situation because network printers by themselves are not always easily or consistently discoverable across the whole group.  Some may support mDNS, some may only support SNMP, etc.

On this server machine, no-one ever logs in.  When someone wants to print, they do so on their own client machine:

clientA }
clientB }- server
clientC }

All of the machines above are running cupsd.  The client cupsd instances discover the queues advertised by the server cupsd instance by listening out for UDP browse packets, which it sends periodically, about once a minute. (Yes, ideally this would be mDNS, but right now it isn't.)

So now imagine they all switch to using systemd, with no other changes.  Someone on clientA is looking at the File->Print dialog, meaning GTK+ has just connected to the local cups UNIX domain socket and started the client cupsd instance.  That will sit there waiting to hear about any network CUPS queues that are being advertised.  But nothing will start the cupsd instance on the server.  CUPS queue discovery is passive.

Even if the user in charge configured systemd to always start cupsd on the server (can that be done?), the clients will still have to wait up to a minute the first time they ever use the print dialog.

Of course, CUPS caches information about network CUPS queues so that it doesn't have to wait at all after starting if has it seen those UDP browse packets before, so subsequent File->Print dialogs won't see the same delay.

So it comes down to:

1. Can systemd be configured to always start a particular service for which it cannot know whether there are clients, such as cupsd when used in this way?

2. Even better, can it be configured to automatically discover whether a particular service needs to be "force started" like this?  For example, I can imagine a small program to read the CUPS configuration file and see if it is configured this way, and tell systemd to act accordingly.

3. As things currently stand, there will be up to a minute's delay on each client the first time they use the Print dialog.  This will only be gone once CUPS switches over to using mDNS as its primary discovery/advertisement mechanism (which is planned).

Posted by codebeard at Thu May 6 16:37:27 2010
@ Lennart

Thanks for taking the time to reply to everyone! It looks like the correspondence generated by your blog post has been considerable.

Regarding patching the kernel to copy the socket on bind(), you say that it is really hairy to do all the copying and stuff, but perhaps I am missing something. Correct me if I'm wrong, but doesn't the kernel have this functionality already, in the form of dup2()?

Here's a small test:
parent.c
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>

int main(void) {
  int sock;
  struct sockaddr_un addr = {AF_UNIX, "./socket"};
  unlink("./socket");
  sock = socket(AF_UNIX, SOCK_STREAM, 0);
  bind(sock, (struct sockaddr *) &addr, sizeof(addr));
  /* pretend that we just did something like:
  * fcntl(sock, F_SETINHERIT, 1);
  */
  listen(sock, 10);
  execl("./child", "child", NULL);
}

child.c
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>

int main(void) {
  int sock;
  struct sockaddr_un addr;
  int i = sizeof(addr);
  sleep(10); /* simluate startup time */
  sock = socket(AF_UNIX, SOCK_STREAM, 0);
  /* pretend that we just did:
  * bind(sock, &addr, sizeof(addr));
  * and that the kernel checked the addr structure for
  * matches with F_SETINHERIT fds, then basically just
  * returned the following ( but with F_SETINHERIT off,
  * so that future calls to bind() will know they don't
  * have to search for things ):
  */
  dup2(sock-1, sock);
  listen(sock, 20);
  accept(sock, (struct sockaddr *) &addr, &i);
}

When I compiled and tested the above code with a sample client, it worked perfectly. dup2() must already do all the necessary locking and stuff that needs to be done to copy the file descriptor, so it's really easy. The only side-effect is that we waste a file descriptor (two fds will end up referring to the same socket), but that's really a minor issue and could probably be fixed if anyone cared.

Posted by antrik at Thu May 6 19:15:32 2010
There are some good ideas in here, and it's definitely a step in the right direction. (Especially compared to upstart.) The most important ones in particular:

- Creating sockets before launching the daemons: sounds like a very nice and useful idea :-)

- Starting services on demand: very important property missing from most other init systems. Very much like passive translators in the Hurd :-)

- Using cgroups for managing resources allocations etc. in a hierachical manner: again, a very good approach -- very similar to the one proposed by Neal Walfield (of Hurd fame) in his Viengoos papers, see http://walfield.org

- The observation that init and session management are closely related is good too -- though it's only mentioned as an afterthought... I believe the whole init system should be built on the idea that it's really just a case of hierarchical session management.

On the negative side, there is a major contradiction between "we don't want portability" and "we'd like all major distributions to adapt it": Debian also has Hurd and FreeBSD ports -- so without portability, it's pretty much out of the question there...

I'm not saying it's bad to use system-specific features, if they really help -- on the contrary, I believe this should be done more often. (The Hurd's unique features for example are pointless, if nobody ever uses them...) However, I don't see why you wouldn't want to accept alternative implementations of various functions upstream.

Of course other systems ideally would use solutions tailored to their specific functionality (I already mentioned passive translators in the Hurd) -- but often the resources for reinventing everything are simply not there; and thus adapting existing solutions can be important. (Also, reducing transition costs.)

The ability to mix and match various components is one of the major strenghts of the free software world IMHO: it faciliates competition and innovation. It is what allows the best solutions in any particular area to rise to the top. Tieing solutions to particular environments prevents this.

There are some other good ideas and caveats, which I will skip here, as it would be too much. (I should blog about this, but I don't think I'll get around to it any time soon... :-( )

The real showstopper however is, "I don't buy into Unix philosophy." Ouch. Just ouch.

(Well, obviously the showstopper not being the fact that you mentioned it in a followup comment -- but rather the fact that it shows in various places in the article, and your comment confirming that this is by design...)

It's a pity to see something implementing so many good ideas, disqualified in such a manner :-(

Posted by Lennart at Thu May 6 21:02:29 2010
Tim: yes, for cases like that it is possible to start CUPS regardless whether a local client or local hardware actually are around. It's a matter of simply adding a symlink to the .service file to some directory (instead of or in addition to the symlink to the .socket file). We probably should decide later on whether CUPS really is a candidate for on-demand loading like originally pointed out, or whether we can leave it to the user to fix the link, or whether we can teach CUPS itself to create that symlink.

Posted by Lennart at Thu May 6 21:09:17 2010
Codebeard: Well, after the bind() we'd have to return to the application a socket that is the merged version of both our systemd socket AND the socket the daemon created itself. We need to have the queued connections from the systemd socket, but all the various sockopts/fd flags/SIGIO handling/yadda yadda and so on that might have been set between the socket() and the bind() in the daemon itself. That basically means we need some non-trivial code in the kernel that can merge the fd and copy all settings over; it's more than just a simple dup(). I do believe that having something like this in the kernel would be great, but it's nothing we can hack in a couple of hours, unfortunately.

Posted by Lennart at Thu May 6 21:20:47 2010
antrik: if some distros care about portability to non-Linux systems then they can deal with the problems that creates, I see no reason to make that my problem. If we cannot make use of the unique features Linux provides we cannot do much what we are doing now in systemd.  One example: cgroups is at the heart of what we do. If we want to provide compatibility with other systems we would not be able to use cgroups. And that would be a big loss. Also, if you try to keep compatibility with other systems, you need to abstract the system-specific behaviour. And that adds code you need to maintain. And before you can add support for some OS specific feature you always have to abstract it. It costs a lot of time. One can certainly do that for normal applications easily, since they use only very few OS-dependent functionality. However, that is different for something as low-level and fundamental as the init system.

And I guess we have to agree to disagree on our belief in the holy grail that is Unix philosophy. If you reject everything coming from folks who didn't drink the Unix cool-aid, then I guess I am sorry for you.

Posted by Claes at Thu May 6 21:55:43 2010
Regarding iCalendar semantics as I mentioned above, I think not so much of the file format and the various mostly human based "event types" it discusses. I think of the way it defines scheduling in time, especially recurrence.

If systemd "understood" recurrence the same way as calendar apps do, it would theoretically be possible to plan, schedule and visualize events with existing calendaring applications.

cron applies a different system for recurrence and I can't say which is better or worse, but recurrence rules can be confusing and difficult to define. There are probably more tools that uses iCalendar principles regarding this. A good design could implement both.

Posted by Walther at Fri May 7 10:45:16 2010
You started out talking about socket/dbus-activation a lot but later you talk a lot about explicit dependencies in configuration files. Do all dependencies have to be defined explicitly? Or is the intent to use mainly socket/dbus-activation and config files for the rest?

It would be really cool if systemd would detect dependencies on the first boot and would use them to start services in parallel before they are needed on the consecutive boots. (Maybe this is exactly what you are doing but I didn't get that :)

For instance: systemd starts gdm. gdm starts X through socket activation. After X has started, gdm starts LDAP through socket-activation. Which means that LDAP is started after X has completed (which is not optimal). systemd logs the activations and so on the next boot systemd starts gdm, X and LDAP in parallel before they are activated.

Posted by Lennart at Fri May 7 20:21:37 2010
Walther: yes we thought about something like that, and would be relatively easy to do that. We'll play around with that and add it if it really turns out to have a positive effect on boot time.

Posted by Luca Bruno at Sun May 9 12:23:29 2010
Not writing scripts in vala (would be overkill with a compiler), but what about systemd itself in Vala? It's not as you said needing lot of gnome stack, you can use it with Posix profile (i.e. no glib).
It has a great support for dbus servers, except you need dbus-glib there.
Btw good work.

Posted by Lennart at Sun May 9 14:53:40 2010
Luca: Vala is not OOM-safe (because GLib isn't). However the init daemon is one of the few pieces of userspace code that should be able to deal with OOM.

Posted by Luca Bruno at Sun May 9 23:26:35 2010
@Lennart: as I said, you can use Vala without glib

Posted by Luca Bruno at Sun May 9 23:32:51 2010
@Lennart also, now that I remember, in Glib you can change the vtable of memory setting your own allocation functions, including malloc: http://library.gnome.org/devel/glib/unstable/glib-Memory-Allocation.html#GMemVTable

Posted by Lennart at Mon May 10 00:00:35 2010
Luca: No, the code Vala generates uses GLib and GObject heavily. In fact, the Vala object model is the GObject object model. Vala is unable to generate code without GLib and there is really no reason for supporting Glib-less binaries for them.

GLib code assumes that malloc() aborts on OOM. You cannot just sneak in a non-aborting malloc() and assume all the right OOM code paths magically appear, because they don't.

Posted by Luca Bruno at Mon May 10 00:13:21 2010
Lennart, Vala 0.8.1 (and since many other releases before that), is able to emit code without using glib with --profile posix.

What does it mean that glib code assumes that malloc() aborts? Glib code uses g_malloc, which calls a vtable.malloc() and does no assumptions on that. So if you create a malloc() function that does not abort, yes, it works.

Posted by alteclanding at Mon May 10 16:24:07 2010
Why do people in the open source world keep reinventing the wheel is something I'd never understand. There's fefe's minit, it works great and I have absolutely no idea why no one uses it.

Posted by Lennart at Tue May 11 01:05:19 2010
alteclanding: Why do commenters in the open source world keep posting comments even though they obviously haven't read the story or even understood it is something I'd never understand. There's alteclanding's comments, they are nonsense and I have absolutely no idea why he's posting them nonetheless.

Luca: interesting, didn't know that. What object model are they using when compiling without gobject?

Simply making malloc() non-abortive doesn't change the fact that nothing that internally calls g_malloc() in glib actually checks for it to return NULL. An example: http://git.gnome.org/browse/glib/tree/glib/glist.c#n283 -- That's one of the most basic data structure operations in GLib, and what you can see there is that memory is allocated and that is assumed to succeed. Right after allocating the data structure is accessed. Would a malloc() implementation return NULL sometimes there this access would immediately cause segfault. And that is why glib is inherently not OOM safe, and it is completely
irrelevant what allocator you plug in there: the OOM handling codepaths are simply not existing. And that is actually a good thing, as I have pointed out here: http://0pointer.de/blog/projects/on-oom.html

Posted by Luca Bruno at Tue May 11 10:50:57 2010
Lennart: it simply create structs without using gobject, of course there's no reference counting... it will free an owned object as soon as it's not used. Methods and variables are glibish, i.e. my_struct_method (MyStruct* s);
It doesn't support inheritance.

For the OOM thing I've got what you mean, I thought you could have done recovery inside the custom malloc itself, but still abort if recovery fails. Clearly it's not your case reading the code.

Posted by Tobu at Mon May 17 15:42:17 2010
Here is some interesting feedback:

http://etbe.coker.com.au/2010/05/16/systemd-init/
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=580814
http://groups.google.com/a/chromium.org/group/chromium-os-dev/browse_thread/thread/d146c73e42fc0e7b

Posted by Martin Sivak at Wed May 26 12:41:56 2010
Sorry Lennard but I can't disagree more with your stance towards unix philosophy.

Having multiple tools (bash is ineffective, but that doesn't mean we have to merge half of the base system into one daemon) where each does only one task and does it right has lots of benefits.

I like the reporting and control stuff in the idea of systemd when it comes to replacing init.

But I will stop right there. Having "xinetd like" configuration of system stuff.. why not. Pid files are piece of crap I agree with that. But why on earth are you trying to replace autofs, cron, mounting, xinetd?

Especially when you are reimplementing the "verified" functionality inetd and xinetd have had for ages now.. (the same applies for cron, autofs, ...)

What we should do is to improve and enhance these tools not write one new monolitic piece of code, which will be hard to maintain, hard to review hard to verify and hard to analyze from security stand point.

I would agree that starting hundreds of shell scripts is not perfect, but your solution is the opposite extreme. Starting couple of main daemons instead of the shell scripts won't affect the performance and it will still conform to the unix philosophy.

What is wrong on the following process structure?

init process (starting the main daemon set, taking care of respawning them and setting the inital environment)
|- improved xinetd for daemons
|- autofs daemon for automounting
|- cron enhanced of proposed reporting
|- udevd taking care of module loading and device dependent spawning

They can all even use some kind of common library to simplify the common tasks... if you want extreme solution, improve xinetd with dependency stuff and make it pid1 process.

You know, there are some of us who use linux on server machines. And we want to be sure that the machine is secure and that we can disable any particular piece.. kind of hard to do when we suddenly have only one piece which does everything. Especially when you find a security bug in that one monolitic piece of code..

Posted by Lennart at Mon May 31 21:35:28 2010
Martin: I am not trying to replace the existing automount daemon. It does a lot of stuff that systemd doesn't do and will never do (i.e. read automount maps from NIS or LDAP!).

And I tried to explain why I want some automount/mount/inetd functionality in systemd. If you cannot see that, then please read the blog story again.

And arguing against reimplementation of "verified" functionality means you eventually come to a complete standstill of development.

Also, I think you are overestimating the complexity of inetd and cron a little.

Also, by no means we want to get rid of udevd. Upstart has weird plans like that. Not us.

Posted by Eero Tamminen at Tue Jun 1 21:20:28 2010
@sjansen Even if the IO side would be handled by the kernel, one still has the problem that the processes generally spend quite a lot of CPU at their startup instead of idling and this means causes kernel to do a lot of scheduling which adds overhead.

I think it would be better to interleave the startup a bit instead of starting potentially hundred(s) of processes at once.  (after profiling the impact of course, preferably also on a single core netbook system)


cgroups: I have some doubts about putting every started process into a separate cgroup group.  That's fine as default, but it should be possible to put multiple processes to a same group so that their resources are handled as a unit.  Otherwise one can more easily run into issues on resource restricted systems due to resource waste when one has set e.g. separate memory usage limits on the groups.


setsid: If one gets rid of setsid(), how one can then make sure that started processes can safely kill their whole process group (to get rid of all started children, engine etc processes) without killing the parent (like systemd...)?

Posted by Eero Tamminen at Tue Jun 1 21:22:08 2010
D-BUS: The reasons why I personally "hate" dbus isn't its API, but dbus daemon implementation and usage.  Programs do all kinds of idiotic things through it; sending data on the whole session bus instead of just control information, send data in XML, subscribe to too many device status messages so that you get client "herd" wakeups etc.

And the daemon implementation is pretty awful.  D-BUS buffers messages without a limit instead of blocking message spammers until the sent messages are consumed.  What makes it worse is that D-BUS memory handling is at the same time incredibly inefficient at releasing the memory it has allocated (it fragments it) and too complicated to make much sense of it from Valgrind reports.

Posted by Lennart at Tue Jun 1 22:26:54 2010
Eero: you are overestimating the price of switching tasks today...

regarding cgroups: we now put each service into its own cgroup in a private systemd-specific hierarchy (/cgroup/systemd). With a very simple config option you can optionally add the process to arbitrary other groups in other hierarchies. So what you ask for is already covered.

regarding setsid: the point of what i wrote is that systemd calls setsid() for you anyway, so you don't have to anymore, and your call will fail with EPERM if you do call it nonetheless.

And uh, your dbus accusations are bogus.

Posted by Joe Nall at Wed Jun 2 22:11:42 2010
What is the plan for managing socket and process selinux contexts?

Posted by Lennart at Wed Jun 2 22:45:26 2010
Joe, systemd is not the first daemon managing sockets and processes. Which means we'll do it the same way as it has done previously for xinetd and other babysitters...

Posted by Anon at Thu Jun 3 21:56:44 2010
OK, I've read bits and pieces from all over the place about systemd so I apologise if you've answered these questions over and over.

1. systemd replaces the need for portreserve simply by design in a rather more robust fashion.
2. systemd can support dependencies but where possible dependencies should be avoided.
3. systemd has a (sysv?) mode where it starts all jobs at sysv levels? This can be used on servers or very conservative environments. It is presumably not possible to mix "implicit" mode with sysv mode?
4. Virtual dependencies are to be strenuously avoided. There will not be support on waiting on ntpdate/forced system time (this is considered to be a non-problem). There will not be support for waiting on all normal jobs finished/GUI idle after boot/start cupsd now.
5. udev events can be turned into dependencies. bluetoothd depends on the kernel having sent a bluetooth udev event at somepoint in the past? What about when the dongle is removed?
6. The "screen killed on GUI logout" is an unrealistic problem or will be manually solved by modifying screen?
7. Circular dependencies (A waits on B waits on A)  are non-problem or would be a problem anyway.

I'm curious as to when things like Xorg are started - do things like gdm enough sockets so it basically handled implicitly?

Posted by Lennart at Fri Jun 4 02:21:29 2010
Anon: 1, 2, 7 are not really questions but yes, you are right on those.

Regarding 2: for normal daemons dependencies are not really necessary. For stuff involved in early boot or late shutdown they are more likely to be needed though. The result of that is that OS vendors are probably the only ones having to deal with deps in the systemd scheme, and packagers and 3rd party vendors won't.

On 3: There is no separate mode for SysV scripts. We simply consider the SysV dirs an additional configuration source. You can mix SysV services with native services as you wish, and distros are expected to do just that during their transition period from sysvinit/upstart to systemd.

On 4: you can use dependencies if you want. We don't suggest you to use them for normal services though. But there's nothing that would stop you from ignoring us.

On 5: systemd won't ever shoot down daemons due to idleness, simply because it is very hard to figure out what "idleness" means from the outside of a daemon. We also believe that we should minimize the work done, and hence think that a correctly written daemon that is nominally running but effectively just swapped out and hanging in a poll() is nicer then constantly stopping and restarting services.

On 6: It's an option under the control of the administrator, whether he wants to allow stuff like screen to work, or not. In a university workstation environment he might choose to kill all the user's processes if the user logs out. On private systems he might want to allow that. We support both schemes, and leave it to the admin to choose. The default will be to allow screen however.

Posted by Lennart at Fri Jun 4 02:22:46 2010
Anon: X11 is actually difficult, since it's port numbers are dynamic. That sad MacOS actually starts its X server as soon as a connection is done on port 6000. We could do the same scheme.

Posted by Will at Sat Jun 26 20:03:20 2010
This looks like it would obviate the need for in-house proprietary unix job management daemons like AOL's venerable "samon".  Also, I like the idea of having a uniform method for stopping and restarting services. PID 1 is the perfect place to put this effort.  Thank you.

Posted by pada at Sun Jun 27 03:59:30 2010
In order to calculate the dependencies of kernel modules, I'd suggest to make use of modprobe's intelligence by executing
modprobe --list
modprobe --show-depends <module_name>
and use the output as an additional configuration source, as systemd already does with LSB headers from init scripts.

That way, systemd won't need to know about any modprobe configuration files, but will be able to figure out the right moment to load a kernel module and whether a module needs to be loaded at all.

One problem I see here is the time required to execute modprobe. Module dependency information should be cached and not determined on every single boot, but only on "depmod -a" events.

A different approach would be to use /lib/modules/`uname -r`/modules.* directly as an additional configuration source, but then systemd would be required to parse these files. Is there some standard for the syntax of these files?

Posted by Mark J at Wed Aug 4 08:07:42 2010
The majority of the details of this are a few college courses over my head.  But listening to your explanation of it on the Linux Outlaws podcast it was fairly easy to understand and generally sounded like an awesome idea.  So I just wanted to applaud your hard work!

Posted by bochecha at Mon Aug 23 11:20:20 2010
Thanks, this serie of article will no doubt be very interesting. :)

About this one, I don't really get the LOAD, ACTIVE and SUB columns.

As I understood it, the first one indicates whether a unit configuration was loaded or not into systemd. But if it wasn't loaded, then it would not appear in the output of systemctl, right?

You say that ACTIVE is a high-level generalization of SUB. In this case, why is that necessary? Isn't SUB already enough information?

Maybe if you could give the list of the possible values for each columns then that would help me understand the differences. :)

Or maybe just point to the appropriate documentation if that is all already documented somewhere, I must admit I haven't had the time yet to look at Systemd as closely as I wanted.

Posted by Lennart at Mon Aug 23 11:35:34 2010
bochecha: well, there are many reasons why a service might show up as failed to load in the systemctl output: for example, it was referenced as required dependency of another service, but we couldn't find neither a native service definition file nor a SysV init script for it. Or, there was a parsing failure while reading it. Or, because the file was incomplete. And that might even happen while a service is active, for example, because the user requested a configuration file reload from systemd after changing a service file, and a service that is already  running suddenly has an invalid configuration file. That effectively means that the LOAD and the ACTIVE state are mostly orthogonal: you may have a running service where configuration loaded fine, you may have a stopped service where it loaded fine, but you may also have a running service where configuration failed to load.

And yes, ACTIVE and SUB show you the same information, though ACTIVE in a more generalized form. While SUB has states that are specific to each unit type (e.g. "running", "exited", "dead" for services; "plugged" and "dead" for devices; or "mounted" and "dead" for mount points), ACTIVE exposes the same high-level states for all units.

We only distuingish 6 ACTIVE states (to list them: active, reloading, inactive, maintenance, activating, deactivating), which are mapped from the lower-level states, which might be many more. For example services have 15 low-level states: dead, start-pre, start, start-post, running, exited, reload, stop, stop-sigterm, stop-sigkill, stop-post, final-sigterm, final-sigkill, maintenance, auto-restart.

Posted by John Drinkwater at Mon Aug 23 12:23:36 2010
Why âsystemctl status ntpd.serviceâ and not âsystemctl status ntpdâ?
Why does systemctl display names like âgetty@tty2.serviceâ and not as âgetty@tty2â ?

Do we really need to have .mount, .service, etc on all our config files now?
IMO, horrible to have file extensions, equally to have them as long as the file name.

Posted by Lennart at Mon Aug 23 13:36:52 2010
John, we support different kinds of units. We manage sockets, mount points, services, devices, automount points, timers, paths, targets, swap files/devices and snapshots with the same tools, with the same commands. For example "dbus.service" and "dbus.socket" are both used by the D-Bus system, but can be controlled and introspected independently. To distuingish them, we hence write their full name everywhere, so that you explicitly state that you mean the D-Bus socket instead of the D-Bus service, or vice versa.

Also, I actually find this one of the pretty things in this design: the unit names are actually identical to the file names they are configured in.

Posted by Shane Falco at Mon Aug 23 14:19:27 2010
I'm with Mr. Drinkwater on this.  Extensions (especially long extensions) are one symptom of a bad design.  All this feels very rushed and hacked together.

It looks like this core systemctl function won't display cleanly in a standard 80 character wide terminal?  Are we trying to change linux so much that we no longer care about those sorts of things?  It may be different for gnome developers, but unix admins I know have lots of windows open and usually they're 80 characters wide.

Finally, why choose a name so close to another common utility?  systemctl?  Seriously?  When another core system utility called sysctl already exists?

Posted by Lennart at Mon Aug 23 14:26:44 2010
Shane, I am sorry but I guess we just have to agree to disagree to this. The points you raise are in the category "matter of taste" or even "bike shedding", and so I guess we should leave it as that.

systemctl shortens the output dependening the terminal size. If you use a tiny terminal, the description string might even be suppressed entirely. The bigger your terminal/screen is, the more output we can stick on it. That should not surprise anybody. Or to put it in other words: we support 80ch terminals just fine, but if you use bigger termiansl we'll make use of it.

Posted by Shane Falco at Mon Aug 23 14:49:26 2010
Sounds reasonable and I appreciate the response.  It looks like you are taking your own personal experience (which is all anyone can ask) and creating something that you think is appropriate.  But I fear that you don't really see the bigger picture of unix admins out there...there are a lot of guys I work with who are junior/middle guys who just work for a paycheck.  They're not linux geeks.  I dare say they're the majority.  They could be doing AIX or Solaris or linux for all they care.  I think they're going to have trouble with systemd.  It just does too much and it's too baroque.  Too confusing.

I finally, finally got them going with services/chkconfig and now this...

Posted by Michael at Mon Aug 23 15:00:08 2010
Just a quick question, can the description be translated ?
I assume that this is not planned, as they are config file, not software, but as we are able to translate .desktop, it would be great to have some way of doing it cleanly.

Posted by Patryk "patrys" Zawadzki at Mon Aug 23 15:07:40 2010
Any idea on when the systemd dependencies get released? Currently it requires unreleased stuff such as dbus-1.3.2.

Posted by Lennart at Mon Aug 23 15:10:54 2010
Shane, well, what makes you think that we haven't looked around ourselves? Also, we managed to get systemd accepted by Fedora, in particular FESCO. We managed to convince this technical committee that systemd is a good thing. Do you really want to say that Fedora as a whole is incapable of "seeing the big picture", but you are the only one who is? Maybe things are the other way round? Ever thought about that?

Also, note that systemd actually brings Linux administration much closer to how many of these things are done on Solaris. Much of what we added is inspired by SMF, and other init systems. That means the administrators should enjoy how we make things on Linux work much more like the other big server operating systems.

Posted by Lennart at Mon Aug 23 15:13:46 2010
Michael: it currently isn't translated, but the plan is to copy very closely the mechanism how .desktop files are translated (our unit definition files also use an .ini inspired format), so that we can reuse existing tools for this. This hasn't been implemented yet however.

Posted by Lennart at Mon Aug 23 15:20:51 2010
Patryk: I plan to roll D-Bus 1.4.0 by the end of this week. However I also plan to add a dependency on very new kernels to systemd, to make sure we can move the cgroup fs mount point to /sys. This means you have to either run an unreleased kernel or backport one patch to your older kernels, as we did in Fedora. So, basically by the end of this week the dependency on one unreleased package will go away, but we'll add another one instead. Sorry for that, but I don't think it would be wise to support the old cgroupfs mount point for longer, to make sure users don't get confused by that unnecessarily.

Posted by Paul Wise at Mon Aug 23 15:40:41 2010
Its a shame you missed the LCA2011 CFP deadline, I would have liked to attend a talk on systemd:

http://lca2011.linux.org.au/

Perhaps the organisers would consider a late submission.

Posted by lirqa at Mon Aug 23 15:51:43 2010
How fast will it be? How fast is the boot on your system?

Posted by Simon at Mon Aug 23 16:07:24 2010
Shane Falco, you are being dishonest.

Your concern is that this change would require you to learn new things and have to teach new things.

The way you should rephrase your questions is:

&#8220;Sorry for being off-topic; I am posting this on the For Admins post while my concern is really about "Does systemd offer so many nice things that justifies the change?". I would like to see the question answered: "What are the advantages of systemd that justify this big change? I did not search your previous posts on this subjest."&#8221;

Posted by Michal at Mon Aug 23 16:18:50 2010
"systemd has been accepted as Feature for Fedora 14"

Probably will also be in the new Ununtu 11.04 ;)

Thanks for your work!

Posted by Diego at Mon Aug 23 16:21:50 2010
What about gettext support?

Posted by Lennart at Mon Aug 23 16:42:09 2010
Diego: it's unlikely we'll use the gettext APIs inside of PID 1, simply because i18n data tends to be stored in /usr, and we try to avoid accesses to that, since some folks still have that one a seperate partition (even though it is crazy and misses the point). However, for the client tools this is differentely and w'll certainly reuse the framworks currently used by other projects, be it gettext or intltool, or the hacks to make .desktop files translatable.

Posted by Lennart at Mon Aug 23 16:47:53 2010
Paul, I actually submitted something to LCA, but speaking from experience I won't get funding for the flight. But at least I will be able to say "I have tried"...

Posted by Lennart at Mon Aug 23 16:50:32 2010
lirqa: see my comments regarding "speed" on http://lwn.net/Articles/401441/.

Posted by Lennart at Mon Aug 23 16:51:48 2010
Michal, it is unlikely that Ubuntu will acknowledge that systemd is the future and Upstart is not any time soon. Note that Upstart is a Canonical-funded project.

Posted by Michal at Mon Aug 23 17:21:50 2010
Lennart, Upstart was announced four years ago. Even main developer isn't satisfied with v0.6. I don't see any progress in their repo. I would not be surprised if they in the next year just switched to systemd. Canonical doesn't have enough people to develop something else than a new gnome desktop theme.

Posted by Matthew Jones at Mon Aug 23 17:37:33 2010
Lennart, I just watched the Debconf video about Debian looking to adopt Upstart.

The main issue that was stated for Debian not adopting Systemd, was their BSD kernel support. Will Systemd work with the BSD kernel? How backwards compatible is it for other Unix-like systems that are stuck with init.d scripts?

Posted by Lennart at Mon Aug 23 17:42:45 2010
Michal, after having talked to Keybuk a couple of times in the last months and acknowledging the fact he very recently still did talks on Upstart at Debconf and LinuxCon I fear that's wishful thinking, even if I too hope I am wrong on hat.

Posted by Lennart at Mon Aug 23 17:46:50 2010
Matthew, systemd is Linux-only. We have no plans to support niche kernels. That'd would severely limit our technical options and hold Linux back unnecessarily. If Debian cares about those kernels, it's on them to provide support for it. Note however, that Upstart doesn't work on those other kernels either and similar to us has little interest in supporting it. Note that nothing stops Debian to ship systemd on Linux by default and provide SysV compatibility scripts for the other OSes.

Posted by Omer Akram at Mon Aug 23 17:53:30 2010
Its my personal thinking but Upstart-1.0 is coming so tighten your seat belts.

Posted by Michal at Mon Aug 23 17:57:03 2010
Lennart, Wait until the Canonical bosses will read the sites with positive reviews of new Fedora/SuSe/etc versions. Phoronix probably soon begin to do some benchmarks.

When they see that people see systemd as a breath of fresh air and the upstart as a failure to meet promises - they will throw it away.

SJR can write his code for Debian for free ;)

Posted by Michal at Mon Aug 23 17:58:29 2010
Omer Akram, "Its my personal thinking but Upstart-1.0 is coming so tighten your seat belts.".

Where?

I don't see anything here
http://bazaar.launchpad.net/~scott/upstart/trunk/changes

Posted by Simon at Mon Aug 23 18:20:36 2010
Michal, could you please stop the trolling re: upstart?

Posted by Diego at Mon Aug 23 18:23:55 2010
Ouch...however, doesn't this help in some way? http://www.gnu.org/software/gettext/manual/gettext.html#Locating-Catalogs

Posted by Lennart at Mon Aug 23 18:30:11 2010
Diego, well I am pretty sure people would hate me if i'd start moving i18n data to /lib...

Posted by Omer Akram at Mon Aug 23 18:30:37 2010
>I don't see anything here
>http://bazaar.launchpad.net/~scott/upstart/trunk/changes

thats for a surprise

Posted by oiaohm at Mon Aug 23 18:38:48 2010
I think you over looked something in the PAM module/possible future feature.

Session disconnects and reconnects support.  This would be a great step forwards particularly if text based vt can be moved to X11 terminals and reverse.

Also a great feature for X11 servers in future.

Question currently I read systemd as starting system wide services.  Could it not be extended in future to also start and manage per user services like pulseaudio and jackaudio?

So spiting these services away from normal user processes and making it simpler for users to restart them and 100 percent clean them up in failure.  Service is a Service no matter where it running would be a good policy.  Also allow sandboxing of these services in a far more controlled way.  cgroups do process tracking to sandbox very well.

I guess systemd is fairly moduler.  Could hooks be added for smack LSM as well as SElinux?  Those are the two mainline LSM's that used labels.  Rest of the LSM's don't.  So really for full support of a Mainline kernel a user might load up supporting both is required.

I hope one day to see systemd with GTK and QT front ends.  Start of serous real graphical management distribution independent.

Posted by Diego at Mon Aug 23 19:02:27 2010
Why would Ubuntu switch so suddenly? Remember that systemd hasn't been deployed in any mainstream distro. They'll probably do it in the future, but...right now? Why would they even interested?

As for Debian...well...it's not like the rest of the Linux world is going to wait for them. If they want to continue pushing for GNU/kfreeBSD while ubuntu dominates the linux desktop and centos the free server market share, that's fine for them.

Posted by Nagilum at Mon Aug 23 20:45:09 2010
If ntpd.service would have emitted some error message while starting up, how would I display that using systemd?

Posted by Lennart at Mon Aug 23 20:49:05 2010
Nagilum: by checking the logs. The long term plan is to hook up "systemctl status" to the logs, so that you'll see the most recent log messages generated by a service next to the service. But until that happened we need to beef up syslog considerable, i.e. make it indexable and stuff like that.

Posted by Rainer Weikusat at Mon Aug 23 21:35:14 2010
The reason to separate /usr from / is that it
contains architecture dependent, shareable data.
And that's still relevant today because of
the possibility to have 'Linux containers' which
share everything shareable with the host
installation they run on. Of course, this also
needs the ability to easily customize system
startup, say, by deleting scripts which are not
needed for a container instance (root-fs of that
having started out and remove parts of existing
scripts which serve no purpose in a container
instance.

And no, I'm not 'crazy' because I happen to have
some experience with the servers I operate you
are quite obviously lacking.

Posted by Lennart at Mon Aug 23 21:38:11 2010
Rainer, I am sorry. But you are completely misunderstanding the /usr vs. / split. Also note that most commercial Unixes already got rid of the distinction and symlink one to the other. Please read up on things before calling me a noob. Thanks.

Posted by Simon at Mon Aug 23 23:06:05 2010
How does pam_systemd relate to ConsoleKit? There seems to be some overlap with regard to maintaining info about current user sessions...

Posted by Lennart at Mon Aug 23 23:11:36 2010
Simon, yes, there's a non-trivial amount of duplication between CK and systemd. Note that Jon passed on half of the maintainership of CK to me and there's something like a consensus of the people involved to fully merge CK (or something equivalent) into systemd, in the long run at least.

Posted by Rahul Sundaram at Mon Aug 23 23:11:54 2010
Simon,

My understanding is that ConsoleKit will be obsoleted by Systemd in the near future.  Lennart is a maintainer of ConsoleKit as well for the time being.  Other distros not using systemd can continue to use ConsoleKit I guess

Posted by William Lovaton at Mon Aug 23 23:30:46 2010
I'm really impressed Lennart!.  Congratulations for your hard work, I can't wait for Fedora 15.

Thanks.

Posted by Claes at Tue Aug 24 00:23:52 2010
I am excited to see so much progress. I don't have much to bring to the table, a few reflections only about the terminology.

Having a kind of status called ACTIVE, and one of its states called active as well feels weird. And to see a string like "Active: maintenance" feels confusing. Likewise would "Active: active". I think something like "Status: failed" would communicate the situation better.

Posted by Lennart at Tue Aug 24 00:42:21 2010
Claes, well, status is too generic, because we have the high-level and the low-level state, which we need to distuingish somehow in the interface. Onbe we called "active" state, the other "sub" state.

Also note that the word "status" (in contrast to state) is already used in the output of the exit status of the program.

Posted by Denice at Tue Aug 24 00:43:45 2010
I'm a little worried that anyone thinks Solaris' SMF is something worthy of copying.  I find it horribly over-engineered.  These days it is common to run virtual servers which do really only one thing (web server, or a mysql slave, or an ldap server).  I have a number of xen guests that list perhaps 15 'chkconfig-ed on' services:
chkconfig --list|grep :on

So from a system administrator's point of view, speaking of managing targeted servers and not multimedia desktops, I don't need anything complicated to manage runtime services.

You might want to seriously think about writing a tutorial for a typical small server (apache only, for example - no graphics, no bluetooth, no atd, no iscsi, etc.), and then convince us that systemd provides any value.

cheers, etc.

Posted by Shane at Tue Aug 24 01:49:13 2010
Denice said it better than I ever could.  As someone stuck with over a hundred Solaris 10 servers, I agree completely with her assessment.

Here's a nice little commentary on Apple's launchd which I feel is just as appropriate for systemd:

http://lowendmac.com/ed/winston/10kw/launchd.html

It's monolithic, it's "over engineered", and it does too many things.  In a nutshell, it's anti-unix.

Posted by Cameron Hutchison at Tue Aug 24 02:35:25 2010
"thus ensuring that everything ever logged on the system will properly end up in the log files"

Does this include timestamps being properly captured? When trying to debug delays with suspend/resume, the logs weren't much help since all the suspend and resume log messages had the same timestamp in the system logs.

Posted by Stan at Tue Aug 24 06:49:31 2010
A new init system is a great opportunity for distros to eliminate the minor (yet damaging) differences, so that a service written for one distro will be 100% compatible in another distro. A single code base also has the advantage of heavy testing and extermination of bugs.

By including special code for non-standard stuff like "SUSE extensions", systemd is just putting a bandaid on the problem instead of fixing it.

Posted by Anonymous at Tue Aug 24 06:57:59 2010
Would you consider writing more about the C-based init scripts?  I've had the general feeling for a long time that all distributions need to do the same small amount of work to bootstrap the early boot process, and I'd love to hear more about the common core you distilled it down to.  Obviously I can (and will) go read the C source, but I'd love to hear the higher-level view you've obtained by reviewing distributions.

Thanks!

Posted by Tomasz at Tue Aug 24 08:42:17 2010
oiaohm: user session support is in current systemd. For graphical insight look at "systemadm" (in fedora: systemd-gtk package).

Posted by Alexandr Kara at Tue Aug 24 10:28:37 2010
I must say I am impressed by the progress on systemd so far, but I am a little worried about one thing. You say that systemd requires a very recent kernel. Does that mean that when booted with an older kernel, it will just refuse to start? Or will it have some "compatibility" mode when it starts services in parallel and without using cgroups? Or maybe drop to old init (if still installed)?

Posted by Tshepang Lekhonkhobe at Tue Aug 24 11:54:20 2010
Lennart, rock on!

Posted by Karellen at Tue Aug 24 14:02:45 2010
@Shane:

    [systemd] does too many things


It manages the startup and lifetime of system processes. That's it.

From the article you linked:

    Merging periodically run jobs into the main system process doesn't make sense.


Why not? "cron" and "at" manage the startup of periodic system processes. The only thing they do different from "init" is that they start the processes at a time other than bootup. Everything else is common between them. So why not de-duplicate the effort involved in starting, tracking and logging, and just allow "init" to start other processes at times other than boot?

    Replacing a simple /etc/crontab text file with multiple, awkwardly named XML plist files scattered among no less than four different directories is taking two big steps toward complexity.


There's no reason that systemd would be implemented that badly. In fact, I'm pretty sure that systemd reads existing "crontab" files just fine. So systemd doesn't require any changes there.

    Starting infrequently used on-demand socket-based daemons from launchd seems like it could open the main system process to a potential denial of service attack. I have not explored this idea or researched to see if it has already been tried,


Well, I haven't researched it, that looks like nothing more than FUD and making-shit-up to me.

    One of the core principles of Unix programing is do one thing and do it well.


Like having one and only one place to consistently manage the startup and monitoring of system processes? Oh yeah, that's totally anti-Unix-philosophy.

Posted by Lennart at Tue Aug 24 14:23:31 2010
Cameron: the kernel log buffer only includes timestamps when this is enabled on the kernel command line. A good syslog implementation could read those timestamps and handle them properly. However, I think the current implementations unfortunately don't do that.

Stan, we only support OpenSUSE extension for the LSB/SysV stuff which in the long run is legacy anyway.

Anonymous: there's no such thing as a C-based init script. That's a misconception.

Alexandr: yes, we require a very new kernel. Which is a safe requirement to make for something that needs to be integrated by the distributor anyway.

Posted by Anonymous at Tue Aug 24 15:08:39 2010
Lennart: You said in your post that "We reimplemented almost all boot-up and shutdown scripts of the standard Fedora install in much smaller, simpler and faster C utilities, or in systemd itself."  "C-based init scripts" seemed like a fair paraphrase of that sentence; would you prefer "C replacements for init scripts"?  Either way, I think my original question still applies; I'd love to hear more about them in the future, if you'd consider writing more about them.

Posted by Aleks at Tue Aug 24 15:58:55 2010
Great work Lennart! I'm very impressed by the progress of systemd and excited about trying it out.

Posted by Marius Gedminas at Tue Aug 24 16:59:14 2010
Could you post an example of a pretty process tree produced by systemd-cgls?

How does the systemd distinguish user processes that should be killed on logout from processes that should be left running (e.g. screen, nohup, wget -b)?

(Why does this form keep rejecting my comments?  Try #3.)

Posted by Lennart at Tue Aug 24 19:21:04 2010
Anonymous: well, what happens with the boot scripts depends on the case. One example: part of the boot and shutdown scripts it is to restore and save the random seed of /dev/random. This was previously done via some shell hackery. In systemd, we replaced that by a simple C program, i.e. this one: http://cgit.freedesktop.org/systemd/tree/src/random-seed.c -- which can easily be called from a simple .service unit in systemd, i.e. this one: http://cgit.freedesktop.org/systemd/tree/units/systemd-random-seed-load.service.in -- and that's all there is to it.

Marius, check http://www.freedesktop.org/wiki/Software/systemd/TipsAndTricks at the end. systemd doesn't duistinguish user processes that should be killed or not. This is about security, and it's a decision of the administrator if he wants to allow the user to keep processes around after logout or not, regardless if that process is called "screen" or "foobar" or whatever. However, privileged processes can escape this, and make themselves a member of an arbitrary cgroup of the system and thus avoid being killed when the user logs out. This could even be done via PAM, where invoking the PAM session hooks whcih will create a new session cgroup and move the calling process into it. For example, if it is desirable that the user may keep processes around after logout via screen and only screen, then screen should be patched to call into PAM (which I think it might actually already do in some cases). But again, just calling a process "screen" should never be something magic that allows you to keep a process around. This must be possible only via privileged code and not otherwise.

Posted by Lennart at Tue Aug 24 19:37:50 2010
Denice, Linux is a scalabale operating system. It is used on big irons to tiniest devices. With systemd we try to cover the whole bandwidth, and please understand that your specific use case is not the only one we need to cover.

Shane, you are right, systemd is nothing like traditional Unix. And that is a good thing. Unix has been designed 41 years ago. You honestly believe that its design is perfect and flawless and 41 years after it was designed still should be followed in all detail? No, computers changed, and Unix never was perfect. It probably was a better design than most other operating systems, but this does not mean it is perfect and we should never depart from it. systemd is inspired by Unix, but also from what has been done on MacOS and even on the Windows world, and on Solaris. We didn't copy any of the existing services 1:1, we just let us inspire by their best features and translated them to Linux and added quite a bit of new stuff on top. And that's how it should be done. Unix is an inspiration, it is not the holy grail. Not 41y after it was designed.

The fact that on traditional Unix the init system was seperate from cron, from at, from inetd, from the dbus service activator and from everything else meant that all of them reimplemented a big chunk of their code, i.e. what was involved with spawning processes. It was a useless code duplication, and all implementations sucked at it in one way or another. Also, you could not run the same thing from more than one of these systems without manually ensuring that things would happen race-freely and properly ordered. In systemd we unified all of this. We use the same codepaths for spawning processes, regardless if they are started via timers, via sockets, via busses, at boot-up, via devices and so on. This allows us to reduce the amount of code duplication, and provide the same awesome process babysitting to all triggers. And that is a big big advantage. If you look at the systemd source code you will notice that the remaining amount of code, for example for doing timer-based spawning is actually very very short, less than 500 lines (including comments and whitespace!). So overall, we simplify things drastically, we get rid of immense code duplication, and we still are a lot more powerful than what came before.

So, in summary: just because we do things differently doesn't mean we do it worse.

And if you tell me that systemd is not Unixy, then I can only agree, and I don't feel ashamed at all of that. Because my horizon is much further than just Unix.

Posted by Denice at Wed Aug 25 02:12:31 2010
Lennart, my 'specific use case', as you put it, is pretty standard actually.  I'm managing 300+ Linux servers (and a few handfuls of Solaris boxen), and we simply don't run lots of services on any of them.  Linux system administrators don't let the plethora of services run that you have in your example above.  What I am looking at above seems to be a desktop.  How about an example like I mention in one of your posts - just a typical targeted server...

Posted by Riku at Wed Aug 25 13:03:57 2010
That quite a bit of progress. I salute your "Get Things Done" attitude :)

Stupid question: What does systemd taking care of d-bus activation mean? eg. Why is current d-bus activation insufficient and how does systemd change that?

The timer part is exciting. But it doesn't replace atd and crond yet ;) According to manpage you can't seeminlgy set a timer to fire at specific time/day/daily.

Posted by Giovanni at Thu Aug 26 02:14:32 2010
I find Solaris SMF one of the most amazing features that we as sysadmins have to aid us in managing hundreds of servers and it's great that something similar is making its way into Linux. Way to go!

Posted by Bryan Horstmann-Allen at Thu Aug 26 09:45:29 2010
Denice: What happens when the Linux OOM killer freaks out and kills a bunch of your services? What ensures they get restarted? Or that they're even running at all? (I guess if you aren't running "a lot" of services, you aren't doing much at all anyway.)

If you aren't using some form of daemon management (runit, daemontools, etc), in addition to your monitoring, you have failed.

Lennart: Nice to see the trend to more mature service management in the Linux space, but further fragmentation is annoying... Is Upstart horribly broken, or simply not extensive enough?

The addition of an API to manage services (and everything else systemd appears to manage) is completely awesome. Can't wait to see a Puppet/Chef provider. :)

Posted by Bryan Horstmann-Allen at Thu Aug 26 09:48:19 2010
Ah, I see your post on Upstart. Nevermind. :-)

Posted by Karel at Thu Aug 26 13:25:34 2010
I really love basic Unix principles and I think that good software should be based on KISS rules. And from my point of view systemd is not bad thing. (frankly, it looks better than PA:-)

It would be really nice to have one place where we manage system processes in userspace. The management should be integrated to Linux -- Linux means cgroups, udev, shared mount subtrees (namespaces), selinux, inotify, etc. It does not make any sense to ignore the modern technologies that are implemented in kernel or use the technologies separately.

Posted by dissent at Thu Aug 26 16:14:06 2010
you must love to reimplement perfectly working stuff in a very "futuristic" way... and the talk about not caring for compatibility with "irrelevant" systems/distros make you look so adventurous and sexy...

Posted by hreidmarr at Thu Aug 26 18:36:07 2010
I smell problems. Tons of them. And, as always, Fedora will be the catalyst.

Anyway, let the world burn!

Posted by fran at Fri Aug 27 16:18:46 2010
Hey dissent, yes we still love our commodore 64s too.

Stick to CentOS if you can't stand change.

Posted by Andy Jackson at Tue Sep 7 21:33:18 2010
I'm fascinated with your random-seed example & research including Debian.

If using C programs is beneficial & systemd independent (upstart could do similar calls), then can this be a separate project so that others (me) can integrate its gains into other distros?

Posted by Lennart at Wed Sep 8 00:22:59 2010
Riku: there are mainly two reasons for hooking up D-Bus activation to systemd: 1) this way you have a single maintenance interface for all daemons that run on the machine, which covers stuff previously started via SysV stuff as well as stuff previously started exclusively via bus activation. You also can use the same configuration options for the services, i.e. all the logging, execution environment fancinesses and whatever else systemd offers to limit what processes and daemons can do, which is substantially more than what the minimal D-Bus process spawning code can do. 2) this allows us to race-freely start services based on different triggers. Example: avahi shall be started as soon as a network iface shows up, or somebody uses its socket interface, or somebody uses the bus interface. Regardless which trigger came first, we are now able to start only one instance, and do that race-freely.

Posted by Lennart at Wed Sep 8 01:12:21 2010
Andy, well, I am working on systemd, and I have little interest in improving other init systems. People are welcome to steal code from systemd (after all its Free Software) but writing the code in a style that it would be useful outside of systemd would be very limiting since we couldn't use systemd's rich set of utility functions for implementing these little utilities.

Posted by bharatt at Wed Sep 8 04:47:03 2010
Hi Lennart,
Have a query, (which could have been addressed earlier).
How about switching between runlevels dynamically?
We use "init <runlevel_number>".
Is this still possible, or any equivalent command is there in "systemd"?
"chkconfig" exists or it is replaced by systemd?

Posted by liam at Wed Sep 8 04:51:28 2010
Thanks for these posts.
I'm a bit uncertain as to how far cgroups can be pushed for administrative purposes. Can you have nested cgroups? For instance, a Gnome/X/whatever group that one could kill? Can the end user create alias' for cgroups which could then aggregate them into more manageable units?

thanks

Posted by nim@fedoraproject.org at Wed Sep 8 08:07:29 2010
Yurk

We are in 2010 now, can't you use the correct unicode glyphs to make a tree that is not cobbled from unrelated characters? This hurts my eyes

(see the box drawings at U+2500...)

Posted by Lennart at Wed Sep 8 11:36:25 2010
Liam, cgroups are fully recursive, you may split every cgroup into sub-cgroups. And as soon as systemd is used for session management the same way it is used for system management session services will be arranged the same way in subgroups of the group the session manager happened to be executed under.

nim, the tool actually uses unicode glyphs. But when I copied this into the blog story I noticed that not a single browser I tried on not a single OS I tried could show them properly and hence I replaced them in this blog story by this 7bit ASCII graphics.

Posted by Karellen at Wed Sep 8 14:09:30 2010
"But when I copied this into the blog story I noticed that not a single browser I tried on not a single OS I tried could show [unicode box characters] properly"

How were you sending the the characters? At HTML numeric character references (e.g. &amp;#1234;) or as plain inline unicode text?

If inline, were you telling the browser which character encoding you were using? As far as I can tell, your web server simply claims "Content-Type: text/html", and there is no HTML "meta" tag in the page to specify a character encoding.

Note that the HTML 4 spec, section 5.2.2 <http://www.w3.org/TR/html4/charset.html#h-5.2.2>, says:

"The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the "charset" parameter is absent from the "Content-Type" header field. In practice, this recommendation has proved useless because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the "charset" parameter."

Note that neither ASCII, or ISO-8859-1, contain any box-drawing characters.

Yes, browsers probably should assume UTF-8 (IETF std 63) by default, but there's no standard says they should, and they don't. :-( In the mean time, it's worth specifying it yourself.

Posted by cesarb at Wed Sep 8 15:25:17 2010
Box drawing characters seem to show fine for me on Firefox: http://en.wikipedia.org/wiki/Box-drawing_characters has several examples.

Posted by Lennart at Thu Sep 9 00:10:19 2010
Well, I didn't want to become a HTML5 hacker for this blog story. I just wanted to get the story out. So I did the simplemost thing, I "photoshopped" the output and replaced the graphical chars with 7bit ASCII.

Posted by nine at Thu Sep 9 12:04:49 2010
UNACCEPTABLE MODIFICATION!!!!

No seriously, this looks great. These features look like they will add real value for administrators.

I have been getting used to upstart lately with Ubuntu 10.04. It seems like a complex restructure of init.d and the only benefit is faster boot. Did you know to restart a service it uses 'restart foo'???

Posted by Anonymous at Thu Sep 9 14:13:19 2010
"Well, I didn't want to become a HTML5 hacker for this blog story. I just wanted to get the story out. So I did the simplemost thing, I "photoshopped" the output and replaced the graphical chars with 7bit ASCII."

- "charset" is part of HTML 4, published in 1997
- UTF-8 was published 1993

If you really can't manage to put the real output into your blog post, use a terminal screenshot.

Posted by Nehemiah at Thu Sep 9 16:47:21 2010
can't you read from /etc/issue for welcome text?

Posted by ChrisM at Thu Sep 9 23:09:25 2010
Just a little comment - the ability to disable services is something that was missing from upstart for a long time and is important to many people. See this feature request:

https://bugs.launchpad.net/upstart/+bug/94065

Will systemd include this functionality?

Posted by bronson at Fri Sep 10 00:47:26 2010
Apparently anonymous thought he was commenting on an article about HTML, not systemd.  Very strange.

Posted by Lennart at Fri Sep 10 01:53:45 2010
ChrisM: yes, systemd has that with "systemctl enable" and "systemctl disable" and had it for quite a while.

Posted by liam at Fri Sep 10 05:08:07 2010
Sounds fantastic. You said exactly what I wanted to hear.
Thanks.

Posted by Perry Lorier at Sun Sep 12 19:57:12 2010
So, you've reinvented process groups?

Posted by Lennart at Sun Sep 12 20:21:43 2010
Perry, no, not at all. process groups you can escape. They aren't hierarchical, they cannot be labelled. Process groups are very very different from cgroups, and useful for little more than pipeline building in shells.

Posted by Mirko at Wed Sep 15 12:29:04 2010
I like the theory of it very much. How about an additional binary interface to implement services which could add additional controls, meta data etc. not easily and efficiently done by signals and other IPC mechanisms. the like. Also, services could be linked into shared objects more than one at a time, and thus further speed up system start up by requiring the load of a single binary image possibly containing more than one service entry point?

Posted by Mirko at Wed Sep 15 12:36:50 2010
I also like the explanation you give on the "upside down" logic of upstart. If, for example, you plug in a USB scanner and you want to open your image scan software as a reaction on that, there is really no need to load a daemon or driver software as a pure result of the "plug" event. Actually, the plug event should solely be processed by higher-level software, like the GUI, which in turn will open the image scanning software, which in turn will request access to a service, which in turn will access a device node, which in turn will load appropriate driver modules, which in turn will initialize the hardware. Upstart messes this clean request-response queue up by unsolicitedly assuming that the mere fact of plugging in a device means that it is about to be used by a specific driver or a specific daemon, no matter what the user actually wants to do...

Posted by Palatinux at Sat Sep 25 02:51:14 2010
I've read an article about systemd in a English Linux magazine which made me very curious. Just by looking at all the information on this page, I can say that this is quite a breakthrough.

As the lead developer of Fortress Linux I was looking for a faster and secure manner to manage and start processes under Linux and Systemd looks like a good implementation to work together or next to secured containers under Fortress Linux.

The ondemand loading and other features of Systemd may also result into less possible vulnerabilities within the FL OS.

I am going to test it first within Small Fortress Linux which is minimized/Live Linux OS.

If it really shows out to be a good init system, then you can bet it will be the default init system of Fortress Linux.


Sincerely,

Palatinux

http://www.fortresslinux.org

Posted by 0x1b at Mon Sep 27 23:37:26 2010
A couple related questions - could be in the plan, could be OT:
1) Does systemd do any orientation - for example, laptops go from the home net, to wifi on the bus, to work net, to coffee shop wifi, to presentation lan etc etc. Can systemd figure out which nets it has access to, to drive daemon launches?
2) Along the lines of the iCalendar request, is systemd going to participate in work flow schemes?
3) Are you developing systemd-to-systemd API? for example, if a service is conflicted on a system, could it ask a neighbor to run the conflict to satisfy a dependency?

Also, congrats "Lightning Rod" Lennart (tip of the hat to Ben). I was expecting EvilDead2 and I'm, I'm not that scared anymore... just please wait for f15 to make it the default.

Posted by Anonymous at Fri Oct 1 06:58:25 2010
Ideally, couldn't you configure ABRT to only run when core files show up in a given directory, or when something requests its dbus service?

Posted by drago01 at Fri Oct 1 10:52:20 2010
CPUSchedulingPolicy=idle ... is there the same thing for IO i.e IOSchedulingPolicy=idle ?

In most cases I couldn't care less about CPU on todays multicore machines but IO is still a very limited resource (when not running an SSD).

The kernel actually allows setting IO priorities (when using the CFQ scheduler).

Posted by Lennart at Fri Oct 1 13:04:51 2010
Anonymous: While this would definitely be desirable AFAICS abrt doesn't support this scheme, since it needs to be running when the first crash dump is collected.

drag01: There's IOSchedulingClass=idle for you.

Posted by John Drinkwater at Fri Oct 1 13:34:29 2010
Restart=restart-always
Again, why have this redundancy if you are starting a design from scratch?
Restart=always|once|on-success

CPUSchedulingPolicy=idle
IOSchedulingClass=idle
Why is one a class, and another a policy? People will mistype these.

This is not bikeshedding, this is a request to stop making everything long-winded when it does not need to be so. If systemd is to be around for the next few decades, and you have time to refine it before the next Fedora release, please do so.

Posted by Lennart at Fri Oct 1 13:55:07 2010
John, regarding Restart= you have a point. And I fixed that now.

Regarding the Class vs. Policy thing: that's how the kernel calls these things, blame the kernel folks for that. I think it would be a very bad idea to introduce deviating terminology here where the kernel fucked up.

Posted by Milan Bouchet-Valat at Fri Oct 1 15:00:38 2010
Glad to see you have an easy to parse Description field! But while you're at it, could you consider providing translated descriptions for configuration tools?

Recently, Ubuntu had a GSoC about writing a new config tool for Upstart. One of the issues was that there's no way to get a localized translation from Upstart jobs or SysV scripts, let alone an icon! It would be great if you tackled this issue in Systemd, e.g. with a standard .desktop-like file that services should ship.


The other part of the work Jacob Peddicord did in his GSoC is more remote from Systemd, but might be interesting. He has a whole project of describing configuration files associated with a service:
http://jacob.peddicord.net/gsoc2010/
http://people.ubuntu.com/~jpeddicord/SLS/0.8/sls-format-0.8.html

I guess it can be good you know it exists...

Posted by Lennart at Fri Oct 1 15:11:55 2010
Milan: the longer term plan is to support translations for the descriptions the same way as .desktop files have them. Right now we don't do this, but this is definitely the plan. I am also open to adding an Icon setting, though I am a bit concerned that if we add and Icon, then the next thing asked for is a Vendor ID and so on and so on.

Posted by j at Fri Oct 1 18:35:14 2010
verbosity is redundant (and confusing) for a unix system tool. Since Io scheduling classes are linux-specific, it can be written like that:

CPUSchedulingPolicy ->  SchedLevel
IOSchedulingClass -> IOSchedLevel

BTW is systemd portable to all Unix or it needs linux kernel for some reason?

Posted by Lennart at Fri Oct 1 18:47:31 2010
j, calling the same stuff in userspace differently than in kernelspace, and calling the same stuff in the chrt tool differently than in systemd is a very bad idea.

systemd is strictly Linux specific. It is not portable to other Unixes and we do not care about portability to them. This allows us to make use of Linux features and is one of the reasons why systemd is so much more powerful than any other init system around.

Posted by Grahame at Fri Oct 1 19:03:30 2010
At the moment if I'm having a problem with a daemon failing to start I might just hack the init script, chuck strace in, and restart it. It'd be great if you could show how you might shim a failing daemon, particularly when debugging 'fails on reboot' issues (eg. starts fine later.)

Posted by Anonymous at Fri Oct 1 22:10:11 2010
I'm wondering about a services that get autostarted via D-Bus. D-Bus starts them itself, so unless I'm wrong they'll end up in the D-Bus service cgroup, not in their own cgroups. Yet I want them to be controllable as services itself. Is this possible to achieve?

Posted by Michael at Fri Oct 1 23:21:36 2010
@Anonymous:

This is one of systemd's great features:
Starting with dbus 1.4.0, dbus-daemon can hand over starting of system services to systemd, where you have all those possibilites to monitor and confine the service (in it's own cgroup)

All you need to do is to add a
SystemdService=foo.service
line to the D-Bus service file, create a foo.service file for systemd and systemd will automatically start the service defined in foo.service.

Posted by Andreas at Sat Oct 2 00:49:51 2010
I agree with those complaining about names like CPUSchedulingPolicy but as Lennart said that is hardly the fault of systemd. Not really much that can be done about it.

This is post is the part I like the most about systemd. No more boilerplate bash and no horrible XML like the launchd plists or overly verbose XML like SMF. Now there might be other good init systems but this is the first one I have seen where it is easy to just read the job configurations.

Like the use of sections too in the files so when I read them I can mostly ignore sections like [Install].

Posted by codebeard at Sat Oct 2 04:36:12 2010
@Grahame

I assume that you can do something like:
ExecStart=/usr/bin/strace -f -o /root/abrtd.strace /usr/sbin/abrtd -d -s

But perhaps Lennart has another way in mind to do this?

Posted by John Drinkwater at Sat Oct 2 14:24:51 2010
Lennart, thanks. Apologies if my comment came over a little stronger than I intended.
I notice some variables for scheduling have different ranges, is this again a kernel issue? Maybe I should go bang some heads there..

Posted by Baybal at Sun Oct 24 07:15:55 2010
Bash is of course slow, but do you know about zsh, dash and finally perl?

Posted by Aaron Seigo at Fri Nov 19 05:13:28 2010
"Yupp, KDE folks, you can add an agent for this, too"

where is the documentation for the relevant API used to accomplish this?

Posted by alex at Fri Nov 19 06:22:18 2010
Lennart, what % of a boot time systemd is reducing in compare to a easy to read/manage sysV boot system?

I were trying to find the measurements, but never found any.

Posted by Anonymous at Fri Nov 19 07:36:44 2010
How easily could I disable the automatic cleaning of /tmp?  I lost useful bits one too many times before I turned off cleaning of /tmp on all my systems.  Plus, this seems like a good opportunity to find out how easily the built-in equivalents to init scripts allow configuration.

Posted by Michael at Fri Nov 19 08:20:46 2010
@Anoymous:
/etc/tmpfiles.d/systemd.conf contains (among others) those two lines:

d /tmp 1777 root root 10d
d /var/tmp 1777 root root 30d

Just comment them out and you're done.
For more info see http://0pointer.de/public/systemd-man/tmpfiles.d.html

Posted by Michael at Fri Nov 19 09:05:44 2010
>> In fact, shell scripts during early boot
>> are only used in exceptional cases

Why is LVM an "exceptional case"? It's the default to install Fedora on LVM after all. Would you say it is better to not use LVM or will there be better support for it in the future?

Posted by Dave Airlie at Fri Nov 19 09:49:06 2010
Hey Lennart, a lot of people use reboot -f when their system won't let them umount filesystems, like for when they've oopsed the kernel and want to remote reboot, so I hope you haven't actually removed proper forced reboot in favour of calling umounts which will just hang the system in the kernel if something has gone wrong in the storage subsystem.

Posted by bkor at Fri Nov 19 10:43:11 2010
Always nice to see the updates & new features of  systemd.

Posted by Anon2 at Fri Nov 19 10:57:40 2010
Always nice to see how an originally nice idea morphs into a Swiss Army Knife software. Next step to shave off a few more milliseconds of a boot time is to move the systemd code into the kernel.

Posted by Jaroslav Reznik at Fri Nov 19 11:07:11 2010
From one of Fedora's KDE folks - is it really so difficult to ping us and ask for help supporting yours technologies that come to Fedora? Same as with Polkit... It's not easy to catch it then if we don't know what to support.

Thanks Aaron for comment. /me is going to look for documentation...

Posted by Vasilis Vasaitis at Fri Nov 19 13:09:24 2010
Great stuff! The only thing that comes to mind is, you guys should really make sure to provide detailed documentation for the user/administrator, in man/texinfo format (ideally both). One really good thing about the traditional shell-based boot system is that it's extremely self-documenting: even if I don't know anything about how a distribution has its boot system set up I can start reading inittab and take it from there. With systemd inevitably a lot of the boot process becomes much more opaque, so there should be plenty of documentation about what it does, in what order, how everything is configured/modified/disabled, etc etc.

Posted by Lennart at Fri Nov 19 16:05:57 2010
Aaron, systemd is actually documented very well, in fact much better than most projects, however this interface isn't so far. Feel free to ping me if you need details. I don't bite. If KDE hackers want to be involved, then involve yourself, don't always wait for us to ping you.

alex: I am pretty sure systemd is much easier to manage than sysv. I am booting in 14s now a fully equipped F15 with crypto and everything. With sysv it used to be something like 26s or so. But on purpose I don't give out numbers like this since they are not necessarily reproducible. The speed-up is bigger if you have a system which starts more stuff anyway. And the measurements are highly dependent on your hardware.

Anonymous: you can configure that easily in the files Michael suggested. Instead of disabling those lines I however recommend simply replacing the last word in those lines. If you write "-" instead of 10d then the automatic cleanup is disabled. Note however that you are most likely doing something wrong if you store files you don't want to lose in /tmp.

Michael: yes, I believe we should no longer install LVM by default. It slows down boot considerably and is still not updated to today's hotpluggable dynamic world. And for the majority of all folks (especially laptop people) it offers zero benefit. In fact, Fedora is the only distribution enabling LVM by default, and I believe we should stop doing that. With the advent of btrfs volume management will become much nicer and future-proof anyway.

Posted by Lennart at Fri Nov 19 16:11:58 2010
Dave: traditional 'reboot -f' continues to exist.

Jaroslav: see my comment to Aaron regarding KDE involvement. Please consider this blog story my ping to you. I am happy to provide you with any information you need and reference implementations. I'd even be willing to review any code you guys might come up with to check if it does what it is supposed to do.

Vasilis: systemd documentation for admins is actually pretty good, much better than what most projects have. Just check out the man pages: http://0pointer.de/public/systemd-man/

Posted by Lennart at Fri Nov 19 17:09:12 2010
Jaroslav, Aaron: wrote some documentation of the algorithm now for you: http://www.freedesktop.org/wiki/Software/systemd/PasswordAgents -- Happy?

Posted by nate-m at Fri Nov 19 18:29:30 2010
"""Michael: yes, I believe we should no longer install LVM by default."""


When Btrfs gets up and going well then LVM will be redundant and inferior for most purposes. Hopefully it won't take long.

Then they can figure out how to integrate support for btrfs snapshots and volume management into systemd. :P

For example one of the more useful ways to use Btrfs is to create a new volume for each user. Then users can enable features like compression and encryption for their user. Also makes it useful for snapshotting users and applications.

And there is the btrfs plugin for yum for rolling back updates and such things.

Fun stuff. Lets hope it does not suck as much as LVM. :)

Posted by j at Fri Nov 19 19:16:18 2010
Is systemd only for sysv folks?

Posted by Lennart at Fri Nov 19 19:22:08 2010
j, uh? what do you mean by that? We (optionally) support SysV scripts as an alternative source of configuration. You can disable that at compile time even though most distributions will probably leave it enabled by default.

Some distros (Gentoo) have chosen to disable SysV support in systemd by default, since they historically actually did not use SysV scripts for bootup.

Posted by alex at Fri Nov 19 20:07:52 2010
> This will ensure that SIGTERM is delivered to all processes of the crond service, not just the main process

boy, you know that u dont need to restart cron? its rereading config files right away once it was changed. And even if its needed, you need send HUP to a parent is enought. Anyway. cron, bind, apache... anyway.

Lennart, with all the respect, your knowledges about daemons are so screwed up. You should read the book of Evi Nemet before touching this things.

Posted by j at Fri Nov 19 20:09:48 2010
Sorry, I was under the impression it somehow relies on sysv. Read up on it in the meanwhile (the announcement post), please disregard.

Posted by bochecha at Fri Nov 19 20:32:50 2010
@alex: I lost count of the times I had a stalled cron daemon that kept spawning children that would never complete, bringing the host to its knees.

Stopping such a cron daemon is not enough usually, and when killing it, all children processes remain alive and attach to init, so you have to « kill -9 » them all individually.

I for one welcome « systemctl kill » heartily. :)

Posted by Michael at Fri Nov 19 20:35:50 2010
@alex:
There are different cron implementations. The one on Debian (vixie-cron based) indeed does pick up changes to configuration files automatically.

Fedora uses a different cron implementation from what I could find out which does not automatically reload on configuration changes.

Please also note that Lennart used --kill-who=main for the SIGHUP example. Exactly for the reason you mentioned that only the main process (what you called parent) needs this signal

Posted by Jeffrey W. Baker at Fri Nov 19 22:31:40 2010
Might I suggest abbreviating the syntax of these two things to 'systemctl kill' and 'systemctl killall'?  That will be a bit nicer than --kill-who=whatever.

I agree with @bochecha that the ability to kill all user's children of crond is a miracle.  It is quite difficult to write a proper cron job that will never launch in parallel with itself and most users will screw it up.

Posted by Saint DanBert at Fri Nov 19 23:47:20 2010
I use *-buntu systems of various sorts. I would love to work with you to see if we can make it work there. I did not find a link to a listserv or similar where one might volunteer.

Also, I've been a code slinger for almost 40 years. Lately I do technical writing, requirements docs and similar. Again, where do I go to volunteer.

HERE I AM -- A VOLUNTEER.  Someone make contact so that I might help with this effort.

Cheers,
~~~ 8d;-Dan

Posted by Jakub Nar&#281;bski at Sat Nov 20 00:04:36 2010
+1 for 'systemctl kill' and 'systemctl killall'.

The --kill-who doesn't make for nice API.

Posted by Lennart at Sat Nov 20 01:31:42 2010
alex: you seem a little bit confused, killing SIGTERM triggers a shutdown of a process, not a restart. Also note that HUP in most daemons actually triggers a reload, not a restart, which is quite a distinction. Finally, different cron implementations work differently. With the advent of inotify more and more daemons now automatically reload their configuration files if they change (although for the cron case you don't even really need that), but that's a more recent development. I won't comment on who of us has the screwed up knowledge here...

Jeffrey: definitely an interesting idea, however I am not 100% convinced we really want this. After all I want to be able to read the command line as if it was a sentence, and "kill foo.service" kinda tells me that this will kill this service, but "killall" would suggest there were more than service by the same name? The killall command we all know and love works like that: it iterates through the process tree and kills everything that matches the name. If we reuse this verb in this context here, then I believe this would be slightly misleading.

Posted by Lennart at Sat Nov 20 01:38:41 2010
Saint: see the systemd website, it includes links to IRC and mailing lists and everything. http://www.freedesktop.org/wiki/Software/systemd

Posted by Horst H. von Brand at Sat Nov 20 01:52:50 2010
Why is one SIGxxx and the other plain HUP?

Posted by Lennart at Sat Nov 20 02:05:54 2010
Horst: just to make the point that you may write the full name including the SIGxxx prefix or leave it out. Since I myself never can remember which tools want the full name and which tools take the unprefixed name I just made all systemd tools take both. While I general believe too much redundancy in the configuration languages is a bad idea I thought that in this case it's fine. (Note that I actually wrote pretty much this in the blog story, in the second to last paragraph)

Posted by someone at Sat Nov 20 02:57:47 2010
Do these commands work?

systemctl kill -9 crond.service
systemctl kill -s 9 crond.service
systemctl kill -s 0 crond.service
systemctl kill -SIGKILL crond.service
systemctl kill -KILL crond.service

i.e. it would be great if the syntax was exactly the same as kill (except drop the -l case).

Posted by Lennart at Sat Nov 20 03:29:09 2010
someone: you have to specify the '-s', but yes, otherwise all three possible syntaxes are accepted (with and without the SIG prefix, and numeric)

Posted by nona at Sat Nov 20 05:31:56 2010
Can we use those password agents in early (initrd) boot?

I'm thinking cryptoroot. AFAICT, systemd isn't supposed to go into the initrd, and these new agents depend on systemd, so how is that going to work?

Posted by oiaohm at Sat Nov 20 09:39:02 2010
I am sorry but btrfs is not a replacement to Linux LVM.  Yes the LVM support should be fixed up.  http://en.wikipedia.org/wiki/Logical_volume_management

There are cases where LVM can come into its own when you have multi distributions on the same drive.

LVM can contain all types of partitions.  btrfs downfall it's solution can only contain btrfs and cannot snapshot other partition types.  Really LVM support in Linux kernel extended to handle windows LVM would be handy.  So yes there could be a need for Multi OS installs for LVM support to work correctly.  Not something that can be just turned off by them for speed.

The NFS read only one is also critical this is handy for secure diskless remote boot terminals where you want reset to return to a clean state.

Lennart you are making the same mistake as some of the design selections with pulseaudio and alsa.  Simple fact NFS read only, LVM, RAID and so on exist in the old system so the new system need to support them or have a replacement that is better for the tasks they do.

If you want to deprecate LVM, NFS read only, RAID support please explain what there proper replacements is matching there function.  BTRFS is not a proper replacement to LVM or RAID due to its limited Filesystem type support.

Posted by Diego at Sat Nov 20 10:43:44 2010
oiaohm: Eventually Btrfs should be able to export a btrfs subvolume as a block device, so you will be able to put a Ext4 filesystem on top of it - but anyway, systemd is not unsupporting LVM, Lennart only said it will not allow to have boots without scripts.

That said, deprecating LVM is just not going to happen, LVM is really powerful and provides features (like extending a filesystem to a new disk) that users can't live without. And the installer allows to install Fedora without LVM.

Posted by David Weinehall at Sat Nov 20 12:02:36 2010
@Lennart: only the "-s <NO SIG>" syntax is POSIX-compliant though; omitting "-s" or including "SIG" is implementation specific and is not guaranteed to be supported.

Posted by Grahame Bowland at Sat Nov 20 12:44:52 2010
A bit of a minor thing, but why do all the systemd commands require you to type 'crond.service' rather than just 'crond'? It's a bit cumbersome and seems unnecessary.

Posted by Lennart at Sat Nov 20 16:43:50 2010
nona, dracut handles passwords for crypt root already quite well. I see not need to replace that by our agent logic.

oiaohm: calm down. I am not suggesting to deprecate LVM. Just remove it from the default install.

Posted by Lennart at Sat Nov 20 16:47:33 2010
David, uh? systemctl is my brainchaild, it's definitely not POSIX compliant, since it was defined by me, not POSIX.

Grahame: since we maintain not only services, but also sockets, devices, mount points, automount points, timers, inotify triggers, and more. The suffix encodes what kind of object it is you deal with.

Posted by strcmp at Sun Nov 21 13:49:39 2010
systemctl kill ssh.service looks dangerous on remote systems, in this case --kill-who=main should be the default... Because of that i vote for something like kill/killall.

Posted by Jon at Sun Nov 21 14:54:25 2010
foo.service seems backwards, in that case.  It would seem more logical to me (at least) that you start with the widest scope and narrow down, e.g. service.ssh (or socket.ssh or whatever).

See also: heirarchical include mechanisms in 4GLs, e.g. Java; Don't see also: The domain name system.

Posted by Lennart at Sun Nov 21 14:59:50 2010
strcmp: note that user sessions are moved into their own cgroups anyway, and ssh sessions would hence not be killed by killing the sshd daemon itself.

Jon: well, file names tend to have the type at the end, and since our unit names are actually identical to the names of the files their configuration is stored in we chose to do <name>.<suffix> instead of the reversed order.

Posted by Holger at Sun Nov 21 21:40:44 2010
So I guess it would be possible to use it to sent STOP/CONT to a service incl. all its childs, assuming CONT doing the "wakeup" in the revers order of STOP, right?

Posted by Lennart at Sun Nov 21 21:46:28 2010
Holger: yes, you can send STOP/CONT, but the order of the delivery is actually undefined.

Posted by Andreas at Mon Nov 22 00:30:28 2010
Typing "kill --kill-who=XXX" feels a bit redundant. Could it not  be shorter like just" kill --who=XXX"?

Posted by yhdezalvarez at Mon Nov 22 14:08:15 2010
@Andreas

> Could it not  be shorter like just" kill --who=XXX"?

why not "kill --target=XXX"? "who" sounds a little odd.

Posted by Andreas at Mon Nov 22 14:33:36 2010
Yeah, --who= was just the first thing which I thought of.

Posted by Dag Wieers at Mon Nov 22 17:58:46 2010
Lennart,

Since the moment I read your first systemd announcement I am excited about this new development. It's one of those things you wonder why that wasn't done decades ago ;-)

However, I see one thing that I liked about the sysv scripts, that is not possible. The importance of the original sysv scripts is that they are written in bash and so offers a lot of flexibility to system administrators. Flexibility comes with responsibility :-) However, where in the past sysv scripts did more than simply start/stop/restart/reload, some scripts allowed to check configuration syntax (eg. apache), initialize something (eg. sshd), etc...

There is an advantage in keeping those actions as part of the systemd tools in my opinion. Even if they are simply passing the action through to a daemon-specific configuration tool (eg. apachectl), which could become a standard. This is exactly why I liked the design of "op" so much (compared to sudo). It provided system administrators (and users) with a single interface to actions using a clean syntax.

While reading the post and the documentation I couldn't find whether "custom actions" would be retained in your design. If not, what would be the recommended alternative ?

Posted by Lennart at Mon Nov 22 19:13:05 2010
Grahame: codebears is right. You can easily prefix binary paths with strace. Just copy the service file from /lib/systemd/system to /etc/systemd/system and edit the ExecStart= line, and done.

John: yes, we mostly expose the kernel stuff 1:1.

Dag: we do not support custom actions, since their set of parameters and what they return is completely free-form it would be a bit weird to pass that through D-Bus. For example, if something is interactive, how would you pass that through D-Bus. If people want additional control interfaces for their tools, then they should create them outside of systemd, for example by creating a seperate <something>ctl tool, such as apachectl. I mean, I think it makes sense to expose new "verbs" in systemd iff these verbs make sense for everybody the same way as "start" and "stop" and similar apply for every service the same way. However, something like "apachectl graceful" is in all its meaning highly specific to Apache, and hence trying to abstract that in systemd must fail, since it's nothing that really could be abstracted nicely. SMF allows definition of additional verbs for each service, but I am not convinced this is really a good idea.

Posted by Will P at Thu Nov 25 13:34:43 2010
It would be really nice to have a way of shutting down a service by specifying a time-delay after TERM before sending KILL, so you can give the service time to gracefully shut down, but then forcibly kill it if it hasn't shut down on its own.  Having the logic present in the systemctl command would let it wait out the full duration or exit early if the service completed its shutdown before the time expired.

maybe:
systemctl killwait -w 15 nfsd
systemctl stopwait -w 15 nfsd

The killwait would use SIGTERM then SIGKILL... The stopwait  would use the ExecStop method, followed by SIGKILL.

This is a feature that the init script 'killproc' function provides primitive support for.  (It's in /etc/init.d/functions on my fedora14 system).  Having the more exact knowledge from systemd about the actual state of the service processes would make this a much more robust method than what 'killproc' tries to do.

For historical perspective, there are internal process management tools at some companies that provide this same functionality, which use process groups to implement the same kind of "service" management, with this 'killproc' delay behavior between TERM and KILL.

If this feature already exists, then bravo! If not, then what do you think of adding it?

Posted by sysitos at Thu Nov 25 13:40:02 2010
@Lennart, yes you are right, the killall would lead to confusion, but if you kill the whole tree, than the right name would be killtree ;)

So my suggestion:
systemctl kill crond.service -> would only kill the single crond service
systemctl killtree crond.service -> would kill the crond and all of the childs

And so the even the systemctl kill command couldn't lead to confusion with the well know kill  command.

CU sysitos

Posted by sysitos at Thu Nov 25 13:56:27 2010
@me and Lennart, some addition.

You could than even add the
systemctl killall service1.service -> would kill all instances of service1
systemctl killalltree service1.service -> would kill all instancen and childs of service1

But a question, what happens when a service was started multiple times and now is running multiple times? Which service is than killed by the first kill? The first one, the last one?

Thanks.
CU sysitos

Posted by Berniyh at Fri Nov 26 01:02:09 2010
I would propose these two commands:
systemctl killmain foo.service # Kill the main service (see --kill-who=main)
systemctl killcgroup foo.service # Kill the service and all of its childs. could be killcg, too.

If I understand the above text correct, what is actually killed in a "complete" kill is everything in the cgroup of the daemon. So this would actually make the command more intuitive.

Posted by kriss at Sun Dec 5 17:45:51 2010
Wow, great idea. Hope the project is going to be used especially on mobile Linux distributions.

Small typo: livirtd v.s. libvirtd (was actually googling for "livirtd" ;-) )

Posted by Ralph Corderoy at Sun Dec 5 18:15:58 2010
--kill-who?  I think that should be --kill-whom!  Would --victim be more fun and avoid the confusion for those that do or don't speak English natively?  :-)

Posted by Bjoern Michaelsen at Mon Jan 3 00:43:58 2011
I have not looked at all at systemd yet, so I wonder if it is possible to just use it to query for PIDs?

systemctl kill -s HUP --kill-who=main crond.service

looks very un-unixy to me as it needlessly mixes multiple tasks. How about something like:

systemctl getpid --main crond.service | xargs kill -S HUP

"one tool, one job" and all that jazz ...

(I just found the "systemctl status" in the first installment of the series, but its output does not seem to be easily parse- or pipeable. It would be a shame having to revert to mayor shell or perl voodoo to do some basic scenario not covered by systemctls "convenience functions".)

Posted by ck at Mon Jan 3 08:28:45 2011
Um, nobody uses pkill?

$ pkill rsyslogd
$ pkill -u bob rsyslogd
$ pkill -1 rsyslogd
...

What does "systemd" know what "pkill" doesn't know?

Posted by Lennart at Mon Jan 3 21:02:26 2011
ck: you didn't even make it to the second paragraph of "Killing Services", have you? killall and pkill do the same thing, and that paragraph tells you why it is ugly. And it doesn't cover the CGI usecase anyway...

Posted by Lennart at Mon Jan 3 21:07:12 2011
Bjoern: there is "systemctl show" which you can use to query particular properties of a service, which is easily parsable and pipable.

Posted by ck at Tue Jan 4 00:23:00 2011
Lennart: I /did/ make it to the 2nd part, thank you very much. It went on to explain how process spawn child process and that "killall" cannot cope with that. I'm not using "killall" any more (hint: try "killall" on a Solaris box :)). But I fail to see what "systemctl" knows that "pkill" or "killall" do not see. If process xy or one of its child processes are not "registered" to systemd, "systemctl" won't see them either? How is "systemctl" superior to pkill? I don't see it. I guess what I fail to understand from your article is how the magic is done.
Thanks.

Posted by Lennart at Tue Jan 4 00:36:44 2011
ck: systemd creates a kernel cgroup for each service. processes do not have to register with systemd, they will be members of the cgroup and their children automatically too, regardless if the fork, rename themselves or try anything else to escape supervision.

Posted by Bjoern Michaelsen at Tue Jan 4 02:40:01 2011
Lennart: Thanks for the reply, sounds great!

systemctl show -p MainPID crond.service | xargs kill -S HUP

(just guessing by some man page found somewhere on the interwebz)

Posted by ck at Tue Jan 4 12:14:24 2011
cgroups, OK. This might make sense after all. Thanks for the response, Lennart.

Posted by pal at Sun Jan 9 04:11:59 2011
Bjoern Michaelsen:
your pipe example has race condition, that's why it's so unixy

Posted by Jeremy Kajikawa at Tue Mar 29 17:06:16 2011
This is a great read!  and I was considering writing my own init ahahahaha

No need to write my own with this as an installable

I am converting my SourceMage Gnu/Linux system
over to using this right now

Brilliant,  Insanely Brilliant!

Posted by Soliko Man at Sat Apr 9 18:50:05 2011
When running command "sudo systemctl" I get this error:
"Failed to get D-Bus connection: Failed to connect to socket /org/freedesktop/systemd1/private: Connection refused"
Anyone can tell me what is wrong with my system?

Posted by Adam Pribyl at Mon Apr 11 18:09:41 2011
Saliko: this is know problem of sudo and su. You have to use "su - " to set complete root environment.

Posted by Lennart at Mon Apr 11 18:16:38 2011
Soliko: you are probably not running systemd if this happens. Or you are running a systemd userspace that is older than what you are running as PID 1-

Adama: No, this has nothing to do with su or sudo.

Posted by Mark Sobell at Fri Apr 22 02:36:48 2011
Terminology (old runlevels).

Most of the documentation I have read says that <target> is the correct term, but keeps referring back to runlevel. For example, there is the graphical.target. But I guess it is uncomfortable to say <boot the system to the graphical target>. Maybe not. Thoughts?

Also, when used by itself, <target> covers a lot of ground. As in <Which target is the system in?> Was <which runlevel is the system in?> How to ask this question using the new terminology?

Maybe no one has gone there yet. If that is the case I will come up with something I feel is appropriate. But if someone has figured this out I would like to hear about it.

Thanks!

Posted by Angie at Fri Apr 22 21:43:30 2011
Honestly, I don't see any need to replace SystemV either by upstart or systemd.

Start your system with bare minimal daemons. As the need may be start / stop others. Simplicity rules not promoting lazy unaware Linux users.

Posted by spuk at Tue May 10 08:56:46 2011
I can't say I like systemd so far, but it looks like a decent new process manager..

re 'cron.service' X 'crond', systemctl could infer what you're referring to, when the name given is univocally mappable to an existing unit name... usually 'cron' would referr only to 'cron.service', not anything else.

Also.. I assume the standard GNU utils should be patched without much trouble to act on cgroups (i.e. systemd managed things)? Like making kill and killall work on cgroups the same as systemctl kill etc., no? Would DBUS be required for that?

Posted by Roman at Tue May 10 20:34:51 2011
Sorry, but this is not appropriate not for 6"-inch devices, not for evince (ugly formatting).

Posted by Sankar at Tue May 10 21:29:53 2011
@Roman: If you use Calibre or some such software you should be able to convert to an e-book format easily. I wanted a PDF with pagebreaks for each chapter so that it is easier to carry around and read from a laptop.

@Lennart: Thanks.

Posted by Roland Taylor at Tue May 10 21:46:12 2011
Sankarasivasubramanian Pasupathilingam - that's a lot of name O.o!

Posted by Thomas at Wed May 18 14:39:13 2011
Following a link on LWN I read all linked systemd articles and I must say I'm really impressed with the thought that has been put into it. I still disagree on some minor points, but all in all it seems like a very good move.

My first thought on systemd: Reimplemented in c? No more shell scripts? Why? WHY? Who cares about 7 seconds longer boot time.

But now I'm seriously hooked. cgroups, recording daemon status apart from 'scroll back the line buffer', shorter and easier readable config without duplicating trivial stuff all over the place .... and a sane configuration file layout (even providing a mechanism for overriding maintainer startup config and a unique system id not depending on hostname/ip) convinced me this is a Good Move(TM)


It's a bit late for a reply to Shane Falco but here it is: Get better admins. Seriously. Learning a new tool that is properly designed and fixes a lot of issues is always a good thing.

Posted by Thelin at Tue May 24 17:48:09 2011
I'm just curious what the "Außergewöhnlicher Migrationsdruck" is doing as a heading for systemd?

Posted by Tshepang Lekhonkhobe at Tue May 24 18:24:01 2011
You guy, you rock so much. This systemd thing is huuuge marketing for Fedora. Thanks a lot.

Posted by Arno at Tue May 24 18:45:15 2011
>probably the biggest distribution release of all time

You'd think so, would you? From what I can tell, this release involves 10,000+ packages and 2 architectures. I see the Fedora mirrorlist mentions 5 architectures, but there does not appear to be any way to get there from the download page. I don't think this qualifies as a big release.

You must have missed the 29,000 packages, 9 architectures release that happened just about four months ago...

Posted by Rahul Sundaram at Tue May 24 19:44:13 2011
Packages and architecture aren't the only way to count as biggest and anyway, both are wrong claims.

sudo yum repolist -C
Loaded plugins: presto, refresh-packagekit, security, yum-fast-downloader
....

repolist: 27,474

Posted by Simon at Tue May 24 23:53:13 2011
All that documentation is good and useful - though it would be nice if you could write a short piece on how to set up systemd from scratch.

For example, if I take an LFS system without any existing sysvinit, and install systemd from source, what configuration do I have to do to get 1) a logger daemon running, 2) local filesystems mounted, 3) networking running (assuming NetworkManager), and 4) a working command prompt?

I'm speaking from the perspective of an LFS user, but I imagine some basic "migration" document would be useful for the regular distros too.

Posted by Christian Kellner at Wed May 25 00:51:10 2011
I quote Alex: "Sweeeet!" ;-)

Posted by Aissen at Wed May 25 10:44:26 2011
You didn't mention directly the awesome systemd-analyze tool (maybe because it lacks a manpage?), although it's mentioned in the #7 blog story.

Posted by Freissen at Wed Jun 15 12:10:54 2011
"Außergewöhnlicher Migrationsdruck" ?!

Posted by Rob at Sun Jul 10 02:37:24 2011
what surprises me is that you haven't written a howto on porting sysVinit scripts to systemd units and linked to it in the "documentation" section.

That would seem of paramount importance for helping people to move to systemd.

In particular, it would be extremely useful to have a document that show exactly how to port several sysVinit scripts, and not just the damned trivial cases -- something complex too.

cheers,

Rob

Posted by Lennart at Mon Jul 11 15:22:01 2011
Rob, this would surprise me too if it was true. But... I actually wrote such a blog article:

http://0pointer.de/blog/projects/systemd-for-admins-3.html

Posted by Thomas at Tue Jul 12 17:53:49 2011
Could you please have the man pages on a location that google can read?

Either by moving it or removing /public/ from http://0pointer.de/robots.txt ?

It sucks not to be able to search them :)

Posted by dashesy at Wed Jul 13 00:33:20 2011
This is the best article I have found about systemd (what it stands for, not what it does).
I am using systemd in my F15 desktop with great pleasure (my embedded system kernel does not have functional cgroups).
From technical point of view it is perfect, thanks for the great job, to be frank the only thing I do not like about systemd is the excessive number of mount points it adds to mtab. This is also the argument I have against excessive kernel command line options for example dracut adds to grub, it is not visually beautiful.

Posted by anonymous at Thu Jul 14 06:26:29 2011
bit confuse with:

To test it we copy it to /etc/systemd/system/

in my f15 installation can not find *.service file but i find in /lib/systemd/system

where .service file actually resident?

Posted by Rob at Sat Jul 23 15:30:02 2011
indeed, you have in fact written an answer to this question. I don't know how I could possibly have missed it, considering the obvious link name ("systemd for admins #3").  That kind of linking is what Vincen Flanders on webpagesthatsuck.com calls "mystery meat navigation":

  http://www.webpagesthatsuck.com/mysterymeatnavigation.html

Unfortunately, this is exactly the trivial example that I was hoping not to find.

How about taking a complex example and porting it?

I have a candidate for you -- the hylafax script. You can find it here (with a few changes by me):

  http://www.spielwiese.de/rob/hylafax-init.sh

It does the following:
- tests for the existence of a file as proof that hylafax is properly configured
- if the file exists, it slurps it in using "."
- sets some shell vars to default values
- figures out how to call "echo"
- depending on the value of various variables possibly start several processes: "faxq", "hfaxd", and in my case also start a process required for faxing via ISDN, "c3faxrecv".

Things I don't understand:
- how to start multiple processes (do I need multiple systemd services?)
- how to start processes according to application configuration rules (i.e., according to settings in some config file).

Obviously, systemd is a paradigm-change from sysVinit.  What I miss is a document describing how all the sysVinit methods/techniques used get translated into systemd.

thanks,

Rob

Posted by Andreas at Thu Aug 4 15:34:23 2011
@Rob: See

http://fedoraproject.org/wiki/SysVinit_to_Systemd_Cheatsheet

Regards,
Andreas

Posted by David at Fri Aug 26 11:28:32 2011
Hello,

http://i.imgur.com/usftZ.png

Posted by BMG at Fri Oct 28 11:14:52 2011
Hello,

I am using systemd 35 on x86 architecture, and I have a problem with the reboot and halt services, don't work very well, only after I type reboot twice the system is rebooted. I think that is a process or something hang and it takes to long to restart

reboot.service
[Unit]
Description=Reboot
DefaultDependencies=no
Requires=shutdown.target umount.target final.target
After=shutdown.target umount.target final.target

[Service]
Type=oneshot
ExecStart=//bin/systemctl --force reboot

halt.service
[Unit]
Description=Halt
DefaultDependencies=no
Requires=shutdown.target umount.target final.target
After=shutdown.target umount.target final.target

[Service]
Type=oneshot
ExecStart=//bin/systemctl --force halt

The option with --force works but I need reboot and halt to work properly.

Can anybody help me please?

P.S. systemctl restart reboot.service
  systemctl restart halt.service

are rebooting and halting the system but I need reboot and halt.

Thank you

Posted by BMG at Fri Oct 28 11:27:08 2011
On serial I have the following output either :

[  533.721589] <29>systemd[1]: connman.service: main process exited, code=exited, status=1
[  533.817543] <29>systemd[1]: Unit connman.service entered failed state.
[  549.440883] <30>systemd[1]: Reloading.
[  550.849766] <29>systemd[1]: dev.mount mount process exited, code=exited status=1
[  550.938737] <27>systemd[1]: Socket service systemd-kmsg-syslogd.service already active, refusing.
[  551.045294] <29>systemd[1]: Job systemd-stdout-syslog-bridge.service/start failed with result 'dependency'.

[  630.301180] Process mthemedaemon (pid: 113, ti=f45d2000 task=f4914170 task.ti=f45d2000)
[  630.301186]
[  630.301191] Call Trace:
Sending SIGKILL to remaining processes...
Unmounting file systems.
Could not unmount /dev: Device or resource busy

Posted by Transmitters at Thu Nov 10 03:40:06 2011
Thank you for your analysis and sharing, from your article I learned more.

Posted by Pawel at Mon Dec 12 16:18:43 2011
"I don't bite. If KDE hackers want to be involved, then involve yourself, don't always wait for us to ping you."

It seems you represent different standards towards gnome. There's also planet KDE if you're not aware.

Posted by dave at Mon Feb 27 23:49:34 2012
Hi

I was having problems with my shutdown services not running as expected.  I traced the issue to the use of the '--force' flag in the halt, poweroff and reboot services.  Here's more about the issue and how to resolve: -

http://www.practicalclouds.com/content/blog/1/dave-mccormick/2012-02-27/why-do-my-systemd-shutdown-scripts-not-run

regards


Dave

Posted by Anonymous at Sat Apr 21 01:22:23 2012
"systemctl can now execute all its operations remotely too (-H switch)."

$ systemctl -H root@localhost
Failed to get D-Bus connection: In D-Bus address, character '@' should have been escaped

Posted by Lennart at Sat Apr 21 02:35:44 2012
Anonymous: you need a newer D-Bus for that.

Posted by ? at Sat Apr 21 12:22:28 2012
Why did you change the license to a weaker one? You should have changed it to AGPL3+

Posted by Anonymous at Sat Apr 21 15:03:20 2012
@Lennart: I have to re-check the machine in question, but I'm pretty sure this was D-Bus 1.5.12 (Debian unstable)

Posted by Jamie at Sat Apr 21 15:24:31 2012
Any chance of writing in a "chkconfig --list" equivalent?

Posted by Lennart at Sat Apr 21 15:39:23 2012
Jamie: see item 42.

Posted by Lennart at Sat Apr 21 15:52:56 2012
?: we include libraries non-GPL code should be able to link to, and we wanted the same license for all of systemd, hence we chose to change the license to LGPL for all of it.

Posted by JR at Sat Apr 21 16:58:19 2012
Keep up the good work!

Incidentally, any news about whether Debian will adopt systemd or not? Now with their kFreeBSD spin in the mix I imagine they're not very keen to move to a linux-only init. Maintenance reasons and all that.

I realize it's packaged in the repos, but I'm not sure if it ever left the "highly experimental here be dragons donut use outside of VMs" stage?

Posted by Lennart at Sun Apr 22 01:32:07 2012
JR: I can't speak for Debian. But knowing Debian if they have the choice between A, B or C they usually choose A, B and C. More specifically that means, that they include sysvinit, Upstart and systemd in their distribution (they already do that for quite a while). Also, they are very conservative, and avoid decisions as much as they can, so I am pretty sure they'll stick with sysvinit for a long long time to come. Then, there's quite a big influence from Canonical in Debian, and Canonical is confused enough to continue pushing Upstart, so I doubt the decision for systemd would be an easy and obvious one for them. And as long as there is contention about the topic no decision will take place.

Debian does include systemd in their testing/unstable distributions now, but if you run that you still end up with a tonload of glue scripts that are quite frankly quite unnecessary on a systemd system. So even if you run systemd on Debian you get an experience that is much more like sysvinit than the real thing. (the real thing is a system that boots up with zero shell scripts)

Short: if you want a modern system, then there are better choices than Debian (or Gentoo, which is in a very similar spot, sharing the inability to make choices and strong conservatism). Try Fedora, or OpenSUSE or Mageia, or similar.

Posted by Mark at Sun Apr 22 12:34:13 2012
In the interests of clarity, I would like to propose some changes to your status update.

"We introduced /run and made it a hard dependency of systemd. This directory is now widely accepted and implemented on all relevant [sic] Linux distributions."

This should read: "If any distribution does not support systemd requirements, then it is to be deemed irrelevant and not worthy of consideration."

Also, from the comments; "if you want a modern system, then there are better choices than Debian"

This should read: "Any distribution which fails to embrace systemd openly and fully must be considered a legacy operating system."

And while we are on the subject of inferior distributions; "Also, [Debian] are very conservative, and avoid decisions as much as they can, so I am pretty sure they'll stick with sysvinit for a long long time to come."

This should read: "Any decision not to adopt systemd is not actually a decision at all, and must either be attributed to incompetence, indecisiveness or irrational conservatism."

I'm sure this comment will be kindly moderated out of existence, but at least be frank with your readers in future - otherwise, how are we going to know which distributions to avoid like the plague, and which ones to embrace?

Posted by Anonymous at Sun Apr 22 15:19:55 2012
@Mark: if you want to troll, at least try harder!

/run transition: name a distribution that you deem relevant that hasn't switched to /run.

"Any decision not to adopt systemd": there is no decision not to adopt systemd in Debian (beware double negatives). Just because you said so on the debian-devel mailing list, it's not a Debian decision.

Posted by Simon at Mon Apr 23 00:11:26 2012
Agree, that's just trolling. It's not a criticism of Debian to say they're conservative - it's just a statement of fact. They don't jump on the latest fad - they just support it as an option, and wait to see what other do. It's their main selling point as a distro, really...

Posted by Josh at Mon Apr 23 04:12:16 2012
@Lennart: I'm currently working with the maintainer of sysvinit in Debian to make it possible to run systemd without invoking any of the legacy init script infrastructure.  As of right now, the Debian systemd package already masks almost all of the legacy init scripts in the "initscripts" package; with the next version of initscripts it'll become possible for systemd to mask off the rest and stop depending on initscripts entirely.

That just leaves the init scripts shipped in individual daemon packages, and many such packages have already started to ship systemd services that mask their legacy init scripts.

So, while the situation hasn't gotten much better for packagers (who still have to ship legacy init scripts in their daemon packages), from a user perspective the prospect of a shell-free boot has gotten much closer.

Posted by Matthew Miller at Mon Apr 23 15:24:18 2012
I'm really glad to see this post, and I've been glad to see all of the effort put into making this a comprehensive system. Some of my earlier concerns were due to developer comments about how the whole project was basically trivial and kind of a weekend's hack-work. Clearly, that's not the case, but I'm glad to see that my fears haven't borne out. The rough edges are getting the effort they need to really be smooth and shiny.

Posted by Abe at Mon Apr 23 16:40:42 2012
Out of curiosity - what are the chances for patches implementing user sessions with systemd to land in upstream?
The idea to use systemd to start not just system-wide but also user-specific services seems pretty natural to me. Any chance to see systemd replacing gdm, kdm and whatever in forseeable future?

Posted by nate at Mon Apr 23 18:22:36 2012
Is there any thought to supporting a network heartbeat mechanism with Systemd ?

It seems to me that there is a significant overlap between a init system like systemd and a clustering solution like Linux-HA.

All that would be needed is the ability to integrate a existing heartbeat mechanism and then have systemd be able to mount/check file systems, bring up a IP address on the desired interface, and then start up the corresponding services.

Posted by Adam Williamson at Mon Apr 23 18:49:19 2012
Yes, yes, yes, but does it make tea yet?

Posted by Antony Williams at Mon Apr 23 18:49:34 2012
@Lennart
I think you're missing the point with your 'debian is not modern' quote.
The purpose of debian isn't to be modern, it's to be stable.
Debian is to Red Hat as Ubuntu is to Fedora.
Fedora and Ubuntu take risks and switch to new technology
Debian/Rhel switch when it's safe

Posted by nate at Mon Apr 23 19:21:48 2012
@Antony Williams

Not really. Ubuntu is created by taking a snapshot of Debian unstable and making their Ubuntu-specific desktop changes on top of it.

The closest you can get for Fedora vs Redhat on Debian is 'Debian unstable' vs 'Debian stable'.

The biggest problem is that Debian can't make a decision. They just choose to try to support everything so you end up with 3 different INIT systems.

You have upstart from Ubuntu that Debian tries to support. Then you have SystemD that Debian tries to support. And then you have people fighting to keep the old script-based 'SystemV' style init because that is the only one that can run on the non-Linux Debian versions. (GNU/Hurd and GNU/kfreebsd)

So instead of putting the work into transition to one of newer systems or sticking with the old they just require package maintainers to do 600% more work for 'service up' and 'service down' functionality then any other operating system.

It's just quite insane and a very bad situation for Debian right now, init-wise.

Posted by Antony Williams at Mon Apr 23 20:38:54 2012
@nate
please double check your facts.
While Ubuntu LTS is based on Debian testing,
but ALL other Ubuntu releases are based on Debian Unstable.

Secondly, Ubuntu vs Debian is a much better comparison than Debian stable vs unstable, because both unstable and testing are ROLLING releases.
You can't compare them.

Posted by Anonymous at Tue Apr 24 11:21:33 2012
Hot off the press, Ubuntu will stick with upstart:

http://www.markshuttleworth.com/archives/1121

Rumours and allegations of a move from Upstart to SystemD are unfounded: Upstart has a huge battery of tests, the competition has virtually none.

Quality comes from focus and clarity of purpose, it comes from careful design and rigorous practices. After a review by the Ubuntu Foundations team our course is clear: we’re committed to Upstart, it’s the better choice for a modern init, innit. For our future on cloud and client, Upstart is crisp, clean and correct.

Posted by nate at Tue Apr 24 14:56:07 2012
@Antony Williams

How can a person engage in a discussion when you don't even read what was written?

> please double check your facts.

Yay. Insult.

What I said:
>> Ubuntu is created by taking a snapshot of Debian unstable and making their Ubuntu-specific desktop changes on top of it.

What you said:
> but ALL other Ubuntu releases are based on Debian Unstable.


wut?

> Secondly, Ubuntu vs Debian is a much better comparison than Debian stable vs unstable, because both unstable and testing are ROLLING releases.

And your point is what?
Nothing. That's right.

> You can't compare them.

I absolutely can.

See? Comparison:

1) Purpose:
* Fedora is created for the purpose of testing out technologies for Redhat.
* Debian unstable is created for the purpose of testing out technologies for Debian stable.
* Ubuntu is NOT created for the purpose of testing out technologies for Debian stable. Ubuntu is created for the purpose of making money by having a 'stable' and 'enterprise' ready OS.

2) Usage:
* Fedora is used by enthusiasts as a cutting edge Desktop that is RPM based and similar to Redhat, which many use professionally.  Occasionally used by people that need something newer then latest Redhat release.
* Debian Unstable is used by enthusiasts as a cutting Desktop that is Deb based and similar to Debian, which many use professionally.  Occasionally used by people that need something newer then latest Debian stable release.


etc etc.

See? I was actually able to compare them quite easily.

Posted by Ole Laursen at Tue Apr 24 16:07:51 2012
Interesting message from Mark Shuttleworth. Unfortunately, Upstart still appears not be ready for server daemons:

https://bugs.launchpad.net/upstart/+bug/406397

It's easy to be "crisp, clean and correct" if you're not actually solving the problem. ;-)

Posted by Jon at Mon Apr 30 16:28:02 2012
Debian have not made a decision yet, but it would be a decision for two releases time (there is not enough time to implement systemd for the next release even if there was unanimous agreement to do so).  Thus there's no reason to rush the decision. The ongoing flamewars on -devel are a red herring, really: Debian is a do-ocracy and the discussion is a diversion from the fact that the people who do the work will make the decision (and the vast majority of participants in the flamewars will do no work, just argue)

Posted by oiaohm at Wed May 2 00:24:24 2012
Lennart  "Debian does include systemd in their testing/unstable distributions now, but if you run that you still end up with a tonload of glue scripts that are quite frankly quite unnecessary on a systemd system."

I can understand this.  Remember debian testing is not running latest versions you have done Lennart.  Some of those scripts address faults in older versions of systemd.  So faults you have fixed debian don't have fixed yet.  Debian is a little pragmatic particularly with init systems.

Posted by Bob Gustafson at Fri May 4 16:30:02 2012
I'm wondering if there is a list of error codes and their explanation. I have googled and clicked, but haven't yet seen a list.

I have some vncserver problems and see code 125 and code 29 in the logs.

Posted by Rosalba at Thu May 31 13:04:46 2012
Just so you know, I've been using Gentoo since 2003, and I really love it. It's on all my maeihncs (laptop, desktop, media center, servers, you name it), and I don't think I will ever change  unless something cooler appears in the future. But that doesn't seem probable in the near future.I also have been using Linux/Unix since 1996. I have used (and programmed in) Solaris, HP/UX, AIX, and BSD.And I also use the GNOME 3 overlay, together with the systemd overlay. I also love them, BTW. And while I don't use the GNOME 3 overlay in all my boxes (I don't need GNOME 3 in my media center nor in my servers), I do use systemd in all of them.I really respect your opinion as a Gentoo Developer (and thanks, BTW, for helping to make my favorite distribution even better), but I respectfully disagree with what you are saying here. Not only because I think systemd is great (and pulseaudio and avahi too, BTW), but also because I think you are misinterpreting some things.If systemd is integrated into GNOME, it still will be able to compile and be used in all the other Unixes, and it will not be mandatory*. But if it's available, it will give us some really cool features.GNOME will implement some interfaces (via dbus, probably): if the underlying OS can fulfill them all, great: If not, it will be marked so and not use those features. That's all.With the use of USE flags, people in Gentoo who (for whatever reason) doesn't like systemd would not have to use it. And people in BSD or Solaris, who actually can't* use it, would not need to worry about. They will lose some features, that's all.So nobody is shovelling stuff in other people's mouths. This is Open Source: people write what they think is cool code, and we get to use it if so we desire it. If not, we can always keep using Linux 1.0, and GNOME 1.2. That's our choice. Or even better: we can take the code and change so we don't need to use things that (for whatever reason) we don't want to use.I really like the way the future is looking: GNOME 3 and systemd and the new kernel features and everything managed in my favorite distribution looks awesome, and with the things coming in the future it would look even awesomer.Just my 2 ${CURRENCY/100}. And again, thanks for helping to mantain my favorite distribution.

Posted by Brian at Thu Aug 9 21:02:16 2012
"In fact, you can easily write a daemon with this that can run, and exit (or crash), and run again and exit again (and so on), and all of that without the clients noticing or loosing any request."

This made me think of erlang.  I assume systemd is written in C++, but just curious if you considered other languages?

Not that I think it's a good idea to make the most core service of a linux system dependent on the erlang compiler... but an erlang mock-up would be interesting as a prototyping tool for systemd.

Posted by Bassu at Wed Oct 31 00:30:47 2012
Lennart, I just wanted to say a personal thank you for all of your awesomeness and the hard work you have put in to this!

$ sudo systemd-analyze
Startup finished in 2751ms (kernel) + 1888ms (userspace) = 4639ms

Posted by Ken Stailey at Sat Nov 10 20:45:39 2012
The ps "ax" and "-e" options both enable displaying "everything", i.e. all processes.  It is not necessary to use both.

Compare:
ps xawf -eo pid,user,cgroup,args
ps wf -eo pid,user,cgroup,args

Leave a Comment:

Your Name:


Your E-mail (optional):


Comment:


As a protection against comment spam, please type the following number into the field on the right:
Secret Number Image

Please note that this is neither a support forum nor a bug tracker! Support questions or bug reports posted here will be ignored and not responded to!

It should be obvious but in case it isn't: the opinions reflected here are my own. They are not the views of my employer, or Ronald McDonald, or anyone else.

Please note that I take the liberty to delete any comments posted here that I deem inappropriate, off-topic, or insulting. And I excercise this liberty quite agressively. So yes, if you comment here, I might censor you. If you don't want to be censored you are welcome to comment on your own blog instead.
Lennart's Blog | Lennart's Homepage | Lennart's Photos | Impressum/Imprint
Lennart Poettering <mzoybt (at) 0pointer (dot) net>
Syndicated on Planet GNOME, Planet Fedora, planet.freedesktop.org, Planet Debian Upstream. feed RSS 0.91, RSS 2.0
Archives: 2005, 2006, 2007, 2008, 2009, 2010, 2011

Valid XHTML 1.0 Strict!   Valid CSS!