<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"><channel><title>Pid Eins</title><link>https://0pointer.net/blog/</link><description></description><lastBuildDate>Fri, 27 Mar 2026 00:00:00 +0100</lastBuildDate><item><title>Mastodon Stories for systemd v260</title><link>https://0pointer.net/blog/mastodon-stories-for-systemd-v260.html</link><description>&lt;p&gt;On March 17 we released systemd v260 &lt;a href="https://github.com/systemd/systemd/releases/tag/v260"&gt;into the wild&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the weeks leading up to that release (and since then) I have posted
a series of serieses of posts to Mastodon about key new features in
this release, under the
&lt;a href="https://mastodon.social/@pid_eins/tagged/systemd260"&gt;#systemd260&lt;/a&gt;
hash tag. In case you aren't using Mastodon, but would like to
read up, here's a list of all 21 posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Post #1: &lt;a href="https://mastodon.social/@pid_eins/116143441607125263"&gt;NvPCR Measurements for Activated DDIs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #2: &lt;a href="https://mastodon.social/@pid_eins/116158446582239764"&gt;Varlink Transport Plugins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #3: &lt;a href="https://mastodon.social/@pid_eins/116167363137246748"&gt;Well-Known Varlink Services&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #4: &lt;a href="https://mastodon.social/@pid_eins/116169755712847601"&gt;.mstack Overlay Mount Stacks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #5: &lt;a href="https://mastodon.social/@pid_eins/116175685679588363"&gt;RefreshOnReload= in Service Units&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #6: &lt;a href="https://mastodon.social/@pid_eins/116198232706709156"&gt;FANCY_NAME= in /etc/os-release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #7: &lt;a href="https://mastodon.social/@pid_eins/116207200815338698"&gt;BindNetworkInterface= in Service Units&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #8: &lt;a href="https://mastodon.social/@pid_eins/116209272590086800"&gt;importctl pull-oci for Acquiring OCI Containers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #9: &lt;a href="https://mastodon.social/@pid_eins/116220986239256436"&gt;systemd-report and Metrics API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #10: &lt;a href="https://mastodon.social/@pid_eins/116249087900074872"&gt;udev's tpm2_id built-in and the TPM2 Quirks Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #11: &lt;a href="https://mastodon.social/@pid_eins/116249601419824877"&gt;Devicetree/CHID Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #12: &lt;a href="https://mastodon.social/@pid_eins/116254613033058686"&gt;Varlink IPC for systemd-networkd&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #13: &lt;a href="https://mastodon.social/@pid_eins/116254632376293505"&gt;systemd-vmspawn knows --ephemeral now&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #14: &lt;a href="https://mastodon.social/@pid_eins/116277375277269773"&gt;systemd-loginds's xaccess Concept&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #15: &lt;a href="https://mastodon.social/@pid_eins/116284331662553931"&gt;Unprivileged Portable Services&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #16: &lt;a href="https://mastodon.social/@pid_eins/116284380064883282"&gt;Image Policy Improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #17: &lt;a href="https://mastodon.social/@pid_eins/116288723454722754"&gt;LUKS Volume Key Fixation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #18: &lt;a href="https://mastodon.social/@pid_eins/116299869233876251"&gt;Journal Varlink Access&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #19: &lt;a href="https://mastodon.social/@pid_eins/116299887458427182"&gt;Nested UID Range Delegation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #20: &lt;a href="https://mastodon.social/@pid_eins/116299916605898399"&gt;PrivateUsers=managed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #21: &lt;a href="https://mastodon.social/@pid_eins/116299971280176013"&gt;bootctl install as Varlink API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I intend to do a similar series of serieses of posts for the next systemd
release (v261), hence if you haven't left tech Twitter for Mastodon yet, now is
the opportunity.&lt;/p&gt;
&lt;p&gt;My series for v261 will begin in a few weeks most likely, under the
&lt;a href="https://mastodon.social/@pid_eins/tagged/systemd261"&gt;#systemd261&lt;/a&gt;
hash tag.&lt;/p&gt;
&lt;p&gt;In case you are interested, &lt;a href="https://0pointer.net/blog/mastodon-stories-for-systemd-v259.html"&gt;here is the corresponding blog story for
systemd v259&lt;/a&gt;,
&lt;a href="https://0pointer.net/blog/mastodon-stories-for-systemd-v258.html"&gt;here for
v258&lt;/a&gt;,
&lt;a href="https://0pointer.net/blog/announcing-systemd-v257.html"&gt;here for
v257&lt;/a&gt;,
and &lt;a href="https://0pointer.net/blog/announcing-systemd-v256.html"&gt;here for
v256&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 27 Mar 2026 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2026-03-27:/blog/mastodon-stories-for-systemd-v260.html</guid><category>projects</category></item><item><title>Introducing Amutable</title><link>https://0pointer.net/blog/introducing-amutable.html</link><description>&lt;p&gt;Today, we announce Amutable, our ✨ new ✨ company. We –
&lt;a href="https://mastodon.social/@blixtra@hachyderm.io"&gt;@blixtra@hachyderm.io&lt;/a&gt;,
&lt;a href="https://mastodon.social/@brauner"&gt;@brauner@mastodon.social&lt;/a&gt;,
&lt;a href="https://mastodon.social/@davidstrauss"&gt;@davidstrauss@mastodon.social&lt;/a&gt;,
&lt;a href="https://mastodon.social/@rodrigo_rata"&gt;@rodrigo_rata@mastodon.social&lt;/a&gt;,
&lt;a href="https://mastodon.social/@michaelvogt"&gt;@michaelvogt@mastodon.social&lt;/a&gt;,
&lt;a href="https://mastodon.social/@pothos@fosstodon.org"&gt;@pothos@fosstodon.org&lt;/a&gt;,
&lt;a href="https://mastodon.social/@zbyszek@fosstodon.org"&gt;@zbyszek@fosstodon.org&lt;/a&gt;,
&lt;a href="https://mastodon.social/@daandemeyer"&gt;@daandemeyer@mastodon.social&lt;/a&gt;
&lt;a href="https://mastodon.social/@cyphar"&gt;@cyphar@mastodon.social&lt;/a&gt;,
&lt;a href="https://mastodon.social/@jrocha@floss.social"&gt;@jrocha@floss.social&lt;/a&gt; and &lt;a href="https://mastodon.social/@pid_eins"&gt;yours
truly&lt;/a&gt; – are building the 🚀 next generation
of Linux systems, with integrity, determinism, and verification – every step of
the way.&lt;/p&gt;
&lt;p&gt;For more information see → &lt;a href="https://amutable.com/blog/introducing-amutable"&gt;https://amutable.com/blog/introducing-amutable&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 27 Jan 2026 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2026-01-27:/blog/introducing-amutable.html</guid><category>misc</category></item><item><title>Mastodon Stories for systemd v259</title><link>https://0pointer.net/blog/mastodon-stories-for-systemd-v259.html</link><description>&lt;p&gt;On Dec 17 we released systemd v259 &lt;a href="https://github.com/systemd/systemd/releases/tag/v259"&gt;into the wild&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the weeks leading up to that release (and since then) I have posted
a series of serieses of posts to Mastodon about key new features in
this release, under the
&lt;a href="https://mastodon.social/@pid_eins/tagged/systemd259"&gt;#systemd259&lt;/a&gt;
hash tag. In case you aren't using Mastodon, but would like to
read up, here's a list of all 25 posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Post #1: &lt;a href="https://mastodon.social/@pid_eins/115570095861864513"&gt;systemd-resolved Hooks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #2: &lt;a href="https://mastodon.social/@pid_eins/115575195119541143"&gt;dlopen() everything&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #3: &lt;a href="https://mastodon.social/@pid_eins/115580882123596509"&gt;systemd-analyze dlopen-metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #4: &lt;a href="https://mastodon.social/@pid_eins/115586469819973533"&gt;run0 --empower&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #5: &lt;a href="https://mastodon.social/@pid_eins/115605598751722319"&gt;systemd-vmspawn --bind-user=&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #6: &lt;a href="https://mastodon.social/@pid_eins/115611051983920298"&gt;Musl libc support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #7: &lt;a href="https://mastodon.social/@pid_eins/115614881738923176"&gt;systemd-repart without device name&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #8: &lt;a href="https://mastodon.social/@pid_eins/115620451885638963"&gt;Parallel kmod loading in systemd-modules-load.service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #9: &lt;a href="https://mastodon.social/@pid_eins/115627835871078915"&gt;NvPCR Support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #10: &lt;a href="https://mastodon.social/@pid_eins/115646310576910417"&gt;systemd-analyze nvcpcrs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #11: &lt;a href="https://mastodon.social/@pid_eins/115662198906484836"&gt;systemd-repart Varlink IPC API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #12: &lt;a href="https://mastodon.social/@pid_eins/115666147093865447"&gt;systemd-vmspawn block device serial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #13: &lt;a href="https://mastodon.social/@pid_eins/115740831317295811"&gt;systemd-repart --defer-partitions-empty= + --defer-partitions-factory-reset=&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #14: &lt;a href="https://mastodon.social/@pid_eins/115746642240479224"&gt;userdb support for UUID queries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #15: &lt;a href="https://mastodon.social/@pid_eins/115750244569426922"&gt;Wallclock time in service completion logging&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #16: &lt;a href="https://mastodon.social/@pid_eins/115756127886232116"&gt;systemd-firstboot --prompt-keymap-auto&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #17: &lt;a href="https://mastodon.social/@pid_eins/115762118507750011"&gt;$LISTEN_PIDFDID&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #18: &lt;a href="https://mastodon.social/@pid_eins/115767664446161203"&gt;Incremental partition rescanning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #19: &lt;a href="https://mastodon.social/@pid_eins/115773088482549498"&gt;ExecReloadPost=&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #20: &lt;a href="https://mastodon.social/@pid_eins/115778781698038221"&gt;Transaction order cycle tracking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #21: &lt;a href="https://mastodon.social/@pid_eins/115784393215752962"&gt;systemd-firstboot facelift&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #22: &lt;a href="https://mastodon.social/@pid_eins/115790168621594241"&gt;Per-User systemd-machined + systemd-importd&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #23: &lt;a href="https://mastodon.social/@pid_eins/115796291114442291"&gt;systemd-udevd's OPTIONS="dump-json"&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #24: &lt;a href="https://mastodon.social/@pid_eins/115812764240295998"&gt;systemd-resolved's DumpDNSConfiguration() IPC Call&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #25: &lt;a href="https://mastodon.social/@pid_eins/115812894738123882"&gt;DHCP Server EmitDomain= + Domain=&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I intend to do a similar series of serieses of posts for the next systemd
release (v260), hence if you haven't left tech Twitter for Mastodon yet, now is
the opportunity.&lt;/p&gt;
&lt;p&gt;My series for v260 will begin in a few weeks most likely, under the
&lt;a href="https://mastodon.social/@pid_eins/tagged/systemd260"&gt;#systemd260&lt;/a&gt;
hash tag.&lt;/p&gt;
&lt;p&gt;In case you are interested, &lt;a href="https://0pointer.net/blog/mastodon-stories-for-systemd-v258.html"&gt;here is the corresponding blog story for
systemd v258&lt;/a&gt;,
&lt;a href="https://0pointer.net/blog/announcing-systemd-v257.html"&gt;here for
v257&lt;/a&gt;,
and &lt;a href="https://0pointer.net/blog/announcing-systemd-v256.html"&gt;here for
v256&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 31 Dec 2025 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2025-12-31:/blog/mastodon-stories-for-systemd-v259.html</guid><category>projects</category></item><item><title>Mastodon Stories for systemd v258</title><link>https://0pointer.net/blog/mastodon-stories-for-systemd-v258.html</link><description>&lt;p&gt;Already on Sep 17 we released systemd v258 &lt;a href="https://lists.freedesktop.org/archives/systemd-devel/2025-September/051670.html"&gt;into the wild&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the weeks leading up to that release I have posted a series of
serieses of posts to Mastodon about key new features in this release,
under the &lt;a href="https://mastodon.social/@pid_eins/tagged/systemd258"&gt;#systemd258&lt;/a&gt; hash
tag. It was my intention to post a link list here on this blog right
after completing that series, but I simply forgot! Hence, in case you
aren't using Mastodon, but would like to read up, here's a list of all
37 posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Post #1: &lt;a href="https://mastodon.social/@pid_eins/114545892813068498"&gt;systemctl start -v&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #2: &lt;a href="https://mastodon.social/@pid_eins/114550305394053015"&gt;Home areas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #3: &lt;a href="https://mastodon.social/@pid_eins/114556401535348313"&gt;systemd-resolved delegate zones&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #4: &lt;a href="https://mastodon.social/@pid_eins/114573005995694680"&gt;Foreign UID range&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #5: &lt;a href="https://mastodon.social/@pid_eins/114584855720355892"&gt;/etc/hostname ??? wildcards&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #6: &lt;a href="https://mastodon.social/@pid_eins/114600936718520372"&gt;Quota on /tmp/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #7: &lt;a href="https://mastodon.social/@pid_eins/114606769582857046"&gt;ConcurretnySoftMax= + ConcurrencyHardMax=&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #8: &lt;a href="https://mastodon.social/@pid_eins/114611768708184175"&gt;Product UUID in ConditionHost=&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #9: &lt;a href="https://mastodon.social/@pid_eins/114618473677694301"&gt;Context OSC terminal sequences&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #10: &lt;a href="https://mastodon.social/@pid_eins/114623820293284298"&gt;uki-url Boot Loader Spec Type #1 fields&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #11: &lt;a href="https://mastodon.social/@pid_eins/114629842058448119"&gt;rd.break= boot breakpoints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #12: &lt;a href="https://mastodon.social/@pid_eins/114635221853062454"&gt;Factory Reset Rework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #13: &lt;a href="https://mastodon.social/@pid_eins/114658726594933395"&gt;systemd-resolved DNS Configuration Change IPC Subscription API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #14: &lt;a href="https://mastodon.social/@pid_eins/114663599190570395"&gt;io.systemd.boot-entries.extra= SMBIOS Type #11 Key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #15: &lt;a href="https://mastodon.social/@pid_eins/114669334251141174"&gt;Bring Your Own Firmware&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #16: &lt;a href="https://mastodon.social/@pid_eins/114674923588128559"&gt;userdb record aliases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #17: &lt;a href="https://mastodon.social/@pid_eins/114691782651088597"&gt;systemd-validatefs and its xattrs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #18: &lt;a href="https://mastodon.social/@pid_eins/114697642092211474"&gt;Offline Signing of Artifacts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #19: &lt;a href="https://mastodon.social/@pid_eins/114710109344141459"&gt;PAMName= in services hooked up to ask-password protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #20: &lt;a href="https://mastodon.social/@pid_eins/114715785929761972"&gt;x-systemd.graceful-option= mount option&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #21: &lt;a href="https://mastodon.social/@pid_eins/114731475255410988"&gt;systemd-userdb-load-credentials.service&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #22: &lt;a href="https://mastodon.social/@pid_eins/114748482077172097"&gt;systemd-vmspawn --grow-image=&lt;/a&gt;a&lt;/li&gt;
&lt;li&gt;Post #23: &lt;a href="https://mastodon.social/@pid_eins/114754518632012480"&gt;systemd-notify --fork&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #24: &lt;a href="https://mastodon.social/@pid_eins/114771082006844631"&gt;$TERM auto-discovery&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #25: &lt;a href="https://mastodon.social/@pid_eins/114776832571027618"&gt;Rebooting/Powering off systemd-nspawn containers via hotkey&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #26: &lt;a href="https://mastodon.social/@pid_eins/114782700539340326"&gt;ExecStart= | modifier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #27: &lt;a href="https://mastodon.social/@pid_eins/114788278114378875"&gt;systemctl reload reloads confexts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #28: &lt;a href="https://mastodon.social/@pid_eins/114812166113613377"&gt;Server side userdb filtering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #29: &lt;a href="https://mastodon.social/@pid_eins/114816786836736040"&gt;Quota on StateDirectory= and friends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #30: &lt;a href="https://mastodon.social/@pid_eins/114823531154384369"&gt;systemd-analyze unit-shell&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #31: &lt;a href="https://mastodon.social/@pid_eins/114827645185448224"&gt;/etc/issue.d/ drop-in for AF_VSOCK CID&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #32: &lt;a href="https://mastodon.social/@pid_eins/114850389356673130"&gt;fsverity in systemd-repart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #33: &lt;a href="https://mastodon.social/@pid_eins/114980359479311999"&gt;AcceptFileDescriptor= + PassPIDFD=&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #34: &lt;a href="https://mastodon.social/@pid_eins/114986213475696548"&gt;Tab completion in interactive systemd-firstboot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #35: &lt;a href="https://mastodon.social/@pid_eins/114991974361745422"&gt;rd.systemd.pull= kernel command  line option/Boot into tarball&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #36: &lt;a href="https://mastodon.social/@pid_eins/115016549897778350"&gt;ConditionKernelModuleLoaded=&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #37: &lt;a href="https://mastodon.social/@pid_eins/115021969729733421"&gt;systemd-analyze chid&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #38: &lt;a href="https://mastodon.social/@pid_eins/115025535193319606"&gt;homectl list-signing-keys/get-signing-key/add-signing-key/remove-signing-key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #39: &lt;a href="https://mastodon.social/@pid_eins/115032087612846572"&gt;DDI Image Filters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #40: &lt;a href="https://mastodon.social/@pid_eins/115047948775277368"&gt;Android USB Debugging udev rules&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #41: &lt;a href="https://mastodon.social/@pid_eins/115054570728027141"&gt;systemd-vmspawn's --smbios11= switch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #42: &lt;a href="https://mastodon.social/@pid_eins/115061399584325638"&gt;$MAINPIDFDID + $MANAGERPIDFDID&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #43: &lt;a href="https://mastodon.social/@pid_eins/115064971230187209"&gt;$DEBUG_INVOCATION=1 Respected by all systemd services&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #44: &lt;a href="https://mastodon.social/@pid_eins/115071247493189919"&gt;LoaderDeviceURL EFI Variable and systemd.pull='s origin kernel command line switch&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #45: &lt;a href="https://mastodon.social/@pid_eins/115088154755010165"&gt;cgroupv1 removal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #46: &lt;a href="https://mastodon.social/@pid_eins/115095836068972901"&gt;ProtectHostname=private&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #47: &lt;a href="https://mastodon.social/@pid_eins/115099443130286999"&gt;homectl adopt + homectl register&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #48: &lt;a href="https://mastodon.social/@pid_eins/115114274588524270"&gt;systemd-machined Varlink APIs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #49: &lt;a href="https://mastodon.social/@pid_eins/115128243121764853"&gt;DeferTrigger and "lenient" job mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #50: &lt;a href="https://mastodon.social/@pid_eins/115132962518662116"&gt;Automatic Removal of foreign UID owned delegate subgroups in the per-user service manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #51: &lt;a href="https://mastodon.social/@pid_eins/115139611674941889"&gt;Per-user ask-password protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #52: &lt;a href="https://mastodon.social/@pid_eins/115179113138417200"&gt;PrivateUsers=full&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #53: &lt;a href="https://mastodon.social/@pid_eins/115190035839444683"&gt;LoadCredentialEncrypted= in the per-user service manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #54: &lt;a href="https://mastodon.social/@pid_eins/115218817138427511"&gt;dissect_image builtin in systemd-udevd&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #55: &lt;a href="https://mastodon.social/@pid_eins/115218897129957361"&gt;BPF Delegation via Tokens&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I intend to do a similar series of serieses of posts for the next systemd
release (v259), hence if you haven't left tech Twitter for Mastodon yet, now is
the opportunity.&lt;/p&gt;
&lt;p&gt;We intend to shorten the release cycle a bit for the future, and in
fact managed to tag v259-rc1 already yesterday, just 2 months after
v258. Hence, my series for v259 will begin soon, under the
&lt;a href="https://mastodon.social/tags/systemd259"&gt;#systemd259&lt;/a&gt; hash tag.&lt;/p&gt;
&lt;p&gt;In case you are interested, &lt;a href="https://0pointer.net/blog/announcing-systemd-v257.html"&gt;here is the corresponding blog story for
systemd v257&lt;/a&gt;,
and &lt;a href="https://0pointer.net/blog/announcing-systemd-v256.html"&gt;here for
v256&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 18 Nov 2025 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2025-11-18:/blog/mastodon-stories-for-systemd-v258.html</guid><category>projects</category></item><item><title>ASG! 2025 CfP Closes Tomorrow!</title><link>https://0pointer.net/blog/asg-2025-cfp-closes-tomorrow.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2025 Call for Participation Closes Tomorrow!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;The Call for Participation (CFP) for &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!
2025&lt;/a&gt; will close &lt;em&gt;tomorrow&lt;/em&gt;, on 13th of
June! We’d like to invite you to submit your proposals for
consideration to &lt;a href="https://cfp.all-systems-go.io/all-systems-go-2025/cfp"&gt;the CFP submission
site&lt;/a&gt; quickly!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 12 Jun 2025 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2025-06-12:/blog/asg-2025-cfp-closes-tomorrow.html</guid><category>projects</category></item><item><title>Announcing systemd v257</title><link>https://0pointer.net/blog/announcing-systemd-v257.html</link><description>&lt;p&gt;Last week we released systemd v257 &lt;a href="https://lists.freedesktop.org/archives/systemd-devel/2024-December/050995.html"&gt;into the wild&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the weeks leading up to this release (and the week after) I have
posted a series of serieses of posts to Mastodon about key new
features in this release, under the
&lt;a href="https://mastodon.social/tags/systemd257"&gt;#systemd257&lt;/a&gt; hash tag. In
case you aren't using Mastodon, but would like to read up, here's a
list of all 37 posts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Post #1: &lt;a href="https://mastodon.social/@pid_eins/113395418561365864"&gt;Fully Locked Accounts with systemd-sysusers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #2: &lt;a href="https://mastodon.social/@pid_eins/113404057459642225"&gt;Combined Signed PCR and Locally Managed PCR Policies for Disk Encryption&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #3: &lt;a href="https://mastodon.social/@pid_eins/113406672373007116"&gt;Progress Indication via Terminal ANSI Sequence&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #4: &lt;a href="https://mastodon.social/@pid_eins/113423432232653226"&gt;Multi-Profile UKIs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #5: &lt;a href="https://mastodon.social/@pid_eins/113429571219875759"&gt;The New sd-varlink &amp;amp; sd-json APIs in libsystemd&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #6: &lt;a href="https://mastodon.social/@pid_eins/113435078590868715"&gt;Querying for Passwords in User Scope&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #7: &lt;a href="https://mastodon.social/@pid_eins/113441330932924520"&gt;Secure Attention Key Logic in systemd-logind&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #8: &lt;a href="https://mastodon.social/@pid_eins/113446270534758537"&gt;systemd-nspawn --bind-user= Now Copies User's SSH Key&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #9: &lt;a href="https://mastodon.social/@pid_eins/113468949960210477"&gt;The New DeferReactivation= Switch in .timer Units&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #10: &lt;a href="https://mastodon.social/@pid_eins/113474628446972762"&gt;Support for the New IPE LSM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #11: &lt;a href="https://mastodon.social/@pid_eins/113480519788031317"&gt;Environment Variables for Shell Prompt Prefix/Suffix&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #12: &lt;a href="https://mastodon.social/@pid_eins/113491516929371648"&gt;sysctl Conflict Detection via eBPF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #13: &lt;a href="https://mastodon.social/@pid_eins/113505888590064562"&gt;initrd and µcode UKI Add-Ons&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #14: &lt;a href="https://mastodon.social/@pid_eins/113508663882187676"&gt;SecureBoot Signing with the New systemd-sbsign Tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #15: &lt;a href="https://mastodon.social/@pid_eins/113514266248183685"&gt;Managed Access to hidraw devices in systemd-logind&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #16: &lt;a href="https://mastodon.social/@pid_eins/113520730307228083"&gt;Fuzzy Filtering in userdbctl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #17: &lt;a href="https://mastodon.social/@pid_eins/113526181930069497"&gt;MAC Address Based Alternative Network Interface Names&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #18: &lt;a href="https://mastodon.social/@pid_eins/113542791707767760"&gt;Conditional Copying/Symlinking in tmpfiles.d/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #19: &lt;a href="https://mastodon.social/@pid_eins/113548780685011324"&gt;Automatic Service Restarts in Debug Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #20: &lt;a href="https://mastodon.social/@pid_eins/113553862192857423"&gt;Filtering by Invocation ID in journalctl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #21: &lt;a href="https://mastodon.social/@pid_eins/113559499843745770"&gt;Supplement Partitions in repart.d/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #22: &lt;a href="https://mastodon.social/@pid_eins/113565367864400184"&gt;DeviceTree Matching in UKIs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #23: &lt;a href="https://mastodon.social/@pid_eins/113582241211076013"&gt;The New ssh-exec: Protocol in varlinkctl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #24: &lt;a href="https://mastodon.social/@pid_eins/113587886819830702"&gt;SecureBoot Key Enrollment Preparation with bootctl&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #25: &lt;a href="https://mastodon.social/@pid_eins/113593858094990730"&gt;Automatically Installing confext/sysext/portable/VMs/container Images at Boot&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #26: &lt;a href="https://mastodon.social/@pid_eins/113622180396380788"&gt;Designated Maintenance Time in systemd-logind&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #27: &lt;a href="https://mastodon.social/@pid_eins/113628059773250732"&gt;PID Namespacing in Service Management&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #28: &lt;a href="https://mastodon.social/@pid_eins/113633316699246556"&gt;Marking Experimental OS Releases in /etc/os-release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #29: &lt;a href="https://mastodon.social/@pid_eins/113633418879095485"&gt;Decoding Capability Masks with systemd-analyze&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #30: &lt;a href="https://mastodon.social/@pid_eins/113639111278004977"&gt;Investigating Passed SMBIOS Type #11 Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #31: &lt;a href="https://mastodon.social/@pid_eins/113639171654338584"&gt;Initializing Partitions from Character Devices in repart.d/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #32: &lt;a href="https://mastodon.social/@pid_eins/113644653430055429"&gt;Entering Namespaces to Generate Stacktraces&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #33: &lt;a href="https://mastodon.social/@pid_eins/113644968673016589"&gt;ID Mapped Mounts for Per-Service Directories&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #34: &lt;a href="https://mastodon.social/@pid_eins/113661595029794773"&gt;A Daemon for systemd-sysupdate&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #35: &lt;a href="https://mastodon.social/@pid_eins/113661637019131657"&gt;User Record Modifications without Administrator Consent in systemd-homed&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #36: &lt;a href="https://mastodon.social/@pid_eins/113667227576493517"&gt;DNR DHCP Support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #37: &lt;a href="https://mastodon.social/@pid_eins/113667254835871626"&gt;Name Based AF_VSOCK ssh Access&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I intend to do a similar series of serieses of posts for the next systemd
release (v258), hence if you haven't left tech Twitter for Mastodon yet, now is
the opportunity.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 17 Dec 2024 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2024-12-17:/blog/announcing-systemd-v257.html</guid><category>projects</category></item><item><title>Announcing systemd v256</title><link>https://0pointer.net/blog/announcing-systemd-v256.html</link><description>&lt;p&gt;Yesterday evening we released systemd v256 into the wild. While other projects,
&lt;a href="https://www.mozilla.org/en-US/firefox/127.0/releasenotes/"&gt;such as Firefox&lt;/a&gt;
are just about to leave the 7bit world and enter 8bit territory, we already
entered 9bit version territory! For details about the release, &lt;a href="https://lists.freedesktop.org/archives/systemd-devel/2024-June/050407.html"&gt;see our
announcement
mail&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the weeks leading up to this release I have posted a series of serieses of
posts to Mastodon about key new features in this release. Mastodon has its
goods and its bads. Among the latter is probably that it isn't that great for
posting listings of serieses of posts. Hence let me provide you with a list of
the relevant first post in the series of posts here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Post #1: &lt;a href="https://mastodon.social/@pid_eins/112332457438509644"&gt;&lt;code&gt;.v/&lt;/code&gt; Directories&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #2: &lt;a href="https://mastodon.social/@pid_eins/112336318532407967"&gt;User-Scoped Encrypted Service Credentials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #3: &lt;a href="https://mastodon.social/@pid_eins/112341584011845948"&gt;&lt;code&gt;X_SYSTEMD_UNIT_ACTIVE=&lt;/code&gt; &lt;code&gt;sd_notify()&lt;/code&gt; Messages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #4: &lt;a href="https://mastodon.social/@pid_eins/112347205079185896"&gt;System-wide &lt;code&gt;ProtectSystem=&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #5: &lt;a href="https://mastodon.social/@pid_eins/112353324518585654"&gt;&lt;code&gt;run0&lt;/code&gt; As &lt;code&gt;sudo&lt;/code&gt; Replacement&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #6: &lt;a href="https://mastodon.social/@pid_eins/112359214673482293"&gt;System Credentials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #7: &lt;a href="https://mastodon.social/@pid_eins/112364314961758625"&gt;Unprivileged DDI Mounts + Unprivileged &lt;code&gt;systemd-nspawn&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #8: &lt;a href="https://mastodon.social/@pid_eins/112370336310304287"&gt;&lt;code&gt;ssh&lt;/code&gt; into &lt;code&gt;systemd-homed&lt;/code&gt; Accounts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #9: &lt;a href="https://mastodon.social/@pid_eins/112376110947253007"&gt;&lt;code&gt;systemd-vmspawn&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #10: &lt;a href="https://mastodon.social/@pid_eins/112393043769199622"&gt;Mutable &lt;code&gt;systemd-sysext&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #11: &lt;a href="https://mastodon.social/@pid_eins/112398647693125514"&gt;Network Device Ownership&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #12: &lt;a href="https://mastodon.social/@pid_eins/112404050701925757"&gt;&lt;code&gt;systemctl sleep&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #13: &lt;a href="https://mastodon.social/@pid_eins/112411213727666482"&gt;&lt;code&gt;systemd-ssh-generator&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #14: &lt;a href="https://mastodon.social/@pid_eins/112434053071743515"&gt;&lt;code&gt;systemd-cryptenroll&lt;/code&gt; without device argument&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #15: &lt;a href="https://mastodon.social/@pid_eins/112445409388762154"&gt;&lt;code&gt;dlopen()&lt;/code&gt; ELF Metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Post #16: &lt;a href="https://mastodon.social/@pid_eins/112597860823838594"&gt;&lt;code&gt;Capsules&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I intend to do a similar series of serieses of posts for the next systemd
release (v257), hence if you haven't left tech Twitter for Mastodon yet, now is
the opportunity.&lt;/p&gt;
&lt;p&gt;And while I have you: note that the &lt;a href="https://all-systems-go.io/"&gt;All Systems Go 2024 Conference
(Berlin)&lt;/a&gt; Call for Papers ends 😲 THIS WEEK 🤯!
Hence, HURRY, and &lt;a href="https://cfp.all-systems-go.io/all-systems-go-2024/cfp"&gt;get your submissions in
now&lt;/a&gt;, for the best
low-level Linux userspace conference around!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 12 Jun 2024 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2024-06-12:/blog/announcing-systemd-v256.html</guid><category>projects</category></item><item><title>A re-introduction to mkosi -- A Tool for Generating OS Images</title><link>https://0pointer.net/blog/a-re-introduction-to-mkosi-a-tool-for-generating-os-images.html</link><description>&lt;blockquote&gt;
&lt;p&gt;This is a guest post written by Daan De Meyer, systemd and mkosi
maintainer&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Almost 7 years ago, Lennart first
&lt;a href="https://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html"&gt;wrote&lt;/a&gt;
about &lt;code&gt;mkosi&lt;/code&gt; on this blog. Some years ago, I took over development and
there's been a huge amount of changes and improvements since then. So I
figure this is a good time to re-introduce &lt;code&gt;mkosi&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/systemd/mkosi"&gt;&lt;code&gt;mkosi&lt;/code&gt;&lt;/a&gt; stands for &lt;em&gt;Make Operating
System Image&lt;/em&gt;. It generates OS images that can be used for a variety of
purposes.&lt;/p&gt;
&lt;p&gt;If you prefer watching a video over reading a blog post, you can also
watch my &lt;a href="https://www.youtube.com/watch?v=6EelcbjbUa8"&gt;presentation&lt;/a&gt; on
&lt;code&gt;mkosi&lt;/code&gt; at All Systems Go 2023.&lt;/p&gt;
&lt;h2&gt;What is mkosi?&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;mkosi&lt;/code&gt; was originally written as a tool to simplify hacking on systemd
and for experimenting with images using many of the new concepts being
introduced in systemd at the time. In the meantime, it has evolved into
a general purpose image builder that can be used in a multitude of
scenarios.&lt;/p&gt;
&lt;p&gt;Instructions to install &lt;code&gt;mkosi&lt;/code&gt; can be found in its
&lt;a href="https://github.com/systemd/mkosi/blob/main/README.md"&gt;readme&lt;/a&gt;. We
recommend running the latest version to take advantage of all the latest
features and bug fixes. You'll also need &lt;code&gt;bubblewrap&lt;/code&gt; and the package
manager of your favorite distribution to get started.&lt;/p&gt;
&lt;p&gt;At its core, the workflow of &lt;code&gt;mkosi&lt;/code&gt; can be divided into 3 steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Generate an OS tree for some distribution by installing a set of
   packages.&lt;/li&gt;
&lt;li&gt;Package up that OS tree in a variety of output formats.&lt;/li&gt;
&lt;li&gt;(Optionally) Boot the resulting image in &lt;code&gt;qemu&lt;/code&gt; or &lt;code&gt;systemd-nspawn&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Images can be built for any of the following distributions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fedora Linux&lt;/li&gt;
&lt;li&gt;Ubuntu&lt;/li&gt;
&lt;li&gt;OpenSUSE&lt;/li&gt;
&lt;li&gt;Debian&lt;/li&gt;
&lt;li&gt;Arch Linux&lt;/li&gt;
&lt;li&gt;CentOS Stream&lt;/li&gt;
&lt;li&gt;RHEL&lt;/li&gt;
&lt;li&gt;Rocky Linux&lt;/li&gt;
&lt;li&gt;Alma Linux&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And the following output formats are supported:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPT disk images built with &lt;code&gt;systemd-repart&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Tar archives&lt;/li&gt;
&lt;li&gt;CPIO archives (for building initramfs images)&lt;/li&gt;
&lt;li&gt;USIs (Unified System Images which are full OS images packed in a UKI)&lt;/li&gt;
&lt;li&gt;Sysext, confext and portable images&lt;/li&gt;
&lt;li&gt;Directory trees&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For example, to build an Arch Linux GPT disk image and boot it in
&lt;code&gt;qemu&lt;/code&gt;, you can run the following command:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;mkosi&lt;span class="w"&gt; &lt;/span&gt;-d&lt;span class="w"&gt; &lt;/span&gt;arch&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;udev&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;linux&lt;span class="w"&gt; &lt;/span&gt;-t&lt;span class="w"&gt; &lt;/span&gt;disk&lt;span class="w"&gt; &lt;/span&gt;qemu
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To instead boot the image in systemd-nspawn, replace &lt;code&gt;qemu&lt;/code&gt; with &lt;code&gt;boot&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;mkosi&lt;span class="w"&gt; &lt;/span&gt;-d&lt;span class="w"&gt; &lt;/span&gt;arch&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;udev&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;linux&lt;span class="w"&gt; &lt;/span&gt;-t&lt;span class="w"&gt; &lt;/span&gt;disk&lt;span class="w"&gt; &lt;/span&gt;boot
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The actual image can be found in the current working directory named
&lt;code&gt;image.raw&lt;/code&gt;. However, using a separate output directory is recommended
which is as simple as running &lt;code&gt;mkdir mkosi.output&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To rebuild the image after it's already been built once, add &lt;code&gt;-f&lt;/code&gt; to the
command line before the verb to rebuild the image. Any arguments passed
after the verb are forwarded to either &lt;code&gt;systemd-nspawn&lt;/code&gt; or &lt;code&gt;qemu&lt;/code&gt;
itself. To build the image without booting it, pass &lt;code&gt;build&lt;/code&gt; instead of
&lt;code&gt;boot&lt;/code&gt; or &lt;code&gt;qemu&lt;/code&gt; or don't pass a verb at all.&lt;/p&gt;
&lt;p&gt;By default, the disk image will have an appropriately sized root
partition and an ESP partition, but the partition layout and contents
can be fully customized using &lt;code&gt;systemd-repart&lt;/code&gt; by creating partition
definition files in &lt;code&gt;mkosi.repart/&lt;/code&gt;. This allows you to customize the
partition as you see fit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The root partition can be encrypted.&lt;/li&gt;
&lt;li&gt;Partition sizes can be customized.&lt;/li&gt;
&lt;li&gt;Partitions can be protected with signed dm-verity.&lt;/li&gt;
&lt;li&gt;You can opt out of having a root partition and only have a /usr
  partition instead.&lt;/li&gt;
&lt;li&gt;You can add various other partitions, e.g. an XBOOTLDR partition or a
  swap partition.&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As part of building the image, we'll run various tools such as
&lt;code&gt;systemd-sysusers&lt;/code&gt;, &lt;code&gt;systemd-firstboot&lt;/code&gt;, &lt;code&gt;depmod&lt;/code&gt;, &lt;code&gt;systemd-hwdb&lt;/code&gt; and
more to make sure the image is set up correctly.&lt;/p&gt;
&lt;h2&gt;Configuring mkosi image builds&lt;/h2&gt;
&lt;p&gt;Naturally with extended use you don't want to specify all settings on
the command line every time, so &lt;code&gt;mkosi&lt;/code&gt; supports configuration files
where the same settings that can be specified on the command line can be
written down.&lt;/p&gt;
&lt;p&gt;For example, the command we used above can be written down in a
configuration file &lt;code&gt;mkosi.conf&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Distribution]&lt;/span&gt;
&lt;span class="na"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;arch&lt;/span&gt;

&lt;span class="k"&gt;[Output]&lt;/span&gt;
&lt;span class="na"&gt;Format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;disk&lt;/span&gt;

&lt;span class="k"&gt;[Content]&lt;/span&gt;
&lt;span class="na"&gt;Packages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;systemd&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;udev&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;linux&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Like systemd, &lt;code&gt;mkosi&lt;/code&gt; uses INI configuration files. We also support
dropins which can be placed in &lt;code&gt;mkosi.conf.d&lt;/code&gt;. Configuration files can
also be conditionalized using the &lt;code&gt;[Match]&lt;/code&gt; section. For example, to
only install a specific package on Arch Linux, you can write the
following to &lt;code&gt;mkosi.conf.d/10-arch.conf&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Match]&lt;/span&gt;
&lt;span class="na"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;arch&lt;/span&gt;

&lt;span class="k"&gt;[Content]&lt;/span&gt;
&lt;span class="na"&gt;Packages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;pacman&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Because not everything you need will be supported in &lt;code&gt;mkosi&lt;/code&gt;, we support
running scripts at various points during the image build process where
all extra image customization can be done. For example, if it is found,
&lt;code&gt;mkosi.postinst&lt;/code&gt; is called after packages have been installed. Scripts
are executed on the host system by default (in a sandbox), but can be
executed inside the image by suffixing the script with &lt;code&gt;.chroot&lt;/code&gt;, so if
&lt;code&gt;mkosi.postinst.chroot&lt;/code&gt; is found it will be executed inside the image.&lt;/p&gt;
&lt;p&gt;To add extra files to the image, you can place them in &lt;code&gt;mkosi.extra&lt;/code&gt; in
the source directory and they will be automatically copied into the
image after packages have been installed.&lt;/p&gt;
&lt;h2&gt;Bootable images&lt;/h2&gt;
&lt;p&gt;If the necessary packages are installed, &lt;code&gt;mkosi&lt;/code&gt; will automatically
generate a UEFI/BIOS bootable image. As &lt;code&gt;mkosi&lt;/code&gt; is a systemd project, it
will always build
&lt;a href="https://uapi-group.org/specifications/specs/unified_kernel_image/"&gt;UKIs&lt;/a&gt;
(Unified Kernel Images), except if the image is BIOS-only (since UKIs
cannot be used on BIOS). The initramfs is built like a regular image by
installing distribution packages and packaging them up in a CPIO archive
instead of a disk image. Specifically, we do not use &lt;code&gt;dracut&lt;/code&gt;,
&lt;code&gt;mkinitcpio&lt;/code&gt; or &lt;code&gt;initramfs-tools&lt;/code&gt; to generate the initramfs from the
host system. &lt;code&gt;ukify&lt;/code&gt; is used to assemble all the individual components
into a UKI.&lt;/p&gt;
&lt;p&gt;If you don't want &lt;code&gt;mkosi&lt;/code&gt; to generate a bootable image, you can set
&lt;code&gt;Bootable=no&lt;/code&gt; to explicitly disable this logic.&lt;/p&gt;
&lt;h2&gt;Using mkosi for development&lt;/h2&gt;
&lt;p&gt;The main requirements to use &lt;code&gt;mkosi&lt;/code&gt; for development is that we can
build our source code against the image we're building and install it
into the image we're building. &lt;code&gt;mkosi&lt;/code&gt; supports this via build scripts.
If a script named &lt;code&gt;mkosi.build&lt;/code&gt; (or &lt;code&gt;mkosi.build.chroot&lt;/code&gt;) is found,
we'll execute it as part of the build. Any files put by the build script
into &lt;code&gt;$DESTDIR&lt;/code&gt; will be installed into the image. Required build
dependencies can be installed using the &lt;code&gt;BuildPackages=&lt;/code&gt; setting. These
packages are installed into an overlay which is put on top of the image
when running the build script so the build packages are available when
running the build script but don't end up in the final image.&lt;/p&gt;
&lt;p&gt;An example &lt;code&gt;mkosi.build.chroot&lt;/code&gt; script for a project using &lt;code&gt;meson&lt;/code&gt; could
look as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="ch"&gt;#!/bin/sh&lt;/span&gt;
meson&lt;span class="w"&gt; &lt;/span&gt;setup&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$BUILDDIR&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$SRCDIR&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;
ninja&lt;span class="w"&gt; &lt;/span&gt;-C&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$BUILDDIR&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;((&lt;/span&gt;WITH_TESTS&lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;meson&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-C&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$BUILDDIR&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
meson&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;-C&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$BUILDDIR&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, every time the image is built, the build script will be executed
and the results will be installed into the image.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;$BUILDDIR&lt;/code&gt; environment variable points to a directory that can be
used as the build directory for build artifacts to allow for incremental
builds if the build system supports it.&lt;/p&gt;
&lt;p&gt;Of course, downloading all packages from scratch every time and
re-installing them again every time the image is built is rather slow,
so &lt;code&gt;mkosi&lt;/code&gt; supports two modes of caching to speed things up.&lt;/p&gt;
&lt;p&gt;The first caching mode caches all downloaded packages so they don't have
to be downloaded again on subsequent builds. Enabling this is as simple
as running &lt;code&gt;mkdir mkosi.cache&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The second mode of caching caches the image after all packages have been
installed but before running the build script. On subsequent builds,
&lt;code&gt;mkosi&lt;/code&gt; will copy the cache instead of reinstalling all packages from
scratch. This mode can be enabled using the &lt;code&gt;Incremental=&lt;/code&gt; setting.
While there is some rudimentary cache invalidation, the cache can also
forcibly be rebuilt by specifying &lt;code&gt;-ff&lt;/code&gt; on the command line instead of
&lt;code&gt;-f&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note that when running on a btrfs filesystem, &lt;code&gt;mkosi&lt;/code&gt; will automatically
use subvolumes for the cached images which can be snapshotted on
subsequent builds for even faster rebuilds. We'll also use reflinks to
do copy-on-write copies where possible.&lt;/p&gt;
&lt;p&gt;With this setup, by running &lt;code&gt;mkosi -f qemu&lt;/code&gt; in the systemd repository,
it takes about 40 seconds to go from a source code change to a root
shell in a virtual machine running the latest systemd with your change
applied. This makes it very easy to test changes to systemd in a safe
environment without risk of breaking your host system.&lt;/p&gt;
&lt;p&gt;Of course, while 40 seconds is not a very long time, it's still more
than we'd like, especially if all we're doing is modifying the kernel
command line. That's why we have the &lt;code&gt;KernelCommandLineExtra=&lt;/code&gt; option to
configure kernel command line options that are passed to the container
or virtual machine at runtime instead of being embedded into the image.
These extra kernel command line options are picked up when the image is
booted with qemu's direct kernel boot (using &lt;code&gt;-append&lt;/code&gt;), but also when
booting a disk image in UEFI mode (using SMBIOS). The same applies to
systemd credentials (using the &lt;code&gt;Credentials=&lt;/code&gt; setting). These settings
allow configuring the image without having to rebuild it, which means
that you only have to run &lt;code&gt;mkosi qemu&lt;/code&gt; or &lt;code&gt;mkosi boot&lt;/code&gt; again afterwards
to apply the new settings.&lt;/p&gt;
&lt;h2&gt;Building images without root privileges and loop devices&lt;/h2&gt;
&lt;p&gt;By using &lt;code&gt;newuidmap&lt;/code&gt;/&lt;code&gt;newgidmap&lt;/code&gt; and &lt;code&gt;systemd-repart&lt;/code&gt;, &lt;code&gt;mkosi&lt;/code&gt; is able to
build images without needing root privileges. As long as proper subuid
and subgid mappings are set up for your user in &lt;code&gt;/etc/subuid&lt;/code&gt; and
&lt;code&gt;/etc/subgid&lt;/code&gt;, you can run &lt;code&gt;mkosi&lt;/code&gt; as your regular user without having
to switch to &lt;code&gt;root&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note that as of the writing of this blog post this only applies to the
&lt;code&gt;build&lt;/code&gt; and &lt;code&gt;qemu&lt;/code&gt; verbs. Booting the image in a &lt;code&gt;systemd-nspawn&lt;/code&gt;
container with &lt;code&gt;mkosi boot&lt;/code&gt; still needs root privileges. We're hoping to
fix this in an future systemd release.&lt;/p&gt;
&lt;p&gt;Regardless of whether you're running &lt;code&gt;mkosi&lt;/code&gt; with root or without root,
almost every tool we execute is invoked in a sandbox to isolate as much
of the build process from the host as possible. For example, &lt;code&gt;/etc&lt;/code&gt; and
&lt;code&gt;/var&lt;/code&gt; from the host are not available in this sandbox, to avoid host
configuration inadvertently affecting the build.&lt;/p&gt;
&lt;p&gt;Because &lt;code&gt;systemd-repart&lt;/code&gt; can build disk images without loop devices,
&lt;code&gt;mkosi&lt;/code&gt; can run from almost any environment, including containers. All
that's needed is a UID range with 65536 UIDs available, either via
running as the root user or via &lt;code&gt;/etc/subuid&lt;/code&gt; and &lt;code&gt;newuidmap&lt;/code&gt;. In a
future systemd release, we're hoping to provide an alternative to
&lt;code&gt;newuidmap&lt;/code&gt; and &lt;code&gt;/etc/subuid&lt;/code&gt; to allow running &lt;code&gt;mkosi&lt;/code&gt; from all
containers, even those with only a single UID available.&lt;/p&gt;
&lt;h2&gt;Supporting older distributions&lt;/h2&gt;
&lt;p&gt;mkosi depends on very recent versions of various systemd tools (v254 or
newer). To support older distributions, we implemented so called tools
trees. In short, &lt;code&gt;mkosi&lt;/code&gt; can first build a tools image for you that
contains all required tools to build the actual image. This can be
enabled by adding &lt;code&gt;ToolsTree=default&lt;/code&gt; to your mkosi configuration.
Building a tools image does not require a recent version of systemd.&lt;/p&gt;
&lt;p&gt;In the systemd mkosi configuration, we automatically use a tools tree if
we detect your distribution does not have the minimum required systemd
version installed.&lt;/p&gt;
&lt;h2&gt;Configuring variants of the same image using profiles&lt;/h2&gt;
&lt;p&gt;Profiles can be defined in the &lt;code&gt;mkosi.profiles/&lt;/code&gt; directory. The profile
to use can be selected using the &lt;code&gt;Profile=&lt;/code&gt; setting (or &lt;code&gt;--profile=&lt;/code&gt;) on
the command line. A profile allows you to bundle various settings behind
a single recognizable name. Profiles can also be matched on if you want
to apply some settings only to a few profiles.&lt;/p&gt;
&lt;p&gt;For example, you could have a &lt;code&gt;bootable&lt;/code&gt; profile that sets
&lt;code&gt;Bootable=yes&lt;/code&gt;, adds the &lt;code&gt;linux&lt;/code&gt; and &lt;code&gt;systemd-boot&lt;/code&gt; packages and
configures &lt;code&gt;Format=disk&lt;/code&gt; to end up with a bootable disk image when
passing &lt;code&gt;--profile bootable&lt;/code&gt; on the kernel command line.&lt;/p&gt;
&lt;h2&gt;Building system extension images&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://uapi-group.org/specifications/specs/extension_image/"&gt;System extension&lt;/a&gt;
images may – dynamically at runtime — extend the base system with an
overlay containing additional files.&lt;/p&gt;
&lt;p&gt;To build system extensions with &lt;code&gt;mkosi&lt;/code&gt;, we need a base image on top of
which we can build our extension.&lt;/p&gt;
&lt;p&gt;To keep things manageable, we'll make use of &lt;code&gt;mkosi&lt;/code&gt;'s support for
building multiple images so that we can build our base image and system
extension in one go.&lt;/p&gt;
&lt;p&gt;We start by creating a temporary directory with a base configuration
file &lt;code&gt;mkosi.conf&lt;/code&gt; with some shared settings:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Output]&lt;/span&gt;
&lt;span class="na"&gt;OutputDirectory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;mkosi.output&lt;/span&gt;
&lt;span class="na"&gt;CacheDirectory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;mkosi.cache&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now let's continue with the base image definition by writing the
following to &lt;code&gt;mkosi.images/base/mkosi.conf&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Output]&lt;/span&gt;
&lt;span class="na"&gt;Format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;directory&lt;/span&gt;

&lt;span class="k"&gt;[Content]&lt;/span&gt;
&lt;span class="na"&gt;CleanPackageMetadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;no&lt;/span&gt;
&lt;span class="na"&gt;Packages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;systemd&lt;/span&gt;
&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="na"&gt;udev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We use the &lt;code&gt;directory&lt;/code&gt; output format here instead of the &lt;code&gt;disk&lt;/code&gt; output
so that we can build our extension without needing root privileges.&lt;/p&gt;
&lt;p&gt;Now that we have our base image, we can define a sysext that builds on
top of it by writing the following to &lt;code&gt;mkosi.images/btrfs/mkosi.conf&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Config]&lt;/span&gt;
&lt;span class="na"&gt;Dependencies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;base&lt;/span&gt;

&lt;span class="k"&gt;[Output]&lt;/span&gt;
&lt;span class="na"&gt;Format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;sysext&lt;/span&gt;
&lt;span class="na"&gt;Overlay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;

&lt;span class="k"&gt;[Content]&lt;/span&gt;
&lt;span class="na"&gt;BaseTrees&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;%O/base&lt;/span&gt;
&lt;span class="na"&gt;Packages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;btrfs-progs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;BaseTrees=&lt;/code&gt; point to our base image and &lt;code&gt;Overlay=yes&lt;/code&gt; instructs mkosi
to only package the files added on top of the base tree.&lt;/p&gt;
&lt;p&gt;We can't sign the extension image without a key. We can generate one
by running &lt;code&gt;mkosi genkey&lt;/code&gt; which will generate files that are
automatically picked up when building the image.&lt;/p&gt;
&lt;p&gt;Finally, you can build the base image and the extensions by running
&lt;code&gt;mkosi -f&lt;/code&gt;. You'll find &lt;code&gt;btrfs.raw&lt;/code&gt; in &lt;code&gt;mkosi.output&lt;/code&gt; which is the
extension image.&lt;/p&gt;
&lt;h2&gt;Various other interesting features&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;To sign any generated UKIs for secure boot, put your secure boot key
  and certificate in &lt;code&gt;mkosi.key&lt;/code&gt; and &lt;code&gt;mkosi.crt&lt;/code&gt; and enable the
  &lt;code&gt;SecureBoot=&lt;/code&gt; setting. You can also run &lt;code&gt;mkosi genkey&lt;/code&gt; to have &lt;code&gt;mkosi&lt;/code&gt;
  generate a key and certificate itself.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Ephemeral=&lt;/code&gt; setting can be enabled to boot the image in an
  ephemeral copy that is thrown away when the container or virtual
  machine exits.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ShimBootloader=&lt;/code&gt; and &lt;code&gt;BiosBootloader=&lt;/code&gt; settings are available to
  configure shim and grub installation if needed.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mkosi&lt;/code&gt; can boot directory trees in a virtual using &lt;code&gt;virtiofsd&lt;/code&gt;. This
  is very useful for quickly rebuilding an image and booting it as the
  image does not have to be packed up as a disk image.&lt;/li&gt;
&lt;li&gt;...&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There's many more features that we won't go over in detail here in this
blog post. Learn more about those by reading the
&lt;a href="https://github.com/systemd/mkosi/blob/main/mkosi/resources/mkosi.md"&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I'll finish with a bunch of links to more information about &lt;code&gt;mkosi&lt;/code&gt; and
related tooling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/systemd/mkosi"&gt;Github repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://fedoramagazine.org/create-images-directly-from-rhel-and-rhel-ubi-package-using-mkosi/"&gt;Building RHEL and RHEL UBI images with mkosi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://media.ccc.de/v/all-systems-go-2023-191-systemd-repart-building-discoverable-disk-images"&gt;My presentation on systemd-repart at ASG 2023&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://matrix.to/#/#mkosi:matrix.org"&gt;mkosi's Matrix channel&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://raw.githubusercontent.com/systemd/systemd/main/mkosi.conf"&gt;systemd's mkosi configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/systemd/systemd/tree/main/mkosi.conf.d"&gt;mkosi's mkosi configuration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 10 Jan 2024 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2024-01-10:/blog/a-re-introduction-to-mkosi-a-tool-for-generating-os-images.html</guid><category>projects</category></item><item><title>ASG! 2023 CfP Closes Soon</title><link>https://0pointer.net/blog/asg-2023-cfp-closes-soon.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2023 Call for Participation Closes in Three Days!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;The Call for Participation (CFP) for &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!
2023&lt;/a&gt; will close &lt;em&gt;in three days&lt;/em&gt;, on 7th of
July! We’d like to invite you to submit your proposals for
consideration to &lt;a href="https://cfp.all-systems-go.io/all-systems-go-2023/cfp"&gt;the CFP submission
site&lt;/a&gt; quickly!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://cfp.all-systems-go.io/media/all-systems-go-2023/img/ASG_social-avatar_Nl7AMBw.png" alt="ASG image" width="512" height="512"/&gt;&lt;/p&gt;
&lt;p&gt;All topics relevant to foundational open-source Linux technologies are
welcome. In particular, however, we are looking for proposals
including, but not limited to, the following topics:&lt;/p&gt;
&lt;p&gt;The CFP will close on &lt;em&gt;July 7th, 2023&lt;/em&gt;. A response will be sent to all
submitters on or before July 14th, 2023. The conference takes place in
🗺️ Berlin, Germany 🇩🇪 on Sept. 13-14th.&lt;/p&gt;
&lt;p&gt;All Systems Go! 2023 is all about foundational open-source Linux
technologies. We are primarily looking for deeply technical talks by
and for developers, engineers and other technical roles.&lt;/p&gt;
&lt;p&gt;We focus on the userspace side of things, so while kernel topics are
welcome they must have clear, direct relevance to userspace. The
following is a non-comprehensive list of topics encouraged for 2023
submissions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Image-Based Linux 🖼️&lt;/li&gt;
&lt;li&gt;Secure and Measured Boot 📏&lt;/li&gt;
&lt;li&gt;TPM-Based Local/Remote Attestation, Encryption, Authentication 🔑&lt;/li&gt;
&lt;li&gt;Low-level container executors and infrastructure ⚙️.&lt;/li&gt;
&lt;li&gt;IoT, embedded and server Linux infrastructure&lt;/li&gt;
&lt;li&gt;Reproducible builds 🔧&lt;/li&gt;
&lt;li&gt;Package management, OS, container 📦, image delivery and updating&lt;/li&gt;
&lt;li&gt;Building Linux devices and applications 🏗️&lt;/li&gt;
&lt;li&gt;Low-level desktop 💻 technologies&lt;/li&gt;
&lt;li&gt;Networking 🌐&lt;/li&gt;
&lt;li&gt;System and service management 🚀&lt;/li&gt;
&lt;li&gt;Tracing and performance measuring 🔍&lt;/li&gt;
&lt;li&gt;IPC and RPC systems 🦜&lt;/li&gt;
&lt;li&gt;Security 🔐 and Sandboxing 🏖️&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information please visit &lt;a href="https://all-systems-go.io/"&gt;our conference
website&lt;/a&gt;!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 04 Jul 2023 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2023-07-04:/blog/asg-2023-cfp-closes-soon.html</guid><category>projects</category></item><item><title>Linux Boot Partitions</title><link>https://0pointer.net/blog/linux-boot-partitions.html</link><description>&lt;h1&gt;💽 Linux Boot Partitions and How to Set Them Up 🚀&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;Let’s have a look how traditional Linux distributions set up
&lt;code&gt;/boot/&lt;/code&gt; and the ESP, and how this could be improved.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;How Linux distributions traditionally have been setting up their
“boot” file systems has been varying to some degree, but the most
common choice has been to have a separate partition mounted to
&lt;code&gt;/boot/&lt;/code&gt;. Usually the partition is formatted as a Linux file system
such as ext2/ext3/ext4. The partition contains the kernel images, the
initrd and various boot loader resources. Some distributions, like
Debian and Ubuntu, also store ancillary files associated with the
kernel here, such as &lt;code&gt;kconfig&lt;/code&gt; or &lt;code&gt;System.map&lt;/code&gt;. Such a traditional
boot partition is only defined within the context of the distribution,
and typically not immediately recognizable as such when looking just
at the partition table (i.e. it uses the generic Linux partition type
UUID).&lt;/p&gt;
&lt;p&gt;With the arrival of UEFI a new partition relevant for boot appeared,
the &lt;em&gt;EFI System Partition&lt;/em&gt; (ESP). This partition is defined by the
firmware environment, but typically accessed by Linux to install or
update boot loaders. The choice of file system is not up to Linux, but
effectively mandated by the UEFI specifications: vFAT. In theory it
could be formatted as other file systems too. However, this would
require the firmware to support file systems other than vFAT. This is
rare and firmware specific though, as vFAT is the only file system
mandated by the UEFI specification. In other words, vFAT is the only
file system which is guaranteed to be universally supported.&lt;/p&gt;
&lt;p&gt;There’s a major overlap of the type of the data typically stored in
the ESP and in the traditional boot partition mentioned earlier: a
variety of boot loader resources as well as kernels/initrds.&lt;/p&gt;
&lt;p&gt;Unlike the traditional boot partition, the ESP is easily recognizable
in the partition table via its GPT partition type UUID. The ESP is
also a &lt;em&gt;shared resource&lt;/em&gt;: all OSes installed on the same disk will
share it and put their boot resources into them (as opposed to the
traditional boot partition, of which there is one per installed Linux
OS, and only that one will put resources there).&lt;/p&gt;
&lt;p&gt;To summarize, the most common setup on typical Linux distributions is
something like this:&lt;/p&gt;
&lt;table&gt;
  &lt;tr&gt;
   &lt;th&gt;Type&lt;/th&gt;
   &lt;th&gt;Linux Mount Point&lt;/th&gt;
   &lt;th&gt;File System Choice&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;Linux “Boot” Partition&lt;/td&gt;
   &lt;td&gt;&lt;code&gt;/boot/&lt;/code&gt;&lt;/td&gt;
   &lt;td&gt;Any Linux File System, typically ext2/ext3/ext4&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ESP&lt;/td&gt;
   &lt;td&gt;&lt;code&gt;/boot/efi/&lt;/code&gt;&lt;/td&gt;
   &lt;td&gt;vFAT&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;As mentioned, not all distributions or local installations agree on
this. For example, it’s probably worth mentioning that some
distributions decided to put kernels onto the root file system of the
OS itself. For this setup to work the boot loader itself [sic!] must
implement a non-trivial part of the storage stack. This may have to
include RAID, storage drivers, networked storage, volume management,
disk encryption, and Linux file systems. Leaving aside the conceptual
argument that complex storage stacks don’t belong in boot loaders
there are very practical problems with this approach. Reimplementing
the Linux storage stack in all its combinations is a massive amount of
work. It took decades to implement what we have on Linux now, and it
will take a similar amount of work to catch up in the boot loader’s
reimplementation. Moreover, there’s a political complication: some
Linux file system communities made clear they have no interest in
supporting a second file system implementation that is not maintained
as part of the Linux kernel.&lt;/p&gt;
&lt;p&gt;What’s interesting is that the &lt;code&gt;/boot/efi/&lt;/code&gt; mount point is nested
below the &lt;code&gt;/boot/&lt;/code&gt; mount point. This effectively means that to access
the ESP the Boot partition must exist and be mounted first. A system
with just an ESP and without a Boot partition hence doesn’t fit well
into the current model. The Boot partition will also have to carry an
empty “efi” directory that can be used as the inner mount point, and
serves no other purpose.&lt;/p&gt;
&lt;p&gt;Given that the traditional boot partition and the ESP may carry
similar data (i.e. boot loader resources, kernels, initrds) one may
wonder why they are separate concepts. Historically, this was the
easiest way to make the pre-UEFI way how Linux systems were booted
compatible with UEFI: conceptually, the ESP can be seen as just a
minor addition to the status quo ante that way. Today, primarily two
reasons remained:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Some distributions see a benefit in support for complex Linux file
  system concepts such as hardlinks, symlinks, SELinux labels/extended
  attributes and so on when storing boot loader resources. – I
  personally believe that making use of features in the boot file
  systems that the firmware environment cannot really make sense of is
  very clearly not advisable. The UEFI file system APIs know no
  symlinks, and what is SELinux to UEFI anyway? Moreover, putting more
  than the absolute minimum of simple data files into such file
  systems immediately raises questions about how to authenticate them
  comprehensively (including all fancy metadata) cryptographically on
  use (see below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On real-life systems that ship with non-Linux OSes the ESP often
  comes pre-installed with a size too small to carry multiple Linux
  kernels and initrds. As growing the size of an existing ESP is
  problematic (for example, because there’s no space available
  immediately after the ESP, or because some low-quality firmware
  reacts badly to the ESP changing size) placing the kernel in a
  separate, secondary partition (i.e. the boot partition) circumvents
  these space issues.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;File System Choices&lt;/h2&gt;
&lt;p&gt;We already mentioned that the ESP effectively has to be vFAT, as that
is what UEFI (more or less) guarantees. The file system choice for the
boot partition is not quite as restricted, but using arbitrary Linux
file systems is not really an option either. The file system must be
accessible by both the boot loader and the Linux OS. Hence only file
systems that are available in both can be used. Note that such
secondary implementations of Linux file systems in the boot
environment – limited as they may be – are not typically welcomed
or supported by the maintainers of the canonical file system
implementation in the upstream Linux kernel. Modern file systems are
notoriously complicated and delicate and simply don’t belong in boot
loaders.&lt;/p&gt;
&lt;p&gt;In a trusted boot world, the two file systems for the ESP and the
&lt;code&gt;/boot/&lt;/code&gt; partition should be considered &lt;em&gt;untrusted&lt;/em&gt;: any code or
essential data read from them must be authenticated cryptographically
before use. And even more, the file system structures themselves are
also untrusted. The file system driver reading them must be careful
not to be exploitable by a rogue file system image. Effectively this
means a simple file system (for which a driver can be more easily
validated and reviewed) is generally a better choice than a complex
file system (Linux file system communities made it pretty clear that
robustness against rogue file system images is outside of their scope
and not what is being tested for.).&lt;/p&gt;
&lt;p&gt;Some approaches tried to address the fact that boot partitions are
untrusted territory by encrypting them via a mechanism compatible to
LUKS, and adding decryption capabilities to the boot loader so it can
access it. This misses the point though, as encryption does not imply
authentication, and only authentication is typically desired. The boot
loader and kernel code are typically Open Source anyway, and hence
there’s little value in attempting to keep secret what is already
public knowledge. Moreover, encryption implies the existence of an
encryption key. Physically typing in the decryption key on a keyboard
might still be acceptable on desktop systems with a single human user
in front, but outside of that scenario unlock via TPM, PKCS#11 or
network services are typically required. And even on the desktop FIDO2
unlocking is probably the future. Implementing all the technologies
these unlocking mechanisms require in the boot loader is not
realistic, unless the boot loader shall become a full OS on its own as
it would require subsystems for FIDO2, PKCS#11, USB, Bluetooth
network, smart card access, and so on.&lt;/p&gt;
&lt;h2&gt;File System Access Patterns&lt;/h2&gt;
&lt;p&gt;Note that traditionally both mentioned partitions were read-only
during most parts of the boot. Only later, once the OS is up, write
access was required to implement OS or boot loader updates. In today’s
world things have become a bit more complicated. A modern OS might
want to require some limited write access already in the boot loader,
to implement boot counting/boot assessment/automatic fallback (e.g.,
if the same kernel fails to boot 3 times, automatically revert to
older kernel), or to maintain an early storage-based random seed. This
means that even though the file system is &lt;em&gt;mostly read-only,&lt;/em&gt; we need
limited write access after all.&lt;/p&gt;
&lt;p&gt;vFAT cannot compete with modern Linux file systems such as &lt;code&gt;btrfs&lt;/code&gt;
when it comes to data safety guarantees. It’s not a journaled file
system, does not use CoW or any form of checksumming. This means when
used for the system boot process we need to be particularly careful
when accessing it, and in particular when making changes to it (i.e.,
trying to keep changes local to single sectors). It is essential to
use write patterns that minimize the chance of file system
corruption. Checking the file system (“&lt;code&gt;fsck&lt;/code&gt;”) before modification
(and probably also reading) is important, as is ensuring the file
system is put into a “clean” state as quickly as possible after each
modification.&lt;/p&gt;
&lt;p&gt;Code quality of the firmware in typical systems is known to not always
be great. When relying on the file system driver included in the
firmware it’s hence a good idea to limit use to operations that have a
better chance to be correctly implemented. For example, when writing
from the UEFI environment it might be wise to avoid any operation that
requires allocation algorithms, but instead focus on access patterns
that only override already written data, and do not require allocation
of new space for the data.&lt;/p&gt;
&lt;p&gt;Besides write access from the boot loader code (as described above)
these file systems will require write access from the OS, to
facilitate boot loader and kernel/initrd updates. These types of
accesses are generally not fully random accesses (i.e., never partial
file updates) but usually mean adding new files as whole, and removing
old files as a whole. Existing files are typically not modified once
created, though they might be replaced wholly by newer versions.&lt;/p&gt;
&lt;h2&gt;Boot Loader Updates&lt;/h2&gt;
&lt;p&gt;Note that the update cycle frequencies for boot loaders and for
kernels/initrds are probably similar these days. While kernels are
still vastly more complex than boot loaders, security issues are
regularly found in both. In particular, as boot loaders (through
“shim” and similar components) carry certificate/keyring and denylist
information, which typically require frequent updates. Update cycles
hence have to be expected regularly.&lt;/p&gt;
&lt;h2&gt;Boot Partition Discovery&lt;/h2&gt;
&lt;p&gt;The traditional boot partition was not recognizable by looking just at
the partition table. On MBR systems it was directly referenced from
the boot sector of the disk, and on EFI systems from information
stored in the ESP. This is less than ideal since by losing this
entrypoint information the system becomes unbootable. It’s typically a
better, more robust idea to make boot partitions recognizable as such
in the partition table directly. This is done for the ESP via the GPT
partition type UUID. For traditional boot partitions this was not done
though.&lt;/p&gt;
&lt;h2&gt;Current Situation Summary&lt;/h2&gt;
&lt;p&gt;Let’s try to summarize the above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Currently, typical deployments use &lt;strong&gt;two distinct boot partitions&lt;/strong&gt;,
  often using two distinct file system implementations&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Firmware effectively dictates existence of the ESP, and the use of
  &lt;strong&gt;vFAT&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In userspace view: the ESP &lt;strong&gt;mount is nested&lt;/strong&gt; below the general
  Boot partition mount&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Resources stored in both partitions are primarily kernel/initrd, and
  boot loader resources&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The mandatory use of vFAT brings certain &lt;strong&gt;data safety challenges&lt;/strong&gt;,
  as does quality of firmware file system driver code&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;During boot limited write access&lt;/strong&gt; is needed, during OS runtime
  more comprehensive write access is needed (though still not fully
  random).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Less restricted but still &lt;strong&gt;limited write patterns from OS
  environment&lt;/strong&gt; (only full file additions/updates/removals, during
  OS/boot loader updates)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Boot loaders should not implement complex storage stacks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ESP can be &lt;strong&gt;auto-discovered&lt;/strong&gt; from the partition table, traditional
  boot partition cannot.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ESP and the traditional boot partition are not protected
  cryptographically neither in structure nor contents. It is expected
  that loaded files are individually authenticated after being read.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The ESP is a &lt;strong&gt;shared resource&lt;/strong&gt; — the traditional boot partition a
  resource specific to each installed Linux OS on the same disk.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;How to Do it Better&lt;/h2&gt;
&lt;p&gt;Now that we have discussed many of the issues with the status quo ante, let’s see how we can do things better:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Two partitions for essentially the same data is a bad idea. Given
  they carry data very similar or identical in nature, the common case
  should be to have only one (but see below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Two file system implementations are worse than one. Given that vFAT
  is more or less mandated by UEFI and the only format universally
  understood by all players, and thus has to be used anyway, it might
  as well be the only file system that is used.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Data safety is unnecessarily bad so far: both ESP and boot partition
  are continuously mounted from the OS, even though access is pretty
  restricted: outside of update cycles access is typically not
  required.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;All partitions should be auto-discoverable/self-descriptive&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The two partitions should not be exposed as nested mounts to userspace&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To be more specific, here’s how I think a better way to set this all up would look like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Whenever possible, only have &lt;strong&gt;one boot partition&lt;/strong&gt;, not two. On EFI
  systems, make it the ESP. On non-EFI systems use an XBOOTLDR
  partition instead (see below). Only have both in the case where a
  Linux OS is installed on a system that already contains an OS with
  an ESP that is too small to carry sufficient kernels/initrds. When a
  system contains a XBOOTLDR partition put kernels/initrd on that,
  otherwise the ESP.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Instead of the vaguely defined, traditional Linux “boot” partition
  use the &lt;strong&gt;XBOOTLDR&lt;/strong&gt; partition type as defined by the &lt;a href="https://uapi-group.org/specifications/specs/discoverable_partitions_specification/"&gt;Discoverable
  Partitions
  Specification&lt;/a&gt;. This
  ensures the partition is discoverable, and can be automatically
  mounted by things like
  &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-gpt-auto-generator.html"&gt;&lt;code&gt;systemd-gpt-auto-generator&lt;/code&gt;&lt;/a&gt;. Use
  XBOOTLDR only if you have to, i.e., when dealing with systems that
  lack UEFI (and where the ESP hence has no value) or to address the
  mentioned size issues with the ESP. Note that unlike the traditional
  boot partition the XBOOTLDR partition is a shared resource, i.e.,
  shared between multiple parallel Linux OS installations on the same
  disk. Because of this it is typically wise to place a per-OS
  directory at the top of the XBOOTLDR file system to avoid conflicts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use &lt;strong&gt;vFAT&lt;/strong&gt; for both partitions, it’s the only thing
  universally understood among relevant firmwares and Linux. It’s
  simple enough to be useful for untrusted storage. Or to say this
  differently: writing a file system driver that is not easily
  vulnerable to rogue disk images is much easier for vFAT than for
  let’s say btrfs. – But the choice of vFAT implies some care needs to
  be taken to address the data safety issues it brings, see below.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mount the two partitions via the “&lt;strong&gt;automount&lt;/strong&gt;”
  logic. For example, via systemd’s
  &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.automount.html"&gt;automount&lt;/a&gt;
  units, with a very short idle time-out (one second or so). This
  improves data safety immensely, as the file systems will remain
  mounted (and thus possibly in a “dirty” state) only for very short
  periods of time, when they are actually accessed – and all that
  while the fact that they are not mounted continuously is mostly not
  noticeable for applications as the file system paths remain
  continuously around. Given that the backing file system (vFAT) has
  poor data safety properties, it is essential to shorten the access
  for unclean file system state as much as possible. In fact, this is
  what the aforementioned &lt;code&gt;systemd-gpt-auto-generator&lt;/code&gt;
  logic actually does by default.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Whenever mounting one of the two partitions, do a file system check
  (&lt;strong&gt;fsck&lt;/strong&gt;; in fact this is also what
  &lt;code&gt;systemd-gpt-auto-generator&lt;/code&gt;does by default, hooked into
  the automount logic, to run on first access). This ensures that even
  if the file system is in an unclean state it is restored to be clean
  when needed, i.e., on first access.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Do not mount the two partitions &lt;strong&gt;nested&lt;/strong&gt;, i.e., no
  more &lt;code&gt;/boot/efi/&lt;/code&gt;. First of all, as mentioned above, it
  should be possible (and is desirable) to only have one of the
  two. Hence it is simply a bad idea to require the other as well,
  just to be able to mount it. More importantly though, by nesting
  them, automounting is complicated, as it is necessary to trigger the
  first automount to establish the second automount, which defeats the
  point of automounting them in the first place. Use the two distinct
  mount points &lt;code&gt;/efi/&lt;/code&gt; (for the ESP) and
  &lt;code&gt;/boot/&lt;/code&gt; (for XBOOTLDR) instead. You might have guessed,
  but that too is what &lt;code&gt;systemd-gpt-auto-generator&lt;/code&gt; does by
  default.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When making additions or updates to ESP/XBOOTLDR from the OS make
  sure to create a file and write it in full, then
  &lt;code&gt;syncfs()&lt;/code&gt; the whole file system, then rename to give it
  its final name, and &lt;code&gt;syncfs()&lt;/code&gt; again. Similar when
  removing files.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When writing from the boot loader environment/UEFI to ESP/XBOOTLDR,
  do not append to files or create new files. Instead overwrite
  already allocated file contents (for example to maintain a random
  seed file) or rename already allocated files to include information
  in the file name (and ideally do not increase the file name in
  length; for example to maintain boot counters).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Consider adopting
  &lt;a href="https://0pointer.net/blog/brave-new-trusted-boot-world.html"&gt;UKIs&lt;/a&gt;,
  which minimize the number of files that need to be updated on the
  ESP/XBOOTLDR during OS/kernel updates (ideally down to 1)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Consider adopting
  &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"&gt;&lt;code&gt;systemd-boot&lt;/code&gt;&lt;/a&gt;,
  which minimizes the number of files that need to be updated on boot
  loader updates (ideally down to 1)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Consider removing any mention of ESP/XBOOTLDR from
  &lt;code&gt;/etc/fstab&lt;/code&gt;, and just let
  &lt;code&gt;systemd-gpt-auto-generator&lt;/code&gt; do its thing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stop implementing file systems, complex storage, disk encryption, …
  in your boot loader.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Implementing things like that you gain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Simplicity&lt;/strong&gt;: only one file system implementation, typically only
  one partition and mount point&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Robust auto-discovery&lt;/strong&gt; of all partitions, no need to even
  configure &lt;code&gt;/etc/fstab&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data safety&lt;/strong&gt; guarantees as good as possible, given the
  circumstances&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To summarize this in a table:&lt;/p&gt;
&lt;table&gt;
  &lt;tr&gt;
   &lt;th&gt;Type&lt;/th&gt;
   &lt;th&gt;Linux Mount Point&lt;/th&gt;
   &lt;th&gt;File System Choice&lt;/th&gt;
   &lt;th&gt;Automount&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;ESP&lt;/td&gt;
   &lt;td&gt;&lt;code&gt;/efi/&lt;/code&gt;&lt;/td&gt;
   &lt;td&gt;vFAT&lt;/td&gt;
   &lt;td&gt;yes&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;XBOOTLDR&lt;/td&gt;
   &lt;td&gt;&lt;code&gt;/boot/&lt;/code&gt;&lt;/td&gt;
   &lt;td&gt;vFAT&lt;/td&gt;
   &lt;td&gt;yes&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;A note regarding modern boot loaders that implement the &lt;a href="https://uapi-group.org/specifications/specs/boot_loader_specification/"&gt;Boot Loader
Specification&lt;/a&gt;:
both partitions are explicitly listed in the specification as sources
for both Type #1 and Type #2 boot menu entries. Hence, if you use such
a modern boot loader (e.g. systemd-boot) these two partitions are the
preferred location for boot loader resources, kernels and initrds
anyway.&lt;/p&gt;
&lt;h1&gt;Addendum: You got RAID?&lt;/h1&gt;
&lt;p&gt;You might wonder, what about RAID setups and the ESP? This comes up
regularly in discussions: how to set up the ESP so that (software)
RAID1 (mirroring) can be done on the ESP. Long story short: I’d
strongly advise against using RAID on the ESP. Firmware typically
doesn’t have native RAID support, and given that firmware and boot
loader can write to the file systems involved, any attempt to use
software RAID on them will mean that a boot cycle might corrupt the
RAID sync, and immediately requires a re-synchronization after
boot. If RAID1 backing for the ESP is really necessary, the only way
to implement that safely would be to implement this as a driver for
UEFI – but that creates certain bootstrapping issues (i.e., where to
place the driver if not the ESP, a file system the driver is supposed
to be used for), and also reimplements a considerable component of the
OS storage stack in firmware mode, which seems problematic.&lt;/p&gt;
&lt;p&gt;So what to do instead? My recommendation would be to solve this via
userspace tooling. If redundant disk support shall be implemented for
the ESP, then create separate ESPs on all disks, and synchronize them
on the file system level instead of the block level. Or in other
words, the tools that install/update/manage kernels or boot loaders
should be taught to maintain multiple ESPs instead of one. Copy the
kernels/boot loader files to all of them, and remove them from all of
them. Under the assumption that the goal of RAID is a more reliable
system this should be the best way to achieve that, as it doesn’t
pretend the firmware could do things it actually cannot do. Moreover
it minimizes the complexity of the boot loader, shifting the syncing
logic to userspace, where it’s typically easier to get right.&lt;/p&gt;
&lt;h1&gt;Addendum: Networked Boot&lt;/h1&gt;
&lt;p&gt;The discussion above focuses on booting up from a local disk. When
thinking about networked boot I think two scenarios are particularly
relevant:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;PXE-style network booting. I think in this mode of operation focus
   should be on directly booting a single UKI image instead of a boot
   loader. This sidesteps the whole issue of maintaining any boot
   partition at all, and simplifies the boot process greatly. In
   scenarios where this is not sufficient, and an interactive boot
   menu or other boot loader features are desired, it might be a good
   idea to take inspiration from the UKI concept, and build a single
   boot loader EFI binary (such as systemd-boot), and include the UKIs
   for the boot menu items and other resources inside it via PE
   sections. Or in other words, build a single boot loader binary that
   is “supercharged” and contains all auxiliary resources in its own
   PE sections. (Note: this does not exist, it’s an idea I intend to
   explore with systemd-boot). Benefit: a single file has to be
   downloaded via PXE/TFTP, not more. Disadvantage: unused resources
   are downloaded unnecessarily. Either way: in this context there is
   no local storage, and the ESP/XBOOTLDR discussion above is without
   relevance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initrd-style network booting. In this scenario the boot loader and
   kernel/initrd (better: UKI) are available on a local disk. The
   initrd then configures the network and transitions to a network
   share or file system on a network block device for the root file
   system. In this case the discussion above applies, and in fact the
   ESP or XBOOTLDR partition would be the only partition available
   locally on disk.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And this is all I have for today.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 03 Nov 2022 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2022-11-03:/blog/linux-boot-partitions.html</guid><category>projects</category></item><item><title>Brave New Trusted Boot World</title><link>https://0pointer.net/blog/brave-new-trusted-boot-world.html</link><description>&lt;h1&gt;🔐 Brave New Trusted Boot World 🚀&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;This document looks at the boot process of general purpose Linux
distributions. It covers the status quo and how we envision Linux boot
to work in the future with a focus on robustness and simplicity.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This document will assume that the reader has comprehensive
familiarity with TPM 2.0 security chips and their capabilities (e.g.,
PCRs, measurements, SRK), boot loaders, the &lt;code&gt;shim&lt;/code&gt; binary, Linux,
initrds, UEFI Firmware, PE binaries, and SecureBoot.&lt;/p&gt;
&lt;h2&gt;Problem Description&lt;/h2&gt;
&lt;p&gt;Status quo ante of the boot logic on typical Linux distributions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Most popular Linux distributions generate &lt;code&gt;initrds&lt;/code&gt; locally, and
  they are unsigned, thus not protected through SecureBoot (since that
  would require local SecureBoot key enrollment, which is generally
  not done), nor TPM PCRs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Boot chain is typically Firmware →
  &lt;a href="https://github.com/rhboot/shim"&gt;&lt;code&gt;shim&lt;/code&gt;&lt;/a&gt; → &lt;code&gt;grub&lt;/code&gt; → Linux kernel →
  &lt;code&gt;initrd&lt;/code&gt; (&lt;code&gt;dracut&lt;/code&gt; or similar) → root file system&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Firmware’s UEFI SecureBoot protects shim, shim’s key management
  protects grub and kernel. No code signing protects initrd. initrd
  acquires the key for encrypted root fs from the user (or
  TPM/FIDO2/PKCS11).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;shim&lt;/code&gt;/&lt;code&gt;grub&lt;/code&gt;/kernel is measured into TPM PCR 4, among other stuff&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;EFI TPM event log reports measured data into TPM PCRs, and can be
  used to reconstruct and validate state of TPM PCRs from the used
  resources.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;No userspace components are typically measured, except for what IMA
  measures&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;New kernels require locally generating new boot loader scripts and
  generating a new initrd each time. OS updates thus mean fragile
  generation of multiple resources and copying multiple files into the
  boot partition.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Problems with the status quo ante:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;initrd typically unlocks root file system encryption, but is not
  protected &lt;em&gt;whatsoever&lt;/em&gt;, and trivial to attack and modify offline&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;OS updates are brittle: PCR values of grub are very hard to
  pre-calculate, as grub measures chosen control flow path, not just
  code images. PCR values vary wildly, and OS provided resources are
  not measured into separate PCRs. Grub’s PCR measurements might be
  useful up to a point to reason about the boot after the fact, for
  the most basic remote attestation purposes, but useless for
  calculating them ahead of time during the OS build process (which
  would be desirable to be able to bind secrets to future expected PCR
  state, for example to bind secrets to an OS in a way that it remain
  accessible even after that OS is updated).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Updates of a boot loader are not robust, require multi-file updates
  of ESP and boot partition, and regeneration of boot scripts&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;No rollback protection (no way to cryptographically invalidate
  access to TPM-bound secrets on OS updates)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Remote attestation of running software is needlessly complex since
  initrds are generated locally and thus basically are guaranteed to
  vary on each system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Locking resources maintained by arbitrary user apps to TPM state
  (PCRs) is not realistic for general purpose systems, since PCRs will
  change on every OS update, and there’s no mechanism to re-enroll
  each such resource before every OS update, and remove the old
  enrollment after the update.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There is no concept to cryptographically invalidate/revoke secrets
  for an older OS version once updated to a new OS version. An
  attacker thus can always access the secrets generated on old OSes if
  they manage to exploit an old version of the OS — even if a newer
  version already has been deployed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Goals of the new design:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Provide a &lt;strong&gt;fully signed execution path&lt;/strong&gt; from firmware to
  userspace, no exceptions&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Provide a &lt;strong&gt;fully measured execution path&lt;/strong&gt; from firmware to
  userspace, no exceptions&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Separate out TPM PCRs assignments&lt;/strong&gt;, by “owner” of measured
  resources, so that resources can be bound to them in a fine-grained
  fashion.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Allow &lt;strong&gt;easy pre-calculation of expected PCR values&lt;/strong&gt; based on
  booted kernel/initrd, configuration, local identity of the system&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Rollback protection&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Simple &amp;amp; robust updates: &lt;strong&gt;one updated file per concept&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Updates without requiring re-enrollment/local preparation&lt;/strong&gt; of the
  TPM-protected resources (no more “brittle” PCR hashes that must be
  propagated into every TPM-protected resource on each OS update)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;System ready for easy &lt;strong&gt;remote attestation&lt;/strong&gt;, to prove validity of
  booted OS, configuration and local identity&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ability to &lt;strong&gt;bind secrets to specific phases of the boot&lt;/strong&gt;, e.g. the
  root fs encryption key should be retrievable from the TPM only in
  the initrd, but not after the host transitioned into the root fs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Reasonably &lt;strong&gt;secure, automatic, unattended unlocking&lt;/strong&gt; of disk
  encryption secrets should be possible.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;“Democratize” use of PCR policies by defining PCR register meanings,
  and making binding to them robust against updates, so that
  &lt;strong&gt;external projects&lt;/strong&gt; can safely and securely bind their own data to
  them (or use them for remote attestation) without risking breakage
  whenever the OS is updated.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Build around &lt;strong&gt;TPM 2.0&lt;/strong&gt; (with graceful fallback for TPM-less
  systems if desired, but TPM 1.2 support is out of scope)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Considered attack scenarios and considerations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Evil Maid: neither online nor offline (i.e. “at rest”), physical
  access to a storage device should enable an attacker to read the
  user’s plaintext data on disk (confidentiality); neither online nor
  offline, physical access to a storage device should allow undetected
  modification/backdooring of user data or OS (integrity), or
  exfiltration of secrets.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TPMs are assumed to be reasonably “secure”, i.e. can securely
  store/encrypt secrets. Communication to TPM is not “secure” though
  and must be protected on the wire.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Similar, the CPU is assumed to be reasonably “secure”&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SecureBoot is assumed to be reasonably “secure” to permit validated
  boot up to and including shim+boot loader+kernel (but see discussion
  below)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;All user data must be encrypted &lt;em&gt;and&lt;/em&gt; authenticated. All vendor and
  administrator data must be authenticated.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It is assumed all software involved regularly contains
  vulnerabilities and requires frequent updates to address them, plus
  regular revocation of old versions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It is further assumed that key material used for signing code by the
  OS vendor can reasonably be kept secure (via use of HSM, and
  similar, where secret key information never leaves the signing
  hardware) and does &lt;em&gt;not&lt;/em&gt; require frequent roll-over.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Proposed Construction&lt;/h2&gt;
&lt;p&gt;Central to the proposed design is the concept of a &lt;strong&gt;Unified Kernel
Image (UKI)&lt;/strong&gt;. These UKIs are the combination of a Linux kernel image,
and initrd, a UEFI boot stub program (and further resources, see
below) into one single UEFI PE file that can either be directly
invoked by the UEFI firmware (which is useful in particular in some
cloud/Confidential Computing environments) or through a boot loader
(which is generally useful to implement support for multiple kernel
versions, with interactive or automatic selection of image to boot
into, potentially with automatic fallback management to increase
robustness).&lt;/p&gt;
&lt;h2&gt;UKI Components&lt;/h2&gt;
&lt;p&gt;Specifically, UKIs typically consist of the following resources:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;An UEFI boot stub that is a small piece of code still running in
   UEFI mode and that transitions into the Linux kernel included in
   the UKI (e.g., as implemented in
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"&gt;&lt;code&gt;sd-stub&lt;/code&gt;&lt;/a&gt;,
   see below)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Linux kernel to boot in the &lt;code&gt;.linux&lt;/code&gt; PE section&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The initrd that the kernel shall unpack and invoke in the
   &lt;code&gt;.initrd&lt;/code&gt; PE section&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A kernel command line string, in the &lt;code&gt;.cmdline&lt;/code&gt; PE
   section&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, information describing the OS this kernel is intended
   for, in the &lt;code&gt;.osrel&lt;/code&gt; PE section (derived from
   &lt;code&gt;/etc/os-release&lt;/code&gt; of the booted OS). This is useful for
   presentation of the UKI in the boot loader menu, and ordering it
   against other entries, using the included version information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, information describing kernel release information
   (i.e. &lt;code&gt;uname -r&lt;/code&gt; output) in the &lt;code&gt;.uname&lt;/code&gt; PE
   section. This is also useful for presentation of the UKI in the
   boot loader menu, and ordering it against other entries.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, a boot splash to bring to screen before transitioning
   into the Linux kernel in the &lt;code&gt;.splash&lt;/code&gt; PE section&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, a compiled Devicetree database file, for systems which
   need it, in the &lt;code&gt;.dtb&lt;/code&gt; PE section&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, the public key in PEM format that matches the
   signatures of the &lt;code&gt;.pcrsig&lt;/code&gt; PE section (see below), in a
   &lt;code&gt;.pcrpkey&lt;/code&gt; PE section.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, a JSON file encoding expected PCR 11 hash values seen
    from userspace once the UKI has booted up, along with signatures
    of these expected PCR 11 hash values, matching a specific public
    key in the &lt;code&gt;.pcrsig&lt;/code&gt; PE section. (Note: we use plural
    for “values” and “signatures” here, as this JSON file will
    typically carry a separate value and signature for each PCR bank
    for PCR 11, i.e. one pair of value and signature for the SHA1
    bank, and another pair for the SHA256 bank, and so on. This
    ensures when enrolling or unlocking a TPM-bound secret we’ll
    always have a signature around matching the banks available
    locally (after all, which banks the local hardware supports is up
    to the hardware). For the sake of simplifying this already overly
    complex topic, we’ll pretend in the rest of the text there was
    only one PCR signature per UKI we have to care about, even if this
    is not actually the case.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Given UKIs are regular UEFI PE files, they can thus be signed as one
for SecureBoot, protecting all of the individual resources listed
above at once, and their combination. Standard Linux tools such as
&lt;code&gt;sbsigntool&lt;/code&gt; and &lt;code&gt;pesign&lt;/code&gt; can be used to sign
UKI files.&lt;/p&gt;
&lt;p&gt;UKIs wrap all of the above data in a single file, hence all of the
above components can be updated in one go through single file atomic
updates, which is useful given that the primary expected storage place
for these UKIs is the UEFI System Partition (ESP), which is a vFAT
file system, with its limited data safety guarantees.&lt;/p&gt;
&lt;p&gt;UKIs can be generated via a single, relatively simple objcopy
invocation, that glues the listed components together, generating one
PE binary that then can be signed for SecureBoot. (For details on
building these, see below.)&lt;/p&gt;
&lt;p&gt;Note that the primary location to place UKIs in is the EFI System
Partition (or an otherwise firmware accessible file system). This
typically means a VFAT file system of some form. Hence an effective
UKI size limit of 4GiB is in place, as that’s the largest file size a
FAT32 file system supports.&lt;/p&gt;
&lt;h2&gt;Basic UEFI Stub Execution Flow&lt;/h2&gt;
&lt;p&gt;The mentioned UEFI stub program will execute the following operations
in UEFI mode before transitioning into the Linux kernel that is
included in its &lt;code&gt;.linux&lt;/code&gt; PE section:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The PE sections listed are searched for in the invoked UKI the stub
   is part of, and superficially validated (i.e. general file format is
   in order).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;All PE sections listed above of the invoked UKI are measured into
   TPM PCR 11. This TPM PCR is expected to be all zeroes before the UKI
   initializes. Pre-calculation is thus very straight-forward if the
   resources included in the PE image are known. (Note: as a single
   exception the &lt;code&gt;.pcrsig&lt;/code&gt; PE section is excluded from this measurement,
   as it is supposed to carry the expected result of the measurement, and
   thus cannot also be input to it, see below for further details about
   this section.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the &lt;code&gt;.splash&lt;/code&gt; PE section is included in the UKI it is brought onto the screen&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the &lt;code&gt;.dtb&lt;/code&gt; PE section is included in the UKI it is activated
   using the Devicetree UEFI “fix-up” protocol&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If a command line was passed from the boot loader to the UKI
   executable it is discarded if SecureBoot is enabled and the command
   line from the &lt;code&gt;.cmdline&lt;/code&gt; used. If SecureBoot is disabled and a
   command line was passed it is used in place of the one from
   &lt;code&gt;.cmdline&lt;/code&gt;. Either way the used command line is measured into TPM
   PCR 12. (This of course removes any flexibility of control of the
   kernel command line of the local user. In many scenarios this is
   probably considered beneficial, but in others it is not, and some
   flexibility might be desired. Thus, this concept probably needs to
   be extended sooner or later, to allow more flexible kernel command
   line policies to be enforced via definitions embedded into the
   UKI. For example: allowing definition of multiple kernel command
   lines the user/boot menu can select one from; allowing additional
   allowlisted parameters to be specified; or even optionally allowing
   any verification of the kernel command line to be turned off even
   in SecureBoot mode. It would then be up to the builder of the UKI
   to decide on the policy of the kernel command line.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It will set a couple of volatile EFI variables to inform userspace
   about executed TPM PCR measurements (and which PCR registers were
   used), and other execution properties. (For example: the EFI
   variable &lt;code&gt;StubPcrKernelImage&lt;/code&gt; in the
   &lt;code&gt;4a67b082-0a4c-41cf-b6c7-440b29bb8c4f&lt;/code&gt; vendor namespace indicates
   the PCR register used for the UKI measurement, i.e. the value
   “11”).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An initrd cpio archive is dynamically synthesized from the
   &lt;code&gt;.pcrsig&lt;/code&gt; and &lt;code&gt;.pcrpkey&lt;/code&gt; PE section data (this is later passed to
   the invoked Linux kernel as additional initrd, to be overlaid with
   the main initrd from the .initrd section). These files are later
   available in the &lt;code&gt;/.extra/&lt;/code&gt; directory in the initrd context.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Linux kernel from the &lt;code&gt;.linux&lt;/code&gt; PE section is invoked with with
   a combined initrd that is composed from the blob from the &lt;code&gt;.initrd&lt;/code&gt;
   PE section, the dynamically generated initrd containing the
   &lt;code&gt;.pcrsig&lt;/code&gt; and &lt;code&gt;.pcrpkey&lt;/code&gt; PE sections, and possibly some additional
   components like sysexts or syscfgs.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;TPM PCR Assignments&lt;/h2&gt;
&lt;p&gt;In the construction above we take possession of two PCR registers
previously unused on generic Linux distributions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;TPM &lt;strong&gt;PCR 11&lt;/strong&gt; shall contain measurements of all components of the
  UKI (with exception of the &lt;code&gt;.pcrsig&lt;/code&gt; PE section, see above). This
  PCR will also contain measurements of the boot phase once userspace
  takes over (see below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TPM &lt;strong&gt;PCR 12&lt;/strong&gt; shall contain measurements of the used kernel command
  line. (Plus potentially other forms of
  parameterization/configuration passed into the UKI, not discussed in
  this document)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On top of that we intend to define two more PCR registers like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;TPM &lt;strong&gt;PCR 15&lt;/strong&gt; shall contain measurements of the volume encryption
  key of the root file system of the OS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[TPM &lt;strong&gt;PCR 13&lt;/strong&gt; shall contain measurements of additional extension
  images for the initrd, to enable a modularized initrd – not covered
  by this document]&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(See the &lt;a href="https://uapi-group.org/specifications/specs/linux_tpm_pcr_registry/"&gt;Linux TPM PCR
Registry&lt;/a&gt;
for an overview how these four PCRs fit into the list of Linux PCR
assignments.)&lt;/p&gt;
&lt;p&gt;For all four PCRs the assumption is that they are zero before the UKI
initializes, and only the data that the UKI and the OS measure into
them is included. This makes pre-calculating them straightforward:
given a specific set of UKI components, it is immediately clear what
PCR values can be expected in PCR 11 once the UKI booted up. Given a
kernel command line (and other parameterization/configuration) it is
clear what PCR values are expected in PCR 12.&lt;/p&gt;
&lt;p&gt;Note that these four PCRs are defined by the conceptual “owner” of the
resources measured into them. PCR 11 only contains resources the &lt;strong&gt;OS
vendor&lt;/strong&gt; controls. Thus it is straight-forward for the OS vendor to
pre-calculate and then cryptographically sign the expected values for
PCR 11. The PCR 11 values will be identical on all systems that run
the same version of the UKI. PCR 12 only contains resources the
&lt;strong&gt;administrator&lt;/strong&gt; controls, thus the administrator can pre-calculate
PCR values, and they will be correct on all instances of the OS that
use the same parameters/configuration. PCR 15 only contains resources
inherently local to the &lt;strong&gt;local system&lt;/strong&gt;, i.e. the cryptographic key
material that encrypts the root file system of the OS.&lt;/p&gt;
&lt;p&gt;Separating out these three roles does not imply these actually need to
be separate when used. However the assumption is that in many popular
environments these three roles should be separate.&lt;/p&gt;
&lt;p&gt;By separating out these PCRs by the owner’s role, it becomes
straightforward to remotely attest, individually, on the software that
runs on a node (PCR 11), the configuration it uses (PCR 12) or the
identity of the system (PCR 15). Moreover, it becomes straightforward
to robustly and securely encrypt data so that it can only be unlocked
on a specific set of systems that share the same OS, or the same
configuration, or have a specific identity – or a combination thereof.&lt;/p&gt;
&lt;p&gt;Note that the mentioned PCRs are so far not typically used on generic
Linux-based operating systems, to our knowledge. Windows uses them,
but given that Windows and Linux should typically not be included in
the same boot process this should be unproblematic, as Windows’ use of
these PCRs should thus not conflict with ours.&lt;/p&gt;
&lt;p&gt;To summarize:&lt;/p&gt;
&lt;table&gt;
  &lt;tr&gt;
   &lt;th&gt;PCR&lt;/th&gt;
   &lt;th&gt;Purpose&lt;/th&gt;
   &lt;th&gt;Owner&lt;/th&gt;
   &lt;th&gt;Expected Value before UKI boot&lt;/th&gt;
   &lt;th&gt;Pre-Calculable&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;11&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;Measurement of &lt;strong&gt;UKI components&lt;/strong&gt; and &lt;strong&gt;boot phases&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;OS Vendor&lt;/td&gt;
   &lt;td&gt;Zero&lt;/td&gt;
   &lt;td&gt;Yes&lt;br/&gt;(at UKI build time)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;12&lt;/strong&gt;&lt;/td&gt;
   &lt;td&gt;Measurement of &lt;strong&gt;kernel command line,&lt;/strong&gt; additional &lt;strong&gt;kernel runtime configuration&lt;/strong&gt; such as systemd credentials, systemd syscfg images&lt;/td&gt;
   &lt;td&gt;Administrator&lt;/td&gt;
   &lt;td&gt;Zero&lt;/td&gt;
   &lt;td&gt;Yes&lt;br/&gt;(when system configuration is assembled)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;13&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;&lt;strong&gt;System Extension Images&lt;/strong&gt; of initrd&lt;br/&gt;(and possibly more)&lt;/td&gt;
   &lt;td&gt;(Administrator)&lt;/td&gt;
   &lt;td&gt;Zero&lt;/td&gt;
   &lt;td&gt;Yes&lt;br/(when set of extensions is assembled)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
   &lt;td&gt;&lt;strong&gt;15&lt;/strong&gt;
   &lt;/td&gt;
   &lt;td&gt;Measurement of&lt;strong&gt; root file system volume key&lt;/strong&gt;&lt;br/&gt;(Possibly later more: measurement of root file system UUIDs and labels and of the machine ID &lt;code&gt;/etc/machine-id&lt;/code&gt;)&lt;/td&gt;
   &lt;td&gt;Local System&lt;/td&gt;
   &lt;td&gt;Zero&lt;/td&gt;
   &lt;td&gt;Yes&lt;br/&gt;(after first boot once ll such IDs are determined)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;h2&gt;Signature Keys&lt;/h2&gt;
&lt;p&gt;In the model above in particular two sets of private/public key pairs
are relevant:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The SecureBoot key to sign the UKI PE executable with. This controls
  permissible choices of OS/kernel&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The key to sign the expected PCR 11 values with. Signatures made
  with this key will end up in the &lt;code&gt;.pcrsig&lt;/code&gt; PE section. The public
  key part will end up in the &lt;code&gt;.pcrpkey&lt;/code&gt; PE section.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Typically the key pair for the PCR 11 signatures should be chosen with
a narrow focus, reused for exactly one specific OS (e.g. “Fedora
Desktop Edition”) and the series of UKIs that belong to it (all the
way through all the versions of the OS). The SecureBoot signature key
can be used with a broader focus, if desired. By keeping the PCR 11
signature key narrow in focus one can ensure that secrets bound to the
signature key can only be unlocked on the narrow set of UKIs desired.&lt;/p&gt;
&lt;h2&gt;TPM Policy Use&lt;/h2&gt;
&lt;p&gt;Depending on the intended access policy to a resource protected by the
TPM, one or more of the PCRs described above should be selected to
bind TPM policy to.&lt;/p&gt;
&lt;p&gt;For example, the root file system encryption key should likely be
bound to TPM PCR 11, so that it can only be unlocked if a specific set
of UKIs is booted (it should then, once acquired, be measured into PCR
15, as discussed above, so that later TPM objects can be bound to it,
further down the chain). With the model described above this is
reasonably straight-forward to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When userspace wants to bind disk encryption to a specific series of
  UKIs (“&lt;strong&gt;enrollment&lt;/strong&gt;”), it looks for the public key passed to the
  &lt;code&gt;initrd&lt;/code&gt; in the &lt;code&gt;/.extra/&lt;/code&gt; directory (which as discussed above
  originates in the &lt;code&gt;.pcrpkey&lt;/code&gt; PE section of the UKI). The relevant
  userspace component (e.g. &lt;code&gt;systemd&lt;/code&gt;) is then responsible for
  generating a random key to be used as symmetric encryption key for
  the storage volume (let’s call it &lt;em&gt;disk encryption key _here&lt;/em&gt;,
  DEK_). The TPM is then used to encrypt (“seal”) the DEK with its
  internal Storage Root Key (TPM SRK). A TPM2 policy is bound to the
  encrypted DEK. The policy enforces that the DEK may only be
  decrypted if a valid signature is provided that matches the state of
  PCR 11 and the public key provided in the &lt;code&gt;/.extra/&lt;/code&gt; directory of
  the &lt;code&gt;initrd&lt;/code&gt;. The plaintext DEK key is passed to the kernel to
  implement disk encryption (e.g. LUKS/dm-crypt). (Alternatively,
  hardware disk encryption can be used too, i.e. Intel MKTME, AMD SME
  or even OPAL, all of which are outside of the scope of this
  document.) The TPM-encrypted version of the DEK which the TPM
  returned is written to the encrypted volume’s superblock.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When userspace wants to &lt;strong&gt;unlock&lt;/strong&gt; disk encryption on a specific
  UKI, it looks for the signature data passed to the initrd in the
  &lt;code&gt;/.extra/&lt;/code&gt; directory (which as discussed above originates in the
  &lt;code&gt;.pcrsig&lt;/code&gt; PE section of the UKI). It then reads the encrypted
  version of the DEK from the superblock of the encrypted volume. The
  signature and the encrypted DEK are then passed to the TPM. The TPM
  then checks if the current PCR 11 state matches the supplied
  signature from the &lt;code&gt;.pcrsig&lt;/code&gt; section and the public key used during
  enrollment. If all checks out it decrypts (“unseals”) the DEK and
  passes it back to the OS, where it is then passed to the kernel
  which implements the symmetric part of disk encryption.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that in this scheme the encrypted volume’s DEK is &lt;strong&gt;not&lt;/strong&gt; bound
to specific literal PCR hash values, but to a public key which is
expected to sign PCR hash values.&lt;/p&gt;
&lt;p&gt;Also note that the state of PCR 11 only matters during unlocking. It
is not used or checked when enrolling.&lt;/p&gt;
&lt;p&gt;In this scenario:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Input to the TPM part of the &lt;strong&gt;enrollment&lt;/strong&gt; process are the TPM’s
  internal SRK, the plaintext DEK provided by the OS, and the public
  key later used for signing expected PCR values, also provided by the
  OS. – Output is the encrypted (“sealed”) DEK.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Input to the TPM part of the &lt;strong&gt;unlocking&lt;/strong&gt; process are the TPM’s
  internal SRK, the current TPM PCR 11 values, the public key used
  during enrollment, a signature that matches both these PCR values
  and the public key, and the encrypted DEK. – Output is the plaintext
  (“unsealed”) DEK.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that sealing/unsealing is done entirely on the TPM chip, the host
OS just provides the inputs (well, only the inputs that the TPM chip
doesn’t know already on its own), and receives the outputs. With the
exception of the plaintext DEK, none of the inputs/outputs are
sensitive, and can safely be stored in the open. On the wire the
plaintext DEK is protected via TPM parameter encryption (not discussed
in detail here because though important not in scope for this
document).&lt;/p&gt;
&lt;p&gt;TPM PCR 11 is the most important of the mentioned PCRs, and its use is
thus explained in detail here. The other mentioned PCRs can be used in
similar ways, but signatures/public keys must be provided via other
means.&lt;/p&gt;
&lt;p&gt;This scheme builds on the functionality Linux’ LUKS2 functionality
provides, i.e. key management supporting multiple slots, and the
ability to embed arbitrary metadata in the encrypted volume’s
superblock. Note that this means the TPM2-based logic explained here
doesn’t have to be the only way to unlock an encrypted volume. For
example, in many setups it is wise to enroll both this TPM-based
mechanism and an additional “&lt;em&gt;recovery key&lt;/em&gt;” (i.e. a high-entropy
computer generated passphrase the user can provide manually in case
they lose access to the TPM and need to access their data), of which
either can be used to unlock the volume.&lt;/p&gt;
&lt;h2&gt;Boot Phases&lt;/h2&gt;
&lt;p&gt;Secrets needed during boot-up (such as the root file system encryption
key) should typically not be accessible anymore afterwards, to protect
them from access if a system is attacked during runtime. To implement
this the scheme above is extended in one way: at certain milestones of
the boot process additional fixed “words” should be measured into PCR
11. These milestones are placed at conceptual security boundaries,
i.e. whenever code transitions from a higher privileged context to a
less privileged context.&lt;/p&gt;
&lt;p&gt;Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When the initrd initializes (“&lt;code&gt;initrd-enter&lt;/code&gt;”)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the initrd transitions into the root file system (“&lt;code&gt;initrd-leave&lt;/code&gt;”)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the early boot phase of the OS on the root file system has
  completed, i.e. all storage and file systems have been set up and
  mounted, immediately before regular services are started
  (“&lt;code&gt;sysinit&lt;/code&gt;”)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the OS on the root file system completed the boot process far
  enough to allow unprivileged users to log in (“&lt;code&gt;complete&lt;/code&gt;”)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the OS begins shut down (“&lt;code&gt;shutdown&lt;/code&gt;”)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the service manager is mostly finished with shutting down and
  is about to pass control to the final phase of the shutdown logic
  (“&lt;code&gt;final&lt;/code&gt;”)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By measuring these additional words into PCR 11 the distinct phases of
the boot process can be distinguished in a relatively straight-forward
fashion and the expected PCR values in each phase can be determined.&lt;/p&gt;
&lt;p&gt;The phases are measured into PCR 11 (as opposed to some other PCR)
mostly because available PCRs are scarce, and the boot phases defined
are typically specific to a chosen OS, and hence fit well with the
other data measured into PCR 11: the UKI which is also specific to the
OS. The OS vendor generates both the UKI and defines the boot phases,
and thus can safely and reliably pre-calculate/sign the expected PCR
values for each phase of the boot.&lt;/p&gt;
&lt;h2&gt;Revocation/Rollback Protection&lt;/h2&gt;
&lt;p&gt;In order to secure secrets stored at rest, in particular in
environments where unattended decryption shall be possible, it is
essential that an attacker cannot use old, known-buggy – but properly
signed – versions of software to access them.&lt;/p&gt;
&lt;p&gt;Specifically, if disk encryption is bound to an OS vendor (via UKIs
that include expected PCR values, signed by the vendor’s public key)
there must be a mechanism to lock out old versions of the OS or UKI
from accessing TPM based secrets once it is determined that the old
version is vulnerable.&lt;/p&gt;
&lt;p&gt;To implement this we propose making use of one of the “counters” TPM
2.0 devices provide: integer registers that are persistent in the TPM
and can only be increased on request of the OS, but never be
decreased. When sealing resources to the TPM, a policy may be declared
to the TPM that restricts how the resources can later be unlocked:
here we use one that requires that along with the expected PCR values
(as discussed above) a counter integer range is provided to the TPM
chip, along with a suitable signature covering both, matching the
public key provided during sealing. The sealing/unsealing mechanism
described above is thus extended: the signature passed to the TPM
during unsealing now covers both the expected PCR values and the
expected counter range. To be able to use a signature associated with
an UKI provided by the vendor to unseal a resource, the counter thus
must be at least increased to the lower end of the range the signature
is for. By doing so the ability is lost to unseal the resource for
signatures associated with older versions of the UKI, because their
upper end of the range disables access once the counter has been
increased far enough. By carefully choosing the upper and lower end of
the counter range whenever the PCR values for an UKI shall be signed
it is thus possible to ensure that updates can invalidate prior
versions’ access to resources. By placing some space between the upper
and lower end of the range it is possible to allow a controlled level
of fallback UKI support, with clearly defined milestones where
fallback to older versions of an UKI is not permitted anymore.&lt;/p&gt;
&lt;p&gt;Example: a hypothetical distribution FooOS releases a regular stream
of UKI kernels 5.1, 5.2, 5.3, … It signs the expected PCR values for
these kernels with a key pair it maintains in a HSM. When signing UKI
5.1 it includes information directed at the TPM in the signed data
declaring that the TPM counter must be above 100, and below 120, in
order for the signature to be used. Thus, when the UKI is booted up
and used for unlocking an encrypted volume the unlocking code must
first increase the counter to 100 if needed, as the TPM will otherwise
refuse unlocking the volume. The next release of the UKI, i.e. UKI 5.2
is a feature release, i.e. reverting back to the old kernel locally is
acceptable. It thus does not increase the lower bound, but it
increases the upper bound for the counter in the signature payload,
thus encoding a valid range 100…121 in the signed payload. Now a major
security vulnerability is discovered in UKI 5.1. A new UKI 5.3 is
prepared that fixes this issue. It is now essential that UKI 5.1 can
no longer be used to unlock the TPM secrets. Thus UKI 5.3 will bump
the lower bound to 121, and increase the upper bound by one, thus
allowing a range 121…122. Or in other words: for each new UKI release
the signed data shall include a counter range declaration where the
upper bound is increased by one. The lower range is left as-is between
releases, except when an old version shall be cut off, in which case
it is bumped to one above the upper bound used in that release.&lt;/p&gt;
&lt;h2&gt;UKI Generation&lt;/h2&gt;
&lt;p&gt;As mentioned earlier, UKIs are the combination of various resources
into one PE file. For most of these individual components there are
pre-existing tools to generate the components. For example the
included kernel image can be generated with the usual Linux kernel
build system. The initrd included in the UKI can be generated with
existing tools such as &lt;code&gt;dracut&lt;/code&gt; and similar. Once the basic components
(&lt;code&gt;.linux&lt;/code&gt;, &lt;code&gt;.initrd&lt;/code&gt;, &lt;code&gt;.cmdline&lt;/code&gt;, &lt;code&gt;.splash&lt;/code&gt;, &lt;code&gt;.dtb&lt;/code&gt;, &lt;code&gt;.osrel&lt;/code&gt;,
&lt;code&gt;.uname&lt;/code&gt;) have been acquired the combination process works roughly
like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The expected PCR 11 hashes (and signatures for them) for the UKI
   are calculated. The tool for that takes all basic UKI components
   and a signing key as input, and generates a JSON object as output
   that includes both the literal expected PCR hash values and a
   signature for them. (For all selected TPM2 banks)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The EFI stub binary is now combined with the basic components, the
   generated JSON PCR signature object from the first step (in the
   &lt;code&gt;.pcrsig&lt;/code&gt; section) and the public key for it (in the &lt;code&gt;.pcrpkey&lt;/code&gt;
   section). This is done via a simple “&lt;code&gt;objcopy&lt;/code&gt;” invocation
   resulting in a single UKI PE binary.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The resulting EFI PE binary is then signed for SecureBoot (via a
   tool such as
   &lt;a href="https://git.kernel.org/pub/scm/linux/kernel/git/jejb/sbsigntools.git/"&gt;&lt;code&gt;sbsign&lt;/code&gt;&lt;/a&gt;
   or similar).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that the UKI model implies pre-built initrds. How to generate
these (and securely extend and parameterize them) is outside of the
scope of this document, but a related document will be provided
highlighting these concepts.&lt;/p&gt;
&lt;h2&gt;Protection Coverage of SecureBoot Signing and PCRs&lt;/h2&gt;
&lt;p&gt;The scheme discussed here touches both SecureBoot code signing and TPM
PCR measurements. These two distinct mechanisms cover separate parts
of the boot process.&lt;/p&gt;
&lt;p&gt;Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Firmware/Shim SecureBoot signing covers bootloader and UKI&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TPM PCR 11 covers the UKI components and boot phase&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TPM PCR 12 covers admin configuration&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TPM PCR 15 covers the local identity of the host&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that this means SecureBoot coverage ends once the system
transitions from the initrd into the root file system. It is assumed
that trust and integrity have been established before this transition
by some means, for example LUKS/dm-crypt/dm-integrity, ideally bound
to PCR 11 (i.e. UKI and boot phase).&lt;/p&gt;
&lt;p&gt;A robust and secure update scheme for PCR 11 (i.e. UKI) has been
described above, which allows binding TPM-locked resources to a
UKI. For PCR 12 no such scheme is currently designed, but might be
added later (use case: permit access to certain secrets only if the
system runs with configuration signed by a specific set of
keys). Given that resources measured into PCR 15 typically aren’t
updated (or if they are updated loss of access to other resources
linked to them is desired) no update scheme should be necessary for
it.&lt;/p&gt;
&lt;p&gt;This document focuses on the three PCRs discussed above. Disk
encryption and other userspace may choose to also bind to other
PCRs. However, doing so means the PCR brittleness issue returns that
this design is supposed to remove. PCRs defined by the various
firmware UEFI/TPM specifications generally do not know any concept for
signatures of expected PCR values.&lt;/p&gt;
&lt;p&gt;It is known that the industry-adopted SecureBoot signing keys are too
broad to act as more than a denylist for known bad code. It is thus
probably a good idea to enroll vendor SecureBoot keys wherever
possible (e.g. in environments where the hardware is very well known,
and VM environments), to raise the bar on preparing rogue UKI-like PE
binaries that will result in PCR values that match expectations but
actually contain bad code. Discussion about that is however outside of
the scope of this document.&lt;/p&gt;
&lt;h2&gt;Whole OS embedded in the UKI&lt;/h2&gt;
&lt;p&gt;The above is written under the assumption that the UKI embeds an
initrd whose job it is to set up the root file system: find it,
validate it, cryptographically unlock it and similar. Once the root
file system is found, the system transitions into it.&lt;/p&gt;
&lt;p&gt;While this is the traditional design and likely what most systems will
use, it is also possible to embed a regular root file system into the
UKI and avoid any transition to an on-disk root file system. In this
mode the whole OS would be encapsulated in the UKI, and
signed/measured as one. In such a scenario the whole of the OS must be
loaded into RAM and remain there, which typically restricts the
general usability of such an approach. However, for specific purposes
this might be the design of choice, for example to implement
self-sufficient recovery or provisioning systems.&lt;/p&gt;
&lt;h1&gt;Proposed Implementations &amp;amp; Current Status&lt;/h1&gt;
&lt;p&gt;The toolset for most of the above is already implemented in systemd and related projects in one way or another. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"&gt;&lt;code&gt;systemd-stub&lt;/code&gt;&lt;/a&gt;
   (or short: &lt;code&gt;sd-stub&lt;/code&gt;) component implements the discussed UEFI stub
   program&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-measure.html"&gt;&lt;code&gt;systemd-measure&lt;/code&gt;&lt;/a&gt;
   tool can be used to pre-calculate expected PCR 11 values given the
   UKI components and can sign the result, as discussed in the UKI
   Image Generation section above.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html"&gt;&lt;code&gt;systemd-cryptenroll&lt;/code&gt;&lt;/a&gt;
   and
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptsetup@.service.html"&gt;&lt;code&gt;systemd-cryptsetup&lt;/code&gt;&lt;/a&gt;
   tools can be used to bind a LUKS2 encrypted file system volume to a
   TPM and PCR 11 public key/signatures, according to the scheme
   described above. (The two components also implement a “&lt;em&gt;recovery
   key&lt;/em&gt;” concept, as discussed above)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-pcrphase.service.html"&gt;&lt;code&gt;systemd-pcrphase&lt;/code&gt;&lt;/a&gt;
   component measures specific words into PCR 11 at the discussed
   phases of the boot process.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html"&gt;&lt;code&gt;systemd-creds&lt;/code&gt;&lt;/a&gt;
   tool may be used to encrypt/decrypt data objects called
   “credentials” that can be passed into services and booted systems,
   and are automatically decrypted (if needed) immediately before
   service invocation. Encryption is typically bound to the local TPM,
   to ensure the data cannot be recovered elsewhere.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"&gt;&lt;code&gt;systemd-stub&lt;/code&gt;&lt;/a&gt;
(i.e. the UEFI code glued into the UKI) is distinct from
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"&gt;&lt;code&gt;systemd-boot&lt;/code&gt;&lt;/a&gt;
(i.e. the UEFI boot loader than can manage multiple UKIs and other
boot menu items and implements automatic fallback, an interactive menu
and a programmatic interface for the OS among other things). One can
be used without the other – both &lt;code&gt;sd-stub&lt;/code&gt; without &lt;code&gt;sd-boot&lt;/code&gt; and vice
versa – though they integrate nicely if used in combination.&lt;/p&gt;
&lt;p&gt;Note that the mechanisms described are relatively generic, and can be
implemented and be consumed in other software too, systemd should be
considered a reference implementation, though one that found
comprehensive adoption across Linux distributions.&lt;/p&gt;
&lt;p&gt;Some concepts discussed above are currently not
implemented. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The rollback protection logic is currently not implemented.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The mentioned measurement of the root file system volume key to PCR
   15 is implemented, but not merged into the systemd main branch yet.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;The UAPI Group&lt;/h1&gt;
&lt;p&gt;We recently started a new group for discussing concepts and
specifications of basic OS components, including UKIs as described
above. It's called &lt;a href="https://uapi-group.org/"&gt;the UAPI Group&lt;/a&gt;. Please
have a look at the various documents and specifications already
available there, and expect more to come. Contributions welcome!&lt;/p&gt;
&lt;h2&gt;Glossary&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TPM&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Trusted Platform Module&lt;/em&gt;; a security chip found in many modern
systems, both physical systems and increasingly also in virtualized
environments. Traditionally a discrete chip on the mainboard but today
often implemented in firmware, and lately directly in the CPU SoC.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;PCR&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Platform Configuration Register&lt;/em&gt;; a set of registers on a TPM that
are initialized to zero at boot. The firmware and OS can “&lt;em&gt;extend&lt;/em&gt;”
these registers with hashes of data used during the boot process and
afterwards. “Extension” means the supplied data is first
cryptographically hashed. The resulting hash value is then combined
with the previous value of the PCR and the combination hashed
again. The result will become the new value of the PCR. By doing this
iteratively for all parts of the boot process (always with the data
that will be used next during the boot process) a concept of
“&lt;em&gt;Measured Boot&lt;/em&gt;” can be implemented: as long as every element in the
boot chain measures (i.e. extends into the PCR) the next part of the
boot like this, the resulting PCR values will prove cryptographically
that only a certain set of boot components can have been used to boot
up. A standards compliant TPM usually has 24 PCRs, but more than half
of those are already assigned specific meanings by the firmware. Some
of the others may be used by the OS, of which we use four in the
concepts discussed in this document.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Measurement&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The act of “&lt;em&gt;extending&lt;/em&gt;” a PCR with some data object.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;SRK&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Storage Root Key&lt;/em&gt;; a special cryptographic key generated by a TPM
that never leaves the TPM, and can be used to encrypt/decrypt data
passed to the TPM.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;UKI&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Unified Kernel Image&lt;/em&gt;; the concept this document is about. A
combination of kernel, &lt;code&gt;initrd&lt;/code&gt; and other resources. See above.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;SecureBoot&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A mechanism where every software component involved in the boot
process is cryptographically signed and checked against a set of
public keys stored in the mainboard hardware, implemented in firmware,
before it is used.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Measured Boot&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A boot process where each component measures (i.e., hashes and extends
into a TPM PCR, see above) the next component it will pass control to
before doing so. This serves two purposes: it can be used to bind
security policy for encrypted secrets to the resulting PCR values (or
signatures thereof, see above), and it can be used to reason about
used software after the fact, for example for the purpose of remote
attestation.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;initrd&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Short for “&lt;em&gt;initial RAM disk&lt;/em&gt;”, which – strictly speaking – is a
misnomer today, because no RAM disk is anymore involved, but a &lt;code&gt;tmpfs&lt;/code&gt;
file system instance. Also known as “&lt;code&gt;initramfs&lt;/code&gt;”, which is also
misleading, given the file system is not &lt;code&gt;ramfs&lt;/code&gt; anymore, but &lt;code&gt;tmpfs&lt;/code&gt;
(both of which are in-memory file systems on Linux, with different
semantics). The &lt;code&gt;initrd&lt;/code&gt; is passed to the Linux kernel and is
basically a file system tree in &lt;code&gt;cpio&lt;/code&gt; archive. The kernel unpacks the
image into a &lt;code&gt;tmpfs&lt;/code&gt; (i.e., into an in-memory file system), and then
executes a binary from it. It thus contains the binaries for the first
userspace code the kernel invokes. Typically, the &lt;code&gt;initrd&lt;/code&gt;’s job is to
find the actual root file system, unlock it (if encrypted), and
transition into it.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;UEFI&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Short for “&lt;em&gt;Unified Extensible Firmware Interface&lt;/em&gt;”, it is a widely
adopted standard for PC firmware, with native support for &lt;em&gt;SecureBoot&lt;/em&gt;
and Measured Boot.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;EFI&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;More or less synonymous to UEFI, IRL.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Shim&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A boot component originating in the Linux world, which in a way
extends the public key database SecureBoot maintains (which is under
control from Microsoft) with a second layer (which is under control of
the Linux distributions and of the owner of the physical device).&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;PE&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Portable Executable&lt;/em&gt;; a file format for executable binaries,
originally from the Windows world, but also used by UEFI firmware. PE
files may contain code and data, categorized in labeled “sections”&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;ESP&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;EFI System Partition&lt;/em&gt;; a special partition on a storage
medium that the firmware is able to look for UEFI PE binaries
in to execute at boot.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;HSM&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Hardware Security Module&lt;/em&gt;; a piece of hardware that can generate and
store secret cryptographic keys, and execute operations with them,
without the keys leaving the hardware (though this is
configurable). TPMs can act as HSMs.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;DEK&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Disk Encryption Key&lt;/em&gt;; an asymmetric cryptographic key used for
unlocking disk encryption, i.e. passed to LUKS/dm-crypt for activating
an encrypted storage volume.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;LUKS2&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;Linux Unified Key Setup Version 2&lt;/em&gt;; a specification for a superblock
for encrypted volumes widely used on Linux. LUKS2 is the default
on-disk format for the &lt;code&gt;cryptsetup&lt;/code&gt; suite of tools. It provides
flexible key management with multiple independent key slots and allows
embedding arbitrary metadata in a JSON format in the superblock.&lt;/p&gt;
&lt;h2&gt;Thanks&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;I’d like to thank Alain Gefflaut, Anna Trikalinou, Christian Brauner,
Daan de Meyer, Luca Boccassi, Zbigniew Jędrzejewski-Szmek for
reviewing this text.&lt;/em&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 24 Oct 2022 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2022-10-24:/blog/brave-new-trusted-boot-world.html</guid><category>projects</category></item><item><title>Fitting Everything Together</title><link>https://0pointer.net/blog/fitting-everything-together.html</link><description>&lt;p&gt;&lt;em&gt;TLDR: Hermetic &lt;code&gt;/usr/&lt;/code&gt; is awesome; let's popularize image-based OSes
with modernized security properties built around immutability,
SecureBoot, TPM2, adaptability, auto-updating, factory reset,
uniformity – built from traditional distribution packages, but
deployed via images.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Over the past years, systemd gained a number of components for
building Linux-based operating systems. While these components
individually have been adopted by many distributions and products for
specific purposes, we did not publicly communicate a broader vision
of how they should all fit together in the long run. In this blog story I
hope to provide that from my personal perspective, i.e. explain how I
&lt;em&gt;personally&lt;/em&gt; would build an OS and where I &lt;em&gt;personally&lt;/em&gt; think OS
development with Linux should go.&lt;/p&gt;
&lt;p&gt;I figure this is going to be a longer blog story, but I hope it
will be equally enlightening. Please understand though that everything
I write about OS design here is my personal opinion, and not one of my
employer.&lt;/p&gt;
&lt;p&gt;For the last 12 years or so I have been working on Linux OS
development, mostly around &lt;code&gt;systemd&lt;/code&gt;. In all those years I had a lot
of time thinking about the Linux platform, and specifically
traditional Linux distributions and their strengths and weaknesses. I
have seen many attempts to reinvent Linux distributions in one way or
another, to varying success. After all this most would probably
agree that the traditional RPM or dpkg/apt-based distributions still
define the Linux platform more than others (for 25+ years now), even
though some Linux-based OSes (Android, ChromeOS) probably outnumber
the installations overall.&lt;/p&gt;
&lt;p&gt;And over all those 12 years I kept wondering, how would &lt;em&gt;I&lt;/em&gt; actually
build an OS for a system or for an appliance, and what are the
components necessary to achieve that. And most importantly, how can we
make these components generic enough so that they are useful in
generic/traditional distributions too, and in other use cases than my
own.&lt;/p&gt;
&lt;h1&gt;The Project&lt;/h1&gt;
&lt;p&gt;Before figuring out how I would build an OS it's probably good to
figure out what type of OS I actually want to build, what purpose I
intend to cover. I think a desktop OS is probably the most
interesting. Why is that?  Well, first of all, I use one of these for my
job every single day, so I care immediately, it's my primary tool of
work. But more importantly: I think building a desktop OS is one of
the most complex overall OS projects you can work on, simply because
desktops are so much more versatile and variable than servers or
embedded devices. If one figures out the desktop case, I think there's
a lot more to learn from, and reuse in the server or embedded case,
then going the other way. After all, there's a reason why so much of the
widely accepted Linux userspace stack comes from people with a desktop
background (including systemd, BTW).&lt;/p&gt;
&lt;p&gt;So, let's see how &lt;em&gt;I&lt;/em&gt; would build a desktop OS. If you press me hard,
and ask me why I would do that given that ChromeOS already exists and
more or less is a Linux desktop OS: there's plenty I am missing in
ChromeOS, but most importantly, I am lot more interested in building
something people can easily and naturally rebuild and hack on,
i.e. Google-style over-the-wall open source with its skewed power
dynamic is not particularly attractive to me. I much prefer building
this within the framework of a proper open source community, out in
the open, and basing all this strongly on the status quo ante,
i.e. the existing distributions. I think it is crucial to provide a
clear avenue to build a modern OS based on the existing distribution
model, if there shall ever be a chance to make this interesting for a
larger audience.&lt;/p&gt;
&lt;p&gt;(Let me underline though: even though I am going to focus on a desktop
here, most of this is directly relevant for servers as well, in
particular container host OSes and suchlike, or embedded devices,
e.g. car IVI systems and so on.)&lt;/p&gt;
&lt;h1&gt;Design Goals&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;First and foremost, I think the focus must be on an image-based
   design rather than a package-based one. For robustness and security
   it is essential to operate with reproducible, immutable images that
   describe the OS or large parts of it in full, rather than operating
   always with fine-grained RPM/dpkg style packages. That's not to say
   that packages are not relevant (I actually think they matter a
   lot!), but I think they should be less of a tool for deploying code
   but more one of building the objects to deploy. A different way to
   see this: any OS built like this must be easy to replicate in a
   large number of instances, with minimal variability. Regardless if
   we talk about desktops, servers or embedded devices: focus for my
   OS should be on "cattle", not "pets", i.e that from the start it's
   trivial to reuse the well-tested, cryptographically signed
   combination of software over a large set of devices the same way,
   with a maximum of bit-exact reuse and a minimum of local variances.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The trust chain matters, from the boot loader all the way to the
   apps. This means all code that is run must be cryptographically
   validated before it is run. All storage must be cryptographically
   protected: public data must be integrity checked; private data must
   remain confidential.&lt;/p&gt;
&lt;p&gt;This is in fact where big distributions currently fail pretty
   badly. I would go as far as saying that SecureBoot on Linux
   distributions is mostly security theater at this point, if you so
   will. That's because the initrd that unlocks your FDE (i.e. the
   cryptographic concept that protects the rest of your system) is not
   signed or protected in any way. It's trivial to modify for an
   attacker with access to your hard disk in an undetectable way, and
   collect your FDE passphrase. The involved bureaucracy around the
   implementation of UEFI SecureBoot of the big distributions is to a
   large degree pointless if you ask me, given that once the kernel is
   assumed to be in a good state, as the next step the system invokes
   completely unsafe code with full privileges.&lt;/p&gt;
&lt;p&gt;This is a fault of current Linux distributions though, not of
   SecureBoot in general. Other OSes use this functionality in more
   useful ways, and we should correct that too.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pretty much the same thing: offline security matters. I want
   my data to be reasonably safe at rest, i.e. cryptographically
   inaccessible even when I leave my laptop in my hotel room,
   suspended.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Everything should be cryptographically measured, so that remote
   attestation is supported for as much software shipped on the OS as
   possible.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Everything should be self descriptive, have single sources of truths
   that are closely attached to the object itself, instead of stored
   externally.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Everything should be self-updating. Today we know that software is
   never bug-free, and thus requires a continuous update cycle. Not
   only the OS itself, but also any extensions, services and apps
   running on it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Everything should be robust in respect to aborted OS operations,
   power loss and so on. It should be robust towards hosed OS updates
   (regardless if the download process failed, or the image was
   buggy), and not require user interaction to recover from them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There must always be a way to put the system back into a
   well-defined, guaranteed safe state ("factory reset"). This
   includes that all sensitive data from earlier uses becomes
   cryptographically inaccessible.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The OS should enforce clear separation between vendor resources,
   system resources and user resources: conceptually and when it comes
   to cryptographical protection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Things should be adaptive: the system should come up and make the
    best of the system it runs on, adapt to the storage and
    hardware. Moreover, the system should support execution on bare
    metal equally well as execution in a VM environment and in a
    container environment (i.e. &lt;code&gt;systemd-nspawn&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Things should not require explicit installation. i.e. every image
    should be a live image. For installation it should be sufficient to
    &lt;code&gt;dd&lt;/code&gt; an OS image onto disk. Thus, strong focus on "instantiate on
    first boot", rather than "instantiate before first boot".&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Things should be reasonably minimal. The image the system starts
    its life with should be quick to download, and not include
    resources that can as well be created locally later.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;System identity, local cryptographic keys and so on should be
    generated locally, not be pre-provisioned, so that there's no leak
    of sensitive data during the transport onto the system possible.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Things should be reasonably democratic and hackable. It should be
    easy to fork an OS, to modify an OS and still get reasonable
    cryptographic protection. Modifying your OS should not necessarily
    imply that your "warranty is voided" and you lose all good
    properties of the OS, if you so will.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Things should be reasonably modular. The privileged part of the
    core OS must be extensible, including on the individual system.
    It's not sufficient to support extensibility just through
    high-level UI applications.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Things should be reasonably uniform, i.e. ideally the same formats
    and cryptographic properties are used for all components of the
    system, regardless if for the host OS itself or the payloads it
    receives and runs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Even taking all these goals into consideration, it should still be
    close to traditional Linux distributions, and take advantage of what
    they are really good at: integration and security update cycles.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now that we know our goals and requirements, let's start designing the
OS along these lines.&lt;/p&gt;
&lt;h1&gt;Hermetic &lt;code&gt;/usr/&lt;/code&gt;&lt;/h1&gt;
&lt;p&gt;First of all the OS resources (code, data files, …) should be
&lt;em&gt;hermetic&lt;/em&gt; in an immutable &lt;code&gt;/usr/&lt;/code&gt;. This means that a &lt;code&gt;/usr/&lt;/code&gt; tree
should carry everything needed to set up the minimal set of
directories and files outside of &lt;code&gt;/usr/&lt;/code&gt; to make the system work. This
&lt;code&gt;/usr/&lt;/code&gt; tree can then be mounted read-only into the writable root file
system that then will eventually carry the local configuration, state
and user data in &lt;code&gt;/etc/&lt;/code&gt;, &lt;code&gt;/var/&lt;/code&gt; and &lt;code&gt;/home/&lt;/code&gt; as usual.&lt;/p&gt;
&lt;p&gt;Thankfully, modern distributions are surprisingly close to working
without issues in such a hermetic context. Specifically, Fedora works
mostly just fine: it has adopted the &lt;code&gt;/usr/&lt;/code&gt; merge and the declarative
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"&gt;&lt;code&gt;systemd-sysusers&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles-setup.service.html"&gt;&lt;code&gt;systemd-tmpfiles&lt;/code&gt;&lt;/a&gt;
components quite comprehensively, which means the directory trees
outside of &lt;code&gt;/usr/&lt;/code&gt; are automatically generated as needed if missing.
In particular &lt;code&gt;/etc/passwd&lt;/code&gt; and &lt;code&gt;/etc/group&lt;/code&gt; (and related files) are
appropriately populated, should they be missing entries.&lt;/p&gt;
&lt;p&gt;In my model a hermetic OS is hence comprehensively defined within
&lt;code&gt;/usr/&lt;/code&gt;: combine the &lt;code&gt;/usr/&lt;/code&gt; tree with an empty, otherwise unpopulated
root file system, and it will boot up successfully, automatically
adding the strictly necessary files, and resources that are necessary
to boot up.&lt;/p&gt;
&lt;p&gt;Monopolizing vendor OS resources and definitions in an immutable
&lt;code&gt;/usr/&lt;/code&gt; opens multiple doors to us:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We can apply &lt;code&gt;dm-verity&lt;/code&gt; to the whole &lt;code&gt;/usr/&lt;/code&gt; tree, i.e. guarantee
  structural, cryptographic integrity on the whole vendor OS resources
  at once, with full file system metadata.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We can implement updates to the OS easily: by implementing an A/B
  update scheme on the &lt;code&gt;/usr/&lt;/code&gt; tree we can update the OS resources
  atomically and robustly, while leaving the rest of the OS environment
  untouched.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We can implement factory reset easily: erase the root file system
  and reboot. The hermetic OS in &lt;code&gt;/usr/&lt;/code&gt; has all the information it
  needs to set up the root file system afresh — exactly like in a new
  installation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Initial Look at the Partition Table&lt;/h1&gt;
&lt;p&gt;So let's have a look at a suitable partition table, taking a hermetic
&lt;code&gt;/usr/&lt;/code&gt; into account. Let's conceptually start with a table of four
entries:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;An UEFI System Partition (required by firmware to boot)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Immutable, Verity-protected, signed file system with the &lt;code&gt;/usr/&lt;/code&gt; tree in version A&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Immutable, Verity-protected, signed file system with the &lt;code&gt;/usr/&lt;/code&gt; tree in version B&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A writable, encrypted root file system&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;(This is just for initial illustration here, as we'll see later it's
going to be a bit more complex in the end.)&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://systemd.io/DISCOVERABLE_PARTITIONS"&gt;Discoverable Partitions
Specification&lt;/a&gt; provides
suitable partition types UUIDs for all of the above partitions. Which
is great, because it makes the image self-descriptive: simply by
looking at the image's GPT table we know what to mount where. This
means we do not need a manual &lt;code&gt;/etc/fstab&lt;/code&gt;, and a multitude of tools
such as &lt;code&gt;systemd-nspawn&lt;/code&gt; and similar can operate directly on the disk
image and boot it up.&lt;/p&gt;
&lt;h1&gt;Booting&lt;/h1&gt;
&lt;p&gt;Now that we have a rough idea how to organize the partition table,
let's look a bit at how to boot into that. Specifically, in my model
"unified kernels" are the way to go, specifically those implementing
&lt;a href="https://systemd.io/BOOT_LOADER_SPECIFICATION"&gt;Boot Loader Specification Type #2&lt;/a&gt;. These are basically
kernel images that have an initial RAM disk attached to them, as well as
a kernel command line, a boot splash image and possibly more, all
wrapped into a single UEFI PE binary. By combining these into one we
achieve two goals: they become extremely easy to update (i.e. drop in
one file, and you update kernel+initrd) and more importantly, you can
sign them as one for the purpose of UEFI SecureBoot.&lt;/p&gt;
&lt;p&gt;In my model, each version of such a kernel would be associated with
exactly one version of the &lt;code&gt;/usr/&lt;/code&gt; tree: both are always updated at
the same time. An update then becomes relatively simple: drop in one
new &lt;code&gt;/usr/&lt;/code&gt; file system plus one kernel, and the update is complete.&lt;/p&gt;
&lt;p&gt;The boot loader used for all this would be
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"&gt;systemd-boot&lt;/a&gt;,
of course. It's a very simple loader, and implements the
aforementioned boot loader specification. This means it requires no
explicit configuration or anything: it's entirely sufficient to drop
in one such unified kernel file, and it will be picked up, and be made
a candidate to boot into.&lt;/p&gt;
&lt;p&gt;You might wonder how to configure the root file system to boot from
with such a unified kernel that contains the kernel command line and
is signed as a whole and thus immutable. The idea here is to use the
&lt;code&gt;usrhash=&lt;/code&gt; kernel command line option implemented by
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-veritysetup-generator.html"&gt;systemd-veritysetup-generator&lt;/a&gt;
and
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-fstab-generator.html"&gt;systemd-fstab-generator&lt;/a&gt;. It
does two things: it will search and set up a &lt;code&gt;dm-verity&lt;/code&gt; volume for
the &lt;code&gt;/usr/&lt;/code&gt; file system, and then mount it. It takes the root hash
value of the &lt;code&gt;dm-verity&lt;/code&gt; Merkle tree as the parameter. This hash is
then also used to find the &lt;code&gt;/usr/&lt;/code&gt; partition in the GPT partition
table, under the assumption that the partition UUIDs are derived from
it, as per the suggestions in the discoverable partitions
specification (see above).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;systemd-boot&lt;/code&gt; (if not told otherwise) will do a version sort of the
kernel image files it finds, and then automatically boot the newest
one. Picking a specific kernel to boot will also fixate which version
of the &lt;code&gt;/usr/&lt;/code&gt; tree to boot into, because — as mentioned — the Verity
root hash of it is built into the kernel command line the unified
kernel image contains.&lt;/p&gt;
&lt;p&gt;In my model I'd place the kernels directly into the UEFI System
Partition (ESP), in order to simplify things. (&lt;code&gt;systemd-boot&lt;/code&gt; also
supports reading them from a separate boot partition, but let's not
complicate things needlessly, at least for now.)&lt;/p&gt;
&lt;p&gt;So, with all this, we now already have a boot chain that goes
something like this: once the boot loader is run, it will pick the
newest kernel, which includes the initial RAM disk and a secure
reference to the &lt;code&gt;/usr/&lt;/code&gt; file system to use. This is already
great. But a &lt;code&gt;/usr/&lt;/code&gt; alone won't make us happy, we also need a root
file system. In my model, that file system would be writable, and the
&lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/var/&lt;/code&gt; hierarchies would be located directly on it. Since
these trees potentially contain secrets (SSH keys, …) the root file
system needs to be encrypted. We'll use LUKS2 for this, of course. In
my model, I'd bind this to the TPM2 chip (for compatibility with
systems lacking one, we can find a suitable fallback, which then
provides weaker guarantees, see below). A TPM2 is a security chip
available in most modern PCs. Among other things it contains a
persistent secret key that can be used to encrypt data, in a way that
only if you possess access to it and can prove you are using validated
software you can decrypt it again. The cryptographic measuring I
mentioned earlier is what allows this to work. But … let's not get
lost too much in the details of TPM2 devices, that'd be material for a
novel, and this blog story is going to be way too long already.&lt;/p&gt;
&lt;p&gt;What does using a TPM2 bound key for unlocking the root file system
get us? We can encrypt the root file system with it, and you can only
read or make changes to the root file system if you also possess the
TPM2 chip and run our validated version of the OS. This protects us
against an &lt;em&gt;evil&lt;/em&gt; &lt;em&gt;maid&lt;/em&gt; scenario to some level: an attacker cannot
just copy the hard disk of your laptop while you leave it in your
hotel room, because unless the attacker also steals the TPM2 device it
cannot be decrypted. The attacker can also not just modify the root
file system, because such changes would be detected on next boot
because they aren't done with the right cryptographic key.&lt;/p&gt;
&lt;p&gt;So, now we have a system that already can boot up somewhat completely,
and run userspace services. All code that is run is verified in some
way: the &lt;code&gt;/usr/&lt;/code&gt; file system is Verity protected, and the root hash of
it is included in the kernel that is signed via UEFI SecureBoot. And
the root file system is locked to the TPM2 where the secret key is
only accessible if our signed OS + &lt;code&gt;/usr/&lt;/code&gt; tree is used.&lt;/p&gt;
&lt;p&gt;(One brief intermission here: so far all the components I am
referencing here exist already, and have been shipped in &lt;code&gt;systemd&lt;/code&gt; and
other projects already, including the TPM2 based disk
encryption. There's one thing missing here however at the moment that
still needs to be developed (happy to take PRs!): right now TPM2 based
LUKS2 unlocking is bound to PCR hash values. This is hard to work with
when implementing updates — what we'd need instead is unlocking by
signatures of PCR hashes. TPM2 supports this, but we don't support it
yet in our &lt;code&gt;systemd-cryptsetup&lt;/code&gt; + &lt;code&gt;systemd-cryptenroll&lt;/code&gt; stack.)&lt;/p&gt;
&lt;p&gt;One of the goals mentioned above is that cryptographic key material
should always be generated locally on first boot, rather than
pre-provisioned. This of course has implications for the encryption
key of the root file system: if we want to boot into this system we
need the root file system to exist, and thus a key already generated
that it is encrypted with. But where precisely would we generate it if
we have no installer which could generate while installing (as it is
done in traditional Linux distribution installers). My proposed
solution here is to use
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-repart.html"&gt;&lt;code&gt;systemd-repart&lt;/code&gt;&lt;/a&gt;,
which is a declarative, purely additive repartitioner. It can run from
the initrd to create and format partitions on boot, before
transitioning into the root file system. It can also format the
partitions it creates and encrypt them, automatically enrolling an
TPM2-bound key.&lt;/p&gt;
&lt;p&gt;So, let's revisit the partition table we mentioned earlier. Here's
what in my model we'd actually ship in the initial image:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;An UEFI System Partition (ESP)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An immutable, Verity-protected, signed file system with the &lt;code&gt;/usr/&lt;/code&gt; tree in version A&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And that's already it. No root file system, no B &lt;code&gt;/usr/&lt;/code&gt; partition,
nothing else. Only two partitions are shipped: the ESP with the
&lt;code&gt;systemd-boot&lt;/code&gt; loader and one unified kernel image, and the A version
of the &lt;code&gt;/usr/&lt;/code&gt; partition. Then, on first boot &lt;code&gt;systemd-repart&lt;/code&gt; will
notice that the root file system doesn't exist yet, and will create
it, encrypt it, and format it, and enroll the key into the TPM2. It
will also create the second &lt;code&gt;/usr/&lt;/code&gt; partition (B) that we'll need for
later A/B updates (which will be created empty for now, until the
first update operation actually takes place, see below). Once done the
initrd will combine the fresh root file system with the shipped
&lt;code&gt;/usr/&lt;/code&gt; tree, and transition into it. Because the OS is hermetic in
&lt;code&gt;/usr/&lt;/code&gt; and contains all the &lt;code&gt;systemd-tmpfiles&lt;/code&gt; and &lt;code&gt;systemd-sysuser&lt;/code&gt;
information it can then set up the root file system properly and
create any directories and symlinks (and maybe a few files) necessary
to operate.&lt;/p&gt;
&lt;p&gt;Besides the fact that the root file system's encryption keys are
generated on the system we boot from and never leave it, it is also
pretty nice that the root file system will be sized dynamically,
taking into account the physical size of the backing storage. This is
perfect, because on first boot the image will automatically adapt to what
it has been &lt;code&gt;dd&lt;/code&gt;'ed onto.&lt;/p&gt;
&lt;h1&gt;Factory Reset&lt;/h1&gt;
&lt;p&gt;This is a good point to talk about the factory reset logic, i.e. the
mechanism to place the system back into a known good state. This is
important for two reasons: in our laptop use case, once you want to
pass the laptop to someone else, you want to ensure your data is fully
and comprehensively erased. Moreover, if you have reason to believe
your device was hacked you want to revert the device to a known good
state, i.e. ensure that exploits cannot persist. &lt;code&gt;systemd-repart&lt;/code&gt;
already has a mechanism for it. In the declarations of the partitions
the system should have, entries may be marked to be candidates for
erasing on factory reset. The actual factory reset is then requested
by one of two means: by specifying a specific kernel command line
option (which is not too interesting here, given we lock that down via
UEFI SecureBoot; but then again, one could also add a second kernel to
the ESP that is identical to the first, with only different that it
lists this command line option: thus when the user selects this entry
it will initiate a factory reset) — and via an EFI variable that can
be set and is honoured on the immediately following boot. So here's
how a factory reset would then go down: once the factory reset is
requested it's enough to reboot. On the subsequent boot
&lt;code&gt;systemd-repart&lt;/code&gt; runs from the initrd, where it will honour the
request and erase the partitions marked for erasing. Once that is
complete the system is back in the state we shipped the system in:
only the ESP and the &lt;code&gt;/usr/&lt;/code&gt; file system will exist, but the root file
system is gone. And from here we can continue as on the original first
boot: create a new root file system (and any other partitions), and
encrypt/set it up afresh.&lt;/p&gt;
&lt;p&gt;So now we have a nice setup, where everything is either signed or
encrypted securely. The system can adapt to the system it is booted on
automatically on first boot, and can easily be brought back into a
well defined state identical to the way it was shipped in.&lt;/p&gt;
&lt;h1&gt;Modularity&lt;/h1&gt;
&lt;p&gt;But of course, such a monolithic, immutable system is only useful for
very specific purposes. If &lt;code&gt;/usr/&lt;/code&gt; can't be written to, – at least in
the traditional sense – one cannot just go and install a new software
package that one needs. So here two goals are superficially
conflicting: on one hand one wants modularity, i.e. the ability to
add components to the system, and on the other immutability, i.e. that
precisely this is prohibited.&lt;/p&gt;
&lt;p&gt;So let's see what I propose as a middle ground in my model. First,
what's the precise use case for such modularity? I see a couple of
different ones:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;For some cases it is necessary to extend the system itself at the
   lowest level, so that the components added in extend (or maybe even
   replace) the resources shipped in the base OS image, so that they live
   in the same namespace, and are subject to the same security
   restrictions and privileges. Exposure to the details of the base OS
   and its interface for this kind of modularity is at the maximum.&lt;/p&gt;
&lt;p&gt;Example: a module that adds a debugger or tracing tools into the
   system. Or maybe an optional hardware driver module.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In other cases, more isolation is preferable: instead of extending
   the system resources directly, additional services shall be added
   in that bring their own files, can live in their own namespace
   (but with "windows" into the host namespaces), however still are
   system components, and provide services to other programs, whether
   local or remote. Exposure to the details of the base OS for this
   kind of modularity is restricted: it mostly focuses on the
   ability to consume and provide IPC APIs from/to the
   system. Components of this type can still be highly privileged, but
   the level of integration is substantially smaller than for the type
   explained above.&lt;/p&gt;
&lt;p&gt;Example: a module that adds a specific VPN connection service to
   the OS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, there's the actual payload of the OS. This stuff is
   relatively isolated from the OS and definitely from each other. It
   mostly consumes OS APIs, and generally doesn't provide OS
   APIs. This kind of stuff runs with minimal privileges, and in its
   own namespace of concepts.&lt;/p&gt;
&lt;p&gt;Example: a desktop app, for reading your emails.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course, the lines between these three types of modules are blurry,
but I think distinguishing them does make sense, as I think different
mechanisms are appropriate for each. So here's what I'd propose in my
model to use for this.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;For the system extension case I think the
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html"&gt;&lt;code&gt;systemd-sysext&lt;/code&gt;&lt;/a&gt;
   images are appropriate. This tool operates on
   system extension images that are very similar to the host's disk
   image: they also contain a &lt;code&gt;/usr/&lt;/code&gt; partition, protected by
   Verity. However, they just include additions to the host image:
   binaries that extend the host. When such a system extension image
   is activated, it is merged via an immutable &lt;code&gt;overlayfs&lt;/code&gt; mount into
   the host's &lt;code&gt;/usr/&lt;/code&gt; tree. Thus any file shipped in such a system
   extension will suddenly appear as if it was part of the host OS
   itself. For optional components that should be considered part of
   the OS more or less this is a very simple and powerful way to
   combine an immutable OS with an immutable extension. Note that most
   likely extensions for an OS matching this tool should be built at
   the same time within the same update cycle scheme as the host OS
   itself. After all, the files included in the extensions will have
   dependencies on files in the system OS image, and care must be
   taken that these dependencies remain in order.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For adding in additional somewhat isolated system services in my
   model, &lt;a href="https://systemd.io/PORTABLE_SERVICES"&gt;Portable Services&lt;/a&gt;
   are the proposed tool of choice. Portable services are in most ways
   just like regular system services; they could be included in the
   system OS image or an extension image. However, portable services
   use
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootImage="&gt;&lt;code&gt;RootImage=&lt;/code&gt;&lt;/a&gt;
   to run off separate disk images, thus within their own
   namespace. Images set up this way have various ways to integrate
   into the host OS, as they are in most ways regular system services,
   which just happen to bring their own directory tree. Also, unlike
   regular system services, for them sandboxing is opt-out rather than
   opt-in. In my model, here too the disk images are Verity protected
   and thus immutable. Just like the host OS they are GPT disk images
   that come with a &lt;code&gt;/usr/&lt;/code&gt; partition and Verity data, along with
   signing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, the actual payload of the OS, i.e. the apps. To be useful
   in real life here it is important to hook into existing ecosystems,
   so that a large set of apps are available. Given that on Linux
   flatpak (or on servers OCI containers) are the established format
   that pretty much won they are probably the way to go. That said, I
   think both of these mechanisms have relatively weak properties, in
   particular when it comes to security, since
   immutability/measurements and similar are not provided. This means,
   unlike for system extensions and portable services a complete trust
   chain with attestation and per-app cryptographically protected data
   is much harder to implement sanely.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What I'd like to underline here is that the main system OS image, as
well as the system extension images and the portable service images
are put together the same way: they are GPT disk images, with one
immutable file system and associated Verity data. The latter two
should also contain a PKCS#7 signature for the top-level Verity
hash. This uniformity has many benefits: you can use the same tools to
build and process these images, but most importantly: by using a
single way to validate them throughout the stack (i.e. Verity, in the
latter cases with PKCS#7 signatures), validation and measurement is
straightforward. In fact it's so obvious that we don't even have to
implement it in systemd: the kernel has direct support for this Verity
signature checking natively already (IMA).&lt;/p&gt;
&lt;p&gt;So, by composing a system at runtime from a host image, extension
images and portable service images we have a nicely modular system
where every single component is cryptographically validated on every
single IO operation, and every component is measured, in its entire
combination, directly in the kernel's IMA subsystem.&lt;/p&gt;
&lt;p&gt;(Of course, once you add the desktop apps or OCI containers on top,
then these properties are lost further down the chain. But well, a lot
is already won, if you can close the chain that far down.)&lt;/p&gt;
&lt;p&gt;Note that system extensions are not designed to replicate the fine
grained packaging logic of RPM/dpkg. Of course, &lt;code&gt;systemd-sysext&lt;/code&gt; is a
generic tool, so you can use it for whatever you want, but there's a
reason it does not bring support for a dependency language: the goal
here is not to replicate traditional Linux packaging (we have that
already, in RPM/dpkg, and I think they are actually OK for what they
do) but to provide delivery of larger, coarser sets of functionality,
in lockstep with the underlying OS' life-cycle and in particular with
no interdependencies, except on the underlying OS.&lt;/p&gt;
&lt;p&gt;Also note that depending on the use case it might make sense to also
use system extensions to modularize the &lt;code&gt;initrd&lt;/code&gt; step. This is
probably less relevant for a desktop OS, but for server systems it
might make sense to package up support for specific complex storage in
a &lt;code&gt;systemd-sysext&lt;/code&gt; system extension, which can be applied to the
initrd that is built into the unified kernel. (In fact, we have been
working on implementing signed yet modular initrd support to general
purpose Fedora this way.)&lt;/p&gt;
&lt;p&gt;Note that portable services are composable from system extension too,
by the way. This makes them even more useful, as you can share a
common runtime between multiple portable service, or even use the host
image as common runtime for portable services. In this model a common
runtime image is shared between one or more system extensions, and
composed at runtime via an &lt;code&gt;overlayfs&lt;/code&gt; instance.&lt;/p&gt;
&lt;h1&gt;More Modularity: Secondary OS Installs&lt;/h1&gt;
&lt;p&gt;Having an immutable, cryptographically locked down host OS is great I
think, and if we have some moderate modularity on top, that's also
great. But oftentimes it's useful to be able to depart/compromise for
some specific use cases from that, i.e. provide a bridge for example to
allow workloads designed around RPM/dpkg package management to coexist
reasonably nicely with such an immutable host.&lt;/p&gt;
&lt;p&gt;For this purpose in my model I'd propose using &lt;code&gt;systemd-nspawn&lt;/code&gt;
containers. The containers are focused on OS containerization,
i.e. they allow you to run a full OS with init system and everything
as payload (unlike for example Docker containers which focus on a
single service, and where running a full OS in it is a mess).&lt;/p&gt;
&lt;p&gt;Running &lt;code&gt;systemd-nspawn&lt;/code&gt; containers for such secondary OS installs has
various nice properties. One of course is that &lt;code&gt;systemd-nspawn&lt;/code&gt;
supports the same level of cryptographic image validation that we rely
on for the host itself. Thus, to some level the whole OS trust chain
is reasonably recursive if desired: the firmware validates the OS, and the OS can
validate a secondary OS installed within it. In fact, we can run our
trusted OS recursively on itself and get similar security guarantees!
Besides these security aspects, &lt;code&gt;systemd-nspawn&lt;/code&gt; also has really nice
properties when it comes to integration with the host. For example the
&lt;code&gt;--bind-user=&lt;/code&gt; permits binding a host user record and their directory
into a container as a simple one step operation. This makes it
extremely easy to have a single user and &lt;code&gt;$HOME&lt;/code&gt; but share it
concurrently with the host &lt;em&gt;and&lt;/em&gt; a zoo of secondary OSes in
&lt;code&gt;systemd-nspawn&lt;/code&gt; containers, which each could run different
distributions even.&lt;/p&gt;
&lt;h1&gt;Developer Mode&lt;/h1&gt;
&lt;p&gt;Superficially, an OS with an immutable &lt;code&gt;/usr/&lt;/code&gt; appears much less
&lt;em&gt;hackable&lt;/em&gt; than an OS where everything is writable. Moreover, an OS
where everything must be signed and cryptographically validated makes
it hard to insert your own code, given you are unlikely to possess
access to the signing keys.&lt;/p&gt;
&lt;p&gt;To address this issue other systems have supported a "developer" mode:
when entered the security guarantees are disabled, and the system can
be freely modified, without cryptographic validation. While that's a
great concept to have I doubt it's what most developers really want:
the cryptographic properties of the OS are great after all, it sucks
having to give them up once developer mode is activated.&lt;/p&gt;
&lt;p&gt;In my model I'd thus propose two different approaches to this
problem. First of all, I think there's value in allowing users to
additively extend/override the OS via local developer &lt;a href="https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html"&gt;system
extensions&lt;/a&gt;. With
this scheme the underlying cryptographic validation would remain in
tact, but — if this form of development mode is explicitly enabled –
the developer could add in more resources from local storage, that are
not tied to the OS builder's chain of trust, but a local one
(i.e. simply backed by encrypted storage of some form).&lt;/p&gt;
&lt;p&gt;The second approach is to make it easy to extend (or in fact replace)
the set of trusted validation keys, with local ones that are under the
control of the user, in order to make it easy to operate with kernel,
OS, extension, portable service or container images signed by the
local developer without involvement of the OS builder. This is
relatively easy to do for components down the trust chain, i.e. the
elements further up the chain should optionally allow additional
certificates to allow validation with.&lt;/p&gt;
&lt;p&gt;(Note that systemd currently has no explicit support for a
"developer" mode like this. I think we should add that sooner or later
however.)&lt;/p&gt;
&lt;h1&gt;Democratizing Code Signing&lt;/h1&gt;
&lt;p&gt;Closely related to the question of developer mode is the question of
code signing. If you ask me, the status quo of UEFI SecureBoot code
signing in the major Linux distributions is pretty sad. The work to
get stuff signed is massive, but in effect it delivers very little in
return: because initrds are entirely unprotected, and reside on
partitions lacking any form of cryptographic integrity protection any
attacker can trivially easily modify the boot process of any such
Linux system and freely collected FDE passphrases entered. There's
little value in signing the boot loader and kernel in a complex
bureaucracy if it then happily loads entirely unprotected code that
processes the actually relevant security credentials: the FDE
keys.&lt;/p&gt;
&lt;p&gt;In my model, through use of unified kernels this important gap is
closed, hence UEFI SecureBoot code signing becomes an integral part of
the boot chain from firmware to the host OS. Unfortunately, code
signing – and having something a user can locally hack, is to some
level conflicting. However, I think we can improve the situation here,
and put more emphasis on enrolling developer keys in the trust chain
easily. Specifically, I see one relevant approach here: enrolling keys
directly in the firmware is something that we should make less of a
theoretical exercise and more something we can realistically
deploy. See &lt;a href="https://github.com/systemd/systemd/pull/20255#issuecomment-1098334694"&gt;this work in
progress&lt;/a&gt;
making this more automatic and eventually safe. Other approaches are
thinkable (including some that build on existing MokManager
infrastructure), but given the politics involved, are harder to
conclusively implement.&lt;/p&gt;
&lt;h1&gt;Running the OS itself in a container&lt;/h1&gt;
&lt;p&gt;What I explain above is put together with running on a bare metal
system in mind. However, one of the stated goals is to make the OS
adaptive enough to also run in a container environment (specifically:
&lt;code&gt;systemd-nspawn&lt;/code&gt;) nicely. Booting a disk image on bare metal or in a
VM generally means that the UEFI firmware validates and invokes the
boot loader, and the boot loader invokes the kernel which then
transitions into the final system. This is different for containers:
here the container manager immediately calls the init system, i.e. PID
1. Thus the validation logic must be different: cryptographic
validation must be done by the container manager. In my model this is
solved by shipping the OS image not only with a Verity data partition
(as is already necessary for the UEFI SecureBoot trust chain, see
above), but also with another partition, containing a PKCS#7 signature
of the root hash of said Verity partition. This of course is exactly
what I propose for both the system extension and portable service
image. Thus, in my model the images for all three uses are put
together the same way: an immutable &lt;code&gt;/usr/&lt;/code&gt; partition, accompanied by
a Verity partition and a PKCS#7 signature partition. The OS image
itself then has two ways "into" the trust chain: either through the
signed unified kernel in the ESP (which is used for bare metal and VM
boots) &lt;em&gt;or&lt;/em&gt; by using the PKCS#7 signature stored in the partition
(which is used for container/&lt;code&gt;systemd-nspawn&lt;/code&gt; boots).&lt;/p&gt;
&lt;h1&gt;Parameterizing Kernels&lt;/h1&gt;
&lt;p&gt;A fully immutable and signed OS has to establish trust in the user
data it makes use of before doing so. In the model I describe here,
for &lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/var/&lt;/code&gt; we do this via disk encryption of the root
file system (in combination with integrity checking). But the point
where the root file system is mounted comes relatively late in the
boot process, and thus cannot be used to parameterize the boot
itself. In many cases it's important to be able to parameterize the
boot process however.&lt;/p&gt;
&lt;p&gt;For example, for the implementation of the developer mode indicated
above it's useful to be able to pass this fact safely to the initrd,
in combination with other fields (e.g. hashed root password for
allowing in-initrd logins for debug purposes). After all, if the
initrd is pre-built by the vendor and signed as whole together with
the kernel it cannot be modified to carry such data directly (which is
in fact how parameterizing of the initrd to a large degree was traditionally
done).&lt;/p&gt;
&lt;p&gt;In my model this is achieved through &lt;a href="https://systemd.io/CREDENTIALS/"&gt;system
credentials&lt;/a&gt;, which allow passing
parameters to systems (and services for the matter) in an encrypted
and authenticated fashion, bound to the TPM2 chip. This means that we
can securely pass data into the initrd so that it can be authenticated
and decrypted only on the system it is intended for and with the
unified kernel image it was intended for.&lt;/p&gt;
&lt;h1&gt;Swap&lt;/h1&gt;
&lt;p&gt;In my model the OS would also carry a swap partition. For the simple
reason that only then
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-oomd.service.html"&gt;&lt;code&gt;systemd-oomd.service&lt;/code&gt;&lt;/a&gt;
can provide the best results. Also see &lt;a href="https://chrisdown.name/2018/01/02/in-defence-of-swap.html"&gt;In defence of swap: common
misconceptions&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;Updating Images&lt;/h1&gt;
&lt;p&gt;We have a rough idea how the system shall be organized now, let's next
focus on the deployment cycle: software needs regular update cycles,
and software that is not updated regularly is a security
problem. Thus, I am sure that any modern system must be automatically
updated, without this requiring avoidable user interaction.&lt;/p&gt;
&lt;p&gt;In my model, this is the job for
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysupdate.html"&gt;systemd-sysupdate&lt;/a&gt;. It's
a relatively simple A/B image updater: it operates either on
partitions, on regular files in a directory, or on subdirectories in a
directory. Each entry has a version (which is encoded in the GPT
partition label for partitions, and in the filename for regular files
and directories): whenever an update is initiated the oldest version
is erased, and the newest version is downloaded.&lt;/p&gt;
&lt;p&gt;With the setup described above a system update becomes a really simple
operation. On each update the &lt;code&gt;systemd-sysupdate&lt;/code&gt; tool downloads a
&lt;code&gt;/usr/&lt;/code&gt; file system partition, an accompanying Verity partition, a
PKCS#7 signature partition, and drops it into the host's partition
table (where it possibly replaces the oldest version so far stored
there). Then it downloads a unified kernel image and drops it into
the EFI System Partition's &lt;code&gt;/EFI/Linux&lt;/code&gt; (as per Boot Loader
Specification; possibly erase the oldest such file there). And that's
already the whole update process: four files are downloaded from the
server, unpacked and put in the most straightforward of ways into the
partition table or file system. Unlike in other OS designs there's no
mechanism required to explicitly switch to the newer version, the
aforementioned &lt;code&gt;systemd-boot&lt;/code&gt; logic will automatically pick the newest
kernel once it is dropped in.&lt;/p&gt;
&lt;p&gt;Above we talked a lot about modularity, and how to put systems
together as a combination of a host OS image, system extension images
for the initrd and the host, portable service images and
&lt;code&gt;systemd-nspawn&lt;/code&gt; container images. I already emphasized that these
image files are actually always the same: GPT disk images with
partition definitions that match the Discoverable Partition
Specification. This comes very handy when thinking about updating: we
can use the exact same &lt;code&gt;systemd-sysupdate&lt;/code&gt; tool for updating these
other images as we use for the host image. The uniformity of the
on-disk format allows us to update them uniformly too.&lt;/p&gt;
&lt;h1&gt;Boot Counting + Assessment&lt;/h1&gt;
&lt;p&gt;Automatic OS updates do not come without risks: if they happen
automatically, and an update goes wrong this might mean your system
might be automatically updated into a brick. This of course is less
than ideal. Hence it is essential to address this reasonably
automatically. In my model, there's systemd's &lt;a href="https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT"&gt;Automatic Boot
Assessment&lt;/a&gt; for
that. The mechanism is simple: whenever a new unified kernel image is
dropped into the system it will be stored with a small integer counter
value included in the filename. Whenever the unified kernel image is
selected for booting by &lt;code&gt;systemd-boot&lt;/code&gt;, it is decreased by one. Once
the system booted up successfully (which is determined by userspace)
the counter is removed from the file name (which indicates "this entry
is known to work"). If the counter ever hits zero, this indicates that
it tried to boot it a couple of times, and each time failed, thus is
apparently "bad". In this case &lt;code&gt;systemd-boot&lt;/code&gt; will not consider the
kernel anymore, and revert to the next older (that doesn't have a
counter of zero).&lt;/p&gt;
&lt;p&gt;By sticking the boot counter into the filename of the unified kernel
we can directly attach this information to the kernel, and thus need
not concern ourselves with cleaning up secondary information about the
kernel when the kernel is removed. Updating with a tool like
&lt;code&gt;systemd-sysupdate&lt;/code&gt; remains a very simple operation hence: drop one
old file, add one new file.&lt;/p&gt;
&lt;h1&gt;Picking the Newest Version&lt;/h1&gt;
&lt;p&gt;I already mentioned that &lt;code&gt;systemd-boot&lt;/code&gt; automatically picks the newest
unified kernel image to boot, by looking at the version encoded in the
filename. This is done via a simple
&lt;a href="https://man7.org/linux/man-pages/man3/strverscmp.3.html"&gt;&lt;code&gt;strverscmp()&lt;/code&gt;&lt;/a&gt;
call (well, truth be told, it's a modified version of that call,
different from the one implemented in libc, because real-life package
managers use more complex rules for comparing versions these days, and
hence it made sense to do that here too). The concept of having
multiple entries of some resource in a directory, and picking the
newest one automatically is a powerful concept, I think. It means
adding/removing new versions is extremely easy (as we discussed above,
in &lt;code&gt;systemd-sysupdate&lt;/code&gt; context), and allows stateless determination of
what to use.&lt;/p&gt;
&lt;p&gt;If &lt;code&gt;systemd-boot&lt;/code&gt; can do that, what about system extension images,
portable service images, or &lt;code&gt;systemd-nspawn&lt;/code&gt; container images that do
not actually use &lt;code&gt;systemd-boot&lt;/code&gt; as the entrypoint? All these tools
actually implement the very same logic, but on the partition level: if
multiple suitable &lt;code&gt;/usr/&lt;/code&gt; partitions exist, then the newest is determined
by comparing the GPT partition label of them.&lt;/p&gt;
&lt;p&gt;This is in a way the counterpart to the &lt;code&gt;systemd-sysupdate&lt;/code&gt; update
logic described above: we always need a way to determine which
partition to actually then use after the update took place: and this
becomes very easy each time: enumerate possible entries, pick the
newest as per the (modified) &lt;code&gt;strverscmp()&lt;/code&gt; result.&lt;/p&gt;
&lt;h1&gt;Home Directory Management&lt;/h1&gt;
&lt;p&gt;In my model the device's users and their home directories are managed
by
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html"&gt;&lt;code&gt;systemd-homed&lt;/code&gt;&lt;/a&gt;. This
means they are relatively self-contained and can be migrated easily
between devices. The numeric UID assignment for each user is done at
the moment of login only, and the files in the home directory are
mapped as needed via a &lt;code&gt;uidmap&lt;/code&gt; mount. It also allows us to protect
the data of each user individually with a credential that belongs to
the user itself. i.e. instead of binding confidentiality of the user's
data to the system-wide full-disk-encryption each user gets their own
encrypted home directory where the user's authentication token
(password, FIDO2 token, PKCS#11 token, recovery key…) is used as
authentication and decryption key for the user's data. This brings
a major improvement for security as it means the user's data is
cryptographically inaccessible except when the user is actually logged
in.&lt;/p&gt;
&lt;p&gt;It also allows us to correct another major issue with traditional
Linux systems: the way how data encryption works during system
suspend. Traditionally on Linux the disk encryption credentials
(e.g. LUKS passphrase) is kept in memory also when the system is
suspended. This is a bad choice for security, since many (most?) of us
probably never turn off their laptop but suspend it instead. But if
the decryption key is always present in unencrypted form during the
suspended time, then it could potentially be read from there by a
sufficiently equipped attacker.&lt;/p&gt;
&lt;p&gt;By encrypting the user's home directory with the user's authentication
token we can first safely "suspend" the home directory before going to
the system suspend state (i.e. flush out the cryptographic keys needed
to access it). This means any process currently accessing the home
directory will be frozen for the time of the suspend, but that's
expected anyway during a system suspend cycle. Why is this better than
the status quo ante? In this model the home directory's cryptographic
key material is erased during suspend, but it can be safely reacquired
on resume, from system code. If the system is only encrypted as a
whole however, then the system code itself couldn't reauthenticate the
user, because it would be frozen too. By separating home directory
encryption from the root file system encryption we can avoid this
problem.&lt;/p&gt;
&lt;h1&gt;Partition Setup&lt;/h1&gt;
&lt;p&gt;So we discussed the organization of the partitions OS images multiple
times in the above, each time focusing on a specific aspect. Let's
now summarize how this should look like all together.&lt;/p&gt;
&lt;p&gt;In my model, the initial, shipped OS image should look roughly like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;(1) An UEFI System Partition, with &lt;code&gt;systemd-boot&lt;/code&gt; as boot loader and one unified kernel&lt;/li&gt;
&lt;li&gt;(2) A &lt;code&gt;/usr/&lt;/code&gt; partition (version "A"), with a label &lt;code&gt;fooOS_0.7&lt;/code&gt; (under the assumption we called our project &lt;code&gt;fooOS&lt;/code&gt; and the image version is &lt;code&gt;0.7&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;(3) A Verity partition for the &lt;code&gt;/usr/&lt;/code&gt; partition (version "A"), with the same label&lt;/li&gt;
&lt;li&gt;(4) A partition carrying the Verity root hash for the &lt;code&gt;/usr/&lt;/code&gt; partition (version "A"), along with a PKCS#7 signature of it, also with the same label&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On first boot this is augmented by &lt;code&gt;systemd-repart&lt;/code&gt; like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;(5) A second &lt;code&gt;/usr/&lt;/code&gt; partition (version "B"), initially with a label &lt;code&gt;_empty&lt;/code&gt; (which is the label &lt;code&gt;systemd-sysupdate&lt;/code&gt; uses to mark partitions that currently carry no valid payload)&lt;/li&gt;
&lt;li&gt;(6) A Verity partition for that (version "B"), similar to the above case, also labelled &lt;code&gt;_empty&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;(7) And ditto a Verity root hash partition with a PKCS#7 signature (version "B"), also labelled &lt;code&gt;_empty&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;(8) A root file system, encrypted and locked to the TPM2&lt;/li&gt;
&lt;li&gt;(9) A home file system, integrity protected via a key also in TPM2 (encryption is unnecessary, since &lt;code&gt;systemd-homed&lt;/code&gt; adds that on its own, and it's nice to avoid duplicate encryption)&lt;/li&gt;
&lt;li&gt;(10) A swap partition, encrypted and locked to the TPM2&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Then, on the first OS update the partitions 5, 6, 7 are filled with a
new version of the OS (let's say &lt;code&gt;0.8&lt;/code&gt;) and thus get their label
updated to &lt;code&gt;fooOS_0.8&lt;/code&gt;. After a boot, this version is active.&lt;/p&gt;
&lt;p&gt;On a subsequent update the three partitions &lt;code&gt;fooOS_0.7&lt;/code&gt; get wiped and
replaced by &lt;code&gt;fooOS_0.9&lt;/code&gt; and so on.&lt;/p&gt;
&lt;p&gt;On factory reset, the partitions 8, 9, 10 are deleted, so that
&lt;code&gt;systemd-repart&lt;/code&gt; recreates them, using a new set of cryptographic
keys.&lt;/p&gt;
&lt;p&gt;Here's a graphic that hopefully illustrates the partition stable from
shipped image, through first boot, multiple update cycles and eventual
factory reset:&lt;/p&gt;
&lt;p&gt;&lt;a href="images/partitions.svg"&gt;&lt;img alt="Partitions Overview" src="images/partitions.svg" width="640"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;Trust Chain&lt;/h1&gt;
&lt;p&gt;So let's summarize the intended chain of trust (for bare metal/VM
boots) that ensures every piece of code in this model is signed
and validated, and any system secret is locked to TPM2.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;First, firmware (or possibly shim) authenticates &lt;code&gt;systemd-boot&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Once &lt;code&gt;systemd-boot&lt;/code&gt; picks a unified kernel image to boot, it is
   also authenticated by firmware/shim.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The unified kernel image contains an initrd, which is the first
   userspace component that runs. It finds any system extensions passed
   into the initrd, and sets them up through Verity. The kernel will
   validate the Verity root hash signature of these system extension
   images against its usual keyring.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The initrd also finds credentials passed in, then securely unlocks
   (which means: decrypts + authenticates) them with a secret from the
   TPM2 chip, locked to the kernel image itself.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The kernel image also contains a kernel command line which contains
   a &lt;code&gt;usrhash=&lt;/code&gt; option that pins the root hash of the &lt;code&gt;/usr/&lt;/code&gt; partition
   to use.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The initrd then unlocks the encrypted root file system, with a
   secret bound to the TPM2 chip.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The system then transitions into the main system, i.e. the
   combination of the Verity protected &lt;code&gt;/usr/&lt;/code&gt; and the encrypted root
   files system. It then activates two more encrypted (and/or
   integrity protected) volumes for &lt;code&gt;/home/&lt;/code&gt; and swap, also with a
   secret tied to the TPM2 chip.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Here's an attempt to illustrate the above graphically:&lt;/p&gt;
&lt;p&gt;&lt;a href="images/trustchain.svg"&gt;&lt;img alt="Trust Chain" src="images/trustchain.svg" width="640"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This is the trust chain of the basic OS. Validation of system
extension images, portable service images, &lt;code&gt;systemd-nspawn&lt;/code&gt; container
images always takes place the same way: the kernel validates these
Verity images along with their PKCS#7 signatures against the kernel's
keyring.&lt;/p&gt;
&lt;h1&gt;File System Choice&lt;/h1&gt;
&lt;p&gt;In the above I left the choice of file systems unspecified. For the
immutable &lt;code&gt;/usr/&lt;/code&gt; partitions &lt;code&gt;squashfs&lt;/code&gt; might be a good candidate, but
any other that works nicely in a read-only fashion and generates
reproducible results is a good choice, too. The home directories as managed
by &lt;code&gt;systemd-homed&lt;/code&gt; should certainly use &lt;code&gt;btrfs&lt;/code&gt;, because it's the only
general purpose file system supporting online grow and shrink, which
&lt;code&gt;systemd-homed&lt;/code&gt; can take benefit of, to manage storage.&lt;/p&gt;
&lt;p&gt;For the root file system &lt;code&gt;btrfs&lt;/code&gt; is likely also the best idea. That's
because we intend to use LUKS/&lt;code&gt;dm-crypt&lt;/code&gt; underneath, which by default
only provides confidentiality, not authenticity of the data (unless
combined with &lt;code&gt;dm-integrity&lt;/code&gt;). Since &lt;code&gt;btrfs&lt;/code&gt; (unlike xfs/ext4) does
full data checksumming it's probably the best choice here, since it
means we don't have to use &lt;code&gt;dm-integrity&lt;/code&gt; (which comes at a higher
performance cost).&lt;/p&gt;
&lt;h1&gt;OS Installation vs. OS Instantiation&lt;/h1&gt;
&lt;p&gt;In the discussion above a lot of focus was put on setting up the OS
and completing the partition layout and such on first boot. This means
installing the OS becomes as simple as &lt;code&gt;dd&lt;/code&gt;-ing (i.e. "streaming") the
shipped disk image into the final HDD medium. Simple, isn't it?&lt;/p&gt;
&lt;p&gt;Of course, such a scheme is just &lt;em&gt;too&lt;/em&gt; simple for many setups in real
life. Whenever multi-boot is required (i.e. co-installing an OS
implementing this model with another unrelated one), &lt;code&gt;dd&lt;/code&gt;-ing a disk
image onto the HDD is going to overwrite user data that was supposed
to be kept around.&lt;/p&gt;
&lt;p&gt;In order to cover for this case, in my model, we'd use
&lt;code&gt;systemd-repart&lt;/code&gt; (again!) to allow streaming the source disk image
into the target HDD in a smarter, additive way. The tool after all is
purely additive: it will add in partitions or grow them if they are
missing or too small. &lt;code&gt;systemd-repart&lt;/code&gt; already has all the necessary
provisions to not only create a partition on the target disk, but also
copy blocks from a raw installer disk. An install operation would then
become a two stop process: one invocation of &lt;code&gt;systemd-repart&lt;/code&gt; that
adds in the &lt;code&gt;/usr/&lt;/code&gt;, its Verity and the signature partition to the
target medium, populated with a copy of the same partition of the
installer medium. And one invocation of &lt;code&gt;bootctl&lt;/code&gt; that installs the
&lt;code&gt;systemd-boot&lt;/code&gt; boot loader in the ESP. (Well, there's one thing
missing here: the unified OS kernel also needs to be dropped into the
ESP. For now, this can be done with a simple &lt;code&gt;cp&lt;/code&gt; call. In the long
run, this should probably be something &lt;code&gt;bootctl&lt;/code&gt; can do as well, if
told so.)&lt;/p&gt;
&lt;p&gt;So, with this scheme we have a simple scheme to cover all bases: we
can either just &lt;code&gt;dd&lt;/code&gt; an image to disk, or we can stream an image onto
an existing HDD, adding a couple of new partitions and files to the
ESP.&lt;/p&gt;
&lt;p&gt;Of course, in reality things are more complex than that even: there's
a good chance that the existing ESP is simply too small to carry
multiple unified kernels. In my model, the way to address this is by
shipping two slightly different &lt;code&gt;systemd-repart&lt;/code&gt; partition definition
file sets: the &lt;em&gt;ideal&lt;/em&gt; case when the ESP is large enough, and a
&lt;em&gt;fallback&lt;/em&gt; case, where it isn't and where we then add in an addition
XBOOTLDR partition (as per the Discoverable Partitions
Specification). In that mode the ESP carries the boot loader, but the
unified kernels are stored in the XBOOTLDR partition. This scenario is
not quite as simple as the XBOOTLDR-less scenario described first, but
is equally well supported in the various tools. Note that
&lt;code&gt;systemd-repart&lt;/code&gt; can be told size constraints on the partitions it
shall create or augment, thus to implement this scheme it's enough to
invoke the tool with the fallback partition scheme if invocation with
the ideal scheme fails.&lt;/p&gt;
&lt;p&gt;Either way: regardless how the partitions, the boot loader and the
unified kernels ended up on the system's hard disk, on first boot the
code paths are the same again: &lt;code&gt;systemd-repart&lt;/code&gt; will be called to
augment the partition table with the root file system, and properly
encrypt it, as was already discussed earlier here. This means: all
cryptographic key material used for disk encryption is generated on
first boot only, the installer phase does not encrypt anything.&lt;/p&gt;
&lt;h1&gt;Live Systems vs. Installer Systems vs. Installed Systems&lt;/h1&gt;
&lt;p&gt;Traditionally on Linux three types of systems were common: "installed"
systems, i.e. that are stored on the main storage of the device and
are the primary place people spend their time in; "installer" systems
which are used to install them and whose job is to copy and setup the
packages that make up the installed system; and "live" systems, which
were a middle ground: a system that behaves like an installed system
in most ways, but lives on removable media.&lt;/p&gt;
&lt;p&gt;In my model I'd like to remove the distinction between these three
concepts as much as possible: each of these three images should carry
the exact same &lt;code&gt;/usr/&lt;/code&gt; file system, and should be suitable to be
replicated the same way. Once installed the resulting image can also
act as an installer for another system, and so on, creating a certain
"viral" effect: if you have one image or installation it's
automatically something you can replicate 1:1 with a simple
&lt;code&gt;systemd-repart&lt;/code&gt; invocation.&lt;/p&gt;
&lt;h1&gt;Building Images According to this Model&lt;/h1&gt;
&lt;p&gt;The above explains how the image should look like and how its first
boot and update cycle will modify it. But this leaves one question
unanswered: how to actually build the initial image for OS instances
according to this model?&lt;/p&gt;
&lt;p&gt;Note that there's nothing too special about the images following this
model: they are ultimately just GPT disk images with Linux file
systems, following the Discoverable Partition Specification. This
means you can use any set of tools of your choice that can put
together GPT disk images for compliant images.&lt;/p&gt;
&lt;p&gt;I personally would use &lt;a href="https://github.com/systemd/mkosi"&gt;&lt;code&gt;mkosi&lt;/code&gt;&lt;/a&gt; for
this purpose though. It's designed to generate compliant images, and
has a rich toolset for SecureBoot and signed/Verity file systems
already in place.&lt;/p&gt;
&lt;p&gt;What is key here is that this model doesn't depart from RPM and dpkg,
instead it builds on top of that: in this model they are excellent for
putting together images on the build host, but deployment onto the
runtime host does not involve individual packages.&lt;/p&gt;
&lt;p&gt;I think one cannot underestimate the value traditional distributions
bring, regarding security, integration and general polishing. The
concepts I describe above are inherited from this, but depart from the
idea that distribution packages are a runtime concept and make it a
build-time concept instead.&lt;/p&gt;
&lt;p&gt;Note that the above is pretty much independent from the underlying
distribution.&lt;/p&gt;
&lt;h1&gt;Final Words&lt;/h1&gt;
&lt;p&gt;I have no illusions, general purpose distributions are not going to
adopt this model as their default any time soon, and it's not even my
goal that they do that. The above is &lt;em&gt;my&lt;/em&gt; &lt;em&gt;personal&lt;/em&gt; vision, and I
don't expect people to buy into it 100%, and that's fine. However,
what I am interested in is finding the overlaps, i.e. work with people
who buy 50% into this vision, and share the components.&lt;/p&gt;
&lt;p&gt;My goals here thus are to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Get distributions to move to a model where images like this can be
   built from the distribution easily. Specifically this means that
   distributions make their OS hermetic in &lt;code&gt;/usr/&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Find the overlaps, share components with other projects to revisit
   how distributions are put together. This is already happening, see
   &lt;code&gt;systemd-tmpfiles&lt;/code&gt; and &lt;code&gt;systemd-sysuser&lt;/code&gt; support in various
   distributions, but I think there's more to share.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Make people interested in building actual real-world images based
   on general purpose distributions adhering to the model described
   above. I'd love a "GnomeBook" image with full trust properties,
   that is built from &lt;em&gt;true&lt;/em&gt; Linux distros, such as Fedora or
   ArchLinux.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;FAQ&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;What about &lt;code&gt;ostree&lt;/code&gt;? Doesn't &lt;code&gt;ostree&lt;/code&gt; already deliver what this blog story describes?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ostree&lt;/code&gt; is fine technology, but in respect to security and
robustness properties it's not too interesting I think, because
unlike image-based approaches it cannot really deliver
integrity/robustness guarantees over the whole tree easily. To be
able to trust an &lt;code&gt;ostree&lt;/code&gt; setup you have to establish trust in the
underlying file system first, and the complexity of the file
system makes that challenging. To provide an effective
offline-secure trust chain through the whole depth of the stack it
is essential to cryptographically validate every single I/O
operation. In an image-based model this is trivially easy, but in
&lt;code&gt;ostree&lt;/code&gt; model it's with current file system technology not
possible and even if this is added in one way or another in the
future (though I am not aware of anyone doing on-access file-based
integrity that spans a whole hierarchy of files that was
compatible with &lt;code&gt;ostree&lt;/code&gt;'s hardlink farm model) I think validation
is still at too high a level, since Linux file system developers
made very clear their implementations are not robust to rogue
images. (There's &lt;a href="https://github.com/ostreedev/ostree-rs-ext/issues/288"&gt;this stuff
planned&lt;/a&gt;,
but doing structural authentication ahead of time instead of on
access makes the idea to weak — and I'd expect too slow — in my
eyes.)&lt;/p&gt;
&lt;p&gt;With my design I want to deliver similar security guarantees as
ChromeOS does, but &lt;code&gt;ostree&lt;/code&gt; is much weaker there, and I see no
perspective of this changing. In a way &lt;code&gt;ostree&lt;/code&gt;'s integrity checks
are similar to RPM's and enforced on download rather than on
access. In the model I suggest above, it's always on access, and
thus safe towards offline attacks (i.e. evil maid attacks). In
today's world, I think offline security is absolutely necessary
though.&lt;/p&gt;
&lt;p&gt;That said, &lt;code&gt;ostree&lt;/code&gt; does have some benefits over the model
described above: it naturally shares file system inodes if many of
the modules/images involved share the same data. It's thus more
space efficient on disk (and thus also in RAM/cache to some
degree) by default. In my model it would be up to the image
builders to minimize shipping overly redundant disk images, by
making good use of suitably composable system extensions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;What about configuration management?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;At first glance immutable systems and configuration management
don't go that well together. However, do note, that in the model
I propose above the root file system with all its contents,
including &lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/var/&lt;/code&gt; is actually writable and can be
modified like on any other typical Linux distribution. The only
exception is &lt;code&gt;/usr/&lt;/code&gt; where the immutable OS is hermetic. That
means configuration management tools should work just fine in this
model – up to the point where they are used to install additional
RPM/dpkg packages, because that's something not allowed in the
model above: packages need to be installed at image build time and
thus on the image build host, not the runtime host.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;What about non-UEFI and non-TPM2 systems?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The above is designed around the feature set of contemporary PCs,
and this means UEFI and TPM2 being available (simply because the
PC is pretty much defined by the Windows platform, and current
versions of Windows require both).&lt;/p&gt;
&lt;p&gt;I think it's important to make the best of the features of today's
PC hardware, and then find suitable fallbacks on more limited
hardware. Specifically this means: if there's desire to implement
something like the this on non-UEFI or non-TPM2 hardware we should
look for suitable fallbacks for the individual functionality, but
generally try to add glue to the old systems so that conceptually
they behave more like the new systems instead of the other way
round. Or in other words: most of the above is not strictly tied
to UEFI or TPM2, and for many cases &lt;em&gt;already&lt;/em&gt; there are reasonably
fallbacks in place for more limited systems. Of course, without
TPM2 many of the security guarantees will be weakened.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;How would you name an OS built that way?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I think a desktop OS built this way if it has the GNOME desktop
should of course be called &lt;em&gt;GnomeBook&lt;/em&gt;, to mimic the &lt;em&gt;ChromeBook&lt;/em&gt;
name. ;-)&lt;/p&gt;
&lt;p&gt;But in general, I'd call hermetic, adaptive, immutable OSes like this "&lt;em&gt;particles&lt;/em&gt;".&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;How can you help?&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Help making Distributions Hermetic in &lt;code&gt;/usr/&lt;/code&gt;!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;One of the core ideas of the approach described above is to make
the OS &lt;em&gt;hermetic&lt;/em&gt; in &lt;code&gt;/usr/&lt;/code&gt;, i.e. make it carry a comprehensive
description of what needs to be set up outside of it when
instantiated. Specifically, this means that system users that are
needed are declared in &lt;code&gt;systemd-sysusers&lt;/code&gt; snippets, and skeleton
files and directories are created via &lt;code&gt;systemd-tmpfiles&lt;/code&gt;. Moreover
additional partitions should be declared via &lt;code&gt;systemd-repart&lt;/code&gt;
drop-ins.&lt;/p&gt;
&lt;p&gt;At this point some distributions (such as Fedora) are (probably
more by accident than on purpose) already mostly hermetic in
&lt;code&gt;/usr/&lt;/code&gt;, at least for the most basic parts of the OS. However,
this is not complete: many daemons require to have specific
resources set up in &lt;code&gt;/var/&lt;/code&gt; or &lt;code&gt;/etc/&lt;/code&gt; before they can work, and
the relevant packages do not carry &lt;code&gt;systemd-tmpfiles&lt;/code&gt; descriptions
that add them if missing. So there are two ways you could help
here: politically, it would be highly relevant to convince
distributions that an OS that is hermetic in &lt;code&gt;/usr/&lt;/code&gt; is highly
desirable and it's a worthy goal for packagers to get there. More
specifically, it would be desirable if RPM/dpkg packages would
ship with enough &lt;code&gt;systemd-tmpfiles&lt;/code&gt; information so that
configuration files the packages strictly need for operation are
symlinked (or copied) from &lt;code&gt;/usr/share/factory/&lt;/code&gt; if they are
missing (even better of course would be if packages from their
upstream sources on would just work with an empty &lt;code&gt;/etc/&lt;/code&gt; and
&lt;code&gt;/var/&lt;/code&gt;, and create themselves what they need and default to good
defaults in absence of configuration files).&lt;/p&gt;
&lt;p&gt;Note that distributions that adopted &lt;code&gt;systemd-sysusers&lt;/code&gt;,
&lt;code&gt;systemd-tmpfiles&lt;/code&gt; and the &lt;code&gt;/usr/&lt;/code&gt; merge are already quite close
to providing an OS that is hermetic in &lt;code&gt;/usr/&lt;/code&gt;. These were the
big, the major advancements: making the image fully hermetic
should be less controversial – at least that's my guess.&lt;/p&gt;
&lt;p&gt;Also note that making the OS hermetic in &lt;code&gt;/usr/&lt;/code&gt; is not just useful in
scenarios like the above. It also means that stuff &lt;a href="https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html"&gt;like
this&lt;/a&gt;
and &lt;a href="https://0pointer.net/blog/running-an-container-off-the-host-usr.html"&gt;like
this&lt;/a&gt;
can work well.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Fill in the gaps!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;I already mentioned a couple of missing bits and pieces in the
implementation of the overall vision. In the &lt;code&gt;systemd&lt;/code&gt; project
we'd be delighted to review/merge any PRs that fill in the voids.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Build your own OS like this!&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Of course, while we built all these building blocks and they have
been adopted to various levels and various purposes in the various
distributions, no one so far built an OS that puts things together
just like that. It would be excellent if we had communities that
work on building images like what I propose above. i.e. if you
want to work on making a secure GnomeBook as I suggest above a
reality that would be more than welcome.&lt;/p&gt;
&lt;p&gt;How could this look like specifically? Pick an existing
distribution, write a set of &lt;code&gt;mkosi&lt;/code&gt; descriptions plus some
additional drop-in files, and then build this on some build
infrastructure. While doing so, report the gaps, and help us
address them.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Further Documentation of Used Components and Concepts&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"&gt;&lt;code&gt;systemd-tmpfiles&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"&gt;&lt;code&gt;systemd-sysusers&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"&gt;&lt;code&gt;systemd-boot&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"&gt;&lt;code&gt;systemd-stub&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html"&gt;&lt;code&gt;systemd-sysext&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"&gt;&lt;code&gt;systemd-portabled&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://systemd.io/PORTABLE_SERVICES"&gt;Portable Services Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-repart.html"&gt;&lt;code&gt;systemd-repart&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"&gt;&lt;code&gt;systemd-nspawn&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysupdate.html"&gt;&lt;code&gt;systemd-sysupdate&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html"&gt;&lt;code&gt;systemd-creds&lt;/code&gt;&lt;/a&gt;, &lt;a href="https://systemd.io/CREDENTIALS"&gt;System and Service Credentials&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html"&gt;&lt;code&gt;systemd-homed&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://systemd.io/AUTOMATIC_BOOT_ASSESSMENT"&gt;Automatic Boot Assessment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://systemd.io/BOOT_LOADER_SPECIFICATION"&gt;Boot Loader Specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://systemd.io/DISCOVERABLE_PARTITIONS"&gt;Discoverable Partitions Specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://systemd.io/BUILDING_IMAGES"&gt;Safely Building Images&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Earlier Blog Stories Related to this Topic&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://0pointer.net/blog/authenticated-boot-and-disk-encryption-on-linux.html"&gt;The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://0pointer.net/blog/the-wondrous-world-of-discoverable-gpt-disk-images.html"&gt;The Wondrous World of Discoverable GPT Disk Images&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html"&gt;Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://0pointer.net/blog/walkthrough-for-portable-services.html"&gt;Portable Services with systemd v239&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html"&gt;mkosi — A Tool for Generating OS Images&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And that's all for now.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 03 May 2022 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2022-05-03:/blog/fitting-everything-together.html</guid><category>projects</category></item><item><title>Testing my System Code in /usr/ Without Modifying /usr/</title><link>https://0pointer.net/blog/testing-my-system-code-in-usr-without-modifying-usr.html</link><description>&lt;p&gt;&lt;a href="https://0pointer.net/blog/running-an-container-off-the-host-usr.html"&gt;I recently
blogged&lt;/a&gt;
about how to run a volatile &lt;code&gt;systemd-nspawn&lt;/code&gt; container from your
host's &lt;code&gt;/usr/&lt;/code&gt; tree, for quickly testing stuff in your host
environment, sharing your home drectory, but all that without making a
single modification to your host, and on an isolated node.&lt;/p&gt;
&lt;p&gt;The one-liner discussed in that blog story is great for testing during
system software development. Let's have a look at another &lt;code&gt;systemd&lt;/code&gt;
tool that I regularly use to test things during &lt;code&gt;systemd&lt;/code&gt; development,
in a relatively safe environment, but still taking full benefit of my
host's setup.&lt;/p&gt;
&lt;p&gt;Since a while now, systemd has been shipping with a simple component
called
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html"&gt;&lt;code&gt;systemd-sysext&lt;/code&gt;&lt;/a&gt;. It's
primary usecase goes something like this: on one hand OS systems with
immutable &lt;code&gt;/usr/&lt;/code&gt; hierarchies are fantastic for security, robustness,
updating and simplicity, but on the other hand not being able to
quickly add stuff to &lt;code&gt;/usr/&lt;/code&gt; is just annoying.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;systemd-sysext&lt;/code&gt; is supposed to bridge this contradiction: when
invoked it will merge a bunch of "system extension" images into
&lt;code&gt;/usr/&lt;/code&gt; (and &lt;code&gt;/opt/&lt;/code&gt; as a matter of fact) through the use of read-only
&lt;code&gt;overlayfs&lt;/code&gt;, making all files shipped in the image instantly and
&lt;em&gt;atomically&lt;/em&gt; appear in &lt;code&gt;/usr/&lt;/code&gt; during runtime — as if they always had
been there. Now, let's say you are building your locked down OS, with
an immutable &lt;code&gt;/usr/&lt;/code&gt; tree, and it comes without ability to log into,
without debugging tools, without anything you want and need when
trying to debug and fix something in the system. With &lt;code&gt;systemd-sysext&lt;/code&gt;
you could use a system extension image that contains all this, drop it
into the system, and activate it with &lt;code&gt;systemd-sysext&lt;/code&gt; so that it
genuinely extends the host system.&lt;/p&gt;
&lt;p&gt;(There are many other usecases for this tool, for example, you could
build systems that way that at their base use a generic image, but by
installing one or more system extensions get extended to with
additional more specific functionality, or drivers, or similar. The
tool is generic, use it for whatever you want, but for now let's not
get lost in listing all the possibilites.)&lt;/p&gt;
&lt;p&gt;What's particularly nice about the tool is that it supports
automatically discovered &lt;code&gt;dm-verity&lt;/code&gt; images, with signatures and
everything. So you can even do this in a fully authenticated,
measured, safe way. But I am digressing…&lt;/p&gt;
&lt;p&gt;Now that we (hopefully) have a rough understanding what
&lt;code&gt;systemd-sysext&lt;/code&gt; is and does, let's discuss how specficially we can
use this in the context of system software development, to safely use
and test bleeding edge development code — built freshly from your
project's build tree – in your host OS without having to risk that the
host OS is corrupted or becomes unbootable by stuff that didn't quite
yet work the way it was envisioned:&lt;/p&gt;
&lt;p&gt;The images &lt;code&gt;systemd-sysext&lt;/code&gt; merges into &lt;code&gt;/usr/&lt;/code&gt; can be of two kinds:
disk images with a file system/verity/signature, or simple, plain
directory trees. To make these images available to the tool, they can
be placed or symlinked into &lt;code&gt;/usr/lib/extensions/&lt;/code&gt;,
&lt;code&gt;/var/lib/extensions/&lt;/code&gt;, &lt;code&gt;/run/extensions/&lt;/code&gt; (and a bunch of
others). So if we now install our freshly built development software
into a subdirectory of those paths, then that's entirely sufficient to
make them valid system extension images in the sense of
&lt;code&gt;systemd-sysext&lt;/code&gt;, and thus can be merged into &lt;code&gt;/usr/&lt;/code&gt; to try them out.&lt;/p&gt;
&lt;p&gt;To be more specific: when I develop &lt;code&gt;systemd&lt;/code&gt; itself, here's what I do
regularly, to see how my new development version would behave on my
host system. As preparation I checked out the systemd development git
tree first of course, hacked around in it a bit, then built it with
meson/ninja. And now I want to test what I just built:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo DESTDIR=/run/extensions/systemd-test meson install -C build --quiet --no-rebuild &amp;amp;&amp;amp;
        sudo systemd-sysext refresh --force
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Explanation: first, we'll install my current build tree as a system
extension into &lt;code&gt;/run/extensions/systemd-test/&lt;/code&gt;. And then we apply it
to the host via the &lt;code&gt;systemd-sysext refresh&lt;/code&gt; command. This command
will search for all installed system extension images in the
aforementioned directories, then unmount (i.e. "unmerge") any
previously merged dirs from &lt;code&gt;/usr/&lt;/code&gt; and then freshly mount
(i.e. "merge") the new set of system extensions on top of &lt;code&gt;/usr/&lt;/code&gt;. And
just like that, I have installed my development tree of &lt;code&gt;systemd&lt;/code&gt; into
the host OS, and all that without actually modifying/replacing even a
single file on the host at all. Nothing here actually hit the disk!&lt;/p&gt;
&lt;p&gt;Note that all this works on any system really, it is not necessary
that the underlying OS even is designed with immutability in
mind. Just because the tool was developed with immutable systems in
mind it doesn't mean you couldn't use it on traditional systems where
&lt;code&gt;/usr/&lt;/code&gt; is mutable as well. In fact, my development box actually runs
regular Fedora, i.e. is RPM-based and thus has a mutable &lt;code&gt;/usr/&lt;/code&gt;
tree. As long as system extensions are applied the whole of &lt;code&gt;/usr/&lt;/code&gt;
becomes read-only though.&lt;/p&gt;
&lt;p&gt;Once I am done testing, when I want to revert to how things were without the image installed, it is sufficient to call:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;sudo systemd-sysext unmerge
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And there you go, all files my development tree generated are gone
again, and the host system is as it was before (and &lt;code&gt;/usr/&lt;/code&gt; mutable
again, in case one is on a traditional Linux distribution).&lt;/p&gt;
&lt;p&gt;Also note that a reboot (regardless if a &lt;em&gt;clean&lt;/em&gt; one or an &lt;em&gt;abnormal&lt;/em&gt;
shutdown) will undo the whole thing automatically, since we installed
our build tree into &lt;code&gt;/run/&lt;/code&gt; after all, i.e. a &lt;code&gt;tmpfs&lt;/code&gt; instance that is
flushed on boot. And given that the &lt;code&gt;overlayfs&lt;/code&gt; merge is a runtime
thing, too, the whole operation was executed without any
persistence. Isn't that great?&lt;/p&gt;
&lt;p&gt;(You might wonder why I specified &lt;code&gt;--force&lt;/code&gt; on the &lt;code&gt;systemd-sysext
refresh&lt;/code&gt; line earlier. That's because &lt;code&gt;systemd-sysext&lt;/code&gt; actually does
some minimal version compatibility checks when applying system
extension images. For that it will look at the host's
&lt;code&gt;/etc/os-release&lt;/code&gt; file with
&lt;code&gt;/usr/lib/extension-release.d/extension-release.&amp;lt;name&amp;gt;&lt;/code&gt;, and refuse
operaton if the image is not actually built for the host OS
version. Here we don't want to bother with dropping that file in
there, we &lt;em&gt;know&lt;/em&gt; already that the extension image is compatible with
the host, as we just built it on it. &lt;code&gt;--force&lt;/code&gt; allows us to skip the
version check.)&lt;/p&gt;
&lt;p&gt;You might wonder: what about the combination of the idea from the
previous blog story (regarding running container's off the host
&lt;code&gt;/usr/&lt;/code&gt; tree) with system extensions? Glad you asked. Right now we
have no support for this, but it's high on our TODO list (patches
welcome, of course!). i.e. a new switch for &lt;code&gt;systemd-nspawn&lt;/code&gt; called
&lt;code&gt;--system-extension=&lt;/code&gt; that would allow merging one or more such
extensions into the container tree booted would be stellar. With that,
with a single command I could run a container off my host OS but with
a development version of systemd dropped in, all without any
persistence. How awesome would that be?&lt;/p&gt;
&lt;p&gt;(Oh, and in case you wonder, all of this only works with distributions
that have completed the &lt;code&gt;/usr/&lt;/code&gt; merge. On legacy distributions that
didn't do that and still place parts of &lt;code&gt;/usr/&lt;/code&gt; all over the hierarchy
the above won't work, since merging &lt;code&gt;/usr/&lt;/code&gt; trees via &lt;code&gt;overlayfs&lt;/code&gt; is
pretty pointess if the OS is not hermetic in &lt;code&gt;/usr/&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;And that's all for now. Happy hacking!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 27 Apr 2022 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2022-04-27:/blog/testing-my-system-code-in-usr-without-modifying-usr.html</guid><category>projects</category></item><item><title>Running a Container off the Host /usr/</title><link>https://0pointer.net/blog/running-an-container-off-the-host-usr.html</link><description>&lt;p&gt;Apparently, in some parts of &lt;a href="https://lwn.net/Articles/890219/"&gt;this
world&lt;/a&gt;, the &lt;code&gt;/usr/&lt;/code&gt;-merge
transition is still ongoing. Let's take the opportunity to have a look
at one specific way to take benefit of the &lt;code&gt;/usr/&lt;/code&gt;-merge (and
associated work) IRL.&lt;/p&gt;
&lt;p&gt;I develop system-level software as you might know. Oftentimes I want
to run my development code on my PC but be reasonably sure it cannot
destroy or otherwise negatively affect my host system. Now I could set
up a container tree for that, and boot into that. But often I am too
lazy for that, I don't want to bother with a slow package manager
setting up a new OS tree for me. So here's what I often do instead —
and this only works because of the &lt;code&gt;/usr/&lt;/code&gt;-merge.&lt;/p&gt;
&lt;p&gt;I run a command like the following (without any preparatory work):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;systemd-nspawn&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;--directory&lt;span class="o"&gt;=&lt;/span&gt;/&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;--volatile&lt;span class="o"&gt;=&lt;/span&gt;yes&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;-U&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;--set-credential&lt;span class="o"&gt;=&lt;/span&gt;passwd.hashed-password.root:&lt;span class="k"&gt;$(&lt;/span&gt;mkpasswd&lt;span class="w"&gt; &lt;/span&gt;mysecret&lt;span class="k"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;--set-credential&lt;span class="o"&gt;=&lt;/span&gt;firstboot.locale:C.UTF-8&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;--bind-user&lt;span class="o"&gt;=&lt;/span&gt;lennart&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;-b
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And then I very quickly get a login prompt on a container that runs
the exact same software as my host — but is also isolated from the
host. I do &lt;em&gt;not&lt;/em&gt; need to prepare any separate OS tree or anything
else. It &lt;em&gt;just&lt;/em&gt; works. And my host user &lt;code&gt;lennart&lt;/code&gt; is &lt;em&gt;just&lt;/em&gt; there,
ready for me to log into.&lt;/p&gt;
&lt;p&gt;So here's what these
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"&gt;&lt;code&gt;systemd-nspawn&lt;/code&gt;&lt;/a&gt;
options specifically do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;--directory=/&lt;/code&gt; tells &lt;code&gt;systemd-nspawn&lt;/code&gt; to run off the host OS'
    file hierarchy. That smells like danger of course, running two
    OS instances off the same directory hierarchy. But don't be
    scared, because:&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;--volatile=yes&lt;/code&gt; enables volatile mode. Specifically this means
    what we configured with &lt;code&gt;--directory=/&lt;/code&gt; as root file system is
    slightly rearranged. Instead of mounting that tree as it is, we'll
    mount a &lt;code&gt;tmpfs&lt;/code&gt; instance as actual root file system, and then
    mount the &lt;code&gt;/usr/&lt;/code&gt; subdirectory of the specified hierarchy into the
    &lt;code&gt;/usr/&lt;/code&gt; subdirectory of the container file hierarchy in read-only
    fashion – and &lt;em&gt;only&lt;/em&gt; that directory. So now we have a container
    directory tree that is basically empty, but imports all host OS
    binaries and libraries into its &lt;code&gt;/usr/&lt;/code&gt; tree. All software
    installed on the host is also available in the container with no
    manual work. This mechanism only works because on &lt;code&gt;/usr/&lt;/code&gt;-merged
    OSes vendor resources are monopolized at a single place:
    &lt;code&gt;/usr/&lt;/code&gt;. It's sufficient to share that one directory with the
    container to get a second instance of the host OS running. Note
    that this means &lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/var/&lt;/code&gt; will be entirely empty
    initially when this second system boots up. Thankfully, forward
    looking distributions (such as Fedora) have adopted
    &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"&gt;&lt;code&gt;systemd-tmpfiles&lt;/code&gt;&lt;/a&gt;
    and
    &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"&gt;&lt;code&gt;systemd-sysusers&lt;/code&gt;&lt;/a&gt;
    quite pervasively, so that system users and files/directories
    required for operation are created automatically should they be
    missing. Thus, even though at boot the mentioned directories are
    initially empty, once the system is booted up they are
    sufficiently populated for things to just work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;-U&lt;/code&gt; means we'll enable user namespacing, in fully automatic
    mode. This does three things: it picks a free host UID range
    dynamically for the container, then sets up user namespacing for
    the container processes mapping host UID range to UIDs 0…65534 in
    the container. It then sets up a similar UID mapped mount on the
    &lt;code&gt;/usr/&lt;/code&gt; tree of the container. Net effect: file ownerships as set
    on the host OS tree appear as they belong to the very &lt;em&gt;same&lt;/em&gt; users
    inside of the container environment, except that we use user
    namespacing for everything, and thus the users are &lt;em&gt;actually&lt;/em&gt;
    neatly isolated from the host.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;--set-credential=passwd.hashed-password.root:$(mkpasswd
    mysecret)&lt;/code&gt; passes a &lt;em&gt;credential&lt;/em&gt; to the container. Credentials are
    bits of data that you can pass to systemd services and whole
    systems. They are actually awesome concepts (e.g. they support
    TPM2 authentication/encryption that just works!) but I am not going
    to go into details around that, given it's off-topic in this
    specific scenario. Here we just take benefit of the fact that
    &lt;code&gt;systemd-sysusers&lt;/code&gt; looks for a credential called
    &lt;code&gt;passwd.hashed-password.root&lt;/code&gt; to initialize the root password of
    the system from. We set it to &lt;code&gt;mysecret&lt;/code&gt;. This means once the
    system is booted up we can log in as &lt;code&gt;root&lt;/code&gt; and the supplied
    password. Yay. (Remember, &lt;code&gt;/etc/&lt;/code&gt; is initially empty on this
    container, and thus also carries no &lt;code&gt;/etc/passwd&lt;/code&gt; or
    &lt;code&gt;/etc/shadow&lt;/code&gt;, and thus has no root user record, and thus no root
    password.)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://linux.die.net/man/1/mkpasswd"&gt;&lt;code&gt;mkpasswd&lt;/code&gt;&lt;/a&gt; is a tool then
converts a plain text password into a UNIX hashed password, which
is what this specific credential expects.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Similar, &lt;code&gt;--set-credential=firstboot.locale:C.UTF-8&lt;/code&gt; tells the
    &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-firstboot.html"&gt;&lt;code&gt;systemd-firstboot&lt;/code&gt;&lt;/a&gt;
    service in the container to initialize &lt;code&gt;/etc/locale.conf&lt;/code&gt; with
    this locale.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;--bind-user=lennart&lt;/code&gt; binds the host user &lt;code&gt;lennart&lt;/code&gt; into the
    container, also as user &lt;code&gt;lennart&lt;/code&gt;. This does two things: it mounts
    the host user's home directory into the container. It also copies
    a minimal user record of the specified user into the container
    that
    &lt;a href="https://www.freedesktop.org/software/systemd/man/nss-systemd.html"&gt;&lt;code&gt;nss-systemd&lt;/code&gt;&lt;/a&gt;
    then picks up and includes in the regular user database. This
    means, once the container is booted up I can log in as &lt;code&gt;lennart&lt;/code&gt;
    with my regular password, and once I logged in I will see my
    regular host home directory, and can make changes to it. Yippieh!
    (This does a couple of more things, such as UID mapping and
    things, but let's not get lost in too much details.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, if I run this, I will very quickly get a login prompt, where I can
log into as my regular user. I have full access to my host home
directory, but otherwise everything is nicely isolated from the host,
and changes outside of the home directory are either prohibited or are
volatile, i.e. go to a &lt;code&gt;tmpfs&lt;/code&gt; instance whose lifetime is bound to the
container's lifetime: when I shut down the container I just started,
then any changes outside of my user's home directory are lost.&lt;/p&gt;
&lt;p&gt;Note that while here I use &lt;code&gt;--volatile=yes&lt;/code&gt; in combination with
&lt;code&gt;--directory=/&lt;/code&gt; you can actually use it on any OS hierarchy, i.e. just
about any directory that contains OS binaries.&lt;/p&gt;
&lt;p&gt;Similar, the &lt;code&gt;--bind-user=&lt;/code&gt; stuff works with any OS hierarchy too (but
do note that only systemd 249 and newer will pick up the user records
passed to the container that way, i.e. this requires at least v249
both on the host and in the container to work).&lt;/p&gt;
&lt;p&gt;Or in short: the possibilities are endless!&lt;/p&gt;
&lt;h2&gt;Requirements&lt;/h2&gt;
&lt;p&gt;For this all to work, you need:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;A recent kernel (5.15 should suffice, as it brings UID mapped
   mounts for the most common file systems, so that &lt;code&gt;-U&lt;/code&gt; and
   &lt;code&gt;--bind-user=&lt;/code&gt; can work well.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A recent systemd (249 should suffice, which brings &lt;code&gt;--bind-user=&lt;/code&gt;,
   and a &lt;code&gt;-U&lt;/code&gt; switch backed by UID mapped mounts).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A distribution that adopted the &lt;code&gt;/usr/&lt;/code&gt;-merge, &lt;code&gt;systemd-tmpfiles&lt;/code&gt;
   and &lt;code&gt;systemd-sysusers&lt;/code&gt; so that the directory hierarchy and user
   databases are automatically populated when empty at boot.  (Fedora
   35 should suffice.)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Limitations&lt;/h2&gt;
&lt;p&gt;While a lot of today's software actually out of the box works well on
systems that come up with an unpopulated &lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/var/&lt;/code&gt;, and
either fall back to reasonable built-in defaults, or deploy
&lt;code&gt;systemd-tmpfiles&lt;/code&gt; to create what is missing, things aren't perfect:
some software typically installed an desktop OSes will fail to start
when invoked in such a container, and be visible as ugly failed
services, but it won't stop me from logging in and using the system
for what I want to use it. It would be excellent to get that fixed,
though. This can either be fixed in the relevant software upstream
(i.e. if opening your configuration file fails with &lt;code&gt;ENOENT&lt;/code&gt;, then
just default to reasonable defaults), or in the distribution packaging
(i.e. add a
&lt;a href="https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html"&gt;&lt;code&gt;tmpfiles.d/&lt;/code&gt;&lt;/a&gt;
file that copies or symlinks in skeleton configuration from
&lt;code&gt;/usr/share/factory/etc/&lt;/code&gt; via the &lt;code&gt;C&lt;/code&gt; or &lt;code&gt;L&lt;/code&gt; line types).&lt;/p&gt;
&lt;p&gt;And then there's certain software dealing with hardware management and
similar that simply cannot work in a container (as device APIs on
Linux are generally not virtualized for containers) reasonably. It
would be excellent if software like that would be updated to carry
&lt;code&gt;ConditionVirtualization=!container&lt;/code&gt; or
&lt;code&gt;ConditionPathIsReadWrite=/sys&lt;/code&gt; conditionalization in their unit
files, so that it is automatically – cleanly – skipped when executed
in such a container environment.&lt;/p&gt;
&lt;p&gt;And that's all for now.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 06 Apr 2022 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2022-04-06:/blog/running-an-container-off-the-host-usr.html</guid><category>projects</category></item><item><title>Authenticated Boot and Disk Encryption on Linux</title><link>https://0pointer.net/blog/authenticated-boot-and-disk-encryption-on-linux.html</link><description>&lt;h1&gt;The Strange State of Authenticated Boot and Disk Encryption on Generic Linux Distributions&lt;/h1&gt;
&lt;p&gt;&lt;em&gt;TL;DR: Linux has been supporting Full Disk Encryption (FDE) and
technologies such as UEFI SecureBoot and TPMs for a long
time. However, the way they are set up by most distributions is not as
secure as they should be, and in some ways quite frankly weird. In
fact, right now, your data is probably more secure if stored on
current ChromeOS, Android, Windows or MacOS devices, than it is on
typical Linux distributions.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Generic Linux distributions (i.e. Debian, Fedora, Ubuntu, …) adopted
Full Disk Encryption (FDE) more than 15 years ago, with the
LUKS/cryptsetup infrastructure. It was a big step forward to a more
secure environment. Almost ten years ago the big distributions started
adding UEFI SecureBoot to their boot process. Support for Trusted
Platform Modules (TPMs) has been added to the distributions a long
time ago as well — but even though many PCs/laptops these days have
TPM chips on-board it's generally not used in the default setup of
generic Linux distributions.&lt;/p&gt;
&lt;p&gt;How these technologies currently fit together on generic Linux
distributions doesn't really make too much sense to me — and falls
short of what they could actually deliver. In this story I'd like to
have a closer look at why I think that, and what I propose to do about
it.&lt;/p&gt;
&lt;h2&gt;The Basic Technologies&lt;/h2&gt;
&lt;p&gt;Let's have a closer look what these technologies actually deliver:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;LUKS/&lt;code&gt;dm-crypt&lt;/code&gt;/&lt;code&gt;cryptsetup&lt;/code&gt; provide disk encryption, and optionally
   data authentication. Disk encryption means that reading the data in
   clear-text form is only possible if you possess a secret of some
   form, usually a password/passphrase. Data authentication means that
   no one can make changes to the data on disk unless they possess a
   secret of some form. Most distributions only enable the former
   though — the latter is a more recent addition to LUKS/cryptsetup,
   and is not used by default on most distributions (though it
   probably should be). Closely related to LUKS/&lt;code&gt;dm-crypt&lt;/code&gt; is
   &lt;code&gt;dm-verity&lt;/code&gt; (which can authenticate immutable volumes) and
   &lt;code&gt;dm-integrity&lt;/code&gt; (which can authenticate writable volumes, among
   other things).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UEFI SecureBoot provides mechanisms for authenticating boot loaders
   and other pre-OS binaries before they are invoked. If those boot
   loaders then authenticate the next step of booting in a similar
   fashion there's a chain of trust which can ensure that only code
   that has some level of trust associated with it will run on the
   system. Authentication of boot loaders is done via cryptographic
   signatures: the OS/boot loader vendors cryptographically sign their
   boot loader binaries. The cryptographic certificates that may be
   used to validate these signatures are then signed by Microsoft, and
   since Microsoft's certificates are basically built into all of
   today's PCs and laptops this will provide some basic trust chain:
   if you want to modify the boot loader of a system you must have
   access to the private key used to sign the code (or to the private
   keys further up the certificate chain).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;TPMs do many things. For this text we'll focus one facet: they can
   be used to protect secrets (for example for use in disk encryption,
   see above), that are released only if the code that booted the host
   can be authenticated in some form. This works roughly like this:
   every component that is used during the boot process (i.e. code,
   certificates, configuration, …) is hashed with a cryptographic hash
   function before it is used. The resulting hash is written to some
   small volatile memory the TPM maintains that is write-only (the so
   called Platform Configuration Registers, "PCRs"): each step of the
   boot process will write hashes of the resources needed by the next
   part of the boot process into these PCRs. The PCRs cannot be
   written freely: the hashes written are combined with what is
   already stored in the PCRs — also through hashing and the result of
   that then replaces the previous value. Effectively this means: only
   if every component involved in the boot matches expectations the
   hash values exposed in the TPM PCRs match the expected values
   too. And if you then use those values to unlock the secrets you
   want to protect you can guarantee that the key is only released to
   the OS if the expected OS and configuration is booted. The process
   of hashing the components of the boot process and writing that to
   the TPM PCRs is called "measuring". What's also important to
   mention is that the secrets are not only protected by these PCR
   values but encrypted with a "seed key" that is generated on the TPM
   chip itself, and cannot leave the TPM (at least so goes the
   theory). The idea is that you cannot read out a TPM's seed key, and
   thus you cannot duplicate the chip: unless you possess the
   original, physical chip you cannot retrieve the secret it might be
   able to unlock for you. Finally, TPMs can enforce a limit on unlock
   attempts per time ("anti-hammering"): this makes it hard to brute
   force things: if you can only execute a certain number of unlock
   attempts within some specific time then brute forcing will be
   prohibitively slow.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;How Linux Distributions use these Technologies&lt;/h2&gt;
&lt;p&gt;As mentioned already, Linux distributions adopted the first two
of these technologies widely, the third one not so much.&lt;/p&gt;
&lt;p&gt;So typically, here's how the boot process of Linux distributions works
these days:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The UEFI firmware invokes a piece of code called "shim" (which is
   stored in the EFI System Partition — the "ESP" — of your system),
   that more or less is just a list of certificates compiled into code
   form. The shim is signed with the aforementioned Microsoft key,
   that is built into all PCs/laptops. This list of certificates then
   can be used to validate the next step of the boot process. The shim
   is measured by the firmware into the TPM. (Well, the shim can do a
   bit more than what I describe here, but this is outside of the
   focus of this article.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The shim then invokes a boot loader (often Grub) that is signed by
   a private key owned by the distribution vendor. The boot loader is
   stored in the ESP as well, plus some other places (i.e. possibly a
   separate boot partition). The corresponding certificate is included
   in the list of certificates built into the shim. The boot loader
   components are also measured into the TPM.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The boot loader then invokes the kernel and passes it an initial
   RAM disk image (initrd), which contains initial userspace code. The
   kernel itself is signed by the distribution vendor too. It's also
   validated via the shim. The initrd is not validated, though
   (!). The kernel is measured into the TPM, the initrd sometimes too.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The kernel unpacks the initrd image, and invokes what is contained
   in it. Typically, the initrd then asks the user for a password for
   the encrypted root file system. The initrd then uses that to set up
   the encrypted volume. No code authentication or TPM measurements
   take place.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The initrd then transitions into the root file system. No code
   authentication or TPM measurements take place.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When the OS itself is up the user is prompted for their user name,
   and their password. If correct, this will unlock the user account:
   the system is now ready to use. At this point no code
   authentication, no TPM measurements take place. Moreover, the
   user's password is not used to unlock any data, it's used only to
   allow or deny the login attempt — the user's data has already been
   decrypted a long time ago, by the initrd, as mentioned above.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;What you'll notice here of course is that code validation happens for
the shim, the boot loader and the kernel, but not for the initrd or
the main OS code anymore. TPM measurements might go one step further:
the initrd is measured sometimes too, if you are lucky. Moreover, you
might notice that the disk encryption password and the user password
are inquired by code that is not validated, and is thus not safe from
external manipulation. You might also notice that even though TPM
measurements of boot loader/OS components are done nothing actually
ever makes use of the resulting PCRs in the typical setup.&lt;/p&gt;
&lt;h2&gt;Attack Scenarios&lt;/h2&gt;
&lt;p&gt;Of course, before determining whether the setup described above makes
sense or not, one should have an idea what one actually intends to
protect against.&lt;/p&gt;
&lt;p&gt;The most basic attack scenario to focus on is probably that you want
to be reasonably sure that if someone steals your laptop that contains
all your data then this data remains confidential. The model described
above probably delivers that to some degree: the full disk encryption
when used with a reasonably strong password should make it hard for
the laptop thief to access the data. The data is as secure as the
password used is strong. The attacker might attempt to brute force the
password, thus if the password is not chosen carefully the attacker
might be successful.&lt;/p&gt;
&lt;p&gt;Two more interesting attack scenarios go something like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Instead of stealing your laptop the attacker takes the harddisk
   from your laptop while you aren't watching (e.g. while you went for
   a walk and left it at home or in your hotel room), makes a copy of
   it, and then puts it back. You'll never notice they did that. The
   attacker then analyzes the data in their lab, maybe trying to brute
   force the password. In this scenario you won't even know that your
   data is at risk, because for you nothing changed — unlike in the
   basic scenario above. If the attacker manages to break your
   password they have full access to the data included on it,
   i.e. everything you so far stored on it, but not necessarily on
   what you are going to store on it later. This scenario is worse
   than the basic one mentioned above, for the simple fact that you
   won't know that you might be attacked. (This scenario could be
   extended further: maybe the attacker has a chance to watch you type
   in your password or so, effectively lowering the password
   strength.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Instead of stealing your laptop the attacker takes the harddisk
   from your laptop while you aren't watching, inserts backdoor code
   on it, and puts it back. In this scenario you won't know your data
   is at risk, because physically everything is as before. What's
   really bad though is that the attacker gets access to anything you
   do on your laptop, both the data already on it, and whatever you
   will do in the future.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I think in particular this backdoor attack scenario is something we
should be concerned about. We know for a fact that attacks like that
happen all the time (Pegasus, industry espionage, …), hence we should
make them hard.&lt;/p&gt;
&lt;h2&gt;Are we Safe?&lt;/h2&gt;
&lt;p&gt;So, does the scheme so far implemented by generic Linux distributions
protect us against the latter two scenarios? Unfortunately not at
all. Because distributions set up disk encryption the way they do, and
only bind it to a user password, an attacker can easily duplicate the
disk, and then attempt to brute force your password. What's worse:
since code authentication ends at the kernel — and the initrd is not
authenticated anymore —, backdooring is trivially easy: an attacker
can change the initrd any way they want, without having to fight any
kind of protections. And given that FDE unlocking is implemented in
the initrd, and it's the initrd that asks for the encryption password
things are just too easy: an attacker could trivially easily insert
some code that picks up the FDE password as you type it in and send it
wherever they want. And not just that: since once they are in they are
in, they can do anything they like for the rest of the system's
lifecycle, with full privileges — including installing backdoors for
versions of the OS or kernel that are installed on the device in the
future, so that their backdoor remains open for as long as they like.&lt;/p&gt;
&lt;p&gt;That is sad of course. It's particular sad given that the other
popular OSes all address this much better. ChromeOS, Android, Windows
and MacOS all have way better built-in protections against attacks
like this. And it's why one can certainly claim that your data is
probably better protected right now if you store it on those OSes then
it is on generic Linux distributions.&lt;/p&gt;
&lt;p&gt;(Yeah, I know that there are some niche distros which do this better,
and some hackers hack their own. But I care about general purpose
distros here, i.e. the big ones, that most people base their work on.)&lt;/p&gt;
&lt;p&gt;Note that there are more problems with the current setup. For example,
it's really weird that during boot the user is queried for an FDE
password which actually protects their data, and then once the system
is up they are queried again – now asking for a username, and another
password. And the weird thing is that this second authentication that
appears to be user-focused doesn't really protect the user's data
anymore — at that moment the data is already unlocked and
accessible. The username/password query is supposed to be useful in
multi-user scenarios of course, but how does that make any sense,
given that these multiple users would all have to know a disk
encryption password that unlocks the whole thing during the FDE step,
and thus they have access to every user's data anyway if they make an
offline copy of the harddisk?&lt;/p&gt;
&lt;h2&gt;Can we do better?&lt;/h2&gt;
&lt;p&gt;Of course we can, and that is what this story is actually supposed to
be about.&lt;/p&gt;
&lt;p&gt;Let's first figure out what the minimal issues we should fix are (at
least in my humble opinion):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The initrd must be authenticated before being booted into. (And
   measured unconditionally.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The OS binary resources (i.e. &lt;code&gt;/usr/&lt;/code&gt;) must be authenticated before
   being booted into. (But don't need to be encrypted, since everyone
   has the same anyway, there's nothing to hide here.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The OS configuration and state (i.e. &lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/var/&lt;/code&gt;) must be
   encrypted, and authenticated before they are used. The encryption
   key should be bound to the TPM device; i.e system data should be
   locked to a security concept belonging to the system, not the user.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The user's home directory (i.e. &lt;code&gt;/home/lennart/&lt;/code&gt; and similar) must
   be encrypted and authenticated. The unlocking key should be bound
   to a user password or user security token (FIDO2 or PKCS#11 token);
   i.e. user data should be locked to a security concept belonging to
   the user, not the system.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Or to summarize this differently:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Every single component of the boot
   process and OS needs to be authenticated, i.e. all of shim (done),
   boot loader (done), kernel (done), initrd (missing so far), OS binary
   resources (missing so far), OS configuration and state (missing so
   far), the user's home (missing so far).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Encryption is necessary for the OS configuration and state (bound
   to TPM), and for the user's home directory (bound to a user
   password or user security token).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;In Detail&lt;/h2&gt;
&lt;p&gt;Let's see how we can achieve the above in more detail.&lt;/p&gt;
&lt;h3&gt;How to Authenticate the initrd&lt;/h3&gt;
&lt;p&gt;At the moment initrds are generated on the installed host via scripts
(dracut and similar) that try to figure out a minimal set of binaries
and configuration data to build an initrd that contains just enough to
be able to find and set up the root file system. What is included in
the initrd hence depends highly on the individual installation and its
configuration. Pretty likely no two initrds generated that way will be
fully identical due to this. This model clearly has benefits: the
initrds generated this way are very small and minimal, and support
exactly what is necessary for the system to boot, and not less or
more. It comes with serious drawbacks too though: the generation
process is fragile and sometimes more akin to black magic than
following clear rules: the generator script natively has to understand
a myriad of storage stacks to determine what needs to be included and
what not. It also means that authenticating the image is hard: given
that each individual host gets a different specialized initrd, it
means we cannot just sign the initrd with the vendor key like we sign
the kernel. If we want to keep this design we'd have to figure out
some other mechanism (e.g. a per-host signature key – that is
generated locally; or by authenticating it with a message
authentication code bound to the TPM). While these approaches are
certainly thinkable, I am not convinced they actually are a good idea
though: locally and dynamically generated per-host initrds is
something we probably should move away from.&lt;/p&gt;
&lt;p&gt;If we move away from locally generated initrds, things become a lot
simpler. If the distribution vendor generates the initrds on their
build systems then it can be attached to the kernel image itself, and
thus be signed and measured along with the kernel image, without any
further work. This simplicity is simply lovely. Besides robustness and
reproducibility this gives us an easy route to authenticated initrds.&lt;/p&gt;
&lt;p&gt;But of course, nothing is really that simple: working with
vendor-generated initrds means that we can't adjust them anymore to
the specifics of the individual host: if we pre-build the initrds and
include them in the kernel image in immutable fashion then it becomes
harder to support complex, more exotic storage or to parameterize it
with local network server information, credentials, passwords, and so
on. Now, for my simple laptop use-case these things don't matter,
there's no need to extend/parameterize things, laptops and their
setups are not that wildly different. But what to do about the cases
where we want both: extensibility to cover for less common storage
subsystems (iscsi, LVM, multipath, drivers for exotic hardware…) and
parameterization?&lt;/p&gt;
&lt;p&gt;Here's a proposal how to achieve that: let's build a basic initrd into
the kernel as suggested, but then do two things to make this scheme
both extensible and parameterizable, without compromising security.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Let's define a way how the basic initrd can be extended with
   additional files, which are stored in separate "extension
   images". The basic initrd should be able to discover these extension
   images, authenticate them and then activate them, thus extending
   the initrd with additional resources on-the-fly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Let's define a way how we can safely pass additional parameters to
   the kernel/initrd (and actually the rest of the OS, too) in an
   authenticated (and possibly encrypted) fashion. Parameters in this
   context can be anything specific to the local installation,
   i.e. server information, security credentials, certificates, SSH
   server keys, or even just the root password that shall be able to
   unlock the root account in the initrd …&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In such a scheme we should be able to deliver everything we are
looking for:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;We'll have a full trust chain for the code: the boot loader will
   authenticate and measure the kernel and basic initrd. The initrd
   extension images will then be authenticated by the basic initrd
   image.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We'll have authentication for all the parameters passed to the
   initrd.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This so far sounds very unspecific? Let's make it more specific by
looking closer at the components I'd suggest to be used for this
logic:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The &lt;code&gt;systemd&lt;/code&gt; suite since a few months contains a subsystem
   implementing &lt;em&gt;system&lt;/em&gt; &lt;em&gt;extensions&lt;/em&gt; (v248). System extensions are
   ultimately just disk images (for example a squashfs file system in
   a GPT envelope) that can &lt;em&gt;extend&lt;/em&gt; an underlying OS tree. Extending
   in this regard means they simply add additional files and
   directories into the OS tree, i.e. below &lt;code&gt;/usr/&lt;/code&gt;. For a longer
   explanation see
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysext.html"&gt;systemd-sysext(8)&lt;/a&gt;. When
   a system extension is activated it is simply mounted and then
   merged into the main &lt;code&gt;/usr/&lt;/code&gt; tree via a read-only overlayfs
   mount. Now what's particularly nice about them in this context we
   are talking about here is that the extension images may carry
   &lt;em&gt;dm-verity&lt;/em&gt; authentication data, and PKCS#7 signatures (&lt;a href="https://github.com/systemd/systemd/pull/20691"&gt;once this
   is merged, that
   is, i.e. v250&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;code&gt;systemd&lt;/code&gt; suite also contains a concept called service
   "credentials". These are small pieces of information passed to
   services in a secure way. One key feature of these credentials is
   that they can be encrypted and authenticated in a very simple way
   with a key bound to the TPM (v250). See
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#LoadCredential=ID:PATH"&gt;LoadCredentialEncrypted=&lt;/a&gt;
   and
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html"&gt;systemd-creds(1)&lt;/a&gt;
   for details. They are great for safely storing SSL private keys and
   similar on your system, but they also come handy for parameterizing
   initrds: an encrypted credential is just a file that can only be
   decoded if the right TPM is around with the right PCR values set.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;code&gt;systemd&lt;/code&gt; suite contains a component called
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-stub.html"&gt;systemd-stub(7)&lt;/a&gt;. It's
   an EFI stub, i.e. a small piece of code that is attached to a
   kernel image, and turns the kernel image into a regular EFI binary
   that can be directly executed by the firmware (or a boot
   loader). This stub has a number of nice features (for example, it
   can show a boot splash before invoking the Linux kernel itself and
   such). &lt;a href="https://github.com/systemd/systemd/pull/20789"&gt;Once this work is
   merged (v250)&lt;/a&gt; the stub
   will support one more feature: it will automatically search for
   system extension image files and credential files next to the
   kernel image file, measure them and pass them on to the main initrd
   of the host.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Putting this together we have nice way to provide fully authenticated
kernel images, initrd images and initrd extension images; as well as
encrypted and authenticated parameters via the credentials logic.&lt;/p&gt;
&lt;p&gt;How would a distribution actually make us of this? A distribution
vendor would pre-build the basic initrd, and glue it into the kernel
image, and sign that as a whole. Then, for each supposed extension of
the basic initrd (e.g. one for iscsi support, one for LVM, one for
multipath, …), the vendor would use a tool such as
&lt;a href="https://github.com/systemd/mkosi"&gt;mkosi&lt;/a&gt; to build an extension image,
i.e. a GPT disk image containing the files in squashfs format, a
Verity partition that authenticates it, plus a PKCS#7 signature
partition that validates the root hash for the dm-verity partition,
and that can be checked against a key provided by the boot loader or
main initrd. Then, any parameters for the initrd will be encrypted
using &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-creds.html"&gt;systemd-creds encrypt
-T&lt;/a&gt;. The
resulting encrypted credentials and the initrd extension images are
then simply placed next to the kernel image in the ESP (or boot
partition). Done.&lt;/p&gt;
&lt;p&gt;This checks all boxes: everything is authenticated and measured, the
credentials also encrypted. Things remain extensible and modular, can
be pre-built by the vendor, and installation is as simple as dropping
in one file for each extension and/or credential.&lt;/p&gt;
&lt;h3&gt;How to Authenticate the Binary OS Resources&lt;/h3&gt;
&lt;p&gt;Let's now have a look how to authenticate the Binary OS resources,
i.e. the stuff you find in &lt;code&gt;/usr/&lt;/code&gt;, i.e. the stuff traditionally
shipped to the user's system via RPMs or DEBs.&lt;/p&gt;
&lt;p&gt;I think there are three relevant ways how to authenticate this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Make &lt;code&gt;/usr/&lt;/code&gt; a &lt;code&gt;dm-verity&lt;/code&gt; volume. &lt;code&gt;dm-verity&lt;/code&gt; is a concept
   implemented in the Linux kernel that provides authenticity to
   read-only block devices: every read access is cryptographically
   verified against a &lt;em&gt;top-level&lt;/em&gt; &lt;em&gt;hash&lt;/em&gt; &lt;em&gt;value&lt;/em&gt;. This top-level
   hash is typically a 256bit value that you can either encode in the
   kernel image you are using, or cryptographically sign (&lt;a href="https://github.com/systemd/systemd/pull/20691"&gt;which is
   particularly nice once this is
   merged&lt;/a&gt;). I think
   this is actually the best approach since it makes the &lt;code&gt;/usr/&lt;/code&gt; tree
   entirely immutable in a very simple way. However, this also means
   that the whole of &lt;code&gt;/usr/&lt;/code&gt; needs to be updated as once, i.e. the
   traditional &lt;code&gt;rpm&lt;/code&gt;/&lt;code&gt;apt&lt;/code&gt; based update logic cannot work in this
   mode.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Make &lt;code&gt;/usr/&lt;/code&gt; a &lt;code&gt;dm-integrity&lt;/code&gt; volume. &lt;code&gt;dm-integrity&lt;/code&gt; is a concept
   provided by the Linux kernel that offers integrity guarantees to
   writable block devices, i.e. in some ways it can be considered to be
   a bit like &lt;code&gt;dm-verity&lt;/code&gt; while permitting write access. It can be
   used in three ways, one of which I think is particularly relevant
   here. The first way is with a simple hash function in "stand-alone"
   mode: this is not too interesting here, it just provides greater
   data safety for file systems that don't hash check their files' data
   on their own. The second way is in combination with &lt;code&gt;dm-crypt&lt;/code&gt;,
   i.e. with disk encryption. In this case it adds authenticity to
   confidentiality: only if you know the right secret you can read and
   make changes to the data, and any attempt to make changes without
   knowing this secret key will be detected as IO error on next read
   by those in possession of the secret (more about this below). The
   third way is the one I think is most interesting here: in
   "stand-alone" mode, but with a keyed hash function
   (e.g. HMAC). What's this good for? This provides authenticity
   without encryption: if you make changes to the disk without knowing
   the secret this will be noticed on the next read attempt of the
   data and result in IO errors. This mode provides what we want
   (authenticity) and doesn't do what we don't need (encryption). Of
   course, the secret key for the HMAC must be provided somehow, I
   think ideally by the TPM.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Make &lt;code&gt;/usr/&lt;/code&gt; a &lt;code&gt;dm-crypt&lt;/code&gt; (LUKS) + &lt;code&gt;dm-integrity&lt;/code&gt; volume. This
   provides both authenticity and encryption. The latter isn't
   typically needed for &lt;code&gt;/usr/&lt;/code&gt; given that it generally contains no
   secret data: anyone can download the binaries off the Internet
   anyway, and the sources too. By encrypting this you'll waste CPU
   cycles, but beyond that it doesn't hurt much. (Admittedly, some
   people might want to hide the precise set of packages they have
   installed, since it of course does reveal a bit of information
   about you: i.e. what you are working on, maybe what your job is –
   think: if you are a hacker you have hacking tools installed – and
   similar). Going this way might simplify things in some cases, as it
   means you don't have to distinguish "OS binary resources" (i.e
   &lt;code&gt;/usr/&lt;/code&gt;) and "OS configuration and state" (i.e. &lt;code&gt;/etc/&lt;/code&gt; + &lt;code&gt;/var/&lt;/code&gt;,
   see below), and just make it the same volume. Here too, the secret
   key must be provided somehow, I think ideally by the TPM.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;All three approach are valid. The first approach has my primary
sympathies, but for distributions not willing to abandon client-side
updates via RPM/dpkg this is not an option, in which case I would
propose the other two approaches for these cases.&lt;/p&gt;
&lt;p&gt;The LUKS encryption key (and in case of &lt;code&gt;dm-integrity&lt;/code&gt; standalone mode
the key for the keyed hash function) should be bound to the TPM. Why
the TPM for this? You could also use a user password, a FIDO2 or
PKCS#11 security token — but I think TPM is the right choice: why
that? To reduce the requirement for repeated authentication, i.e. that
you first have to provide the disk encryption password, and then you
have to login, providing another password. It should be possible that
the system boots up unattended and then only one authentication prompt
is needed to unlock the user's data properly. The TPM provides a way
to do this in a reasonably safe and fully unattended way. Also, when
we stop considering just the laptop use-case for a moment: on servers
interactive disk encryption prompts don't make much sense — the fact
that TPMs can provide secrets without this requiring user interaction
and thus the ability to work in entirely unattended environments is
quite desirable. Note that
&lt;a href="https://www.freedesktop.org/software/systemd/man/crypttab.html"&gt;crypttab(5)&lt;/a&gt;
as implemented by &lt;code&gt;systemd&lt;/code&gt; (v248) provides native support for
authentication via password, via TPM2, via PKCS#11 or via FIDO2, so
the choice is ultimately all yours.&lt;/p&gt;
&lt;h3&gt;How to Encrypt/Authenticate OS Configuration and State&lt;/h3&gt;
&lt;p&gt;Let's now look at the OS configuration and state, i.e. the stuff in
&lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/var/&lt;/code&gt;. It probably makes sense to not consider these two
hierarchies independently but instead just consider this to be the
root file system. If the OS binary resources are in a separate file
system it is then mounted onto the &lt;code&gt;/usr/&lt;/code&gt; sub-directory of the root
file system.&lt;/p&gt;
&lt;p&gt;The OS configuration and state (or: root file system) should be both
encrypted and authenticated: it might contain secret keys, user
passwords, privileged logs and similar. This data matters and contains
plenty data that should remain confidential.&lt;/p&gt;
&lt;p&gt;The encryption of choice here is &lt;code&gt;dm-crypt&lt;/code&gt; (LUKS) + &lt;code&gt;dm-integrity&lt;/code&gt;
similar as discussed above, again with the key bound to the TPM.&lt;/p&gt;
&lt;p&gt;If the OS binary resources are protected the same way it is safe to
merge these two volumes and have a single partition for both (see
above)&lt;/p&gt;
&lt;h3&gt;How to Encrypt/Authenticate the User's Home Directory&lt;/h3&gt;
&lt;p&gt;The data in the user's home directory should be encrypted, and bound
to the user's preferred token of authentication (i.e. a password or
FIDO2/PKCS#11 security token). As mentioned, in the traditional mode
of operation the user's home directory is not individually encrypted,
but only encrypted because FDE is in use. The encryption key for that
is a system wide key though, not a per-user key. And I think that's
problem, as mentioned (and probably not even generally understood by
our users). We should correct that and ensure that the user's password
is what unlocks the user's data.&lt;/p&gt;
&lt;p&gt;In the &lt;code&gt;systemd&lt;/code&gt; suite we provide a service
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-homed.service.html"&gt;systemd-homed(8)&lt;/a&gt;
(v245) that implements this in a safe way: each user gets its own LUKS
volume stored in a loopback file in &lt;code&gt;/home/&lt;/code&gt;, and this is enough to
synthesize a user account. The encryption password for this volume is
the user's account password, thus it's really the password provided at
login time that unlocks the user's data. &lt;code&gt;systemd-homed&lt;/code&gt; also supports
other mechanisms of authentication, in particular PKCS#11/FIDO2
security tokens. It also provides support for other storage back-ends
(such as fscrypt), but I'd always suggest to use the LUKS back-end
since it's the only one providing the comprehensive confidentiality
guarantees one wants for a UNIX-style home directory.&lt;/p&gt;
&lt;p&gt;Note that there's one special caveat here: if the user's home
directory (e.g. &lt;code&gt;/home/lennart/&lt;/code&gt;) is encrypted and authenticated, what
about the file system this data is stored on, i.e. &lt;code&gt;/home/&lt;/code&gt; itself? If
that dir is part of the the root file system this would result in
double encryption: first the data is encrypted with the TPM root file
system key, and then again with the per-user key. Such double
encryption is a waste of resources, and unnecessary. I'd thus suggest
to make &lt;code&gt;/home/&lt;/code&gt; its own &lt;code&gt;dm-integrity&lt;/code&gt; volume with a HMAC, keyed by
the TPM. This means the data stored directly in &lt;code&gt;/home/&lt;/code&gt; will be
authenticated but not encrypted. That's good not only for performance,
but also has practical benefits: it allows extracting the encrypted
volume of the various users in case the TPM key is lost, as a way to
recover from dead laptops or similar.&lt;/p&gt;
&lt;p&gt;Why authenticate &lt;code&gt;/home/&lt;/code&gt;, if it only contains per-user home
directories that are authenticated on their own anyway?  That's a
valid question: it's because the kernel file system maintainers made
clear that Linux file system code is not considered safe against rogue
disk images, and is not tested for that; this means before you mount
anything you need to establish trust in some way because otherwise
there's a risk that the act of mounting might exploit your kernel.&lt;/p&gt;
&lt;h3&gt;Summary of Resources and their Protections&lt;/h3&gt;
&lt;p&gt;So, let's now put this all together. Here's a table showing the
various resources we deal with, and how I think they should be
protected (in my idealized world).&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Needs Authentication&lt;/th&gt;
&lt;th&gt;Needs Encryption&lt;/th&gt;
&lt;th&gt;Suggested Technology&lt;/th&gt;
&lt;th&gt;Validation/Encryption Keys/Certificates acquired via&lt;/th&gt;
&lt;th&gt;Stored where&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Shim&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;SecureBoot signature verification&lt;/td&gt;
&lt;td&gt;firmware certificate database&lt;/td&gt;
&lt;td&gt;ESP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boot loader&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;firmware certificate database/shim&lt;/td&gt;
&lt;td&gt;ESP/boot partition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;initrd&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;initrd parameters&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;systemd TPM encrypted credentials&lt;/td&gt;
&lt;td&gt;TPM&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;initrd extensions&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;code&gt;systemd-sysext&lt;/code&gt; with Verity+PKCS#7 signatures&lt;/td&gt;
&lt;td&gt;firmware/initrd certificate database&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OS binary resources&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dm-verity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;root hash linked into kernel image, or firmware/initrd certificate database&lt;/td&gt;
&lt;td&gt;top-level partition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OS configuration and state&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dm-crypt&lt;/code&gt; (LUKS) + &lt;code&gt;dm-integrity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TPM&lt;/td&gt;
&lt;td&gt;top-level partition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/home/&lt;/code&gt; itself&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dm-integrity&lt;/code&gt; with HMAC&lt;/td&gt;
&lt;td&gt;TPM&lt;/td&gt;
&lt;td&gt;top-level partition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User home directories&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dm-crypt&lt;/code&gt; (LUKS) + &lt;code&gt;dm-integrity&lt;/code&gt; in loopback files&lt;/td&gt;
&lt;td&gt;User password/FIDO2/PKCS#11 security token&lt;/td&gt;
&lt;td&gt;loopback file inside &lt;code&gt;/home&lt;/code&gt; partition&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This should provide all the desired guarantees: everything is
authenticated, and the individualized per-host or per-user data
is also encrypted. No double encryption takes place. The encryption
keys/verification certificates are stored/bound to the most appropriate
infrastructure.&lt;/p&gt;
&lt;p&gt;Does this address the three attack scenarios mentioned earlier? I
think so, yes. The basic attack scenario I described is addressed by
the fact that &lt;code&gt;/var/&lt;/code&gt;, &lt;code&gt;/etc/&lt;/code&gt; and &lt;code&gt;/home/*/&lt;/code&gt; are encrypted. Brute
forcing the former two is harder than in the status quo ante model,
since a high entropy key is used instead of one derived from a user
provided password. Moreover, the "anti-hammering" logic of the TPM
will make brute forcing prohibitively slow. The home directories are
protected by the user's password or ideally a personal FIDO2/PKCS#11
security token in this model. Of course, a password isn't better
security-wise then the status quo ante. But given the FIDO2/PKCS#11
support built into &lt;code&gt;systemd-homed&lt;/code&gt; it should be easier to lock down
the home directories securely.&lt;/p&gt;
&lt;p&gt;Binding encryption of &lt;code&gt;/var/&lt;/code&gt; and &lt;code&gt;/etc/&lt;/code&gt; to the TPM also addresses
the first of the two more advanced attack scenarios: a copy of the
harddisk is useless without the physical TPM chip, since the seed key
is sealed into that. (And even if the attacker had the chance to watch
you type in your password, it won't help unless they possess access to
to the TPM chip.) For the home directory this attack is not addressed
as long as a plain password is used. However, since binding home
directories to FIDO2/PKCS#11 tokens is built into &lt;code&gt;systemd-homed&lt;/code&gt;
things should be safe here too — provided the user actually possesses
and uses such a device.&lt;/p&gt;
&lt;p&gt;The backdoor attack scenario is addressed by the fact that every
resource in play now is authenticated: it's hard to backdoor the OS if
there's no component that isn't verified by signature keys or TPM
secrets the attacker hopefully doesn't know.&lt;/p&gt;
&lt;p&gt;For general purpose distributions that focus on updating the OS per
RPM/dpkg the idealized model above won't work out, since (as
mentioned) this implies an immutable &lt;code&gt;/usr/&lt;/code&gt;, and thus requires
updating &lt;code&gt;/usr/&lt;/code&gt; via an atomic update operation. For such distros a
setup like the following is probably more realistic, but see above.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Needs Authentication&lt;/th&gt;
&lt;th&gt;Needs Encryption&lt;/th&gt;
&lt;th&gt;Suggested Technology&lt;/th&gt;
&lt;th&gt;Validation/Encryption Keys/Certificates acquired via&lt;/th&gt;
&lt;th&gt;Stored where&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Shim&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;SecureBoot signature verification&lt;/td&gt;
&lt;td&gt;firmware certificate database&lt;/td&gt;
&lt;td&gt;ESP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boot loader&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;firmware certificate database/shim&lt;/td&gt;
&lt;td&gt;ESP/boot partition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;initrd&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;initrd parameters&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;systemd TPM encrypted credentials&lt;/td&gt;
&lt;td&gt;TPM&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;initrd extensions&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;code&gt;systemd-sysext&lt;/code&gt; with Verity+PKCS#7 signatures&lt;/td&gt;
&lt;td&gt;firmware/initrd certificate database&lt;/td&gt;
&lt;td&gt;ditto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OS binary resources, configuration and state&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dm-crypt&lt;/code&gt; (LUKS) + &lt;code&gt;dm-integrity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TPM&lt;/td&gt;
&lt;td&gt;top-level partition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/home/&lt;/code&gt; itself&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dm-integrity&lt;/code&gt; with HMAC&lt;/td&gt;
&lt;td&gt;TPM&lt;/td&gt;
&lt;td&gt;top-level partition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User home directories&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dm-crypt&lt;/code&gt; (LUKS) + &lt;code&gt;dm-integrity&lt;/code&gt; in loopback files&lt;/td&gt;
&lt;td&gt;User password/FIDO2/PKCS#11 security token&lt;/td&gt;
&lt;td&gt;loopback file inside &lt;code&gt;/home&lt;/code&gt; partition&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This means there's only one root file system that contains all of
&lt;code&gt;/etc/&lt;/code&gt;, &lt;code&gt;/var/&lt;/code&gt; and &lt;code&gt;/usr/&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;Recovery Keys&lt;/h2&gt;
&lt;p&gt;When binding encryption to TPMs one problem that arises is what
strategy to adopt if the TPM is lost, due to hardware failure: if I
need the TPM to unlock my encrypted volume, what do I do if I need the
data but lost the TPM?&lt;/p&gt;
&lt;p&gt;The answer here is supporting recovery keys (this is similar to how
other OSes approach this). Recovery keys are pretty much the same
concept as passwords. The main difference being that they are computer
generated rather than user-chosen. Because of that they typically have
much higher entropy (which makes them more annoying to type in, i.e
you want to use them only when you must, not day-to-day). By having
higher entropy they are useful in combination with TPM, FIDO2 or
PKCS#11 based unlocking: unlike a combination with passwords they do
not compromise the higher strength of protection that
TPM/FIDO2/PKCS#11 based unlocking is supposed to provide.&lt;/p&gt;
&lt;p&gt;Current versions of
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html"&gt;systemd-cryptenroll(1)&lt;/a&gt;
implement a recovery key concept in an attempt to address this
problem. You may enroll any combination of TPM chips, PKCS#11 tokens,
FIDO2 tokens, recovery keys and passwords on the same LUKS
volume. When enrolling a recovery key it is generated and shown on
screen both in text form and as QR code you can scan off screen if you
like. The idea is write down/store this recovery key at a safe place so
that you can use it when you need it. Note that such recovery keys can
be entered wherever a LUKS password is requested, i.e. after
generation they behave pretty much the same as a regular password.&lt;/p&gt;
&lt;h2&gt;TPM PCR Brittleness&lt;/h2&gt;
&lt;p&gt;Locking devices to TPMs and enforcing a PCR policy with this
(i.e. configuring the TPM key to be unlockable only if certain PCRs
match certain values, and thus requiring the OS to be in a certain
state) brings a problem with it: TPM PCR brittleness. If the key you
want to unlock with the TPM requires the OS to be in a specific state
(i.e. that all OS components' hashes match certain expectations or
similar) then doing OS updates might have the affect of making your
key inaccessible: the OS updates will cause the code to change, and
thus the hashes of the code, and thus certain PCRs. (Thankfully, you
unrolled a recovery key, as described above, so this doesn't mean you
lost your data, right?).&lt;/p&gt;
&lt;p&gt;To address this I'd suggest three strategies:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Most importantly: don't actually use the TPM PCRs that contain code
   hashes. There are actually &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html#id-1.6.3.9.2.2"&gt;multiple PCRs
   defined&lt;/a&gt;,
   each containing measurements of different aspects of the boot
   process. My recommendation is to bind keys to PCR 7 only, a PCR
   that contains measurements of the UEFI SecureBoot certificate
   databases. Thus, the keys will remain accessible as long as these
   databases remain the same, and updates to code will not affect it
   (updates to the certificate databases will, and they do happen too,
   though hopefully much less frequent then code updates). Does this
   reduce security? Not much, no, because the code that's run is after
   all not just measured but also validated via code signatures, and
   those signatures are validated with the aforementioned certificate
   databases. Thus binding an encrypted TPM key to PCR 7 should
   enforce a similar level of trust in the boot/OS code as binding it
   to a PCR with hashes of specific versions of that code. i.e. using
   PCR 7 means you say "every code signed by these vendors is allowed
   to unlock my key" while using a PCR that contains code hashes means
   "only this exact version of my code may access my key".&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Use LUKS key management to enroll multiple versions of the TPM keys
   in relevant volumes, to support multiple versions of the OS code
   (or multiple versions of the certificate database, as discussed
   above). Specifically: whenever an update is done that might result
   changing the relevant PCRs, pre-calculate the new PCRs, and enroll
   them in an additional LUKS slot on the relevant volumes. This means
   that the unlocking keys tied to the TPM remain accessible in both
   states of the system. Eventually, once rebooted after the update,
   remove the old slots.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If these two strategies didn't work out (maybe because the
   OS/firmware was updated outside of OS control, or the update
   mechanism was aborted at the wrong time) and the TPM PCRs changed
   unexpectedly, and the user now needs to use their recovery key to
   get access to the OS back, let's handle this gracefully and
   automatically reenroll the current TPM PCRs at boot, after the
   recovery key checked out, so that for future boots everything is in
   order again.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Other approaches can work too: for example, some OSes simply remove
TPM PCR policy protection of disk encryption keys altogether
immediately before OS or firmware updates, and then reenable it right
after. Of course, this opens a time window where the key bound to the
TPM is much less protected than people might assume. I'd try to avoid
such a scheme if possible.&lt;/p&gt;
&lt;h2&gt;Anything Else?&lt;/h2&gt;
&lt;p&gt;So, given that we are talking about idealized systems: I personally
actually think the ideal OS would be much simpler, and thus more
secure than this:&lt;/p&gt;
&lt;p&gt;I'd try to ditch the Shim, and instead focus on enrolling the
distribution vendor keys directly in the UEFI firmware certificate
list. This is actually supported by all firmwares too. This has
various benefits: it's no longer necessary to bind everything to
Microsoft's root key, you can just enroll your own stuff and thus make
sure only what you want to trust is trusted and nothing else. To make
an approach like this easier, we have been working on doing automatic
enrollment of these keys from the &lt;code&gt;systemd-boot&lt;/code&gt; boot loader, see
&lt;a href="https://github.com/systemd/systemd/pull/20255"&gt;this work in progress for
details&lt;/a&gt;. This way the
Firmware will authenticate the boot loader/kernel/initrd without any
further component for this in place.&lt;/p&gt;
&lt;p&gt;I'd also not bother with a separate boot partition, and just use the
ESP for everything. The ESP is required anyway by the firmware, and is
good enough for storing the few files we need.&lt;/p&gt;
&lt;h2&gt;FAQ&lt;/h2&gt;
&lt;h3&gt;Can I implement all of this in my distribution today?&lt;/h3&gt;
&lt;p&gt;Probably not. While the big issues have mostly been addressed there's
a lot of integration work still missing. As you might have seen I
linked some PRs that haven't even been merged into our tree yet, and
definitely not been released yet or even entered the distributions.&lt;/p&gt;
&lt;h3&gt;Will this show up in Fedora/Debian/Ubuntu soon?&lt;/h3&gt;
&lt;p&gt;I don't know. I am making a proposal how these things might work, and
am working on getting various building blocks for this into
shape. What the distributions do is up to them. But even if they don't
follow the recommendations I make 100%, or don't want to use the
building blocks I propose I think it's important they start thinking
about this, and yes, I think they should be thinking about defaulting
to setups like this.&lt;/p&gt;
&lt;p&gt;Work for measuring/signing initrds on Fedora has been started,
&lt;a href="https://raw.githubusercontent.com/keszybz/mkosi-initrd-talk/main/mkosi-initrd.pdf"&gt;here's a slide deck with some information about
it&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt;But isn't a TPM evil?&lt;/h3&gt;
&lt;p&gt;Some corners of the community tried (unfortunately successfully to
some degree) to paint TPMs/Trusted Computing/SecureBoot as generally
evil technologies that stop us from using our systems the way we
want. That idea is rubbish though, I think. We should focus on what it
can deliver for us (and that's a lot I think, see above), and
appreciate the fact we can actually use it to kick out perceived evil
empires from our devices instead of being subjected to them. Yes, the
way SecureBoot/TPMs are defined puts &lt;em&gt;you&lt;/em&gt; in the driver seat if you
want — and you may enroll your own certificates to keep out everything
you don't like.&lt;/p&gt;
&lt;h3&gt;What if my system doesn't have a TPM?&lt;/h3&gt;
&lt;p&gt;TPMs are becoming quite ubiquitous, in particular as the upcoming
Windows versions will require them. In general I think we should focus
on modern, fully equipped systems when designing all this, and then
find fall-backs for more limited systems. Frankly it feels as if so
far the design approach for all this was the other way round: try to
make the new stuff work like the old rather than the old like the new
(I mean, to me it appears this thinking is the main raison d'être for
the Grub boot loader).&lt;/p&gt;
&lt;p&gt;More specifically, on the systems where we have no TPM we ultimately
cannot provide the same security guarantees as for those which
have. So depending on the resource to protect we should fall back to
different TPM-less mechanisms. For example, if we have no TPM then the
root file system should probably be encrypted with a user provided
password, typed in at boot as before. And for the encrypted boot
credentials we probably should simply not encrypt them, and place them
in the ESP unencrypted.&lt;/p&gt;
&lt;p&gt;Effectively this means: without TPM you'll still get protection regarding the
basic attack scenario, as before, but not the other two.&lt;/p&gt;
&lt;h3&gt;What if my system doesn't have UEFI?&lt;/h3&gt;
&lt;p&gt;Many of the mechanisms explained above taken individually do not
require UEFI. But of course the chain of trust suggested above requires
something like UEFI SecureBoot. If your system lacks UEFI it's
probably best to find work-alikes to the technologies suggested above,
but I doubt I'll be able to help you there.&lt;/p&gt;
&lt;h3&gt;rpm/dpkg already cryptographically validates all packages at installation time (&lt;code&gt;gpg&lt;/code&gt;), why would I need more than that?&lt;/h3&gt;
&lt;p&gt;This type of package validation happens once: at the moment of
installation (or update) of the package, but not anymore when the data
installed is actually used. Thus when an attacker manages to modify
the package data after installation and before use they can make any
change they like without this ever being noticed. Such package download
validation does address certain attack scenarios
(i.e. man-in-the-middle attacks on network downloads), but it doesn't
protect you from attackers with physical access, as described in the
attack scenarios above.&lt;/p&gt;
&lt;p&gt;Systems such as &lt;code&gt;ostree&lt;/code&gt; aren't better than rpm/dpkg regarding this
BTW, their data is not validated on use either, but only during
download or when processing tree checkouts.&lt;/p&gt;
&lt;p&gt;Key really here is that the scheme explained here provides &lt;em&gt;offline&lt;/em&gt;
protection for the data "at rest" — even someone with physical access
to your device cannot easily make changes that aren't noticed on next
use. rpm/dpkg/ostree provide &lt;em&gt;online&lt;/em&gt; protection only: as long as the
system remains up, and all OS changes are done through the intended
program code-paths, and no one has physical access everything should
be good. In today's world I am sure this is not good enough though. As
mentioned most modern OSes provide offline protection for the data at
rest in one way or another. Generic Linux distributions are terribly
behind on this.&lt;/p&gt;
&lt;h3&gt;This is all so desktop/laptop focused, what about servers?&lt;/h3&gt;
&lt;p&gt;I am pretty sure servers should provide similar security guarantees as
outlined above. In a way servers are a much simpler case: there are no
users and no interactivity. Thus the discussion of &lt;code&gt;/home/&lt;/code&gt; and what
it contains and of user passwords doesn't matter. However, the
authenticated initrd and the unattended TPM-based encryption I think
are very important for servers too, in a trusted data center
environment. It provides security guarantees so far not given by Linux
server OSes.&lt;/p&gt;
&lt;h3&gt;I'd like to help with this, or discuss/comment on this&lt;/h3&gt;
&lt;p&gt;Submit patches or reviews through
&lt;a href="https://github.com/systemd/systemd"&gt;GitHub&lt;/a&gt;. General discussion about
this is best done on the &lt;a href="https://lists.freedesktop.org/mailman/listinfo/systemd-devel"&gt;systemd mailing
list&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 23 Sep 2021 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2021-09-23:/blog/authenticated-boot-and-disk-encryption-on-linux.html</guid><category>projects</category></item><item><title>The Wondrous World of Discoverable GPT Disk Images</title><link>https://0pointer.net/blog/the-wondrous-world-of-discoverable-gpt-disk-images.html</link><description>&lt;p&gt;&lt;em&gt;TL;DR: Tag your GPT partitions with the right, descriptive partition
types, and the world will become a better place.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;A number of years ago we started the &lt;a href="https://systemd.io/DISCOVERABLE_PARTITIONS"&gt;Discoverable Partitions
Specification&lt;/a&gt; which
defines &lt;a href="https://en.wikipedia.org/wiki/GUID_Partition_Table"&gt;GPT&lt;/a&gt;
partition type UUIDs and partition flags for the various partitions
Linux systems typically deal with. Before the specification all Linux
partitions usually just used the same type, basically saying "Hey, I
am a Linux partition" and not much else. With this specification the
GPT partition type, flags and label system becomes a lot more
expressive, as it can tell you:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;What kind of data a partition contains (i.e. is this swap data, a file system or Verity data?)&lt;/li&gt;
&lt;li&gt;What the purpose/mount point of a partition is (i.e. is this a &lt;code&gt;/home/&lt;/code&gt; partition or a root file system?)&lt;/li&gt;
&lt;li&gt;What CPU architecture a partition is intended for (i.e. is this a root partition for x86-64 or for aarch64?)&lt;/li&gt;
&lt;li&gt;Shall this partition be mounted automatically? (i.e. without specifically be configured via &lt;code&gt;/etc/fstab&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;And if so, shall it be mounted read-only?&lt;/li&gt;
&lt;li&gt;And if so, shall the file system be grown to its enclosing partition size, if smaller?&lt;/li&gt;
&lt;li&gt;Which partition contains the newer version of the same data (i.e. multiple root file systems, with different versions)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;By embedding all of this information inside the GPT partition table
disk images become self-descriptive: without requiring any other
source of information (such as &lt;code&gt;/etc/fstab&lt;/code&gt;) if you look at a
compliant GPT disk image it is clear how an image is put together and
how it should be used and mounted. This self-descriptiveness in
particular breaks one philosophical weirdness of traditional Linux
installations: the original source of information which file system
the root file system is, typically is embedded in the root file system
itself, in &lt;code&gt;/etc/fstab&lt;/code&gt;. Thus, in a way, in order to know what the
root file system is you need to know what the root file system is. 🤯
🤯 🤯&lt;/p&gt;
&lt;p&gt;(Of course, the way this recursion is traditionally broken up is by
then copying the root file system information from &lt;code&gt;/etc/fstab&lt;/code&gt; into
the boot loader configuration, resulting in a situation where the
primary source of information for this — i.e. &lt;code&gt;/etc/fstab&lt;/code&gt; — is
actually mostly irrelevant, and the secondary source — i.e. the copy
in the boot loader — becomes the configuration that actually matters.)&lt;/p&gt;
&lt;p&gt;Today, the GPT partition type UUIDs defined by the specification have
been adopted quite widely, by distributions and their installers, as
well as a variety of partitioning tools and other tools.&lt;/p&gt;
&lt;p&gt;In this article I want to highlight how the various tools the
&lt;a href="https://systemd.io/"&gt;systemd&lt;/a&gt; project provides make use of the
concepts the specification introduces.&lt;/p&gt;
&lt;p&gt;But before we start with that, let's underline why tagging partitions
with these descriptive partition type UUIDs (and the associated
partition flags) is a good thing, besides the philosophical points
made above.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Simplicity: in particular OS installers become simpler — adjusting
   &lt;code&gt;/etc/fstab&lt;/code&gt; as part of the installation is not necessary anymore,
   as the partitioning step already put all information into place for
   assembling the system properly at boot. i.e. installing doesn't
   mean that you always have to get &lt;code&gt;fdisk&lt;/code&gt; &lt;em&gt;and&lt;/em&gt; &lt;code&gt;/etc/fstab&lt;/code&gt; into
   place, the former suffices entirely.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Robustness: since partition tables mostly remain static after
   installation the chance of corruption is much lower than if the
   data is stored in file systems (e.g. in &lt;code&gt;/etc/fstab&lt;/code&gt;). Moreover by
   associating the metadata directly with the objects it describes the
   chance of things getting out of sync is reduced. (i.e. if you lose
   &lt;code&gt;/etc/fstab&lt;/code&gt;, or forget to rerun your initrd builder you still know
   what a partition is supposed to be just by looking at it.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Programmability: if partitions are self-descriptive it's much
   easier to automatically process them with various tools. In fact,
   this blog story is mostly about that: various systemd tools can
   naturally process disk images prepared like this.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Alternative entry points: on traditional disk images, the boot
   loader needs to be told which kernel command line option &lt;code&gt;root=&lt;/code&gt; to
   use, which then provides access to the root file system, where
   &lt;code&gt;/etc/fstab&lt;/code&gt; is then found which describes the rest of the file
   systems. Where precisely &lt;code&gt;root=&lt;/code&gt; is configured for the boot loader
   highly depends on the boot loader and distribution used, and is
   typically encoded in a Turing complete programming language
   (Grub…). This makes it very hard to automatically determine the
   right root file system to use, to implement alternative entry points
   to the system. By alternative entry points I mean other ways to boot
   the disk image, specifically for running it as a &lt;code&gt;systemd-nspawn&lt;/code&gt;
   container — but this extends to other mechanisms where the boot
   loader may be bypassed to boot up the system, for example &lt;code&gt;qemu&lt;/code&gt;
   when configured without a boot loader.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;User friendliness: it's simply a lot nicer for the user looking at
   a partition table if the partition table explains what is what,
   instead of just saying "Hey, this is a Linux partition!" and
   nothing else.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Uses for the concept&lt;/h1&gt;
&lt;p&gt;Now that we cleared up the Why?, lets have a closer look how this is
currently used and exposed in &lt;code&gt;systemd&lt;/code&gt;'s various components.&lt;/p&gt;
&lt;h2&gt;Use #1: Running a disk image in a container&lt;/h2&gt;
&lt;p&gt;If a disk image follows the Discoverable Partition Specification then
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"&gt;&lt;code&gt;systemd-nspawn&lt;/code&gt;&lt;/a&gt;
has all it needs to just boot it up. Specifically, if you have a GPT
disk image in a file &lt;code&gt;foobar.raw&lt;/code&gt; and you want to boot it up in a
container, just run &lt;code&gt;systemd-nspawn -i foobar.raw -b&lt;/code&gt;, and that's it
(you can specify a block device like &lt;code&gt;/dev/sdb&lt;/code&gt; too if you like). It
becomes easy and natural to prepare disk images that can be booted
either on a physical machine, inside a virtual machine manager or
inside such a container manager: the necessary meta-information is
included in the image, easily accessible before actually looking into
its file systems.&lt;/p&gt;
&lt;h2&gt;Use #2: Booting an OS image on bare-metal without &lt;code&gt;/etc/fstab&lt;/code&gt; or kernel command line &lt;code&gt;root=&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If a disk image follows the specification in many cases you can remove
&lt;code&gt;/etc/fstab&lt;/code&gt; (or never even install it) — as the basic information
needed is already included in the partition table. The
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-gpt-auto-generator.html"&gt;&lt;code&gt;systemd-gpt-auto-generator&lt;/code&gt;&lt;/a&gt;
logic implements automatic discovery of the root file system as well
as all auxiliary file systems. (Note that the former requires an
initrd that uses systemd, some more conservative distributions do not
support that yet, unfortunately). Effectively this means you can boot
up a kernel/initrd with an entirely empty kernel command line, and the
initrd will automatically find the root file system (by looking for a
suitably marked partition on the same drive the EFI System Partition
was found on).&lt;/p&gt;
&lt;p&gt;(Note, if &lt;code&gt;/etc/fstab&lt;/code&gt; or &lt;code&gt;root=&lt;/code&gt; exist and contain relevant
information they always takes precedence over the automatic logic. This
is in particular useful to tweaks thing by specifying additional mount
options and such.)&lt;/p&gt;
&lt;h2&gt;Use #3: Mounting a complex disk image for introspection or manipulation&lt;/h2&gt;
&lt;p&gt;The
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-dissect.html"&gt;&lt;code&gt;systemd-dissect&lt;/code&gt;&lt;/a&gt;
tool may be used to introspect and manipulate OS disk images that
implement the specification. If you pass the path to a disk image (or
block device) it will extract various bits of useful information from
the image (e.g. what OS is this? what partitions to mount?) and display it.&lt;/p&gt;
&lt;p&gt;With the &lt;code&gt;--mount&lt;/code&gt; switch a disk image (or block device) can be
mounted to some location. This is useful for looking what is inside
it, or changing its contents. This will dissect the image and then
automatically mount all contained file systems matching their GPT
partition description to the right places, so that you subsequently
could &lt;code&gt;chroot&lt;/code&gt; into it. (But why &lt;code&gt;chroot&lt;/code&gt; if you can just use &lt;code&gt;systemd-nspawn&lt;/code&gt;? 😎)&lt;/p&gt;
&lt;h2&gt;Use #4: Copying files in and out of a disk image&lt;/h2&gt;
&lt;p&gt;The
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-dissect.html"&gt;&lt;code&gt;systemd-dissect&lt;/code&gt;&lt;/a&gt;
tool also has two switches &lt;code&gt;--copy-from&lt;/code&gt; and &lt;code&gt;--copy-to&lt;/code&gt; which allow
copying files out of or into a compliant disk image, taking all
included file systems and the resulting mount hierarchy into account.&lt;/p&gt;
&lt;h2&gt;Use #5: Running services directly off a disk image&lt;/h2&gt;
&lt;p&gt;The
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootImage="&gt;&lt;code&gt;RootImage=&lt;/code&gt;&lt;/a&gt;
setting in service unit files accepts paths to compliant disk images
(or block device nodes), and can mount them automatically, running
service binaries directly off them (in &lt;code&gt;chroot()&lt;/code&gt; style). In fact,
this is the base for the &lt;a href="https://systemd.io/PORTABLE_SERVICES"&gt;Portable
Service&lt;/a&gt; concept of systemd.&lt;/p&gt;
&lt;h2&gt;Use #6: Provisioning disk images&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;systemd&lt;/code&gt; provides various tools that can run operations provisioning
disk images in an "offline" mode. Specifically:&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;systemd-tmpfiles&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;With the &lt;code&gt;--image=&lt;/code&gt; switch
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"&gt;&lt;code&gt;systemd-tmpfiles&lt;/code&gt;&lt;/a&gt;
can directly operate on a disk image, and for example create all
directories and other inodes defined in its declarative configuration
files included in the image. This can be useful for example to set up
the &lt;code&gt;/var/&lt;/code&gt; or &lt;code&gt;/etc/&lt;/code&gt; tree according to such configuration before
first boot.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;systemd-sysusers&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;Similar, the &lt;code&gt;--image=&lt;/code&gt; switch of
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-sysusers.html"&gt;&lt;code&gt;systemd-sysusers&lt;/code&gt;&lt;/a&gt;
tells the tool to read the declarative system user specifications
included in the image and synthesizes system users from it, writing
them to the &lt;code&gt;/etc/passwd&lt;/code&gt; (and related) files in the image. This is
useful for provisioning these users before the first boot, for example
to ensure UID/GID numbers are pre-allocated, and such allocations not
delayed until first boot.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;systemd-machine-id-setup&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;--image=&lt;/code&gt; switch of
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-machine-id-setup.html"&gt;&lt;code&gt;systemd-machine-id-setup&lt;/code&gt;&lt;/a&gt;
may be used to provision a fresh machine ID into
&lt;a href="https://www.freedesktop.org/software/systemd/man/machine-id.html"&gt;&lt;code&gt;/etc/machine-id&lt;/code&gt;&lt;/a&gt;
of a disk image, before first boot.&lt;/p&gt;
&lt;h3&gt;&lt;code&gt;systemd-firstboot&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;--image=&lt;/code&gt; switch of
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-firstboot.html"&gt;&lt;code&gt;systemd-firstboot&lt;/code&gt;&lt;/a&gt;
may be used to set various basic system setting (such as root
password, locale information, hostname, …) on the specified disk
image, before booting it up.&lt;/p&gt;
&lt;h2&gt;Use #7: Extracting log information&lt;/h2&gt;
&lt;p&gt;The
&lt;a href="https://www.freedesktop.org/software/systemd/man/journalctl.html"&gt;&lt;code&gt;journalctl&lt;/code&gt;&lt;/a&gt;
switch &lt;code&gt;--image=&lt;/code&gt; may be used to show the journal log data included in
a disk image (or, as usual, the specified block device). This is very
useful for analyzing failed systems offline, as it gives direct access
to the logs without any further, manual analysis.&lt;/p&gt;
&lt;h2&gt;Use #8: Automatic repartitioning/growing of file systems&lt;/h2&gt;
&lt;p&gt;The
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-repart.html"&gt;&lt;code&gt;systemd-repart&lt;/code&gt;&lt;/a&gt;
tool may be used to repartition a disk or image in an declarative and
additive way. One primary use-case for it is to run during boot on
physical or VM systems to grow the root file system to the disk size,
or to add in, format, encrypt, populate additional partitions at boot.&lt;/p&gt;
&lt;p&gt;With its &lt;code&gt;--image=&lt;/code&gt; switch it the tool may operate on compliant disk
images in &lt;em&gt;offline&lt;/em&gt; mode of operation: it will then read the partition
definitions that shall be grown or created off the image itself, and
then apply them to the image. This is particularly useful in
combination with the &lt;code&gt;--size=&lt;/code&gt; which allows growing disk images to the
specified size.&lt;/p&gt;
&lt;p&gt;Specifically, consider the following work-flow: you download a
minimized disk image &lt;code&gt;foobar.raw&lt;/code&gt; that contains only the minimized
root file system (and maybe an ESP, if you want to boot it on
bare-metal, too). You then run &lt;code&gt;systemd-repart --image=foo.raw
--size=15G&lt;/code&gt; to enlarge the image to the 15G, based on the declarative
rules defined in the
&lt;a href="https://www.freedesktop.org/software/systemd/man/repart.d.html"&gt;&lt;code&gt;repart.d/&lt;/code&gt;&lt;/a&gt;
drop-in files included in the image (this means this can grow the root
partition, and/or add in more partitions, for example for &lt;code&gt;/srv&lt;/code&gt; or
so, maybe encrypted with a locally generated key or so). Then, you
proceed to boot it up with &lt;code&gt;systemd-nspawn --image=foo.raw -b&lt;/code&gt;, making
use of the full 15G.&lt;/p&gt;
&lt;h1&gt;Versioning + Multi-Arch&lt;/h1&gt;
&lt;p&gt;Disk images implementing this specifications can carry OS executables in one of three ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Only a root file system&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Only a &lt;code&gt;/usr/&lt;/code&gt; file system (in which case the root file system is automatically picked as &lt;code&gt;tmpfs&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Both a root and a &lt;code&gt;/usr/&lt;/code&gt;file system (in which case the two are
   combined, the &lt;code&gt;/usr/&lt;/code&gt; file system mounted into the root file system,
   and the former possibly in read-only fashion`)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;They may also contain OS executables for different architectures,
permitting "multi-arch" disk images that can safely boot up on
multiple CPU architectures. As the root and &lt;code&gt;/usr/&lt;/code&gt; partition type
UUIDs are specific to architectures this is easily done by including
one such partition for &lt;code&gt;x86-64&lt;/code&gt;, and another for &lt;code&gt;aarch64&lt;/code&gt;. If the
image is now used on an &lt;code&gt;x86-64&lt;/code&gt; system automatically the former
partition is used, on &lt;code&gt;aarch64&lt;/code&gt; the latter.&lt;/p&gt;
&lt;p&gt;Moreover, these OS executables may be contained in different versions,
to implement a simple versioning scheme: when tools such as
&lt;code&gt;systemd-nspawn&lt;/code&gt; or &lt;code&gt;systemd-gpt-auto-generator&lt;/code&gt; dissect a disk image,
and they find two or more root or &lt;code&gt;/usr/&lt;/code&gt; partitions of the same type
UUID, they will automatically pick the one whose GPT partition label
(a 36 character free-form string every GPT partition may have) is the
newest according to
&lt;a href="https://man7.org/linux/man-pages/man3/strverscmp.3.html"&gt;&lt;code&gt;strverscmp()&lt;/code&gt;&lt;/a&gt;
(OK, truth be told, we don't use &lt;code&gt;strverscmp()&lt;/code&gt; as-is, but a modified
version with some more modern syntax and semantics, but conceptually
identical).&lt;/p&gt;
&lt;p&gt;This logic allows to implement a very simple and natural A/B update
scheme: an updater can drop multiple versions of the OS into separate
root or &lt;code&gt;/usr/&lt;/code&gt; partitions, always updating the partition label to the
version included there-in once the download is complete. All of the
tools described here will then honour this, and always automatically
pick the newest version of the OS.&lt;/p&gt;
&lt;h1&gt;Verity&lt;/h1&gt;
&lt;p&gt;When building modern OS appliances, security is highly
relevant. Specifically, &lt;em&gt;offline&lt;/em&gt; security matters: an attacker with
physical access should have a difficult time modifying the OS in a way
that isn't noticed. i.e. think of a car or a cell network base
station: these appliances are usually parked/deployed in environments
attackers can get physical access to: it's essential that in this case
the OS itself sufficiently protected, so that the attacker cannot just
mount the OS file system image, make modifications (inserting a
backdoor, spying software or similar) and the system otherwise
continues to run without this being immediately detected.&lt;/p&gt;
&lt;p&gt;A great way to implement offline security is via Linux' &lt;code&gt;dm-verity&lt;/code&gt;
subsystem: it allows to securely bind immutable disk IO to a single,
short trusted hash value: if an attacker manages to offline modify the
disk image the modified disk image won't match the trusted hash
anymore, and will not be trusted anymore (depending on policy this
then just result in IO errors being generated, or automatic
reboot/power-off).&lt;/p&gt;
&lt;p&gt;The Discoverable Partitions Specification declares how to include
Verity validation data in disk images, and how to relate them to the file
systems they protect, thus making if very easy to deploy and work with
such protected images. For example &lt;code&gt;systemd-nspawn&lt;/code&gt; supports a
&lt;code&gt;--root-hash=&lt;/code&gt; switch, which accepts the Verity root hash and then
will automatically assemble &lt;code&gt;dm-verity&lt;/code&gt; with this, automatically
matching up the payload and verity partitions. (Alternatively, just
place a &lt;code&gt;.roothash&lt;/code&gt; file next to the image file).&lt;/p&gt;
&lt;h1&gt;Future&lt;/h1&gt;
&lt;p&gt;The above already is a powerful tool set for working with disk
images. However, there are some more areas I'd like to extend this
logic to:&lt;/p&gt;
&lt;h2&gt;&lt;code&gt;bootctl&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Similar to the other tools mentioned above,
&lt;a href="https://www.freedesktop.org/software/systemd/man/bootctl.html"&gt;&lt;code&gt;bootctl&lt;/code&gt;&lt;/a&gt;
(which is a tool to interface with the boot loader, and install/update
systemd's own EFI boot loader
&lt;a href="https://www.freedesktop.org/software/systemd/man/bootctl.html"&gt;&lt;code&gt;sd-boot&lt;/code&gt;&lt;/a&gt;)
should learn a &lt;code&gt;--image=&lt;/code&gt; switch, to make installation of the boot
loader on disk images easy and natural. It would automatically find
the ESP and other relevant partitions in the image, and copy the boot
loader binaries into them (or update them).&lt;/p&gt;
&lt;h2&gt;&lt;code&gt;coredumpctl&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Similar to the existing &lt;code&gt;journalctl --image=&lt;/code&gt; logic the &lt;code&gt;coredumpctl&lt;/code&gt;
tool should also gain an &lt;code&gt;--image=&lt;/code&gt; switch for extracting coredumps
from compliant disk images. The combination of &lt;code&gt;journalctl --image=&lt;/code&gt;
and &lt;code&gt;coredumpctl --image=&lt;/code&gt; would make it exceptionally easy to work
with OS disk images of appliances and extracting logging and debugging
information from them after failures.&lt;/p&gt;
&lt;p&gt;And that's all for now. Please refer to the specification and the man
pages for further details. If your distribution's installer does not
yet tag the GPT partition it creates with the right GPT type UUIDs,
consider asking them to do so.&lt;/p&gt;
&lt;p&gt;Thank you for your time.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 11 Jun 2021 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2021-06-11:/blog/the-wondrous-world-of-discoverable-gpt-disk-images.html</guid><category>projects</category></item><item><title>File Descriptor Limits</title><link>https://0pointer.net/blog/file-descriptor-limits.html</link><description>&lt;p&gt;&lt;em&gt;TL;DR: don't use &lt;code&gt;select()&lt;/code&gt; + bump the &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; soft limit to
the hard limit in your modern programs.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The primary way to reference, allocate and pin runtime OS resources on
Linux today are file descriptors ("fds"). Originally they were used to
reference open files and directories and maybe a bit more, but today
they may be used to reference almost any kind of runtime resource in
Linux userspace, including open devices, memory
(&lt;a href="https://man7.org/linux/man-pages/man2/memfd_create.2.html"&gt;&lt;code&gt;memfd_create(2)&lt;/code&gt;&lt;/a&gt;),
timers
(&lt;a href="https://man7.org/linux/man-pages/man2/timerfd_create.2.html"&gt;&lt;code&gt;timefd_create(2)&lt;/code&gt;&lt;/a&gt;)
and even processes (with the new
&lt;a href="https://man7.org/linux/man-pages/man2/pidfd_open.2.html"&gt;&lt;code&gt;pidfd_open(2)&lt;/code&gt;&lt;/a&gt;
system call). In a way, the philosophically skewed UNIX concept of
"everything is a file" through the proliferation of fds actually
acquires a bit of sensible meaning: "everything &lt;em&gt;has&lt;/em&gt; a file
&lt;em&gt;descriptor&lt;/em&gt;" is certainly a much better motto to adopt.&lt;/p&gt;
&lt;p&gt;Because of this proliferation of fds, non-trivial modern programs tend
to have to deal with substantially more fds at the same time than they
traditionally did. Today, you'll often encounter real-life programs
that have a few thousand fds open at the same time.&lt;/p&gt;
&lt;p&gt;Like on most runtime resources on Linux limits are enforced on file
descriptors: once you hit the resource limit configured via
&lt;a href="https://man7.org/linux/man-pages/man2/getrlimit.2.html"&gt;&lt;code&gt;RLIMIT_NOFILE&lt;/code&gt;&lt;/a&gt;
any attempt to allocate more is refused with the &lt;code&gt;EMFILE&lt;/code&gt; error —
until you close a couple of those you already have open.&lt;/p&gt;
&lt;p&gt;Because fds weren't such a universal concept traditionally, the limit
of &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; used to be quite low. Specifically, when the Linux
kernel first invokes userspace it still sets &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; to a low
value of 1024 (soft) and 4096 (hard). (Quick explanation: the &lt;em&gt;soft&lt;/em&gt;
limit is what matters and causes the &lt;code&gt;EMFILE&lt;/code&gt; issues, the &lt;em&gt;hard&lt;/em&gt; limit
is a secondary limit that processes may bump their soft limit to — if
they like — without requiring further privileges to do so. Bumping the
limit further would require privileges however.). A limit of 1024 fds
made fds a &lt;em&gt;scarce&lt;/em&gt; resource: APIs tried to be careful with using fds,
since you simply couldn't have that many of them at the same
time. This resulted in some questionable coding decisions and
concepts at various places: often secondary descriptors that are very
similar to fds — but were not actually fds — were introduced
(e.g. inotify watch descriptors), simply to avoid for them the low
limits enforced on true fds. Or code tried to aggressively close fds
when not absolutely needing them (e.g. &lt;code&gt;ftw()&lt;/code&gt;/&lt;code&gt;nftw()&lt;/code&gt;), losing the
nice + stable "pinning" effect of open fds.&lt;/p&gt;
&lt;p&gt;Worse though is that certain OS level APIs were designed having only
the low limits in mind. The worst offender being the BSD/POSIX
&lt;a href="https://man7.org/linux/man-pages/man2/select.2.html"&gt;&lt;code&gt;select(2)&lt;/code&gt;&lt;/a&gt;
system call: it only works with fds in the numeric range of 0…1023
(aka &lt;code&gt;FD_SETSIZE&lt;/code&gt;-1). If you have an fd outside of this range, tough
luck: select() won't work, and only if you are lucky you'll detect
that and can handle it somehow.&lt;/p&gt;
&lt;p&gt;Linux fds are exposed as simple integers, and for most calls it is
guaranteed that the lowest unused integer is allocated for new
fds. Thus, as long as the &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; soft limit is set to 1024
everything remains compatible with &lt;code&gt;select()&lt;/code&gt;: the resulting fds will
also be below 1024. Yay. If we'd bump the soft limit above this
threshold though and at some point in time an fd higher than the
threshold is allocated, this fd would not be compatible with
&lt;code&gt;select()&lt;/code&gt; anymore.&lt;/p&gt;
&lt;p&gt;Because of that, indiscriminately increasing the soft &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt;
resource limit today for every userspace process is problematic: as
long as there's userspace code still using &lt;code&gt;select()&lt;/code&gt; doing so will
risk triggering hard-to-handle, hard-to-debug errors all over the
place.&lt;/p&gt;
&lt;p&gt;However, given the nowadays ubiquitous use of fds for all
kinds of resources (did you know, an eBPF program is an fd? and a
cgroup too? and attaching an eBPF program to cgroup is another fd? …),
we'd really like to raise the limit anyway. 🤔&lt;/p&gt;
&lt;p&gt;So before we continue thinking about this problem, let's make the
problem more complex (…uh, I mean… "more exciting") first. Having just
one hard and one soft per-process limit on fds is boring. Let's add
more limits on fds to the mix. Specifically on Linux there are two
system-wide sysctls: &lt;code&gt;fs.nr_open&lt;/code&gt; and &lt;code&gt;fs.file-max&lt;/code&gt;. (Don't ask me why
one uses a dash and the other an underscore, or why there are two of
them...) On today's kernels they kinda lost their relevance. They had
some originally, because fds weren't accounted by any other
counter. But today, the kernel tracks fds mostly as small pieces of
memory allocated on userspace requests — because that's ultimately
what they are —, and thus charges them to the memory accounting done
anyway.&lt;/p&gt;
&lt;p&gt;So now, we have four limits (actually: five if you count the memory
accounting) on the same kind of resource, and all of them make a
resource artificially scarce that we don't want to be scarce. So what
to do?&lt;/p&gt;
&lt;p&gt;Back in systemd v240 already (i.e. 2019) we decided to do something
about it. Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Automatically at boot we'll now bump the two sysctls to their
  maximum, making them effectively ineffective. This one was easy. We
  got rid of two pretty much redundant knobs. Nice!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; hard limit is bumped substantially to 512K. Yay,
  cheap fds! &lt;em&gt;You&lt;/em&gt; may have an fd, and &lt;em&gt;you&lt;/em&gt;, and &lt;em&gt;you&lt;/em&gt; as well,
  &lt;em&gt;everyone&lt;/em&gt; may have an fd!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;But … we left the soft &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; limit at 1024. We weren't
  quite ready to break all programs still using &lt;code&gt;select()&lt;/code&gt; in 2019
  yet. But it's not as bad as it might sound I think: given the hard
  limit is bumped every program can easily opt-in to a larger number
  of fds, by setting the soft limit to the hard limit early on —
  without requiring privileges.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So effectively, with this approach fds should be much less scarce (at
least for programs that opt into that), and the limits should be much
easier to configure, since there are only two knobs now one really
needs to care about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Configure the &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; hard limit to the maximum number of
  fds you actually want to allow a process.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the program code then either bump the soft to the hard limit, or
  not. If you do, you basically declare "I understood the problem, I
  promise to not use &lt;code&gt;select()&lt;/code&gt;, drown me fds please!". If you don't
  then effectively everything remains as it always was.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Apparently this approach worked, since the negative feedback on change
was even scarcer than fds traditionally were (ha, fun!). We got
reports from pretty much only two projects that were bitten by the
change (one being a JVM implementation): they already bumped their
soft limit automatically to their hard limit during program
initialization, and then allocated an array with one entry per
possible fd. With the new high limit this resulted in one massive
allocation that traditionally was just a few K, and this caused memory
checks to be hit.&lt;/p&gt;
&lt;p&gt;Anyway, here's the take away of this blog story:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Don't use &lt;code&gt;select()&lt;/code&gt; anymore in 2021. Use &lt;code&gt;poll()&lt;/code&gt;, &lt;code&gt;epoll&lt;/code&gt;,
  &lt;code&gt;iouring&lt;/code&gt;, …, but for heaven's sake don't use &lt;code&gt;select()&lt;/code&gt;. It might
  have been all the rage in the 1990s but it doesn't scale and is
  simply not designed for today's programs. I wished the man page of
  &lt;code&gt;select()&lt;/code&gt; would make clearer how icky it is and that there are
  plenty of more preferably APIs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you hack on a program that potentially uses a lot of fds, add
  &lt;a href="https://github.com/systemd/systemd/blob/e7901aba1480db21e06e21cef4f6486ad71b2ec5/src/basic/rlimit-util.c#L373"&gt;some simple
  code&lt;/a&gt;
  somewhere to its start-up that bumps the &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; soft limit
  to the hard limit. But if you do this, you have to make sure your
  code (and any code that you link to from it) refrains from using
  &lt;code&gt;select()&lt;/code&gt;. (Note: there's at least one glibc NSS plugin using
  &lt;code&gt;select()&lt;/code&gt; internally. Given that NSS modules can end up being
  loaded into pretty much &lt;em&gt;any&lt;/em&gt; process such modules should probably
  be considered just buggy.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If said program you hack on forks off foreign programs, make sure to
  reset the &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; soft limit &lt;a href="https://github.com/systemd/systemd/blob/e7901aba1480db21e06e21cef4f6486ad71b2ec5/src/basic/rlimit-util.c#L394"&gt;back to
  1024&lt;/a&gt;
  for them. Just because your program might be fine with fds &amp;gt;= 1024
  it doesn't mean that those foreign programs might. And unfortunately
  &lt;code&gt;RLIMIT_NOFILE&lt;/code&gt; is inherited down the process tree unless explicitly
  set.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And that's all I have for today. I hope this was enlightening.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 19 May 2021 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2021-05-19:/blog/file-descriptor-limits.html</guid><category>projects</category></item><item><title>Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248</title><link>https://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html</link><description>&lt;p&gt;&lt;em&gt;TL;DR: It's now easy to unlock your LUKS2 volume with a FIDO2
security token (e.g. YubiKey, Nitrokey FIDO2, AuthenTrend
ATKey.Pro). And TPM2 unlocking is easy now too.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Blogging is a lot of work, and a lot less fun than hacking. I mostly
focus on the latter because of that, but from time to time I guess
stuff is just too interesting to not be blogged about. Hence here,
finally, another blog story about exciting new features in systemd.&lt;/p&gt;
&lt;p&gt;With the upcoming systemd v248 the
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptsetup@.service.html"&gt;&lt;code&gt;systemd-cryptsetup&lt;/code&gt;&lt;/a&gt;
component of systemd (which is responsible for assembling encrypted
volumes during boot) gained direct support for unlocking encrypted
storage with three types of security hardware:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Unlocking with FIDO2 security tokens (well, at least with those
   which implement the &lt;code&gt;hmac-secret&lt;/code&gt; extension; most do). i.e. your
   YubiKeys (series 5 and above), Nitrokey FIDO2, AuthenTrend
   ATKey.Pro and such.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unlocking with TPM2 security chips (pretty ubiquitous on non-budget
   PCs/laptops/…)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unlocking with PKCS#11 security tokens, i.e. your smartcards and
   older YubiKeys (the ones that implement PIV). (Strictly speaking
   this was supported on older systemd already, but was a lot more
   "manual".)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For completeness' sake, let's keep in mind that the component also
allows unlocking with these more traditional mechanisms:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Unlocking interactively with a user-entered passphrase (i.e. the
   way most people probably already deploy it, supported since
   about forever)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unlocking via key file on disk (optionally on removable media
   plugged in at boot), supported since forever.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unlocking via a key acquired through trivial
   &lt;code&gt;AF_UNIX&lt;/code&gt;/&lt;code&gt;SOCK_STREAM&lt;/code&gt; socket IPC. (Also new in v248)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unlocking via &lt;em&gt;recovery&lt;/em&gt; &lt;em&gt;keys&lt;/em&gt;. These are pretty much the same
   thing as a regular passphrase (and in fact can be entered wherever
   a passphrase is requested) — the main difference being that they
   are always generated by the computer, and thus have guaranteed high
   entropy, typically higher than user-chosen passphrases. They are
   generated in a way they are easy to type, in many cases even if the
   local key map is misconfigured. (Also new in v248)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this blog story, let's focus on the first three items, i.e. those
that talk to specific types of hardware for implementing unlocking.&lt;/p&gt;
&lt;p&gt;To make working with security tokens and TPM2 easy, a new, small tool
was added to the systemd tool set:
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-cryptenroll.html"&gt;systemd-cryptenroll&lt;/a&gt;. It's
only purpose is to make it easy to enroll your security token/chip of
choice into an encrypted volume. It works with any LUKS2 volume, and
embeds a tiny bit of meta-information into the LUKS2 header with
parameters necessary for the unlock operation.&lt;/p&gt;
&lt;h1&gt;Unlocking with FIDO2&lt;/h1&gt;
&lt;p&gt;So, let's see how this fits together in the FIDO2 case. Most likely
this is what you want to use if you have one of these fancy FIDO2 tokens
(which need to implement the &lt;code&gt;hmac-secret&lt;/code&gt; extension, as
mentioned). Let's say you already have your LUKS2 volume set up, and
previously unlocked it with a simple passphrase. Plug in your token,
and run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemd-cryptenroll&lt;span class="w"&gt; &lt;/span&gt;--fido2-device&lt;span class="o"&gt;=&lt;/span&gt;auto&lt;span class="w"&gt; &lt;/span&gt;/dev/sda5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(Replace &lt;code&gt;/dev/sda5&lt;/code&gt; with the underlying block device of your volume).&lt;/p&gt;
&lt;p&gt;This will enroll the key as an additional way to unlock the volume,
and embeds all necessary information for it in the LUKS2 volume
header. Before we can unlock the volume with this at boot, we need to
allow FIDO2 unlocking via
&lt;a href="https://www.freedesktop.org/software/systemd/man/crypttab.html"&gt;&lt;code&gt;/etc/crypttab&lt;/code&gt;&lt;/a&gt;. For
that, find the right entry for your volume in that file, and edit it
like so:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;myvolume /dev/sda5 - fido2-device=auto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Replace &lt;code&gt;myvolume&lt;/code&gt; and &lt;code&gt;/dev/sda5&lt;/code&gt; with the right volume name, and
underlying device of course. Key here is the &lt;code&gt;fido2-device=auto&lt;/code&gt;
option you need to add to the fourth column in the file. It tells
&lt;code&gt;systemd-cryptsetup&lt;/code&gt; to use the FIDO2 metadata now embedded in the
LUKS2 header, wait for the FIDO2 token to be plugged in at boot
(utilizing &lt;code&gt;systemd-udevd&lt;/code&gt;, …) and unlock the volume with it.&lt;/p&gt;
&lt;p&gt;And that's it already. Easy-peasy, no?&lt;/p&gt;
&lt;p&gt;Note that all of this doesn't modify the FIDO2 token itself in any
way. Moreover you can enroll the same token in as many volumes as you
like. Since all enrollment information is stored in the LUKS2 header
(and not on the token) there are no bounds on any of this. (OK, well,
admittedly, there's a cap on LUKS2 key slots per volume, i.e. you
can't enroll more than a bunch of keys per volume.)&lt;/p&gt;
&lt;h1&gt;Unlocking with PKCS#11&lt;/h1&gt;
&lt;p&gt;Let's now have a closer look how the same works with a PKCS#11
compatible security token or smartcard. For this to work, you need a
device that can store an RSA key pair. I figure most security
tokens/smartcards that implement PIV qualify. How you actually get the
keys onto the device might differ though. Here's how you do this for
any YubiKey that implements the PIV feature:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;ykman&lt;span class="w"&gt; &lt;/span&gt;piv&lt;span class="w"&gt; &lt;/span&gt;reset
&lt;span class="gp"&gt;# &lt;/span&gt;ykman&lt;span class="w"&gt; &lt;/span&gt;piv&lt;span class="w"&gt; &lt;/span&gt;generate-key&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;RSA2048&lt;span class="w"&gt; &lt;/span&gt;9d&lt;span class="w"&gt; &lt;/span&gt;pubkey.pem
&lt;span class="gp"&gt;# &lt;/span&gt;ykman&lt;span class="w"&gt; &lt;/span&gt;piv&lt;span class="w"&gt; &lt;/span&gt;generate-certificate&lt;span class="w"&gt; &lt;/span&gt;--subject&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Knobelei&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;9d&lt;span class="w"&gt; &lt;/span&gt;pubkey.pem
&lt;span class="gp"&gt;# &lt;/span&gt;rm&lt;span class="w"&gt; &lt;/span&gt;pubkey.pem
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(This chain of commands erases what was stored in PIV feature of your
token before, be careful!)&lt;/p&gt;
&lt;p&gt;For tokens/smartcards from other vendors a different series of
commands might work. Once you have a key pair on it, you can enroll it
with a LUKS2 volume like so:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemd-cryptenroll&lt;span class="w"&gt; &lt;/span&gt;--pkcs11-token-uri&lt;span class="o"&gt;=&lt;/span&gt;auto&lt;span class="w"&gt; &lt;/span&gt;/dev/sda5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Just like the same command's invocation in the FIDO2 case this enrolls
the security token as an additional way to unlock the volume, any
passphrases you already have enrolled remain enrolled.&lt;/p&gt;
&lt;p&gt;For the PKCS#11 case you need to edit your &lt;code&gt;/etc/crypttab&lt;/code&gt; entry like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;myvolume /dev/sda5 - pkcs11-uri=auto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you have a security token that implements both PKCS#11 PIV and
FIDO2 I'd probably enroll it as FIDO2 device, given it's the more
contemporary, future-proof standard. Moreover, it requires no special
preparation in order to get an RSA key onto the device: FIDO2 keys
typically &lt;em&gt;just&lt;/em&gt; &lt;em&gt;work&lt;/em&gt;.&lt;/p&gt;
&lt;h1&gt;Unlocking with TPM2&lt;/h1&gt;
&lt;p&gt;Most modern (non-budget) PC hardware (and other kind of hardware too)
nowadays comes with a TPM2 security chip. In many ways a TPM2 chip is
a smartcard that is soldered onto the mainboard of your system. Unlike
your usual USB-connected security tokens you thus cannot remove them
from your PC, which means they address quite a different security
scenario: they aren't immediately comparable to a physical key you can
take with you that unlocks some door, but they are a key you leave at
the door, but that refuses to be turned by anyone but you.&lt;/p&gt;
&lt;p&gt;Even though this sounds a lot weaker than the FIDO2/PKCS#11 model TPM2
still bring benefits for securing your systems: because the
cryptographic key material stored in TPM2 devices cannot be extracted
(at least that's the theory), if you bind your hard disk encryption to
it, it means attackers cannot just copy your disk and analyze it
offline — they always need access to the TPM2 chip too to have a
chance to acquire the necessary cryptographic keys. Thus, they can
still steal your whole PC and analyze it, but they cannot just copy
the disk without you noticing and analyze the copy.&lt;/p&gt;
&lt;p&gt;Moreover, you can bind the ability to unlock the harddisk to specific
software versions: for example you could say that only your trusted
Fedora Linux can unlock the device, but not any arbitrary OS some
hacker might boot from a USB stick they plugged in. Thus, if you trust
your OS vendor, you can entrust storage unlocking to the vendor's OS
together with your TPM2 device, and thus can be reasonably sure
intruders cannot decrypt your data unless they both hack your OS
vendor &lt;em&gt;and&lt;/em&gt; steal/break your TPM2 chip.&lt;/p&gt;
&lt;p&gt;Here's how you enroll your LUKS2 volume with your TPM2 chip:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemd-cryptenroll&lt;span class="w"&gt; &lt;/span&gt;--tpm2-device&lt;span class="o"&gt;=&lt;/span&gt;auto&lt;span class="w"&gt; &lt;/span&gt;--tpm2-pcrs&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;7&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/dev/sda5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This looks almost as straightforward as the two earlier
&lt;code&gt;sytemd-cryptenroll&lt;/code&gt; command lines — if it wasn't for the
&lt;code&gt;--tpm2-pcrs=&lt;/code&gt; part. With that option you can specify to which TPM2
PCRs you want to bind the enrollment. TPM2 PCRs are a set of
(typically 24) hash values that every TPM2 equipped system at boot
calculates from all the software that is invoked during the boot
sequence, in a secure, unfakable way (this is called
"measurement"). If you bind unlocking to a specific value of a
specific PCR you thus require the system has to follow the same
sequence of software at boot to re-acquire the disk encryption
key. Sounds complex? Well, that's because it is.&lt;/p&gt;
&lt;p&gt;For now, let's see how we have to modify your &lt;code&gt;/etc/crypttab&lt;/code&gt; to
unlock via TPM2:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;myvolume /dev/sda5 - tpm2-device=auto
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This part is easy again: the &lt;code&gt;tpm2-device=&lt;/code&gt; option is what tells
&lt;code&gt;systemd-cryptsetup&lt;/code&gt; to use the TPM2 metadata from the LUKS2 header
and to wait for the TPM2 device to show up.&lt;/p&gt;
&lt;h1&gt;Bonus: Recovery Key Enrollment&lt;/h1&gt;
&lt;p&gt;FIDO2, PKCS#11 and TPM2 security tokens and chips pair well with
recovery keys: since you don't need to type in your password everyday
anymore it makes sense to get rid of it, and instead enroll a
high-entropy recovery key you then print out or scan off screen and
store a safe, physical location. i.e. forget about good ol'
passphrase-based unlocking, go for FIDO2 plus recovery key instead!
Here's how you do it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemd-cryptenroll&lt;span class="w"&gt; &lt;/span&gt;--recovery-key&lt;span class="w"&gt; &lt;/span&gt;/dev/sda5
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will generate a key, enroll it in the LUKS2 volume, show it to
you on screen and generate a QR code you may scan off screen if you
like. The key has highest entropy, and can be entered wherever you can
enter a passphrase. Because of that you don't have to modify
&lt;code&gt;/etc/crypttab&lt;/code&gt; to make the recovery key work.&lt;/p&gt;
&lt;h1&gt;Future&lt;/h1&gt;
&lt;p&gt;There's still plenty room for further improvement in all of this. In
particular for the TPM2 case: what the text above doesn't really
mention is that binding your encrypted volume unlocking to specific
software versions (i.e. kernel + initrd + OS versions) actually sucks
hard: if you naively update your system to newer versions you might
lose access to your TPM2 enrolled keys (which isn't terrible, after
all you did enroll a recovery key — &lt;em&gt;right&lt;/em&gt;? — which you then can use
to regain access). To solve this some more integration with
distributions would be necessary: whenever they upgrade the system
they'd have to make sure to enroll the TPM2 again — with the PCR
hashes matching the new version. And whenever they remove an old
version of the system they need to remove the old TPM2
enrollment. Alternatively TPM2 also knows a concept of &lt;em&gt;signed&lt;/em&gt; PCR
hash values. In this mode the distro could just ship a set of PCR
signatures which would unlock the TPM2 keys. (But quite frankly I
don't really see the point: whether you drop in a signature file on
each system update, or enroll a new set of PCR hashes in the LUKS2
header doesn't make much of a difference). Either way, to make TPM2
enrollment smooth some more integration work with your distribution's
system update mechanisms need to happen. And yes, because of this OS
updating complexity the example above — where I referenced your trusty
Fedora Linux — doesn't actually work IRL (yet? hopefully…). Nothing
updates the enrollment automatically after you initially enrolled it,
hence after the first kernel/initrd update you have to manually
re-enroll things again, and again, and again … after every update.&lt;/p&gt;
&lt;p&gt;The TPM2 could also be used for other kinds of key policies, we might
look into adding later too. For example, Windows uses TPM2 stuff to
allow short (4 digits or so) "PINs" for unlocking the harddisk,
i.e. kind of a low-entropy password you type in. The reason this is
reasonably safe is that in this case the PIN is passed to the TPM2
which enforces that not more than some limited amount of unlock
attempts may be made within some time frame, and that after too many
attempts the PIN is invalidated altogether. Thus making dictionary
attacks harder (which would normally be easier given the short length
of the PINs).&lt;/p&gt;
&lt;h1&gt;Postscript&lt;/h1&gt;
&lt;p&gt;(BTW: Yubico sent me two YubiKeys for testing, Nitrokey a Nitrokey
FIDO2, and AuthenTrend three ATKey.Pro tokens, thank you! — That's why
you see all those references to YubiKey/Nitrokey/AuthenTrend devices
in the text above: it's the hardware I had to test this with. That
said, I also tested the FIDO2 stuff with a SoloKey I bought, where it
also worked fine. And yes, you!, other vendors!, who might be reading
this, please send me your security tokens &lt;em&gt;for&lt;/em&gt; &lt;em&gt;free&lt;/em&gt;, too, and I
might test things with them as well. No promises though. And I am not
going to give them back, if you do, sorry. ;-))&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 13 Jan 2021 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2021-01-13:/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html</guid><category>projects</category></item><item><title>ASG! 2019 CfP Re-Opened!</title><link>https://0pointer.net/blog/asg-2019-cfp-re-opened.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2019 Call for Participation Re-Opened for ONE DAY!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;Due to popular request we have re-opened the Call for Participation
(CFP) for &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!  2019&lt;/a&gt; for one
day. It will close again &lt;em&gt;TODAY&lt;/em&gt;, on 15 of July 2019, midnight Central
European Summit Time! If you missed the deadline so far, we’d like to
invite you to submit your proposals for consideration to &lt;a href="https://cfp.all-systems-go.io/ASG2019/cfp"&gt;the CFP
submission site&lt;/a&gt; quickly!
(And yes, this is the last extension, there's not going to be any
more extensions.)&lt;/p&gt;
&lt;p&gt;&lt;img src="https://pbs.twimg.com/profile_banners/869627937145802752/1551356869/1500x500" alt="ASG image" width="1000" height="333"/&gt;&lt;/p&gt;
&lt;p&gt;All Systems Go! is everybody's favourite low-level Userspace Linux
conference, taking place in Berlin, Germany in September 20-22, 2019.&lt;/p&gt;
&lt;p&gt;For more information please visit &lt;a href="https://all-systems-go.io/"&gt;our conference
website&lt;/a&gt;!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 15 Jul 2019 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2019-07-15:/blog/asg-2019-cfp-re-opened.html</guid><category>projects</category></item><item><title>Walkthrough for Portable Services in Go</title><link>https://0pointer.net/blog/walkthrough-for-portable-services-in-go.html</link><description>&lt;h1&gt;Portable Services Walkthrough (Go Edition)&lt;/h1&gt;
&lt;p&gt;A few months ago I posted &lt;a href="http://0pointer.net/blog/walkthrough-for-portable-services.html"&gt;a blog story with a walkthrough of systemd
Portable
Services&lt;/a&gt;. The
example service given was written in C, and the image was built with
&lt;a href="https://github.com/systemd/mkosi"&gt;&lt;code&gt;mkosi&lt;/code&gt;&lt;/a&gt;. In this blog story I'd
like to revisit the exercise, but this time focus on a different
aspect: modern programming languages like Go and Rust push users a lot
more towards static linking of libraries than the usual dynamic
linking preferred by C (at least in the way C is used by traditional
Linux distributions).&lt;/p&gt;
&lt;p&gt;Static linking means we can greatly simplify image building: if we
don't have to link against shared libraries during runtime we don't
have to include them in the portable service image. And that means
pretty much all need for building an image from a Linux distribution
of some kind goes away as we'll have next to no dependencies that
would require us to rely on a distribution package manager or
distribution packages. In fact, as it turns out, we only need as few
as three files in the portable service image to be fully functional.&lt;/p&gt;
&lt;p&gt;So, let's have a closer look how such an image can be put
together. All of the following is available in &lt;a href="https://github.com/systemd/portable-walkthrough-go"&gt;this git
repository&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;A Simple Go Service&lt;/h2&gt;
&lt;p&gt;Let's start with a simple Go service, an HTTP service that simply
counts how often a page from it is requested. Here are the sources:
&lt;a href="https://github.com/systemd/portable-walkthrough-go/blob/master/main.go"&gt;main.go&lt;/a&gt;
— note that I am not a seasoned Go programmer, hence please be
gracious.&lt;/p&gt;
&lt;p&gt;The service implements systemd's socket activation protocol, and thus
can receive bound TCP listener sockets from systemd, using the
&lt;code&gt;$LISTEN_PID&lt;/code&gt; and &lt;code&gt;$LISTEN_FDS&lt;/code&gt; environment variables.&lt;/p&gt;
&lt;p&gt;The service will store the counter data in the directory indicated in
the &lt;code&gt;$STATE_DIRECTORY&lt;/code&gt; environment variable, which happens to be an
environment variable current systemd versions set based on the
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectory="&gt;&lt;code&gt;StateDirectory=&lt;/code&gt;&lt;/a&gt;
setting in service files.&lt;/p&gt;
&lt;h1&gt;Two Simple Unit Files&lt;/h1&gt;
&lt;p&gt;When a service shall be managed by systemd a unit file is
required. Since the service we are putting together shall be socket
activatable, we even have two:
&lt;a href="https://github.com/systemd/portable-walkthrough-go/blob/master/portable-walkthrough-go.service"&gt;&lt;code&gt;portable-walkthrough-go.service&lt;/code&gt;&lt;/a&gt;
(the description of the service binary itself) and
&lt;a href="https://github.com/systemd/portable-walkthrough-go/blob/master/portable-walkthrough-go.socket"&gt;&lt;code&gt;portable-walkthrough-go.socket&lt;/code&gt;&lt;/a&gt;
(the description of the sockets to listen on for the service).&lt;/p&gt;
&lt;p&gt;These units are not particularly remarkable: the &lt;code&gt;.service&lt;/code&gt; file
primarily contains the command line to invoke and a &lt;code&gt;StateDirectory=&lt;/code&gt;
setting to make sure the service when invoked gets its own private
state directory under &lt;code&gt;/var/lib/&lt;/code&gt; (and the &lt;code&gt;$STATE_DIRECTORY&lt;/code&gt;
environment variable is set to the resulting path). The &lt;code&gt;.socket&lt;/code&gt; file
simply lists 8088 as TCP/IP port to listen on.&lt;/p&gt;
&lt;h1&gt;An OS Description File&lt;/h1&gt;
&lt;p&gt;OS images (and that includes portable service images) generally should
include an
&lt;a href="https://www.freedesktop.org/software/systemd/man/os-release.html"&gt;&lt;code&gt;os-release&lt;/code&gt;&lt;/a&gt;
file. Usually, that is provided by the distribution. Since we are
building an image without any distribution let's write our &lt;a href="https://github.com/systemd/portable-walkthrough-go/blob/master/os-release"&gt;own
version of such a
file&lt;/a&gt;. Later
on we can use the &lt;code&gt;portablectl inspect&lt;/code&gt; command to have a look at this
metadata of our image.&lt;/p&gt;
&lt;h1&gt;Putting it All Together&lt;/h1&gt;
&lt;p&gt;The four files described above are already every file we need to build
our image. Let's now put the portable service image together. For that
I've written a
&lt;a href="https://github.com/systemd/portable-walkthrough-go/blob/master/Makefile"&gt;&lt;code&gt;Makefile&lt;/code&gt;&lt;/a&gt;. It
contains two relevant rules: the first one builds the static binary
from the Go program sources. The second one then puts together a
&lt;code&gt;squashfs&lt;/code&gt; file system combining the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The compiled, statically linked service binary&lt;/li&gt;
&lt;li&gt;The two systemd unit files&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;os-release&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;A couple of empty directories such as &lt;code&gt;/proc/&lt;/code&gt;, &lt;code&gt;/sys/&lt;/code&gt;, &lt;code&gt;/dev/&lt;/code&gt;
   and so on that need to be over-mounted with the respective kernel
   API file system. We need to create them as empty directories here
   since Linux insists on directories to exist in order to over-mount
   them, and since the image we are building is going to be an
   immutable read-only image (&lt;code&gt;squashfs&lt;/code&gt;) these directories cannot be
   created dynamically when the portable image is mounted.&lt;/li&gt;
&lt;li&gt;Two empty files &lt;code&gt;/etc/resolv.conf&lt;/code&gt; and &lt;code&gt;/etc/machine-id&lt;/code&gt; that can
   be over-mounted with the same files from the host.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And that's already it. After a quick &lt;code&gt;make&lt;/code&gt; we'll have our portable
service image &lt;code&gt;portable-walkthrough-go.raw&lt;/code&gt; and are ready to go.&lt;/p&gt;
&lt;h1&gt;Trying it out&lt;/h1&gt;
&lt;p&gt;Let's now attach the portable service image to our host system:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;portablectl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;attach&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Matching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;prefix&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;portable-walkthrough-go&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;.)&lt;/span&gt;
&lt;span class="n"&gt;Created&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Created&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Written&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Copied&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Created&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;directory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Written&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Created&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;symlink&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lib&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Copied&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Created&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;symlink&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portables&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;lennart&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;projects&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The portable service image is now attached to the host, which means we
can now go and start it (or even enable it):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;systemctl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;start&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's see if our little web service works, by doing an HTTP request on port 8088:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# curl localhost:8088
Hello! You are visitor #1!
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's try this again, to check if it counts correctly:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# curl localhost:8088
Hello! You are visitor #2!
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Nice! It worked. Let's now stop the service again, and detach the image again:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;systemctl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;
&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;portablectl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;detach&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;conf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portables&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;portable&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;walkthrough&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Removed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;etc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;system&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attached&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And there we go, the portable image file is detached from the host again.&lt;/p&gt;
&lt;h2&gt;A Couple of Notes&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Of course, this is a simplistic example: in real life services will
   be more than one compiled file, even when statically linked. But
   you get the idea, and it's very easy to extend the example above to
   include any additional, auxiliary files in the portable service
   image.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The service is very nicely sandboxed during runtime: while it runs
   as regular service on the host (and you thus can watch its logs or
   do resource management on it like you would do for all other
   systemd services), it runs in a very restricted environment under a
   dynamically assigned UID that ceases to exist when the service is
   stopped again.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Originally I wanted to make the service not only socket activatable
   but also implement exit-on-idle, i.e. add a logic so that the
   service terminates on its own when there's no ongoing HTTP
   connection for a while. I couldn't figure out how to do this
   race-freely in Go though, but I am sure an interested reader might
   want to add that? By combining socket activation with exit-on-idle
   we can turn this project into an excercise of putting together an
   extremely resource-friendly and robust service architecture: the
   service is started only when needed and terminates when no longer
   needed. This would allow to pack services at a much higher density
   even on systems with few resources.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;While the basic concepts of portable services have been around
   since systemd 239, it's best to try the above with systemd 241 or
   newer since the portable service logic received a number of fixes
   since then.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Further Reading&lt;/h2&gt;
&lt;p&gt;A low-level document introducing Portable Services is &lt;a href="https://systemd.io/PORTABLE_SERVICES"&gt;shipped along
with systemd&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Please have a look at the &lt;a href="http://0pointer.net/blog/walkthrough-for-portable-services.html"&gt;blog story from a few months
ago&lt;/a&gt;
that did something very similar with a service written in C.&lt;/p&gt;
&lt;p&gt;There are also relevant manual pages:
&lt;a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"&gt;&lt;code&gt;portablectl(1)&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"&gt;&lt;code&gt;systemd-portabled(8)&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 03 Apr 2019 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2019-04-03:/blog/walkthrough-for-portable-services-in-go.html</guid><category>projects</category></item><item><title>ASG! 2018 Tickets</title><link>https://0pointer.net/blog/asg-2018-tickets.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;All Systems Go! 2018 Tickets Selling Out Quickly!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;Buy your tickets for &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!
2018&lt;/a&gt; soon, they are quickly selling out!
The conference takes place on &lt;em&gt;September 28-30&lt;/em&gt;, in &lt;em&gt;Berlin&lt;/em&gt;, Germany, in
a bit over two weeks.&lt;/p&gt;
&lt;p&gt;Why should you attend? If you are interested in low-level Linux
userspace, then All Systems Go! is the right conference for you. It
covers all topics relevant to foundational open-source Linux
technologies. For details on the covered topics see our schedule &lt;a href="https://cfp.all-systems-go.io/en/ASG2018/public/schedule/2"&gt;for day #1&lt;/a&gt;
and &lt;a href="https://cfp.all-systems-go.io/en/ASG2018/public/schedule/3"&gt;for day #2&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For more information please visit &lt;a href="https://all-systems-go.io/"&gt;our conference
website&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;See you in Berlin!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 11 Sep 2018 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2018-09-11:/blog/asg-2018-tickets.html</guid><category>projects</category></item><item><title>ASG! 2018 CfP Closes TODAY</title><link>https://0pointer.net/blog/asg-2018-cfp-closes-today.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2018 Call for Participation Closes TODAY!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;The Call for Participation (CFP) for &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!
2018&lt;/a&gt; will close &lt;em&gt;TODAY&lt;/em&gt;, on 30th of
July! We’d like to invite you to submit your proposals for
consideration to &lt;a href="https://cfp.all-systems-go.io/de/ASG2018/cfp"&gt;the CFP submission
site&lt;/a&gt; quickly!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://scontent-frx5-1.xx.fbcdn.net/v/t1.0-9/32372869_2062729060632451_4411941877062828032_o.jpg?_nc_cat=0&amp;oh=112809c076e808ede4dee6e50afe2b99&amp;oe=5B8ACDDF" alt="ASG image" width="512" height="256"/&gt;&lt;/p&gt;
&lt;p&gt;All Systems Go! is everybody's favourite low-level Userspace Linux
conference, taking place in Berlin, Germany in September 28-30, 2018.&lt;/p&gt;
&lt;p&gt;For more information please visit &lt;a href="https://all-systems-go.io/"&gt;our conference
website&lt;/a&gt;!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 30 Jul 2018 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2018-07-30:/blog/asg-2018-cfp-closes-today.html</guid><category>projects</category></item><item><title>ASG! 2018 CfP Closes Soon</title><link>https://0pointer.net/blog/asg-2018-cfp-closes-soon.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2018 Call for Participation Closes in One Week!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;The Call for Participation (CFP) for &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!
2018&lt;/a&gt; will close &lt;em&gt;in one week&lt;/em&gt;, on 30th of
July! We’d like to invite you to submit your proposals for
consideration to &lt;a href="https://cfp.all-systems-go.io/de/ASG2018/cfp"&gt;the CFP submission
site&lt;/a&gt; quickly!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://scontent-frx5-1.xx.fbcdn.net/v/t1.0-9/32372869_2062729060632451_4411941877062828032_o.jpg?_nc_cat=0&amp;oh=112809c076e808ede4dee6e50afe2b99&amp;oe=5B8ACDDF" alt="ASG image" width="512" height="256"/&gt;&lt;/p&gt;
&lt;p&gt;Notification of acceptance and non-acceptance will go out within 7
days of the closing of the CFP.&lt;/p&gt;
&lt;p&gt;All topics relevant to foundational open-source Linux technologies are
welcome. In particular, however, we are looking for proposals
including, but not limited to, the following topics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Low-level container executors and infrastructure&lt;/li&gt;
&lt;li&gt;IoT and embedded OS infrastructure&lt;/li&gt;
&lt;li&gt;BPF and eBPF filtering&lt;/li&gt;
&lt;li&gt;OS, container, IoT image delivery and updating&lt;/li&gt;
&lt;li&gt;Building Linux devices and applications&lt;/li&gt;
&lt;li&gt;Low-level desktop technologies&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;System and service management&lt;/li&gt;
&lt;li&gt;Tracing and performance measuring&lt;/li&gt;
&lt;li&gt;IPC and RPC systems&lt;/li&gt;
&lt;li&gt;Security and Sandboxing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While our focus is definitely more on the user-space side of things,
talks about kernel projects are welcome, as long as they have a clear
and direct relevance for user-space.&lt;/p&gt;
&lt;p&gt;For more information please visit &lt;a href="https://all-systems-go.io/"&gt;our conference
website&lt;/a&gt;!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 23 Jul 2018 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2018-07-23:/blog/asg-2018-cfp-closes-soon.html</guid><category>projects</category></item><item><title>Walkthrough for Portable Services</title><link>https://0pointer.net/blog/walkthrough-for-portable-services.html</link><description>&lt;h1&gt;Portable Services with systemd v239&lt;/h1&gt;
&lt;p&gt;&lt;a href="https://lists.freedesktop.org/archives/systemd-devel/2018-June/040879.html"&gt;systemd
v239&lt;/a&gt;
contains a great number of new features. One of them is first class
support for &lt;a href="https://systemd.io/PORTABLE_SERVICES"&gt;Portable
Services&lt;/a&gt;. In this blog story
I'd like to shed some light on what they are and why they might be
interesting for your application.&lt;/p&gt;
&lt;h2&gt;What are "Portable Services"?&lt;/h2&gt;
&lt;p&gt;The "Portable Service" concept takes inspiration from classic
&lt;code&gt;chroot()&lt;/code&gt; environments as well as container management and brings a
number of their features to more regular system service management.&lt;/p&gt;
&lt;p&gt;While the definition of what a "container" really is is hotly debated,
I figure people can generally agree that the "container" concept
primarily provides two major features:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Resource bundling: a container generally brings its own file system
   tree along, bundling any shared libraries and other resources it
   might need along with the main service executables.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Isolation and sand-boxing: a container operates in a name-spaced
   environment that is relatively detached from the host. Besides
   living in its own file system namespace it usually also has its own
   user database, process tree and so on. Access from the container to
   the host is limited with various security technologies.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of these two concepts the first one is also what traditional UNIX
&lt;code&gt;chroot()&lt;/code&gt; environments are about.&lt;/p&gt;
&lt;p&gt;Both resource bundling and isolation/sand-boxing are concepts systemd
has implemented to varying degrees for a longer time. Specifically,
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootDirectory="&gt;&lt;code&gt;RootDirectory=&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootImage="&gt;&lt;code&gt;RootImage=&lt;/code&gt;&lt;/a&gt;
have been around for a long time, and so have been the various
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Sandboxing"&gt;sand-boxing
features&lt;/a&gt;
systemd provides. The Portable Services concept builds on that,
putting these features together in a new, integrated way to make them
more accessible and usable.&lt;/p&gt;
&lt;h2&gt;OK, so what precisely is a "Portable Service"?&lt;/h2&gt;
&lt;p&gt;Much like a container image, a portable service on disk can be just a
directory tree that contains service executables and all their
dependencies, in a hierarchy resembling the normal Linux directory
hierarchy. A portable service can also be a raw disk image, containing
a file system containing such a tree (which can be mounted via a
loop-back block device), or multiple file systems (in which case they
need to follow the &lt;a href="https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/"&gt;Discoverable Partitions
Specification&lt;/a&gt;
and be located within a GPT partition table). Regardless whether the
portable service on disk is a simple directory tree or a raw disk
image, let's call this concept the portable service &lt;em&gt;image&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Such images can be generated with any tool typically used for the
purpose of installing OSes inside some directory, for example &lt;code&gt;dnf
--installroot=&lt;/code&gt; or &lt;code&gt;debootstrap&lt;/code&gt;. There are very few requirements made
on these trees, except the following two:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The tree should carry &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html"&gt;systemd unit
   files&lt;/a&gt;
   for relevant services in them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The tree should carry
&lt;a href="https://www.freedesktop.org/software/systemd/man/os-release.html"&gt;&lt;code&gt;/usr/lib/os-release&lt;/code&gt;&lt;/a&gt;
(or &lt;code&gt;/etc/os-release&lt;/code&gt;) OS release information.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course, as you might notice, OS trees generated from any of today's
big distributions generally qualify for these two requirements without
any further modification, as pretty much all of them adopted
&lt;code&gt;/usr/lib/os-release&lt;/code&gt; and tend to ship their major services with
systemd unit files.&lt;/p&gt;
&lt;p&gt;A portable service image generated like this can be "attached" or
"detached" from a host:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;"Attaching" an image to a host is done through the new
   &lt;a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"&gt;&lt;code&gt;portablectl
   attach&lt;/code&gt;&lt;/a&gt;
   command. This command dissects the image, reading the &lt;code&gt;os-release&lt;/code&gt;
   information, and searching for unit files in them. It then copies
   relevant unit files out of the images and into
   &lt;code&gt;/etc/systemd/system/&lt;/code&gt;. After that it augments any copied service
   unit files in two ways: a drop-in adding a &lt;code&gt;RootDirectory=&lt;/code&gt; or
   &lt;code&gt;RootImage=&lt;/code&gt; line is added in so that even though the unit files
   are now available on the host when started they run the referenced
   binaries from the image. It also symlinks in a second drop-in which
   is called a "profile", which is supposed to carry additional
   security settings to enforce on the attached services, to ensure
   the right amount of sand-boxing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;"Detaching" an image from the host is done through &lt;code&gt;portable
   detach&lt;/code&gt;. It reverses the steps above: the unit files copied out are
   removed again, and so are the two drop-in files generated for them.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;While a portable service is attached its relevant unit files are made
available on the host like any others: they will appear in &lt;code&gt;systemctl
list-unit-files&lt;/code&gt;, you can enable and disable them, you can start them
and stop them. You can extend them with &lt;code&gt;systemctl edit&lt;/code&gt;. You can
introspect them. You can apply resource management to them like to any
other service, and you can process their logs like any other service
and so on. That's because they really &lt;em&gt;are&lt;/em&gt; native systemd services,
except that they have 'twist' if you so will: they have tougher
security by default and store their resources in a root directory or
image.&lt;/p&gt;
&lt;p&gt;And that's already the essence of what Portable Services are.&lt;/p&gt;
&lt;p&gt;A couple of interesting points:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Even though the focus is on shipping &lt;em&gt;service&lt;/em&gt; unit files in
   portable service images, you can actually ship timer units, socket
   units, target units, path units in portable services too. This
   means you can very naturally do time, socket and path based
   activation. It's also entirely fine to ship multiple service units
   in the same image, in case you have more complex applications.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;This concept introduces zero new metadata. Unit files are an
   existing concept, as are &lt;code&gt;os-release&lt;/code&gt; files, and — in case you opt
   for raw disk images — GPT partition tables are already established
   too. This also means existing tools to generate images can be
   reused for building portable service images to a large degree as no
   completely new artifact types need to be generated.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because the Portable Service concepts introduces zero new metadata
   and just builds on existing security and resource bundling
   features of systemd it's implemented in a set of distinct tools,
   relatively disconnected from the rest of systemd. Specifically, the
   main user-facing command is
   &lt;a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"&gt;&lt;code&gt;portablectl&lt;/code&gt;&lt;/a&gt;,
   and the actual operations are implemented in
   &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"&gt;&lt;code&gt;systemd-portabled.service&lt;/code&gt;&lt;/a&gt;. If
   you so will, portable services are a true add-on to systemd, just
   making a specific work-flow nicer to use than with the basic
   operations systemd otherwise provides. Also note that
   &lt;code&gt;systemd-portabled&lt;/code&gt; provides bus APIs accessible to any program
   that wants to interface with it, &lt;code&gt;portablectl&lt;/code&gt; is just one tool
   that happens to be shipped along with systemd.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Since Portable Services are a feature we only added very recently
   we wanted to keep some freedom to make changes still. Due to that
   we decided to install the &lt;code&gt;portablectl&lt;/code&gt; command into
   &lt;code&gt;/usr/lib/systemd/&lt;/code&gt; for now, so that it does not appear in &lt;code&gt;$PATH&lt;/code&gt;
   by default. This means, for now you have to invoke it with a full
   path: &lt;code&gt;/usr/lib/systemd/portablectl&lt;/code&gt;. We expect to move it into
   &lt;code&gt;/usr/bin/&lt;/code&gt; very soon though, and make it a fully supported
   interface of systemd.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You may wonder which unit files contained in a portable service
   image are the ones considered "relevant" and are actually copied
   out by the &lt;code&gt;portablectl attach&lt;/code&gt; operation. Currently, this is
   derived from the image name. Let's say you have an image stored in
   a directory &lt;code&gt;/var/lib/portables/foobar_4711/&lt;/code&gt; (or alternatively in
   a raw image &lt;code&gt;/var/lib/portables/foobar_4711.raw&lt;/code&gt;). In that case the
   unit files copied out match the pattern &lt;code&gt;foobar*.service&lt;/code&gt;,
   &lt;code&gt;foobar*.socket&lt;/code&gt;, &lt;code&gt;foobar*.target&lt;/code&gt;, &lt;code&gt;foobar*.path&lt;/code&gt;,
   &lt;code&gt;foobar*.timer&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Portable Services concept does not define any specific method
   how images get on the deployment machines, that's entirely up to
   administrators. You can just &lt;code&gt;scp&lt;/code&gt; them there, or &lt;code&gt;wget&lt;/code&gt; them. You
   could even package them as RPMs and then deploy them with &lt;code&gt;dnf&lt;/code&gt; if
   you feel adventurous.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Portable service images can reside in any directory you
   like. However, if you place them in &lt;code&gt;/var/lib/portables/&lt;/code&gt; then
   &lt;code&gt;portablectl&lt;/code&gt; will find them easily and can show you a list of
   images you can attach and suchlike.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Attaching a portable service image can be done persistently, so
   that it remains attached on subsequent boots (which is the default),
   or it can be attached only until the next reboot, by passing
   &lt;code&gt;--runtime&lt;/code&gt; to &lt;code&gt;portablectl&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because portable service images are ultimately just regular OS
   images, it's natural and easy to build a single image that can be
   used in three different ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;It can be attached to any host as a portable service image.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can be booted as OS container, for example in a container
   manager like &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"&gt;&lt;code&gt;systemd-nspawn&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can be booted as host system, for example on bare metal or
   in a VM manager.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Of course, to qualify for the latter two the image needs to
contain more than just the service binaries, the &lt;code&gt;os-release&lt;/code&gt; file
and the unit files. To be bootable an OS container manager such as
&lt;code&gt;systemd-nspawn&lt;/code&gt; the image needs to contain an init system of some
form, for example
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.html"&gt;&lt;code&gt;systemd&lt;/code&gt;&lt;/a&gt;. To
be bootable on bare metal or as VM it also needs a boot loader of
some form, for example
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-boot.html"&gt;&lt;code&gt;systemd-boot&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Profiles&lt;/h2&gt;
&lt;p&gt;In the previous section the "profile" concept was briefly
mentioned. Since they are a major feature of the Portable Services
concept, they deserve some focus. A "profile" is ultimately just a
pre-defined drop-in file for unit files that are attached to a
host. They are supposed to mostly contain sand-boxing and security
settings, but may actually contain any other settings, too. When a
portable service is attached a suitable profile has to be selected. If
none is selected explicitly, the default profile called &lt;code&gt;default&lt;/code&gt; is
used. systemd ships with four different profiles out of the box:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The
&lt;a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/default/service.conf"&gt;&lt;code&gt;default&lt;/code&gt;&lt;/a&gt;
profile provides a medium level of security. It contains settings to
drop capabilities, enforce system call filters, restrict many kernel
interfaces and mount various file systems read-only.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The
&lt;a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/strict/service.conf"&gt;&lt;code&gt;strict&lt;/code&gt;&lt;/a&gt;
profile is similar to the &lt;code&gt;default&lt;/code&gt; profile, but generally uses the
most restrictive sand-boxing settings. For example networking is turned
off and access to &lt;code&gt;AF_NETLINK&lt;/code&gt; sockets is prohibited.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The
&lt;a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/trusted/service.conf"&gt;&lt;code&gt;trusted&lt;/code&gt;&lt;/a&gt;
profile is the least strict of them all. In fact it makes almost no
restrictions at all. A service run with this profile has basically
full access to the host system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The
&lt;a href="https://github.com/systemd/systemd/blob/master/src/portable/profile/nonetwork/service.conf"&gt;&lt;code&gt;nonetwork&lt;/code&gt;&lt;/a&gt;
profile is mostly identical to &lt;code&gt;default&lt;/code&gt;, but also turns off network access.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that the profile is selected at the time the portable service
image is attached, and it applies to all service files attached, in
case multiple are shipped in the same image. Thus, the sand-boxing
restriction to enforce are selected by the administrator attaching the
image and not the image vendor.&lt;/p&gt;
&lt;p&gt;Additional profiles can be defined easily by the administrator, if
needed. We might also add additional profiles sooner or later to be
shipped with systemd out of the box.&lt;/p&gt;
&lt;h2&gt;What's the use-case for this? If I have containers, why should I bother?&lt;/h2&gt;
&lt;p&gt;Portable Services are primarily intended to cover use-cases where code
should more feel like "extensions" to the host system rather than live
in disconnected, separate worlds. The profile concept is
supposed to be tunable to the exact right amount of integration or
isolation needed for an application.&lt;/p&gt;
&lt;p&gt;In the container world the concept of "super-privileged containers"
has been touted a lot, i.e. containers that run with full
privileges. It's precisely that use-case that portable services are
intended for: extensions to the host OS, that default to isolation,
but can optionally get as much access to the host as needed, and can
naturally take benefit of the full functionality of the host. The
concept should hence be useful for all kinds of low-level system
software that isn't shipped with the OS itself but needs varying
degrees of integration with it. Besides servers and appliances this
should be particularly interesting for IoT and embedded devices.&lt;/p&gt;
&lt;p&gt;Because portable services are just a relatively small extension to the
way system services are otherwise managed, they can be treated like
regular service for almost all use-cases: they will appear along
regular services in all tools that can introspect systemd unit data,
and can be managed the same way when it comes to logging, resource
management, runtime life-cycles and so on.&lt;/p&gt;
&lt;p&gt;Portable services are a very generic concept. While the original
use-case is OS extensions, it's of course entirely up to you and other
users to use them in a suitable way of your choice.&lt;/p&gt;
&lt;h2&gt;Walkthrough&lt;/h2&gt;
&lt;p&gt;Let's have a look how this all can be used. We'll start with building
a portable service image from scratch, before we attach, enable and
start it on a host.&lt;/p&gt;
&lt;h3&gt;Building a Portable Service image&lt;/h3&gt;
&lt;p&gt;As mentioned, you can use any tool you like that can create OS trees
or raw images for building Portable Service images, for example
&lt;code&gt;debootstrap&lt;/code&gt; or &lt;code&gt;dnf --installroot=&lt;/code&gt;. For this example walkthrough
run we'll use &lt;a href="https://github.com/systemd/mkosi"&gt;&lt;code&gt;mkosi&lt;/code&gt;&lt;/a&gt;, which is
ultimately just a fancy wrapper around &lt;code&gt;dnf&lt;/code&gt; and &lt;code&gt;debootstrap&lt;/code&gt; but
makes a number of things particularly easy when repetitively building
images from source trees.&lt;/p&gt;
&lt;p&gt;I have pushed everything necessary to reproduce this walkthrough
locally to &lt;a href="https://github.com/systemd/portable-walkthrough"&gt;a GitHub
repository&lt;/a&gt;. Let's check it out:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;git&lt;span class="w"&gt; &lt;/span&gt;clone&lt;span class="w"&gt; &lt;/span&gt;https://github.com/systemd/portable-walkthrough.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's have a look in the repository:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;First of all,
   &lt;a href="https://github.com/systemd/portable-walkthrough/blob/master/walkthroughd.c"&gt;&lt;code&gt;walkthroughd.c&lt;/code&gt;&lt;/a&gt;
   is the main source file of our little service. To keep things
   simple it's written in C, but it could be in any language of your
   choice. The daemon as implemented won't do much: it just starts up
   and waits for &lt;code&gt;SIGTERM&lt;/code&gt;, at which point it will shut down. It's
   ultimately useless, but hopefully illustrates how this all fits
   together. The C code has no dependencies besides libc.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/systemd/portable-walkthrough/blob/master/walkthroughd.service"&gt;&lt;code&gt;walkthroughd.service&lt;/code&gt;&lt;/a&gt;
   is a systemd unit file that starts our little daemon. It's a simple
   service, hence the unit file is trivial.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/systemd/portable-walkthrough/blob/master/Makefile"&gt;&lt;code&gt;Makefile&lt;/code&gt;&lt;/a&gt;
   is a short make build script to build the daemon binary. It's
   pretty trivial, too: it just takes the C file and builds a binary
   from it. It can also install the daemon. It places the binary in
   &lt;code&gt;/usr/local/lib/walkthroughd/walkthroughd&lt;/code&gt; (why not in
   &lt;code&gt;/usr/local/bin&lt;/code&gt;? because it's not a user-facing binary but a system
   service binary), and its unit file in
   &lt;code&gt;/usr/local/lib/systemd/walkthroughd.service&lt;/code&gt;. If you want to test
   the daemon on the host we can now simply run &lt;code&gt;make&lt;/code&gt; and then
   &lt;code&gt;./walkthroughd&lt;/code&gt; in order to check everything works.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/systemd/portable-walkthrough/blob/master/mkosi.default"&gt;&lt;code&gt;mkosi.default&lt;/code&gt;&lt;/a&gt;
   is file that tells &lt;code&gt;mkosi&lt;/code&gt; how to build the image. We opt for a
   Fedora-based image here (but we might as well have used Debian
   here, or any other supported distribution). We need no particular
   packages during runtime (after all we only depend on libc), but
   during the build phase we need gcc and make, hence these are the
   only packages we list in &lt;code&gt;BuildPackages=&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://github.com/systemd/portable-walkthrough/blob/master/mkosi.build"&gt;&lt;code&gt;mkosi.build&lt;/code&gt;&lt;/a&gt;
   is a shell script that is invoked during mkosi's build logic. All
   it does is invoke &lt;code&gt;make&lt;/code&gt; and &lt;code&gt;make install&lt;/code&gt; to build and install
   our little daemon, and afterwards it extends the
   distribution-supplied &lt;code&gt;/etc/os-release&lt;/code&gt; file with an additional
   field that describes our portable service a bit.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let's now use this to build the portable service image. For that we
use the &lt;a href="https://github.com/systemd/mkosi"&gt;mkosi&lt;/a&gt; tool. It's
sufficient to invoke it without parameter to build the first image: it
will automatically discover &lt;code&gt;mkosi.default&lt;/code&gt; and &lt;code&gt;mkosi.build&lt;/code&gt; which
tells it what to do. (Note that if you work on a project like this for
a longer time, &lt;code&gt;mkosi -if&lt;/code&gt; is probably the better command to use, as
it that speeds up building substantially by using an incremental build
mode). &lt;code&gt;mkosi&lt;/code&gt; will download the necessary RPMs, and put them all
together. It will build our little daemon inside the image and after
all that's done it will output the resulting image:
&lt;code&gt;walkthroughd_1.raw&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Because we opted to build a GPT raw disk image in &lt;code&gt;mkosi.default&lt;/code&gt; this
file is actually a raw disk image containing a GPT partition
table. You can use &lt;code&gt;fdisk -l walkthroughd_1.raw&lt;/code&gt; to enumerate the
partition table. You can also use &lt;code&gt;systemd-nspawn -i
walkthroughd_1.raw&lt;/code&gt; to explore the image quickly if you need.&lt;/p&gt;
&lt;h2&gt;Using the Portable Service Image&lt;/h2&gt;
&lt;p&gt;Now that we have a portable service image, let's see how we can
attach, enable and start the service included within it.&lt;/p&gt;
&lt;p&gt;First, let's attach the image:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;/usr/lib/systemd/portablectl&lt;span class="w"&gt; &lt;/span&gt;attach&lt;span class="w"&gt; &lt;/span&gt;./walkthroughd_1.raw
&lt;span class="gp gp-VirtualEnv"&gt;(Matching unit files with prefix &amp;#39;walkthroughd&amp;#39;.)&lt;/span&gt;
&lt;span class="go"&gt;Created directory /etc/systemd/system/walkthroughd.service.d.&lt;/span&gt;
&lt;span class="go"&gt;Written /etc/systemd/system/walkthroughd.service.d/20-portable.conf.&lt;/span&gt;
&lt;span class="go"&gt;Created symlink /etc/systemd/system/walkthroughd.service.d/10-profile.conf → /usr/lib/systemd/portable/profile/default/service.conf.&lt;/span&gt;
&lt;span class="go"&gt;Copied /etc/systemd/system/walkthroughd.service.&lt;/span&gt;
&lt;span class="go"&gt;Created symlink /etc/portables/walkthroughd_1.raw → /home/lennart/projects/portable-walkthrough/walkthroughd_1.raw.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The command will show you exactly what is has been doing: it just
copied the main service file out, and added the two drop-ins, as
expected.&lt;/p&gt;
&lt;p&gt;Let's see if the unit is now available on the host, just like a regular unit, as promised:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemctl&lt;span class="w"&gt; &lt;/span&gt;status&lt;span class="w"&gt; &lt;/span&gt;walkthroughd.service
&lt;span class="go"&gt;● walkthroughd.service - A simple example service&lt;/span&gt;
&lt;span class="go"&gt;   Loaded: loaded (/etc/systemd/system/walkthroughd.service; disabled; vendor preset: disabled)&lt;/span&gt;
&lt;span class="go"&gt;  Drop-In: /etc/systemd/system/walkthroughd.service.d&lt;/span&gt;
&lt;span class="go"&gt;           └─10-profile.conf, 20-portable.conf&lt;/span&gt;
&lt;span class="go"&gt;   Active: inactive (dead)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Nice, it worked. We see that the unit file is available and that
systemd correctly discovered the two drop-ins. The unit is neither
enabled nor started however. Yes, attaching a portable service image
doesn't imply enabling nor starting. It just means the unit files
contained in the image are made available to the host. It's up to the
administrator to then enable them (so that they are automatically
started when needed, for example at boot), and/or start them (in case
they shall run right-away).&lt;/p&gt;
&lt;p&gt;Let's now enable and start the service in one step:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemctl&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;enable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--now&lt;span class="w"&gt; &lt;/span&gt;walkthroughd.service
&lt;span class="go"&gt;Created symlink /etc/systemd/system/multi-user.target.wants/walkthroughd.service → /etc/systemd/system/walkthroughd.service.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's check if it's running:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemctl&lt;span class="w"&gt; &lt;/span&gt;status&lt;span class="w"&gt; &lt;/span&gt;walkthroughd.service
&lt;span class="go"&gt;● walkthroughd.service - A simple example service&lt;/span&gt;
&lt;span class="go"&gt;   Loaded: loaded (/etc/systemd/system/walkthroughd.service; enabled; vendor preset: disabled)&lt;/span&gt;
&lt;span class="go"&gt;  Drop-In: /etc/systemd/system/walkthroughd.service.d&lt;/span&gt;
&lt;span class="go"&gt;           └─10-profile.conf, 20-portable.conf&lt;/span&gt;
&lt;span class="go"&gt;   Active: active (running) since Wed 2018-06-27 17:55:30 CEST; 4s ago&lt;/span&gt;
&lt;span class="go"&gt; Main PID: 45003 (walkthroughd)&lt;/span&gt;
&lt;span class="go"&gt;    Tasks: 1 (limit: 4915)&lt;/span&gt;
&lt;span class="go"&gt;   Memory: 4.3M&lt;/span&gt;
&lt;span class="go"&gt;   CGroup: /system.slice/walkthroughd.service&lt;/span&gt;
&lt;span class="go"&gt;           └─45003 /usr/local/lib/walkthroughd/walkthroughd&lt;/span&gt;

&lt;span class="go"&gt;Jun 27 17:55:30 sigma walkthroughd[45003]: Initializing.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Perfect! We can see that the service is now enabled and running. The daemon is running as PID 45003.&lt;/p&gt;
&lt;p&gt;Now that we verified that all is good, let's stop, disable and detach the service again:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemctl&lt;span class="w"&gt; &lt;/span&gt;disable&lt;span class="w"&gt; &lt;/span&gt;--now&lt;span class="w"&gt; &lt;/span&gt;walkthroughd.service
&lt;span class="go"&gt;Removed /etc/systemd/system/multi-user.target.wants/walkthroughd.service.&lt;/span&gt;
&lt;span class="gp"&gt;# &lt;/span&gt;/usr/lib/systemd/portablectl&lt;span class="w"&gt; &lt;/span&gt;detach&lt;span class="w"&gt; &lt;/span&gt;./walkthroughd_1.raw
&lt;span class="go"&gt;Removed /etc/systemd/system/walkthroughd.service.&lt;/span&gt;
&lt;span class="go"&gt;Removed /etc/systemd/system/walkthroughd.service.d/10-profile.conf.&lt;/span&gt;
&lt;span class="go"&gt;Removed /etc/systemd/system/walkthroughd.service.d/20-portable.conf.&lt;/span&gt;
&lt;span class="go"&gt;Removed /etc/systemd/system/walkthroughd.service.d.&lt;/span&gt;
&lt;span class="go"&gt;Removed /etc/portables/walkthroughd_1.raw.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And finally, let's see that it's really gone:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;systemctl&lt;span class="w"&gt; &lt;/span&gt;status&lt;span class="w"&gt; &lt;/span&gt;walkthroughd
&lt;span class="go"&gt;Unit walkthroughd.service could not be found.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Perfect! It worked!&lt;/p&gt;
&lt;p&gt;I hope the above gets you started with Portable Services. If you have
further questions, please contact &lt;a href="https://lists.freedesktop.org/mailman/listinfo/systemd-devel"&gt;our mailing
list&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Further Reading&lt;/h2&gt;
&lt;p&gt;A more low-level document explaining details is &lt;a href="https://systemd.io/PORTABLE_SERVICES"&gt;shipped
along with systemd&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are also relevant manual pages:
&lt;a href="https://www.freedesktop.org/software/systemd/man/portablectl.html"&gt;&lt;code&gt;portablectl(1)&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-portabled.service.html"&gt;&lt;code&gt;systemd-portabled(8)&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further information about &lt;code&gt;mkosi&lt;/code&gt; see &lt;a href="https://github.com/systemd/mkosi"&gt;its homepage&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 27 Jun 2018 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2018-06-27:/blog/walkthrough-for-portable-services.html</guid><category>projects</category></item><item><title>All Systems Go! 2018 CfP Open</title><link>https://0pointer.net/blog/all-systems-go-2018-cfp-open.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2018 Call for Participation is Now Open!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;The Call for Participation (CFP) for &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!
2018&lt;/a&gt; is now open. We’d like to invite you
to submit your proposals for consideration to &lt;a href="https://cfp.all-systems-go.io/de/ASG2018/cfp"&gt;the CFP submission
site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://scontent-frx5-1.xx.fbcdn.net/v/t1.0-9/32372869_2062729060632451_4411941877062828032_o.jpg?_nc_cat=0&amp;oh=112809c076e808ede4dee6e50afe2b99&amp;oe=5B8ACDDF" alt="ASG image" width="512" height="256"/&gt;&lt;/p&gt;
&lt;p&gt;The CFP will close on July 30th. Notification of acceptance and
non-acceptance will go out within 7 days of the closing of the CFP.&lt;/p&gt;
&lt;p&gt;All topics relevant to foundational open-source Linux technologies are
welcome. In particular, however, we are looking for proposals
including, but not limited to, the following topics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Low-level container executors and infrastructure&lt;/li&gt;
&lt;li&gt;IoT and embedded OS infrastructure&lt;/li&gt;
&lt;li&gt;BPF and eBPF filtering&lt;/li&gt;
&lt;li&gt;OS, container, IoT image delivery and updating&lt;/li&gt;
&lt;li&gt;Building Linux devices and applications&lt;/li&gt;
&lt;li&gt;Low-level desktop technologies&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;System and service management&lt;/li&gt;
&lt;li&gt;Tracing and performance measuring&lt;/li&gt;
&lt;li&gt;IPC and RPC systems&lt;/li&gt;
&lt;li&gt;Security and Sandboxing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While our focus is definitely more on the user-space side of things,
talks about kernel projects are welcome, as long as they have a clear
and direct relevance for user-space.&lt;/p&gt;
&lt;p&gt;For more information please visit &lt;a href="https://all-systems-go.io/"&gt;our conference
website&lt;/a&gt;!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 21 May 2018 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2018-05-21:/blog/all-systems-go-2018-cfp-open.html</guid><category>projects</category></item><item><title>All Systems Go! 2017 Videos Online!</title><link>https://0pointer.net/blog/all-systems-go-2017-videos-online.html</link><description>&lt;p&gt;For those living under a rock, the videos from everybody's favourite
Userspace Linux Conference &lt;a href="https://all-systems-go.io/"&gt;&lt;em&gt;All Systems Go!&lt;/em&gt;
2017&lt;/a&gt; are now available online.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media.ccc.de/b/conferences/all_systems_go/2017"&gt;All videos&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The videos for my own two talks are available here:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media.ccc.de/v/ASG2017-125-synchronizing_images_with_casync"&gt;Synchronizing Images with
casync&lt;/a&gt;
(&lt;a href="http://0pointer.de/public/casync-asg2017.pdf"&gt;Slides&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://media.ccc.de/v/ASG2017-101-containers_without_a_container_manager_with_systemd"&gt;Containers without a Container Manager, with
systemd&lt;/a&gt;
(&lt;a href="http://0pointer.de/public/systemd-asg2017.pdf"&gt;Slides&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Of course, this is the stellar work of the &lt;a href="https://c3voc.de/"&gt;CCC
VOC&lt;/a&gt; folks, who are hard to beat when it comes to
videotaping of community conferences.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://all-systems-go.io/"&gt;&lt;img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/&gt;&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 24 Oct 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-10-24:/blog/all-systems-go-2017-videos-online.html</guid><category>projects</category></item><item><title>Attending and Speaking at GNOME.Asia 2017 Summit</title><link>https://0pointer.net/blog/attending-and-speaking-at-gnomeasia-2017-summit.html</link><description>&lt;p&gt;The &lt;a href="https://2017.gnome.asia/"&gt;GNOME.Asia Summit 2017&lt;/a&gt; organizers
invited to me to speak at their conference in Chongqing/China, and it
was an excellent event! Here's my brief report:&lt;/p&gt;
&lt;p&gt;&lt;img src="https://wiki.gnome.org/Travel/Policy?action=AttachFile&amp;do=get&amp;target=sponsored-badge-shadow.png" width="230" height="230"/&gt;&lt;/p&gt;
&lt;p&gt;Because we arrived one day early in Chongqing, my GNOME friends Sri,
Matthias, Jonathan, David and I started our journey with an excursion
to the &lt;a href="https://en.wikipedia.org/wiki/Dazu_Rock_Carvings"&gt;Dazu Rock
Carvings&lt;/a&gt;, a short
bus trip from Chongqing, and an excellent (and sometimes quite
surprising) sight. I mean, where else can you see a buddha with 1000+
hands, and centuries old, holding a cell Nexus 5 cell phone? Here's
proof:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0234.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0234.jpg" width="167" height="250"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The GNOME.Asia schedule was excellent, with various good talks,
including some about Flatpak, Endless OS, rpm-ostree, Blockchains and
more. My own talk was about &lt;em&gt;The Path to a Fully Protected GNOME
Desktop OS Image&lt;/em&gt; (&lt;a href="http://0pointer.de/public/systemd-gnomeasia2017.pdf"&gt;Slides available
here&lt;/a&gt;). In the
hallway track I did my best to advocate
&lt;a href="https://github.com/systemd/casync"&gt;casync&lt;/a&gt; to whoever was willing to
listen, and I think enough were ;-). As we all know attending
conferences is at least as much about the hallway track as about the
talks, and GNOME.Asia was a fantastic way to meet the Chinese GNOME
and Open Source communities.&lt;/p&gt;
&lt;p&gt;The day after the conference the organizers of GNOME.Asia organized a
Chongqing day trip. A particular highlight was the ubiqutious hot pot,
sometimes with the local speciality: fresh pig brain.&lt;/p&gt;
&lt;p&gt;Here some random photos from the trip: sights, food, social event and
more.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0409.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0409.jpg" width="135" height="250"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0265.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0265.jpg" width="167" height="250"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0183.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0183.jpg" width="177" height="250"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/handy/esel.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/handy/esel-klein.jpg" width="240" height="320"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0273.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0273.jpg" width="167" height="250"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0164.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0164.jpg" width="167" height="250"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/handy/hotpot.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/handy/hotpot-klein.jpg" width="240" height="320"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0176.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0176.jpg" width="250" height="152"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0150.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0150.jpg" width="250" height="195"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0216.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0216.jpg" width="250" height="167"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0326.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0326.jpg" width="250" height="169"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0371.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0371.jpg" width="250" height="167"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0442.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0442.jpg" width="250" height="167"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0480.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0480.jpg" width="250" height="177"/&gt;&lt;/a&gt;
&lt;a href="http://0pointer.de/public/chongqing/big/IMG_0536.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/small/IMG_0536.jpg" width="250" height="94"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I'd like to thank the GNOME Foundation for funding my trip to
GNOME.Asia. And that's all for now. But let me close with an old
chinese wisdom:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://0pointer.de/public/chongqing/handy/wahlspruch.jpg"&gt;&lt;img src="http://0pointer.de/public/chongqing/handy/wahlspruch-klein.jpg" width="320" height="210"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;big&gt;&lt;i&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;The Trials Of A Long Journey Always Feeling, Civilized Travel Pass Reputation.&lt;/i&gt;&lt;/big&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 24 Oct 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-10-24:/blog/attending-and-speaking-at-gnomeasia-2017-summit.html</guid><category>projects</category></item><item><title>IP Accounting and Access Lists with systemd</title><link>https://0pointer.net/blog/ip-accounting-and-access-lists-with-systemd.html</link><description>&lt;p&gt;&lt;em&gt;TL;DR: systemd now can do per-service IP traffic accounting, as well
as access control for IP address ranges.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Last Friday we released &lt;a href="https://lists.freedesktop.org/archives/systemd-devel/2017-October/039589.html"&gt;systemd
235&lt;/a&gt;. &lt;a href="http://0pointer.net/blog/dynamic-users-with-systemd.html"&gt;I
already blogged about its Dynamic User feature in
detail&lt;/a&gt;, but
there's one more piece of new functionality that I think deserves special
attention: IP accounting and access control.&lt;/p&gt;
&lt;p&gt;Before v235 systemd already provided per-unit resource management
hooks for a number of different kinds of resources: consumed CPU time,
disk I/O, memory usage and number of tasks. With v235 another kind of
resource can be controlled per-unit with systemd: network traffic
(specifically IP).&lt;/p&gt;
&lt;p&gt;Three new unit file settings have been added in this context:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#IPAccounting="&gt;&lt;code&gt;IPAccounting=&lt;/code&gt;&lt;/a&gt; is a boolean setting. If enabled for a unit, all IP
traffic sent and received by processes associated with it is counted
both in terms of bytes and of packets.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#IPAddressAllow=ADDDRESS%5B/PREFIXLENGTH%5D%E2%80%A6"&gt;&lt;code&gt;IPAddressDeny=&lt;/code&gt;&lt;/a&gt; takes an IP address prefix (that means: an IP
address with a network mask). All traffic from and to this address will be
prohibited for processes of the service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#IPAddressAllow=ADDDRESS%5B/PREFIXLENGTH%5D%E2%80%A6"&gt;&lt;code&gt;IPAddressAllow=&lt;/code&gt;&lt;/a&gt; is the matching positive counterpart to
&lt;code&gt;IPAddressDeny=&lt;/code&gt;. All traffic matching this IP address/network mask
combination will be allowed, even if otherwise listed in
&lt;code&gt;IPAddressDeny=&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The three options are thin wrappers around kernel functionality
introduced with Linux 4.11: the control group eBPF hooks. The actual
work is done by the kernel, systemd just provides a number of new
settings to configure this facet of it. Note that cgroup/eBPF is
unrelated to classic Linux firewalling,
i.e. NetFilter/&lt;code&gt;iptables&lt;/code&gt;. It's up to you whether you use one or the
other, or both in combination (or of course neither).&lt;/p&gt;
&lt;h1&gt;IP Accounting&lt;/h1&gt;
&lt;p&gt;Let's have a closer look at the IP accounting logic mentioned
above. Let's write a simple unit
&lt;code&gt;/etc/systemd/system/ip-accounting-test.service&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Service]&lt;/span&gt;
&lt;span class="na"&gt;ExecStart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/usr/bin/ping 8.8.8.8&lt;/span&gt;
&lt;span class="na"&gt;IPAccounting&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This simple unit invokes the
&lt;a href="http://man7.org/linux/man-pages/man8/ping.8.html"&gt;ping(8)&lt;/a&gt; command to
send a series of ICMP/IP ping packets to the IP address 8.8.8.8 (which
is the Google DNS server IP; we use it for testing here, since it's
easy to remember, reachable everywhere and known to react to ICMP
pings; any other IP address responding to pings would be fine to use,
too). The &lt;code&gt;IPAccounting=&lt;/code&gt; option is used to turn on IP accounting for
the unit.&lt;/p&gt;
&lt;p&gt;Let's start this service after writing the file. Let's then have a
look at the status output of &lt;code&gt;systemctl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemctl daemon-reload&lt;/span&gt;
&lt;span class="c1"&gt;# systemctl start ip-accounting-test&lt;/span&gt;
&lt;span class="c1"&gt;# systemctl status ip-accounting-test&lt;/span&gt;
●&lt;span class="w"&gt; &lt;/span&gt;ip-accounting-test.service
&lt;span class="w"&gt;   &lt;/span&gt;Loaded:&lt;span class="w"&gt; &lt;/span&gt;loaded&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;/etc/systemd/system/ip-accounting-test.service&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;static&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;vendor&lt;span class="w"&gt; &lt;/span&gt;preset:&lt;span class="w"&gt; &lt;/span&gt;disabled&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;Active:&lt;span class="w"&gt; &lt;/span&gt;active&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;running&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;since&lt;span class="w"&gt; &lt;/span&gt;Mon&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2017&lt;/span&gt;-10-09&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:05:47&lt;span class="w"&gt; &lt;/span&gt;CEST&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;1s&lt;span class="w"&gt; &lt;/span&gt;ago
&lt;span class="w"&gt; &lt;/span&gt;Main&lt;span class="w"&gt; &lt;/span&gt;PID:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;ping&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;       &lt;/span&gt;IP:&lt;span class="w"&gt; &lt;/span&gt;168B&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;168B&lt;span class="w"&gt; &lt;/span&gt;out
&lt;span class="w"&gt;    &lt;/span&gt;Tasks:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;limit:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;4915&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;CGroup:&lt;span class="w"&gt; &lt;/span&gt;/system.slice/ip-accounting-test.service
&lt;span class="w"&gt;           &lt;/span&gt;└─32152&lt;span class="w"&gt; &lt;/span&gt;/usr/bin/ping&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8

Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:05:47&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;Started&lt;span class="w"&gt; &lt;/span&gt;ip-accounting-test.service.
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:05:47&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;PING&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;56&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;84&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;of&lt;span class="w"&gt; &lt;/span&gt;data.
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:05:47&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;29&lt;/span&gt;.2&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:05:48&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;28&lt;/span&gt;.0&lt;span class="w"&gt; &lt;/span&gt;ms
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This shows the &lt;code&gt;ping&lt;/code&gt; command running — it's currently at its second
ping cycle as we can see in the logs at the end of the output. More
interesting however is the &lt;code&gt;IP:&lt;/code&gt; line further up showing the current
IP byte counters. It currently shows 168 bytes have been received, and
168 bytes have been sent. That the two counters are at the same value
is not surprising: ICMP ping requests and responses are supposed to
have the same size. Note that this line is shown only if
&lt;code&gt;IPAccounting=&lt;/code&gt; is turned on for the service, as only then this data
is collected.&lt;/p&gt;
&lt;p&gt;Let's wait a bit, and invoke &lt;code&gt;systemctl status&lt;/code&gt; again:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemctl status ip-accounting-test&lt;/span&gt;
●&lt;span class="w"&gt; &lt;/span&gt;ip-accounting-test.service
&lt;span class="w"&gt;   &lt;/span&gt;Loaded:&lt;span class="w"&gt; &lt;/span&gt;loaded&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;/etc/systemd/system/ip-accounting-test.service&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;static&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;vendor&lt;span class="w"&gt; &lt;/span&gt;preset:&lt;span class="w"&gt; &lt;/span&gt;disabled&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;Active:&lt;span class="w"&gt; &lt;/span&gt;active&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;running&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;since&lt;span class="w"&gt; &lt;/span&gt;Mon&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2017&lt;/span&gt;-10-09&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:05:47&lt;span class="w"&gt; &lt;/span&gt;CEST&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;4min&lt;span class="w"&gt; &lt;/span&gt;28s&lt;span class="w"&gt; &lt;/span&gt;ago
&lt;span class="w"&gt; &lt;/span&gt;Main&lt;span class="w"&gt; &lt;/span&gt;PID:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;ping&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;       &lt;/span&gt;IP:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;22&lt;/span&gt;.2K&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;22&lt;/span&gt;.2K&lt;span class="w"&gt; &lt;/span&gt;out
&lt;span class="w"&gt;    &lt;/span&gt;Tasks:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;limit:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;4915&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;CGroup:&lt;span class="w"&gt; &lt;/span&gt;/system.slice/ip-accounting-test.service
&lt;span class="w"&gt;           &lt;/span&gt;└─32152&lt;span class="w"&gt; &lt;/span&gt;/usr/bin/ping&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8

Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:07&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;260&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.7&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:08&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;261&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;28&lt;/span&gt;.0&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:09&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;262&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;33&lt;/span&gt;.8&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:10&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;263&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;48&lt;/span&gt;.9&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:11&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;264&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.2&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:12&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;265&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.0&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:13&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;266&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;26&lt;/span&gt;.8&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:14&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;267&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.4&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:15&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;268&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;29&lt;/span&gt;.7&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:10:16&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;269&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.6&lt;span class="w"&gt; &lt;/span&gt;ms
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As we can see, after 269 pings the counters are much higher: at 22K.&lt;/p&gt;
&lt;p&gt;Note that while &lt;code&gt;systemctl status&lt;/code&gt; shows only the byte counters,
packet counters are kept as well. Use the low-level &lt;code&gt;systemctl show&lt;/code&gt;
command to query the current raw values of the in and out packet and
byte counters:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemctl show ip-accounting-test -p IPIngressBytes -p IPIngressPackets -p IPEgressBytes -p IPEgressPackets&lt;/span&gt;
&lt;span class="nv"&gt;IPIngressBytes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;37776&lt;/span&gt;
&lt;span class="nv"&gt;IPIngressPackets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;449&lt;/span&gt;
&lt;span class="nv"&gt;IPEgressBytes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;37776&lt;/span&gt;
&lt;span class="nv"&gt;IPEgressPackets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;449&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Of course, the same information is also available via the D-Bus
APIs. If you want to process this data further consider talking proper
D-Bus, rather than scraping the output of &lt;code&gt;systemctl show&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Now, let's stop the service again:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemctl stop ip-accounting-test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When a service with such accounting turned on terminates, a log line
about all its consumed resources is written to the logs. Let's check
with &lt;code&gt;journalctl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# journalctl -u ip-accounting-test -n 5&lt;/span&gt;
--&lt;span class="w"&gt; &lt;/span&gt;Logs&lt;span class="w"&gt; &lt;/span&gt;begin&lt;span class="w"&gt; &lt;/span&gt;at&lt;span class="w"&gt; &lt;/span&gt;Thu&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2016&lt;/span&gt;-08-18&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;23&lt;/span&gt;:09:37&lt;span class="w"&gt; &lt;/span&gt;CEST,&lt;span class="w"&gt; &lt;/span&gt;end&lt;span class="w"&gt; &lt;/span&gt;at&lt;span class="w"&gt; &lt;/span&gt;Mon&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2017&lt;/span&gt;-10-09&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:17:02&lt;span class="w"&gt; &lt;/span&gt;CEST.&lt;span class="w"&gt; &lt;/span&gt;--
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:15:50&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;603&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;26&lt;/span&gt;.9&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:15:51&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;32152&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;604&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.2&lt;span class="w"&gt; &lt;/span&gt;ms
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:15:52&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;Stopping&lt;span class="w"&gt; &lt;/span&gt;ip-accounting-test.service...
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:15:52&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;Stopped&lt;span class="w"&gt; &lt;/span&gt;ip-accounting-test.service.
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:15:52&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;ip-accounting-test.service:&lt;span class="w"&gt; &lt;/span&gt;Received&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;49&lt;/span&gt;.5K&lt;span class="w"&gt; &lt;/span&gt;IP&lt;span class="w"&gt; &lt;/span&gt;traffic,&lt;span class="w"&gt; &lt;/span&gt;sent&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;49&lt;/span&gt;.5K&lt;span class="w"&gt; &lt;/span&gt;IP&lt;span class="w"&gt; &lt;/span&gt;traffic
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The last line shown is the interesting one, that shows the accounting
data. It's actually a structured log message, and among its metadata
fields it contains the more comprehensive raw data:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# journalctl -u ip-accounting-test -n 1 -o verbose&lt;/span&gt;
--&lt;span class="w"&gt; &lt;/span&gt;Logs&lt;span class="w"&gt; &lt;/span&gt;begin&lt;span class="w"&gt; &lt;/span&gt;at&lt;span class="w"&gt; &lt;/span&gt;Thu&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2016&lt;/span&gt;-08-18&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;23&lt;/span&gt;:09:37&lt;span class="w"&gt; &lt;/span&gt;CEST,&lt;span class="w"&gt; &lt;/span&gt;end&lt;span class="w"&gt; &lt;/span&gt;at&lt;span class="w"&gt; &lt;/span&gt;Mon&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2017&lt;/span&gt;-10-09&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:18:50&lt;span class="w"&gt; &lt;/span&gt;CEST.&lt;span class="w"&gt; &lt;/span&gt;--
Mon&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2017&lt;/span&gt;-10-09&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:15:52.649028&lt;span class="w"&gt; &lt;/span&gt;CEST&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;s&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;89a2cc877fdf4dafb2269a7631afedad&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nv"&gt;i&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;14d7&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nv"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4c7e7adcba0c45b69d612857270716d3&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nv"&gt;m&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;137592e75e&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nv"&gt;t&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;55b1f81298605&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="nv"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;c3c9b57b28c9490e&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;PRIORITY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_BOOT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;4c7e7adcba0c45b69d612857270716d3
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_MACHINE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;e87bfd866aea4ae4b761aff06c9c3cb3
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_HOSTNAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sigma
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;SYSLOG_FACILITY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;SYSLOG_IDENTIFIER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;systemd
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_UID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_GID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_TRANSPORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;journal
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_PID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_COMM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;systemd
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_EXE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/lib/systemd/systemd
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_CAP_EFFECTIVE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3fffffffff
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_SYSTEMD_CGROUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/init.scope
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_SYSTEMD_UNIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;init.scope
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_SYSTEMD_SLICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-.slice
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;CODE_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;../src/core/unit.c
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_CMDLINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/lib/systemd/systemd&lt;span class="w"&gt; &lt;/span&gt;--switched-root&lt;span class="w"&gt; &lt;/span&gt;--system&lt;span class="w"&gt; &lt;/span&gt;--deserialize&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;25&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_SELINUX_CONTEXT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;system_u:system_r:init_t:s0
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;UNIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ip-accounting-test.service
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;CODE_LINE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;2115&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;CODE_FUNC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;unit_log_resources
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;MESSAGE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ae8f7b866b0347b9af31fe1c80b127c0
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;INVOCATION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;98a6e756fa9d421d8dfc82b6df06a9c3
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;IP_METRIC_INGRESS_BYTES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;50880&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;IP_METRIC_INGRESS_PACKETS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;605&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;IP_METRIC_EGRESS_BYTES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;50880&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;IP_METRIC_EGRESS_PACKETS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;605&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;MESSAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ip-accounting-test.service:&lt;span class="w"&gt; &lt;/span&gt;Received&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;49&lt;/span&gt;.6K&lt;span class="w"&gt; &lt;/span&gt;IP&lt;span class="w"&gt; &lt;/span&gt;traffic,&lt;span class="w"&gt; &lt;/span&gt;sent&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;49&lt;/span&gt;.6K&lt;span class="w"&gt; &lt;/span&gt;IP&lt;span class="w"&gt; &lt;/span&gt;traffic
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;_SOURCE_REALTIME_TIMESTAMP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1507565752649028&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The interesting fields of this log message are of course
&lt;code&gt;IP_METRIC_INGRESS_BYTES=&lt;/code&gt;, &lt;code&gt;IP_METRIC_INGRESS_PACKETS=&lt;/code&gt;,
&lt;code&gt;IP_METRIC_EGRESS_BYTES=&lt;/code&gt;, &lt;code&gt;IP_METRIC_EGRESS_PACKETS=&lt;/code&gt; that show the
consumed data.&lt;/p&gt;
&lt;p&gt;The log message carries a &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html#MESSAGE_ID="&gt;message
ID&lt;/a&gt;
that may be used to quickly search for all such resource log messages
(&lt;code&gt;ae8f7b866b0347b9af31fe1c80b127c0&lt;/code&gt;). We can combine a search term for
messages of this ID with &lt;code&gt;journalctl&lt;/code&gt;'s &lt;code&gt;-u&lt;/code&gt; switch to quickly find
out about the resource usage of any invocation of a specific
service. Let's try:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# journalctl -u ip-accounting-test MESSAGE_ID=ae8f7b866b0347b9af31fe1c80b127c0&lt;/span&gt;
--&lt;span class="w"&gt; &lt;/span&gt;Logs&lt;span class="w"&gt; &lt;/span&gt;begin&lt;span class="w"&gt; &lt;/span&gt;at&lt;span class="w"&gt; &lt;/span&gt;Thu&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2016&lt;/span&gt;-08-18&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;23&lt;/span&gt;:09:37&lt;span class="w"&gt; &lt;/span&gt;CEST,&lt;span class="w"&gt; &lt;/span&gt;end&lt;span class="w"&gt; &lt;/span&gt;at&lt;span class="w"&gt; &lt;/span&gt;Mon&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2017&lt;/span&gt;-10-09&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:25:27&lt;span class="w"&gt; &lt;/span&gt;CEST.&lt;span class="w"&gt; &lt;/span&gt;--
Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;09&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;18&lt;/span&gt;:15:52&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;ip-accounting-test.service:&lt;span class="w"&gt; &lt;/span&gt;Received&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;49&lt;/span&gt;.6K&lt;span class="w"&gt; &lt;/span&gt;IP&lt;span class="w"&gt; &lt;/span&gt;traffic,&lt;span class="w"&gt; &lt;/span&gt;sent&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;49&lt;/span&gt;.6K&lt;span class="w"&gt; &lt;/span&gt;IP&lt;span class="w"&gt; &lt;/span&gt;traffic
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Of course, the output above shows only one message at the moment,
since we started the service only once, but a new one will appear
every time you start and stop it again.&lt;/p&gt;
&lt;p&gt;The IP accounting logic is also hooked up with
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-run.html"&gt;&lt;code&gt;systemd-run&lt;/code&gt;&lt;/a&gt;,
which is useful for transiently running a command as systemd service
with IP accounting turned on. Let's try it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemd-run -p IPAccounting=yes --wait wget https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2.pdf&lt;/span&gt;
Running&lt;span class="w"&gt; &lt;/span&gt;as&lt;span class="w"&gt; &lt;/span&gt;unit:&lt;span class="w"&gt; &lt;/span&gt;run-u2761.service
Finished&lt;span class="w"&gt; &lt;/span&gt;with&lt;span class="w"&gt; &lt;/span&gt;result:&lt;span class="w"&gt; &lt;/span&gt;success
Main&lt;span class="w"&gt; &lt;/span&gt;processes&lt;span class="w"&gt; &lt;/span&gt;terminated&lt;span class="w"&gt; &lt;/span&gt;with:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;exited/status&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
Service&lt;span class="w"&gt; &lt;/span&gt;runtime:&lt;span class="w"&gt; &lt;/span&gt;878ms
IP&lt;span class="w"&gt; &lt;/span&gt;traffic&lt;span class="w"&gt; &lt;/span&gt;received:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;231&lt;/span&gt;.0K
IP&lt;span class="w"&gt; &lt;/span&gt;traffic&lt;span class="w"&gt; &lt;/span&gt;sent:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;.7K
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This uses &lt;a href="https://linux.die.net/man/1/wget"&gt;&lt;code&gt;wget&lt;/code&gt;&lt;/a&gt; to download &lt;a href="https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2.pdf"&gt;the
PDF version of the 2nd day
schedule&lt;/a&gt;
of everybody's favorite Linux user-space conference &lt;a href="https://all-systems-go.io/"&gt;All Systems Go!
2017&lt;/a&gt; (BTW, have you already &lt;a href="https://all-systems-go.io/#tickets"&gt;booked your
ticket&lt;/a&gt;? We are very close to
selling out, be quick!). The IP traffic this command generated was
231K ingress and 4K egress. In the &lt;code&gt;systemd-run&lt;/code&gt; command line two
parameters are important. First of all, we use &lt;code&gt;-p IPAccounting=yes&lt;/code&gt;
to turn on IP accounting for the transient service (as above). And
secondly we use &lt;code&gt;--wait&lt;/code&gt; to tell &lt;code&gt;systemd-run&lt;/code&gt; to wait for the service
to exit. If &lt;code&gt;--wait&lt;/code&gt; is used, &lt;code&gt;systemd-run&lt;/code&gt; will also show you various
statistics about the service that just ran and terminated, including
the IP statistics you are seeing if IP accounting has been turned on.&lt;/p&gt;
&lt;p&gt;It's fun to combine this sort of IP accounting with interactive
transient units. Let's try that:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemd-run -p IPAccounting=1 -t /bin/sh&lt;/span&gt;
Running&lt;span class="w"&gt; &lt;/span&gt;as&lt;span class="w"&gt; &lt;/span&gt;unit:&lt;span class="w"&gt; &lt;/span&gt;run-u2779.service
Press&lt;span class="w"&gt; &lt;/span&gt;^&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;three&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;times&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;within&lt;span class="w"&gt; &lt;/span&gt;1s&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;disconnect&lt;span class="w"&gt; &lt;/span&gt;TTY.
sh-4.4#&lt;span class="w"&gt; &lt;/span&gt;dnf&lt;span class="w"&gt; &lt;/span&gt;update
…
sh-4.4#&lt;span class="w"&gt; &lt;/span&gt;dnf&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;firefox
…
sh-4.4#&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;
Finished&lt;span class="w"&gt; &lt;/span&gt;with&lt;span class="w"&gt; &lt;/span&gt;result:&lt;span class="w"&gt; &lt;/span&gt;success
Main&lt;span class="w"&gt; &lt;/span&gt;processes&lt;span class="w"&gt; &lt;/span&gt;terminated&lt;span class="w"&gt; &lt;/span&gt;with:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;exited/status&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
Service&lt;span class="w"&gt; &lt;/span&gt;runtime:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;.297s
IP&lt;span class="w"&gt; &lt;/span&gt;traffic&lt;span class="w"&gt; &lt;/span&gt;received:&lt;span class="w"&gt; &lt;/span&gt;…B
IP&lt;span class="w"&gt; &lt;/span&gt;traffic&lt;span class="w"&gt; &lt;/span&gt;sent:&lt;span class="w"&gt; &lt;/span&gt;…B
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This uses &lt;code&gt;systemd-run&lt;/code&gt;'s &lt;code&gt;--pty&lt;/code&gt; switch (or short: &lt;code&gt;-t&lt;/code&gt;), which opens
an interactive pseudo-TTY connection to the invoked service process,
which is a bourne shell in this case. Doing this means we have a full,
comprehensive shell with job control and everything. Since the shell
is running as part of a service with IP accounting turned on, all IP
traffic we generate or receive will be accounted for. And as soon as
we exit the shell, we'll see what it consumed. (For the sake of
brevity I actually didn't paste the whole output above, but truncated
core parts. Try it out for yourself, if you want to see the output in
full.)&lt;/p&gt;
&lt;p&gt;Sometimes it might make sense to turn on IP accounting for a unit that
is already running. For that, use &lt;code&gt;systemctl set-property
foobar.service IPAccounting=yes&lt;/code&gt;, which will instantly turn on
accounting for it. Note that it won't count retroactively though: only
the traffic sent/received after the point in time you turned it on
will be collected. You may turn off accounting for the unit with the
same command.&lt;/p&gt;
&lt;p&gt;Of course, sometimes it's interesting to collect IP accounting data
for all services, and turning on &lt;code&gt;IPAccounting=yes&lt;/code&gt; in every single
unit is cumbersome. To deal with that there's a global option
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html#DefaultCPUAccounting="&gt;&lt;code&gt;DefaultIPAccounting=&lt;/code&gt;&lt;/a&gt;
available which can be set in &lt;code&gt;/etc/systemd/system.conf&lt;/code&gt;.&lt;/p&gt;
&lt;h1&gt;IP Access Lists&lt;/h1&gt;
&lt;p&gt;So much about IP accounting. Let's now have a look at IP access
control with systemd 235. As mentioned above, the two new unit file
settings, &lt;code&gt;IPAddressAllow=&lt;/code&gt; and &lt;code&gt;IPAddressDeny=&lt;/code&gt; maybe be used for
that. They operate in the following way:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If the source address of an incoming packet or the destination
address of an outgoing packet matches one of the IP addresses/network
masks in the relevant unit's &lt;code&gt;IPAddressAllow=&lt;/code&gt; setting then it will be
allowed to go through.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Otherwise, if a packet matches an &lt;code&gt;IPAddressDeny=&lt;/code&gt; entry configured
for the service it is dropped.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the packet matches neither of the above it is allowed to go
through.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Or in other words, &lt;code&gt;IPAddressDeny=&lt;/code&gt; implements a blacklist, but
&lt;code&gt;IPAddressAllow=&lt;/code&gt; takes precedence.&lt;/p&gt;
&lt;p&gt;Let's try that out. Let's modify our last example above in order to
get a transient service running an interactive shell which has such an
access list set:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemd-run -p IPAddressDeny=any -p IPAddressAllow=8.8.8.8 -p IPAddressAllow=127.0.0.0/8 -t /bin/sh&lt;/span&gt;
Running&lt;span class="w"&gt; &lt;/span&gt;as&lt;span class="w"&gt; &lt;/span&gt;unit:&lt;span class="w"&gt; &lt;/span&gt;run-u2850.service
Press&lt;span class="w"&gt; &lt;/span&gt;^&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;three&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;times&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;within&lt;span class="w"&gt; &lt;/span&gt;1s&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;disconnect&lt;span class="w"&gt; &lt;/span&gt;TTY.
sh-4.4#&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8&lt;span class="w"&gt; &lt;/span&gt;-c1
PING&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;56&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;84&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;of&lt;span class="w"&gt; &lt;/span&gt;data.
&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;59&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.9&lt;span class="w"&gt; &lt;/span&gt;ms

---&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.8.8&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="w"&gt; &lt;/span&gt;statistics&lt;span class="w"&gt; &lt;/span&gt;---
&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;packets&lt;span class="w"&gt; &lt;/span&gt;transmitted,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;received,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;%&lt;span class="w"&gt; &lt;/span&gt;packet&lt;span class="w"&gt; &lt;/span&gt;loss,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;0ms
rtt&lt;span class="w"&gt; &lt;/span&gt;min/avg/max/mdev&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;27&lt;/span&gt;.957/27.957/27.957/0.000&lt;span class="w"&gt; &lt;/span&gt;ms
sh-4.4#&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.4.4&lt;span class="w"&gt; &lt;/span&gt;-c1
PING&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.4.4&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.4.4&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;56&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;84&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;of&lt;span class="w"&gt; &lt;/span&gt;data.
ping:&lt;span class="w"&gt; &lt;/span&gt;sendmsg:&lt;span class="w"&gt; &lt;/span&gt;Operation&lt;span class="w"&gt; &lt;/span&gt;not&lt;span class="w"&gt; &lt;/span&gt;permitted
^C
---&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;.8.4.4&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="w"&gt; &lt;/span&gt;statistics&lt;span class="w"&gt; &lt;/span&gt;---
&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;packets&lt;span class="w"&gt; &lt;/span&gt;transmitted,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;received,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;100&lt;/span&gt;%&lt;span class="w"&gt; &lt;/span&gt;packet&lt;span class="w"&gt; &lt;/span&gt;loss,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;0ms
sh-4.4#&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;127&lt;/span&gt;.0.0.2&lt;span class="w"&gt; &lt;/span&gt;-c1
PING&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;127&lt;/span&gt;.0.0.1&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;127&lt;/span&gt;.0.0.2&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;56&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="m"&gt;84&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;of&lt;span class="w"&gt; &lt;/span&gt;data.
&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;bytes&lt;span class="w"&gt; &lt;/span&gt;from&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;127&lt;/span&gt;.0.0.2:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;icmp_seq&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.116&lt;span class="w"&gt; &lt;/span&gt;ms

---&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;127&lt;/span&gt;.0.0.2&lt;span class="w"&gt; &lt;/span&gt;ping&lt;span class="w"&gt; &lt;/span&gt;statistics&lt;span class="w"&gt; &lt;/span&gt;---
&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;packets&lt;span class="w"&gt; &lt;/span&gt;transmitted,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;received,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;%&lt;span class="w"&gt; &lt;/span&gt;packet&lt;span class="w"&gt; &lt;/span&gt;loss,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;time&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;0ms
rtt&lt;span class="w"&gt; &lt;/span&gt;min/avg/max/mdev&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;.116/0.116/0.116/0.000&lt;span class="w"&gt; &lt;/span&gt;ms
sh-4.4#&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The access list we set up uses &lt;code&gt;IPAddressDeny=any&lt;/code&gt; in order to define
an IP white-list: all traffic will be prohibited for the session,
except for what is explicitly white-listed. In this command line, we
white-listed two address prefixes: 8.8.8.8 (with no explicit network
mask, which means the mask with all bits turned on is implied,
i.e. &lt;code&gt;/32&lt;/code&gt;), and 127.0.0.0/8. Thus, the service can communicate with
Google's DNS server and everything on the local loop-back, but nothing
else. The commands run in this interactive shell show this: First we
try pinging 8.8.8.8 which happily responds. Then, we try to ping
8.8.4.4 (that's Google's other DNS server, but excluded from this
white-list), and as we see it is immediately refused with an &lt;em&gt;Operation
not permitted&lt;/em&gt; error. As last step we ping 127.0.0.2 (which is on the
local loop-back), and we see it works fine again, as expected.&lt;/p&gt;
&lt;p&gt;In the example above we used &lt;code&gt;IPAddressDeny=any&lt;/code&gt;. The &lt;code&gt;any&lt;/code&gt;
identifier is a shortcut for writing 0.0.0.0/0 ::/0, i.e. it's a
shortcut for &lt;em&gt;everything&lt;/em&gt;, on both IPv4 and IPv6. A number of other
such shortcuts exist. For example, instead of spelling out
&lt;code&gt;127.0.0.0/8&lt;/code&gt; we could also have used the more descriptive shortcut
&lt;code&gt;localhost&lt;/code&gt; which is expanded to 127.0.0.0/8 ::1/128, i.e. everything
on the local loopback device, on both IPv4 and IPv6.&lt;/p&gt;
&lt;p&gt;Being able to configure IP access lists individually for each unit is
pretty nice already. However, typically one wants to configure this
comprehensively, not just for individual units, but for a set of units
in one go or even the system as a whole. In systemd, that's possible
by making use of
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.slice.html"&gt;&lt;code&gt;.slice&lt;/code&gt;&lt;/a&gt;
units (for those who don't know systemd that well, slice units are a
concept for organizing services in hierarchical tree for the purpose of
resource management): the IP access list in effect for a unit is the
combination of the individual IP access lists configured for the unit
itself and those of all slice units it is contained in.&lt;/p&gt;
&lt;p&gt;By default, system services are assigned to
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.special.html#system.slice"&gt;&lt;code&gt;system.slice&lt;/code&gt;&lt;/a&gt;,
which in turn is a child of the root slice
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.special.html#-.slice"&gt;&lt;code&gt;-.slice&lt;/code&gt;&lt;/a&gt;. Either
of these two slice units are hence suitable for locking down &lt;em&gt;all&lt;/em&gt;
system services at once. If an access list is configured on
&lt;code&gt;system.slice&lt;/code&gt; it will only apply to system services, however, if
configured on &lt;code&gt;-.slice&lt;/code&gt; it will apply to all user processes of the
system, including all user session processes (i.e. which are by
default assigned to &lt;code&gt;user.slice&lt;/code&gt; which is a child of &lt;code&gt;-.slice&lt;/code&gt;) in
addition to the system services.&lt;/p&gt;
&lt;p&gt;Let's make use of this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# systemctl set-property system.slice IPAddressDeny=any IPAddressAllow=localhost
# systemctl set-property apache.service IPAddressAllow=10.0.0.0/8
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The two commands above are a very powerful way to first turn off all
IP communication for all system services (with the exception of
loop-back traffic), followed by an explicit white-listing of
10.0.0.0/8 (which could refer to the local company network, you get
the idea) but only for the Apache service.&lt;/p&gt;
&lt;h1&gt;Use-cases&lt;/h1&gt;
&lt;p&gt;After playing around a bit with this, let's talk about use-cases. Here
are a few ideas:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;The IP access list logic can in many ways provide a more modern
replacement for the venerable &lt;a href="https://en.wikipedia.org/wiki/TCP_Wrapper"&gt;TCP
Wrapper&lt;/a&gt;, but unlike it it
applies to all IP sockets of a service unconditionally, and requires
no explicit support in any way in the service's code: no patching
required. On the other hand, TCP wrappers have a number of features
this scheme cannot cover, most importantly systemd's IP access lists
operate solely on the level of IP addresses and network masks, there
is no way to configure access by DNS name (though quite frankly, that
is a very dubious feature anyway, as doing networking — unsecured
networking even – in order to restrict networking sounds quite
questionable, at least to me).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can also replace (or augment) some facets of IP firewalling,
i.e. Linux NetFilter/&lt;code&gt;iptables&lt;/code&gt;. Right now, systemd's access lists are
of course a lot more minimal than NetFilter, but they have one major
benefit: they understand the service concept, and thus are a lot more
context-aware than NetFilter. Classic firewalls, such as NetFilter,
derive most service context from the IP port number alone, but we live
in a world where IP port numbers are a lot more dynamic than they used
to be. As one example, a BitTorrent client or server may use any IP
port it likes for its file transfer, and writing IP firewalling rules
matching that precisely is hence hard. With the systemd IP access list
implementing this is easy: just set the list for your BitTorrent
service unit, and all is good.&lt;/p&gt;
&lt;p&gt;Let me stress though that you should be careful when comparing
 NetFilter with systemd's IP address list logic, it's really like
 comparing apples and oranges: to start with, the IP address list
 logic has a clearly local focus, it only knows what a local
 service is and manages access of it. NetFilter on the other hand
 may run on border gateways, at a point where the traffic flowing
 through is pure IP, carrying no information about a systemd unit
 concept or anything like that.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It's a simple way to lock down distribution/vendor supplied system
services by default. For example, if you ship a service that you know
never needs to access the network, then simply set &lt;code&gt;IPAddressDeny=any&lt;/code&gt;
(possibly combined with &lt;code&gt;IPAddressAllow=localhost&lt;/code&gt;) for it, and it
will live in a very tight networking sand-box it cannot escape
from. systemd itself makes use of this for a number of its services by
default now. For example, the logging service
&lt;code&gt;systemd-journald.service&lt;/code&gt;, the login manager &lt;code&gt;systemd-logind&lt;/code&gt; or the
core-dump processing unit &lt;code&gt;systemd-coredump@.service&lt;/code&gt; all have such a
rule set out-of-the-box, because we know that neither of these
services should be able to access the network, under any
circumstances.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Because the IP access list logic can be combined with transient
units, it can be used to quickly and effectively sandbox arbitrary
commands, and even include them in shell pipelines and such. For
example, let's say we don't trust our
&lt;a href="https://linux.die.net/man/1/curl"&gt;&lt;code&gt;curl&lt;/code&gt;&lt;/a&gt; implementation (maybe it
got modified locally by a hacker, and phones home?), but want to use
it anyway to download the &lt;a href="http://0pointer.de/public/casync-kinvolk2017.pdf"&gt;the slides of my most recent casync
talk&lt;/a&gt; in order to
print it, but want to make sure it doesn't connect anywhere except
where we tell it to (and to make this even more fun, let's minimize
privileges further, by setting
&lt;a href="http://0pointer.net/blog/dynamic-users-with-systemd.html"&gt;&lt;code&gt;DynamicUser=yes&lt;/code&gt;&lt;/a&gt;):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# systemd-resolve 0pointer.de
0pointer.de: 85.214.157.71
             2a01:238:43ed:c300:10c3:bcf3:3266:da74
-- Information acquired via protocol DNS in 2.8ms.
-- Data is authenticated: no
# systemd-run --pipe -p IPAddressDeny=any \
                     -p IPAddressAllow=85.214.157.71 \
                     -p IPAddressAllow=2a01:238:43ed:c300:10c3:bcf3:3266:da74 \
                     -p DynamicUser=yes \
                     curl http://0pointer.de/public/casync-kinvolk2017.pdf | lp
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So much about use-cases. This is by no means a comprehensive list of
what you can do with it, after all both IP accounting and IP access
lists are very generic concepts. But I do hope the above inspires your
fantasy.&lt;/p&gt;
&lt;h1&gt;What does that mean for packagers?&lt;/h1&gt;
&lt;p&gt;IP accounting and IP access control are primarily concepts for the
local administrator. However, As suggested above, it's a very good
idea to ship services that by design have no network-facing
functionality with an access list of &lt;code&gt;IPAddressDeny=any&lt;/code&gt; (and possibly
&lt;code&gt;IPAddressAllow=localhost&lt;/code&gt;), in order to improve the out-of-the-box
security of our systems.&lt;/p&gt;
&lt;p&gt;An option for security-minded distributions might be a more radical
approach: ship the system with &lt;code&gt;-.slice&lt;/code&gt; or &lt;code&gt;system.slice&lt;/code&gt; configured
to &lt;code&gt;IPAddressDeny=any&lt;/code&gt; by default, and ask the administrator to punch
holes into that for each network facing service with &lt;code&gt;systemctl
set-property … IPAddressAllow=…&lt;/code&gt;. But of course, that's only an
option for distributions willing to break compatibility with what was
before.&lt;/p&gt;
&lt;h1&gt;Notes&lt;/h1&gt;
&lt;p&gt;A couple of additional notes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;IP accounting and access lists may be mixed with socket
activation. In this case, it's a good idea to configure access lists
and accounting for both the socket unit that activates and the service
unit that is activated, as both units maintain fully separate
settings. Note that IP accounting and access lists configured on the
socket unit applies to all sockets created on behalf of that unit, and
even if these sockets are passed on to the activated services, they
will still remain in effect and belong to the socket unit. This also
means that IP traffic done on such sockets will be accounted to the
socket unit, not the service unit. The fact that IP access lists are
maintained separately for the kernel sockets created on behalf of the
socket unit and for the kernel sockets created by the service code
itself enables some interesting uses. For example, it's possible to
set a relatively open access list on the socket unit, but a very
restrictive access list on the service unit, thus making the sockets
configured through the socket unit the only way in and out of the
service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;systemd's IP accounting and access lists apply to IP sockets only,
not to sockets of any other address families. That also means that
&lt;code&gt;AF_PACKET&lt;/code&gt; (i.e. raw) sockets are not covered. This means it's a good
idea to combine IP access lists with &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RestrictAddressFamilies="&gt;&lt;code&gt;RestrictAddressFamilies=AF_UNIX
AF_INET
AF_INET6&lt;/code&gt;&lt;/a&gt;
in order to lock this down.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You may wonder if the per-unit resource log message and
&lt;code&gt;systemd-run --wait&lt;/code&gt; may also show you details about other types or
resources consumed by a service. The answer is yes: if you turn on
&lt;code&gt;CPUAccounting=&lt;/code&gt; for a service, you'll also see a summary of consumed
CPU time in the log message and the command output. And we are
planning to hook-up &lt;code&gt;IOAccounting=&lt;/code&gt; the same way too, soon.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Note that IP accounting and access lists aren't entirely
free. systemd inserts an eBPF program into the IP pipeline to make
this functionality work. However, eBPF execution has been optimized
for speed in the last kernel versions already, and given that it
currently is in the focus of interest to many I'd expect to be
optimized even further, so that the cost for enabling these features
will be negligible, if it isn't already.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;IP accounting is currently not recursive. That means you cannot use
a slice unit to join the accounting of multiple units into one. This
is something we definitely want to add, but requires some more kernel
work first.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You might wonder how the
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateNetwork="&gt;&lt;code&gt;PrivateNetwork=&lt;/code&gt;&lt;/a&gt;
setting relates to &lt;code&gt;IPAccessDeny=any&lt;/code&gt;. Superficially they have similar
effects: they make the network unavailable to services. However,
looking more closely there are a number of
differences. &lt;code&gt;PrivateNetwork=&lt;/code&gt; is implemented using Linux network
name-spaces. As such it entirely detaches all networking of a service
from the host, including non-IP networking. It does so by creating a
private little environment the service lives in where communication
with itself is still allowed though. In addition using the
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.unit.html#JoinsNamespaceOf="&gt;&lt;code&gt;JoinsNamespaceOf=&lt;/code&gt;&lt;/a&gt;
dependency additional services may be added to the same environment,
thus permitting communication with each other but not with anything
outside of this group. &lt;code&gt;IPAddressAllow=&lt;/code&gt; and &lt;code&gt;IPAddressDeny=&lt;/code&gt; are much
less invasive. First of all they apply to IP networking only, and can
match against specific IP addresses. A service running with
&lt;code&gt;PrivateNetwork=&lt;/code&gt; turned off but &lt;code&gt;IPAddressDeny=any&lt;/code&gt; turned on, may
enumerate the network interfaces and their IP configured even though
it cannot actually do any IP communication. On the other hand if you
turn on &lt;code&gt;PrivateNetwork=&lt;/code&gt; all network interfaces besides &lt;code&gt;lo&lt;/code&gt;
disappear. Long story short: depending on your use-case one, the other,
both or neither might be suitable for sand-boxing of your service. If
possible I'd always turn on both, for best security, and that's what
we do for all of systemd's own long-running services.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And that's all for now. Have fun with per-unit IP accounting and
access lists!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 09 Oct 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-10-09:/blog/ip-accounting-and-access-lists-with-systemd.html</guid><category>projects</category></item><item><title>Dynamic Users with systemd</title><link>https://0pointer.net/blog/dynamic-users-with-systemd.html</link><description>&lt;p&gt;&lt;em&gt;TL;DR: you may now configure systemd to dynamically allocate a UNIX
user ID for service processes when it starts them and release it when
it stops them. It's pretty secure, mixes well with transient services,
socket activated services and service templating.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Today we released &lt;a href="https://lists.freedesktop.org/archives/systemd-devel/2017-October/039589.html"&gt;systemd
235&lt;/a&gt;. Among
other improvements this greatly extends the dynamic user logic of
systemd. Dynamic users are a powerful but little known concept,
supported in its basic form since systemd 232. With this blog story I
hope to make it a bit better known.&lt;/p&gt;
&lt;p&gt;The UNIX &lt;em&gt;user&lt;/em&gt; concept is the most basic and well-understood security
concept in POSIX operating systems. It is UNIX/POSIX' primary security
concept, the one everybody can agree on, and most security concepts
that came after it (such as process capabilities, SELinux and other
MACs, user name-spaces, …)  in some form or another build on it, extend
it or at least interface with it. If you build a Linux kernel with all
security features turned off, the user concept is pretty much the one
you'll still retain.&lt;/p&gt;
&lt;p&gt;Originally, the user concept was introduced to make multi-user systems
a reality, i.e. systems enabling multiple &lt;em&gt;human&lt;/em&gt; users to share the
same system at the same time, cleanly separating their resources and
protecting them from each other. The majority of today's UNIX systems
don't really use the user concept like that anymore though. Most of
today's systems probably have only one actual human user (or even
less!), but their user databases (&lt;code&gt;/etc/passwd&lt;/code&gt;) list a good number
more entries than that. Today, the majority of UNIX users in most
environments are &lt;em&gt;system users&lt;/em&gt;, i.e. users that are not the technical
representation of a human sitting in front of a PC anymore, but the
security identity a system service — an executable program — runs
as. Even though traditional, simultaneous multi-user systems slowly
became less relevant, their ground-breaking basic concept became the
cornerstone of UNIX security.  The OS is nowadays partitioned into
isolated services — and each service runs as its own system user, and
thus within its own, minimal security context.&lt;/p&gt;
&lt;p&gt;The people behind the Android OS realized the relevance of the UNIX
user concept as the primary security concept on UNIX, and took its use
even further: on Android not only system services take benefit of the
UNIX user concept, but each UI app gets its own, individual user
identity too — thus neatly separating app resources from each other,
and protecting app processes from each other, too.&lt;/p&gt;
&lt;p&gt;Back in the more traditional Linux world things are a bit less
advanced in this area. Even though users are the quintessential UNIX
security concept, allocation and management of system users is still a
pretty limited, raw and static affair. In most cases, RPM or DEB
package installation scripts allocate a fixed number of (usually one)
system users when you install the package of a service that wants to
take benefit of the user concept, and from that point on the system
user remains allocated on the system and is never deallocated again,
even if the package is later removed again. Most Linux distributions
limit the number of system users to 1000 (which isn't particularly a
lot). Allocating a system user is hence expensive: the number of
available users is limited, and there's no defined way to dispose of
them after use. If you make use of system users too liberally, you are
very likely to run out of them sooner rather than later.&lt;/p&gt;
&lt;p&gt;You may wonder why system users are generally not deallocated when the
package that registered them is uninstalled from a system (at least on
most distributions). The reason for that is one relevant property of
the user concept (you might even want to call this a &lt;em&gt;design flaw&lt;/em&gt;):
user IDs are &lt;em&gt;sticky&lt;/em&gt; to files (and other objects such as IPC
objects). If a service running as a specific system user creates a
file at some location, and is then terminated and its package and user
removed, then the created file still belongs to the numeric ID ("UID")
the system user originally got assigned. When the next system user is
allocated and — due to ID recycling — happens to get assigned the same
numeric ID, then it will also gain access to the file, and that's
generally considered a problem, given that the file belonged to a
potentially very different service once upon a time, and likely should
not be readable or changeable by anything coming after
it. Distributions hence tend to avoid UID recycling which means system
users remain registered forever on a system after they have been
allocated once.&lt;/p&gt;
&lt;p&gt;The above is a description of the status quo ante. Let's now focus on
what systemd's dynamic user concept brings to the table, to improve
the situation.&lt;/p&gt;
&lt;h1&gt;Introducing Dynamic Users&lt;/h1&gt;
&lt;p&gt;With systemd dynamic users we hope to make make it easier and cheaper
to allocate system users on-the-fly, thus substantially increasing the
possible uses of this core UNIX security concept.&lt;/p&gt;
&lt;p&gt;If you write a systemd service unit file, you may enable the dynamic
user logic for it by setting the
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#DynamicUser="&gt;&lt;code&gt;DynamicUser=&lt;/code&gt;&lt;/a&gt;
option in its &lt;code&gt;[Service]&lt;/code&gt; section to &lt;code&gt;yes&lt;/code&gt;. If you do a system user is
dynamically allocated the instant the service binary is invoked, and
released again when the service terminates. The user is automatically
allocated from the UID range 61184–65519, by looking for a so far
unused UID.&lt;/p&gt;
&lt;p&gt;Now you may wonder, how does this concept deal with the sticky user
issue discussed above? In order to counter the problem, two strategies
easily come to mind:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Prohibit the service from creating any files/directories or IPC objects&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Automatically removing the files/directories or IPC objects the
service created when it shuts down.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In systemd we implemented both strategies, but for different parts of
the execution environment. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;DynamicUser=yes&lt;/code&gt; implies
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#ProtectSystem="&gt;&lt;code&gt;ProtectSystem=strict&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#ProtectHome="&gt;&lt;code&gt;ProtectHome=read-only&lt;/code&gt;&lt;/a&gt;. These
sand-boxing options turn off write access to pretty much the whole OS
directory tree, with a few relevant exceptions, such as the API file
systems &lt;code&gt;/proc&lt;/code&gt;, &lt;code&gt;/sys&lt;/code&gt; and so on, as well as &lt;code&gt;/tmp&lt;/code&gt; and
&lt;code&gt;/var/tmp&lt;/code&gt;. (BTW: setting these two options on your regular services
that do not use &lt;code&gt;DynamicUser=&lt;/code&gt; is a good idea too, as it drastically
reduces the exposure of the system to exploited services.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;DynamicUser=yes&lt;/code&gt; implies
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateTmp="&gt;&lt;code&gt;PrivateTmp=yes&lt;/code&gt;&lt;/a&gt;. This
option sets up &lt;code&gt;/tmp&lt;/code&gt; and &lt;code&gt;/var/tmp&lt;/code&gt; for the service in a way that it
gets its own, disconnected version of these directories, that are not
shared by other services, and whose life-cycle is bound to the
service's own life-cycle. Thus if the service goes down, the user is
removed and all its temporary files and directories with it. (BTW: as
above, consider setting this option for your regular services that do
not use &lt;code&gt;DynamicUser=&lt;/code&gt; too, it's a great way to lock things down
security-wise.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;DynamicUser=yes&lt;/code&gt; implies
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RemoveIPC="&gt;&lt;code&gt;RemoveIPC=yes&lt;/code&gt;&lt;/a&gt;. This
option ensures that when the service goes down all SysV and POSIX IPC
objects (shared memory, message queues, semaphores) owned by the
service's user are removed. Thus, the life-cycle of the IPC objects is
bound to the life-cycle of the dynamic user and service, too. (BTW:
yes, here too, consider using this in your regular services, too!)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With these four settings in effect, services with dynamic users are
nicely sand-boxed. They cannot create files or directories, except in
&lt;code&gt;/tmp&lt;/code&gt; and &lt;code&gt;/var/tmp&lt;/code&gt;, where they will be removed automatically when
the service shuts down, as will any IPC objects created. Sticky
ownership of files/directories and IPC objects is hence dealt with
effectively.&lt;/p&gt;
&lt;p&gt;The
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectory="&gt;&lt;code&gt;RuntimeDirectory=&lt;/code&gt;&lt;/a&gt;
option may be used to open up a bit the sandbox to external
programs. If you set it to a directory name of your choice, it will be
created below &lt;code&gt;/run&lt;/code&gt; when the service is started, and removed in its
entirety when it is terminated. The ownership of the directory is
assigned to the service's dynamic user. This way, a dynamic user
service can expose API interfaces (AF_UNIX sockets, …) to other
services at a well-defined place and again bind the life-cycle of it to
the service's own run-time. Example: set &lt;code&gt;RuntimeDirectory=foobar&lt;/code&gt; in
your service, and watch how a directory &lt;code&gt;/run/foobar&lt;/code&gt; appears at the
moment you start the service, and disappears the moment you stop
it again. (BTW: Much like the other settings discussed above,
&lt;code&gt;RuntimeDirectory=&lt;/code&gt; may be used outside of the &lt;code&gt;DynamicUser=&lt;/code&gt; context
too, and is a nice way to run any service with a properly owned,
life-cycle-managed run-time directory.)&lt;/p&gt;
&lt;h1&gt;Persistent Data&lt;/h1&gt;
&lt;p&gt;Of course, a service running in such an environment (although already
very useful for many cases!), has a major limitation: it cannot leave
persistent data around it can reuse on a later run. As pretty much the
whole OS directory tree is read-only to it, there's simply no place it
could put the data that survives from one service invocation to the
next.&lt;/p&gt;
&lt;p&gt;With systemd 235 this limitation is removed: there are now three new
settings:
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RuntimeDirectory="&gt;&lt;code&gt;StateDirectory=&lt;/code&gt;&lt;/a&gt;,
&lt;code&gt;LogsDirectory=&lt;/code&gt; and &lt;code&gt;CacheDirectory=&lt;/code&gt;. In many ways they operate like
&lt;code&gt;RuntimeDirectory=&lt;/code&gt;, but create sub-directories below &lt;code&gt;/var/lib&lt;/code&gt;,
&lt;code&gt;/var/log&lt;/code&gt; and &lt;code&gt;/var/cache&lt;/code&gt;, respectively. There's one major
difference beyond that however: directories created that way are
&lt;em&gt;persistent&lt;/em&gt;, they will survive the run-time cycle of a service, and
thus may be used to store data that is supposed to stay around between
invocations of the service.&lt;/p&gt;
&lt;p&gt;Of course, the obvious question to ask now is: how do these three
settings deal with the &lt;em&gt;sticky file ownership problem&lt;/em&gt;?&lt;/p&gt;
&lt;p&gt;For that we lifted a concept from container managers. Container
managers have a very similar problem: each container and the host
typically end up using a very similar set of numeric UIDs, and unless
user name-spacing is deployed this means that host users might be able
to access the data of specific containers that also have a user by the
same numeric UID assigned, even though it actually refers to a very
different identity in a different context. (Actually, it's even worse
than just getting access, due to the existence of &lt;code&gt;setuid&lt;/code&gt; file bits,
access might translate to privilege elevation.) The way container
managers protect the container images from the host (and from each
other to some level) is by placing the container trees below a
&lt;em&gt;boundary&lt;/em&gt; directory, with very restrictive access modes and ownership
(0700 and &lt;code&gt;root:root&lt;/code&gt; or so). A host user hence cannot take advantage
of the files/directories of a container user of the same UID inside of
a local container tree, simply because the boundary directory makes it
impossible to even reference files in it. After all on UNIX, in order
to get access to a specific path you need access to every single
component of it.&lt;/p&gt;
&lt;p&gt;How is that applied to dynamic user services? Let's say
&lt;code&gt;StateDirectory=foobar&lt;/code&gt; is set for a service that has &lt;code&gt;DynamicUser=&lt;/code&gt;
turned off. The instant the service is started, &lt;code&gt;/var/lib/foobar&lt;/code&gt; is
created as state directory, owned by the service's user and remains in
existence when the service is stopped. If the same service now is run
with &lt;code&gt;DynamicUser=&lt;/code&gt; turned on, the implementation is slightly
altered. Instead of a directory &lt;code&gt;/var/lib/foobar&lt;/code&gt; a symbolic link by
the same path is created (owned by root), pointing to
&lt;code&gt;/var/lib/private/foobar&lt;/code&gt; (the latter being owned by the service's
dynamic user). The &lt;code&gt;/var/lib/private&lt;/code&gt; directory is created as boundary
directory: it's owned by &lt;code&gt;root:root&lt;/code&gt;, and has a restrictive access
mode of 0700. Both the symlink and the service's state directory will
survive the service's life-cycle, but the state directory will remain,
and continues to be owned by the now disposed dynamic UID — however it
is protected from other host users (and other services which might get
the same dynamic UID assigned due to UID recycling) by the boundary
directory.&lt;/p&gt;
&lt;p&gt;The obvious question to ask now is: but if the boundary directory
prohibits access to the directory from unprivileged processes, how can
the service itself which runs under its own dynamic UID access it
anyway? This is achieved by invoking the service process in a slightly
modified mount name-space: it will see most of the file hierarchy the
same way as everything else on the system (modulo &lt;code&gt;/tmp&lt;/code&gt; and
&lt;code&gt;/var/tmp&lt;/code&gt; as mentioned above), except for &lt;code&gt;/var/lib/private&lt;/code&gt;, which
is over-mounted with a read-only &lt;code&gt;tmpfs&lt;/code&gt; file system instance, with a
slightly more liberal access mode permitting the service read
access. Inside of this &lt;code&gt;tmpfs&lt;/code&gt; file system instance another mount is
placed: a bind mount to the host's real &lt;code&gt;/var/lib/private/foobar&lt;/code&gt;
directory, onto the same name. Putting this together these means that
superficially everything looks the same and is available at the same
place on the host and from inside the service, but two important
changes have been made: the &lt;code&gt;/var/lib/private&lt;/code&gt; boundary directory lost
its restrictive character inside the service, and has been emptied of
the state directories of any other service, thus making the protection
complete. Note that the symlink &lt;code&gt;/var/lib/foobar&lt;/code&gt; hides the fact that
the boundary directory is used (making it little more than an
implementation detail), as the directory is available this way under
the same name as it would be if &lt;code&gt;DynamicUser=&lt;/code&gt; was not used. Long
story short: for the daemon and from the view from the host the
indirection through &lt;code&gt;/var/lib/private&lt;/code&gt; is mostly transparent.&lt;/p&gt;
&lt;p&gt;This logic of course raises another question: what happens to the
state directory if a dynamic user service is started with a state
directory configured, gets UID X assigned on this first invocation,
then terminates and is restarted and now gets UID Y assigned on the
second invocation, with X ≠ Y? On the second invocation the directory
— and all the files and directories below it — will still be owned by
the original UID X so how could the second instance running as Y
access it? Our way out is simple: systemd will recursively change the
ownership of the directory and everything contained within it to UID Y
before invoking the service's executable.&lt;/p&gt;
&lt;p&gt;Of course, such recursive ownership changing (&lt;code&gt;chown()&lt;/code&gt;ing) of whole
directory trees can become expensive (though according to my
experiences, IRL and for most services it's much cheaper than you
might think), hence in order to optimize behavior in this regard, the
allocation of dynamic UIDs has been tweaked in two ways to avoid the
necessity to do this expensive operation in most cases: firstly, when
a dynamic UID is allocated for a service an allocation loop is
employed that starts out with a UID hashed from the service's
name. This means a service by the same name is likely to always use
the same numeric UID. That means that a stable service name translates
into a stable dynamic UID, and that means recursive file ownership
adjustments can be skipped (of course, after validation). Secondly, if
the configured state directory already exists, and is owned by a
suitable currently unused dynamic UID, it's preferably used above
everything else, thus maximizing the chance we can avoid the
&lt;code&gt;chown()&lt;/code&gt;ing. (That all said, ultimately we have to face it, the
currently available UID space of 4K+ is very small still, and
conflicts are pretty likely sooner or later, thus a chown()ing has to
be expected every now and then when this feature is used extensively).&lt;/p&gt;
&lt;p&gt;Note that &lt;code&gt;CacheDirectory=&lt;/code&gt; and &lt;code&gt;LogsDirectory=&lt;/code&gt; work very similar to
&lt;code&gt;StateDirectory=&lt;/code&gt;. The only difference is that they manage directories
below the &lt;code&gt;/var/cache&lt;/code&gt; and &lt;code&gt;/var/logs&lt;/code&gt; directories, and their boundary
directory hence is &lt;code&gt;/var/cache/private&lt;/code&gt; and &lt;code&gt;/var/log/private&lt;/code&gt;,
respectively.&lt;/p&gt;
&lt;h1&gt;Examples&lt;/h1&gt;
&lt;p&gt;So, after all this introduction, let's have a look how this all can be
put together. Here's a trivial example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# cat &amp;gt; /etc/systemd/system/dynamic-user-test.service &amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;Service&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="nv"&gt;ExecStart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/bin/sleep&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;4711&lt;/span&gt;
&lt;span class="nv"&gt;DynamicUser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;yes
EOF
&lt;span class="c1"&gt;# systemctl daemon-reload&lt;/span&gt;
&lt;span class="c1"&gt;# systemctl start dynamic-user-test&lt;/span&gt;
&lt;span class="c1"&gt;# systemctl status dynamic-user-test&lt;/span&gt;
●&lt;span class="w"&gt; &lt;/span&gt;dynamic-user-test.service
&lt;span class="w"&gt;   &lt;/span&gt;Loaded:&lt;span class="w"&gt; &lt;/span&gt;loaded&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;/etc/systemd/system/dynamic-user-test.service&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;static&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;vendor&lt;span class="w"&gt; &lt;/span&gt;preset:&lt;span class="w"&gt; &lt;/span&gt;disabled&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;Active:&lt;span class="w"&gt; &lt;/span&gt;active&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;running&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;since&lt;span class="w"&gt; &lt;/span&gt;Fri&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2017&lt;/span&gt;-10-06&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:12:25&lt;span class="w"&gt; &lt;/span&gt;CEST&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;3s&lt;span class="w"&gt; &lt;/span&gt;ago
&lt;span class="w"&gt; &lt;/span&gt;Main&lt;span class="w"&gt; &lt;/span&gt;PID:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2967&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;sleep&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;Tasks:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;limit:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;4915&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;CGroup:&lt;span class="w"&gt; &lt;/span&gt;/system.slice/dynamic-user-test.service
&lt;span class="w"&gt;           &lt;/span&gt;└─2967&lt;span class="w"&gt; &lt;/span&gt;/usr/bin/sleep&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;4711&lt;/span&gt;

Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;06&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:12:25&lt;span class="w"&gt; &lt;/span&gt;sigma&lt;span class="w"&gt; &lt;/span&gt;systemd&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;Started&lt;span class="w"&gt; &lt;/span&gt;dynamic-user-test.service.
&lt;span class="c1"&gt;# ps -e -o pid,comm,user | grep 2967&lt;/span&gt;
&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2967&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;sleep&lt;span class="w"&gt;           &lt;/span&gt;dynamic-user-test
&lt;span class="c1"&gt;# id dynamic-user-test&lt;/span&gt;
&lt;span class="nv"&gt;uid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;64642&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;dynamic-user-test&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;gid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;64642&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;dynamic-user-test&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;groups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;64642&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;dynamic-user-test&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# systemctl stop dynamic-user-test&lt;/span&gt;
&lt;span class="c1"&gt;# id dynamic-user-test&lt;/span&gt;
id:&lt;span class="w"&gt; &lt;/span&gt;‘dynamic-user-test’:&lt;span class="w"&gt; &lt;/span&gt;no&lt;span class="w"&gt; &lt;/span&gt;such&lt;span class="w"&gt; &lt;/span&gt;user
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In this example, we create a unit file with &lt;code&gt;DynamicUser=&lt;/code&gt; turned on,
start it, check if it's running correctly, have a look at the service
process' user (which is named like the service; systemd does this
automatically if the service name is suitable as user name, and you
didn't configure any user name to use explicitly), stop the service
and verify that the user ceased to exist too.&lt;/p&gt;
&lt;p&gt;That's already pretty cool. Let's step it up a notch, by doing the
same in an interactive &lt;em&gt;transient&lt;/em&gt; service (for those who don't know
systemd well: a transient service is a service that is defined and
started dynamically at run-time, for example via the
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd-run.html"&gt;&lt;code&gt;systemd-run&lt;/code&gt;&lt;/a&gt;
command from the shell. Think: run a service without having to write a
unit file first):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh&lt;/span&gt;
Running&lt;span class="w"&gt; &lt;/span&gt;as&lt;span class="w"&gt; &lt;/span&gt;unit:&lt;span class="w"&gt; &lt;/span&gt;run-u15750.service
Press&lt;span class="w"&gt; &lt;/span&gt;^&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;three&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;times&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;within&lt;span class="w"&gt; &lt;/span&gt;1s&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;disconnect&lt;span class="w"&gt; &lt;/span&gt;TTY.
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;id
&lt;span class="nv"&gt;uid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;run-u15750&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;gid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;run-u15750&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;groups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;run-u15750&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;system_u:system_r:initrc_t:s0
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;ls&lt;span class="w"&gt; &lt;/span&gt;-al&lt;span class="w"&gt; &lt;/span&gt;/var/lib/private/
total&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt;       &lt;/span&gt;root&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="m"&gt;60&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:21&lt;span class="w"&gt; &lt;/span&gt;.
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt;       &lt;/span&gt;root&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="m"&gt;852&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:21&lt;span class="w"&gt; &lt;/span&gt;..
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;run-u15750&lt;span class="w"&gt; &lt;/span&gt;run-u15750&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:22&lt;span class="w"&gt; &lt;/span&gt;wuff
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;ls&lt;span class="w"&gt; &lt;/span&gt;-ld&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff
lrwxrwxrwx.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:21&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;private/wuff
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;ls&lt;span class="w"&gt; &lt;/span&gt;-ld&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff/
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;run-u15750&lt;span class="w"&gt; &lt;/span&gt;run-u15750&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:21&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff/
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;hello&lt;span class="w"&gt; &lt;/span&gt;&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff/test
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;span class="c1"&gt;# id run-u15750&lt;/span&gt;
id:&lt;span class="w"&gt; &lt;/span&gt;‘run-u15750’:&lt;span class="w"&gt; &lt;/span&gt;no&lt;span class="w"&gt; &lt;/span&gt;such&lt;span class="w"&gt; &lt;/span&gt;user
&lt;span class="c1"&gt;# ls -al /var/lib/private&lt;/span&gt;
total&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
drwx------.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt;  &lt;/span&gt;root&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="m"&gt;66&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:21&lt;span class="w"&gt; &lt;/span&gt;.
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt;  &lt;/span&gt;root&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;852&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:21&lt;span class="w"&gt; &lt;/span&gt;..
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:22&lt;span class="w"&gt; &lt;/span&gt;wuff
&lt;span class="c1"&gt;# ls -ld /var/lib/wuff&lt;/span&gt;
lrwxrwxrwx.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;12&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:21&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff&lt;span class="w"&gt; &lt;/span&gt;-&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;private/wuff
&lt;span class="c1"&gt;# ls -ld /var/lib/wuff/&lt;/span&gt;
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:22&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff/
&lt;span class="c1"&gt;# cat /var/lib/wuff/test&lt;/span&gt;
hello
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above invokes an interactive shell as transient service
&lt;code&gt;run-u15750.service&lt;/code&gt; (&lt;code&gt;systemd-run&lt;/code&gt; picked that name automatically,
since we didn't specify anything explicitly) with a dynamic user whose
name is derived automatically from the service name. Because
&lt;code&gt;StateDirectory=wuff&lt;/code&gt; is used, a persistent state directory for the
service is made available as &lt;code&gt;/var/lib/wuff&lt;/code&gt;. In the interactive shell
running inside the service, the &lt;code&gt;ls&lt;/code&gt; commands show the
&lt;code&gt;/var/lib/private&lt;/code&gt; boundary directory and its contents, as well as the
symlink that is placed for the service. Finally, before exiting the
shell, a file is created in the state directory. Back in the original
command shell we check if the user is still allocated: it is not, of
course, since the service ceased to exist when we exited the shell and
with it the dynamic user associated with it. From the host we check
the state directory of the service, with similar commands as we did
from inside of it. We see that things are set up pretty much the same
way in both cases, except for two things: first of all the user/group
of the files is now shown as raw numeric UIDs instead of the
user/group names derived from the unit name. That's because the user
ceased to exist at this point, and "ls" shows the raw UID for files
owned by users that don't exist. Secondly, the access mode of the
boundary directory is different: when we look at it from outside of
the service it is not readable by anyone but root, when we looked from
inside we saw it it being world readable.&lt;/p&gt;
&lt;p&gt;Now, let's see how things look if we start another transient service,
reusing the state directory from the first invocation:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh&lt;/span&gt;
Running&lt;span class="w"&gt; &lt;/span&gt;as&lt;span class="w"&gt; &lt;/span&gt;unit:&lt;span class="w"&gt; &lt;/span&gt;run-u16087.service
Press&lt;span class="w"&gt; &lt;/span&gt;^&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;three&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;times&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;within&lt;span class="w"&gt; &lt;/span&gt;1s&lt;span class="w"&gt; &lt;/span&gt;to&lt;span class="w"&gt; &lt;/span&gt;disconnect&lt;span class="w"&gt; &lt;/span&gt;TTY.
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;cat&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff/test
hello
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;ls&lt;span class="w"&gt; &lt;/span&gt;-al&lt;span class="w"&gt; &lt;/span&gt;/var/lib/wuff/
total&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt;
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;run-u16087&lt;span class="w"&gt; &lt;/span&gt;run-u16087&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;8&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:22&lt;span class="w"&gt; &lt;/span&gt;.
drwxr-xr-x.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;root&lt;span class="w"&gt;       &lt;/span&gt;root&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="m"&gt;60&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;15&lt;/span&gt;:42&lt;span class="w"&gt; &lt;/span&gt;..
-rw-r--r--.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;run-u16087&lt;span class="w"&gt; &lt;/span&gt;run-u16087&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="m"&gt;6&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;Okt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;13&lt;/span&gt;:22&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;test&lt;/span&gt;
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;id
&lt;span class="nv"&gt;uid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;run-u16087&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;gid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;run-u16087&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;groups&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;63122&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;run-u16087&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;system_u:system_r:initrc_t:s0
sh-4.4$&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;span class="nb"&gt;exit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here, &lt;code&gt;systemd-run&lt;/code&gt; picked a different auto-generated unit name, but
the used dynamic UID is still the same, as it was read from the
pre-existing state directory, and was otherwise unused. As we can see
the test file we generated earlier is accessible and still contains
the data we left in there. Do note that the user name is different
this time (as it is derived from the unit name, which is different),
but the UID it is assigned to is the same one as on the first
invocation. We can thus see that the mentioned optimization of the UID
allocation logic (i.e. that we start the allocation loop from the UID
owner of any existing state directory) took effect, so that no
recursive &lt;code&gt;chown()&lt;/code&gt;ing was required.&lt;/p&gt;
&lt;p&gt;And that's the end of our example, which hopefully illustrated a bit
how this concept and implementation works.&lt;/p&gt;
&lt;h1&gt;Use-cases&lt;/h1&gt;
&lt;p&gt;Now that we had a look at how to enable this logic for a unit and how
it is implemented, let's discuss where this actually could be useful
in real life.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;One major benefit of dynamic user IDs is that running a
privilege-separated service leaves no artifacts in the system. A
system user is allocated and made use of, but it is discarded
automatically in a safe and secure way after use, in a fashion that is
safe for later recycling. Thus, quickly invoking a short-lived service
for processing some job can be protected properly through a user ID
without having to pre-allocate it and without this draining the
available UID pool any longer than necessary.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In many cases, starting a service no longer requires
package-specific preparation. Or in other words, quite often
&lt;code&gt;useradd&lt;/code&gt;/&lt;code&gt;mkdir&lt;/code&gt;/&lt;code&gt;chown&lt;/code&gt;/&lt;code&gt;chmod&lt;/code&gt; invocations in "&lt;code&gt;post-inst&lt;/code&gt;" package
scripts, as well as
&lt;a href="https://www.freedesktop.org/software/systemd/man/sysusers.d.html"&gt;&lt;code&gt;sysusers.d&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://www.freedesktop.org/software/systemd/man/tmpfiles.d.html"&gt;&lt;code&gt;tmpfiles.d&lt;/code&gt;&lt;/a&gt;
drop-ins become unnecessary, as the &lt;code&gt;DynamicUser=&lt;/code&gt; and
&lt;code&gt;StateDirectory=&lt;/code&gt;/&lt;code&gt;CacheDirectory=&lt;/code&gt;/&lt;code&gt;LogsDirectory=&lt;/code&gt; logic can do the
necessary work automatically, on-demand and with a well-defined
life-cycle.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;By combining dynamic user IDs with the transient unit concept, new
creative ways of sand-boxing are made available. For example, let's say
you don't trust the correct implementation of the &lt;code&gt;sort&lt;/code&gt; command. You
can now lock it into a simple, robust, dynamic UID sandbox with a
simple &lt;code&gt;systemd-run&lt;/code&gt; and still integrate it into a shell pipeline like
any other command. Here's an example, showcasing a shell pipeline
whose middle element runs as a dynamically on-the-fly allocated UID,
that is released when the pipelines ends.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# cat some-file.txt | systemd-run ---pipe --property=DynamicUser=1 sort -u | grep -i foobar &amp;gt; some-other-file.txt
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;By combining dynamic user IDs with the systemd templating logic it
is now possible to do much more fine-grained and fully automatic UID
management. For example, let's say you have a template unit file
&lt;code&gt;/etc/systemd/system/foobard@.service&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Service]&lt;/span&gt;
&lt;span class="na"&gt;ExecStart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/usr/bin/myfoobarserviced&lt;/span&gt;
&lt;span class="na"&gt;DynamicUser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;StateDirectory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;foobar/%i&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, let's say you want to start one instance of this service for
 each of your customers. All you need to do now for that is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;systemctl&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;enable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;foobard&lt;/span&gt;&lt;span class="nv"&gt;@customerxyz&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c1"&gt;--now&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And you are done. (Invoke this as many times as you like, each time
 replacing &lt;code&gt;customerxyz&lt;/code&gt; by some customer identifier, you get the
 idea.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;By combining dynamic user IDs with socket activation you may easily
implement a system where each incoming connection is served by a
process instance running as a different, fresh, newly allocated UID
within its own sandbox. Here's an example &lt;code&gt;waldo.socket&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Socket]&lt;/span&gt;
&lt;span class="na"&gt;ListenStream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2048&lt;/span&gt;
&lt;span class="na"&gt;Accept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With a matching &lt;code&gt;waldo@.service&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Service]&lt;/span&gt;
&lt;span class="na"&gt;ExecStart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;-/usr/bin/myservicebinary&lt;/span&gt;
&lt;span class="na"&gt;DynamicUser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With the two unit files above, systemd will listen on TCP/IP port
2048, and for each incoming connection invoke a fresh instance of
&lt;code&gt;waldo@.service&lt;/code&gt;, each time utilizing a different, new,
dynamically allocated UID, neatly isolated from any other
instance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Dynamic user IDs combine very well with state-less systems,
i.e. systems that come up with an unpopulated &lt;code&gt;/etc&lt;/code&gt; and &lt;code&gt;/var&lt;/code&gt;. A
service using dynamic user IDs and the &lt;code&gt;StateDirectory=&lt;/code&gt;,
&lt;code&gt;CacheDirectory=&lt;/code&gt;, &lt;code&gt;LogsDirectory=&lt;/code&gt; and &lt;code&gt;RuntimeDirectory=&lt;/code&gt; concepts
will implicitly allocate the users and directories it needs for
running, right at the moment where it needs it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Dynamic users are a very generic concept, hence a multitude of other
uses are thinkable; the list above is just supposed to trigger your
imagination.&lt;/p&gt;
&lt;h1&gt;What does this mean for you as a packager?&lt;/h1&gt;
&lt;p&gt;I am pretty sure that a large number of services shipped with today's
distributions could benefit from using &lt;code&gt;DynamicUser=&lt;/code&gt; and
&lt;code&gt;StateDirectory=&lt;/code&gt; (and related settings). It often allows removal of
&lt;code&gt;post-inst&lt;/code&gt; packaging scripts altogether, as well as any &lt;code&gt;sysusers.d&lt;/code&gt;
and &lt;code&gt;tmpfiles.d&lt;/code&gt; drop-ins by unifying the needed declarations in the
unit file itself. Hence, as a packager please consider switching your
unit files over. That said, there are a number of conditions where
&lt;code&gt;DynamicUser=&lt;/code&gt; and &lt;code&gt;StateDirectory=&lt;/code&gt; (and friends) cannot or should
not be used. To name a few:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Service that need to write to files outside of &lt;code&gt;/run/&amp;lt;package&amp;gt;&lt;/code&gt;,
&lt;code&gt;/var/lib/&amp;lt;package&amp;gt;&lt;/code&gt;, &lt;code&gt;/var/cache/&amp;lt;package&amp;gt;&lt;/code&gt;, &lt;code&gt;/var/log/&amp;lt;package&amp;gt;&lt;/code&gt;,
&lt;code&gt;/var/tmp&lt;/code&gt;, &lt;code&gt;/tmp&lt;/code&gt;, &lt;code&gt;/dev/shm&lt;/code&gt; are generally incompatible with this
scheme. This rules out daemons that upgrade the system as one example,
as that involves writing to &lt;code&gt;/usr&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Services that maintain a herd of processes with different user
IDs. Some SMTP services are like this. If your service has such a
&lt;em&gt;super-server&lt;/em&gt; design, UID management needs to be done by the
super-server itself, which rules out systemd doing its dynamic UID
magic for it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Services which run as root (obviously…) or are otherwise
privileged.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Services that need to live in the same mount name-space as the host
system (for example, because they want to establish mount points
visible system-wide). As mentioned &lt;code&gt;DynamicUser=&lt;/code&gt; implies
&lt;code&gt;ProtectSystem=&lt;/code&gt;, &lt;code&gt;PrivateTmp=&lt;/code&gt; and related options, which all require
the service to run in its own mount name-space.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Your focus is older distributions, i.e. distributions that do not
have systemd 232 (for &lt;code&gt;DynamicUser=&lt;/code&gt;) or systemd 235 (for
&lt;code&gt;StateDirectory=&lt;/code&gt; and friends) yet.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If your distribution's packaging guides don't allow it. Consult
your packaging guides, and possibly start a discussion on your
distribution's mailing list about this.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Notes&lt;/h1&gt;
&lt;p&gt;A couple of additional, random notes about the implementation and use
of these features:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Do note that allocating or deallocating a dynamic user leaves
&lt;code&gt;/etc/passwd&lt;/code&gt; untouched. A dynamic user is added into the user
database through the glibc NSS module
&lt;a href="https://www.freedesktop.org/software/systemd/man/nss-systemd.html"&gt;&lt;code&gt;nss-systemd&lt;/code&gt;&lt;/a&gt;,
and this information never hits the disk.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On traditional UNIX systems it was the job of the daemon process
itself to drop privileges, while the &lt;code&gt;DynamicUser=&lt;/code&gt; concept is
designed around the service manager (i.e. systemd) being responsible
for that. That said, since v235 there's a way to marry &lt;code&gt;DynamicUser=&lt;/code&gt;
and such services which want to drop privileges on their own. For
that, turn on &lt;code&gt;DynamicUser=&lt;/code&gt; and set
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#User="&gt;&lt;code&gt;User=&lt;/code&gt;&lt;/a&gt;
to the user name the service wants to &lt;code&gt;setuid()&lt;/code&gt; to. This has the
effect that systemd will allocate the dynamic user under the specified
name when the service is started. Then, prefix the command line you
specify in
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStart="&gt;&lt;code&gt;ExecStart=&lt;/code&gt;&lt;/a&gt;
with a single &lt;code&gt;!&lt;/code&gt; character. If you do, the user is allocated for the
service, but the daemon binary is invoked as &lt;code&gt;root&lt;/code&gt; instead of the
allocated user, under the assumption that the daemon changes its UID
on its own the right way. Note that after registration the user will
show up instantly in the user database, and is hence resolvable like
any other by the daemon process. Example:
&lt;code&gt;ExecStart=!/usr/bin/mydaemond&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You may wonder why systemd uses the UID range 61184–65519 for its
dynamic user allocations (side note: in hexadecimal this reads as
0xEF00–0xFFEF). That's because distributions (specifically Fedora)
tend to allocate regular users from below the 60000 range, and we
don't want to step into that. We also want to stay away from 65535 and
a bit around it, as some of these UIDs have special meanings (65535 is
often used as special value for "invalid" or "no" UID, as it is
identical to the 16bit value -1; 65534 is generally mapped to the
"nobody" user, and is where some kernel subsystems map unmappable
UIDs). Finally, we want to stay within the 16bit range. In a user
name-spacing world each container tends to have much less than the full
32bit UID range available that Linux kernels theoretically
provide. Everybody apparently can agree that a container should at
least cover the 16bit range though — already to include a &lt;code&gt;nobody&lt;/code&gt;
user. (And quite frankly, I am pretty sure assigning 64K UIDs per
container is nicely systematic, as the the higher 16bit of the 32bit
UID values this way become a container ID, while the lower 16bit
become the logical UID within each container, if you still follow what
I am babbling here…). And before you ask: no this range cannot be
changed right now, it's compiled in. We might change that eventually
however.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You might wonder what happens if you already used UIDs from the
61184–65519 range on your system for other purposes. systemd should
handle that mostly fine, as long as that usage is properly registered
in the user database: when allocating a dynamic user we pick a UID,
see if it is currently used somehow, and if yes pick a different one,
until we find a free one. Whether a UID is used right now or not is
checked through NSS calls. Moreover the IPC object lists are checked to
see if there are any objects owned by the UID we are about to
pick. This means systemd will avoid using UIDs you have assigned
otherwise. Note however that this of course makes the pool of
available UIDs smaller, and in the worst cases this means that
allocating a dynamic user might fail because there simply are no
unused UIDs in the range.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If not specified otherwise the name for a dynamically allocated
user is derived from the service name. Not everything that's valid in
a service name is valid in a user-name however, and in some cases a
randomized name is used instead to deal with this. Often it makes
sense to pick the user names to register explicitly. For that use
&lt;code&gt;User=&lt;/code&gt; and choose whatever you like.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you pick a user name with &lt;code&gt;User=&lt;/code&gt; and combine it with
&lt;code&gt;DynamicUser=&lt;/code&gt; and the user already exists statically it will be used
for the service and the dynamic user logic is automatically
disabled. This permits automatic up- and downgrades between static and
dynamic UIDs. For example, it provides a nice way to move a system
from static to dynamic UIDs in a compatible way: as long as you select
the same &lt;code&gt;User=&lt;/code&gt; value before and after switching &lt;code&gt;DynamicUser=&lt;/code&gt; on,
the service will continue to use the statically allocated user if it
exists, and only operates in the dynamic mode if it does not. This is
useful for other cases as well, for example to adapt a service that
normally would use a dynamic user to concepts that require statically
assigned UIDs, for example to marry classic UID-based file system
quota with such services.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;systemd always allocates a pair of dynamic UID and GID at the same
time, with the same numeric ID.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the Linux kernel had a "shiftfs" or similar functionality,
i.e. a way to mount an existing directory to a second place, but map
the exposed UIDs/GIDs in some way configurable at mount time, this
would be excellent for the implementation of &lt;code&gt;StateDirectory=&lt;/code&gt; in
conjunction with &lt;code&gt;DynamicUser=&lt;/code&gt;.  It would make the recursive
&lt;code&gt;chown()&lt;/code&gt;ing step unnecessary, as the host version of the state
directory could simply be mounted into a the service's mount
name-space, with a shift applied that maps the directory's owner to the
services' UID/GID. But I don't have high hopes in this regard, as all
work being done in this area appears to be bound to user name-spacing
— which is a concept not used here (and I guess one could say user
name-spacing is probably more a source of problems than a solution to
one, but you are welcome to disagree on that).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And that's all for now. Enjoy your dynamic users!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 06 Oct 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-10-06:/blog/dynamic-users-with-systemd.html</guid><category>projects</category></item><item><title>All Systems Go! 2017 Schedule Published</title><link>https://0pointer.net/blog/all-systems-go-2017-schedule-published.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2017 schedule has been published!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;I am happy to announce that we have published the &lt;a
href="https://all-systems-go.io/"&gt;All Systems Go! 2017&lt;/a&gt; schedule!
We are very happy with the large number and the quality of the
submissions we got, and the resulting schedule is exceptionally
strong.&lt;/p&gt;
&lt;p&gt;Without further ado:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cfp.all-systems-go.io/en/ASG2017/public/schedule/2"&gt;Here's the schedule for the first day (Saturday, 21st of October).&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cfp.all-systems-go.io/en/ASG2017/public/schedule/3"&gt;And here's the schedule for the second day (Sunday, 22nd of October).&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Here are a couple of keywords from the topics of the talks:
&lt;strong&gt;1password&lt;/strong&gt;, &lt;strong&gt;azure&lt;/strong&gt;, &lt;strong&gt;bluetooth&lt;/strong&gt;, &lt;strong&gt;build systems&lt;/strong&gt;,
&lt;strong&gt;casync&lt;/strong&gt;, &lt;strong&gt;cgroups&lt;/strong&gt;, &lt;strong&gt;cilium&lt;/strong&gt;, &lt;strong&gt;cockpit&lt;/strong&gt;, &lt;strong&gt;containers&lt;/strong&gt;,
&lt;strong&gt;ebpf&lt;/strong&gt;, &lt;strong&gt;flatpak&lt;/strong&gt;, &lt;strong&gt;habitat&lt;/strong&gt;, &lt;strong&gt;IoT&lt;/strong&gt;, &lt;strong&gt;kubernetes&lt;/strong&gt;,
&lt;strong&gt;landlock&lt;/strong&gt;, &lt;strong&gt;meson&lt;/strong&gt;, &lt;strong&gt;OCI&lt;/strong&gt;, &lt;strong&gt;rkt&lt;/strong&gt;, &lt;strong&gt;rust&lt;/strong&gt;, &lt;strong&gt;secureboot&lt;/strong&gt;,
&lt;strong&gt;skydive&lt;/strong&gt;, &lt;strong&gt;systemd&lt;/strong&gt;, &lt;strong&gt;testing&lt;/strong&gt;, &lt;strong&gt;tor&lt;/strong&gt;, &lt;strong&gt;varlink&lt;/strong&gt;,
&lt;strong&gt;virtualization&lt;/strong&gt;, &lt;strong&gt;wifi&lt;/strong&gt;, and more.&lt;/p&gt;
&lt;p&gt;Our speakers are from all across the industry: Chef CoreOS, Covalent,
Facebook, Google, Intel, Kinvolk, Microsoft, Mozilla, Pantheon,
Pengutronix, Red Hat, SUSE and more.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://all-systems-go.io/"&gt;&lt;img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For further information about All Systems Go! visit our &lt;a href="http://all-systems-go.io/"&gt;conference web site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Make sure to buy your ticket for All Systems Go! 2017 now! A limited
number of tickets are left at this point, so make sure you get yours
before we are all sold out!  &lt;a
href="https://all-systems-go.io/#tickets"&gt;Find all details here.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;See you in Berlin!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 27 Sep 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-09-27:/blog/all-systems-go-2017-schedule-published.html</guid><category>projects</category></item><item><title>All Systems Go! 2017 CfP Closes Soon!</title><link>https://0pointer.net/blog/all-systems-go-2017-cfp-closes-soon.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2017 Call for Participation is Closing on September 3rd!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;Please make sure to get your presentation proprosals for&lt;i&gt;All Systems Go! 2017&lt;/i&gt; in now! The CfP closes on sunday!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://all-systems-go.io/"&gt;&lt;img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In case you haven't heard about &lt;i&gt;All Systems Go!&lt;/i&gt; yet, here's a quick reminder what kind of conference it is, and why you should attend and speak there:&lt;/p&gt;
&lt;p&gt;&lt;i&gt;All Systems Go!&lt;/i&gt; is an Open Source community conference focused
on the projects and technologies at the foundation of modern Linux
systems — specifically low-level user-space technologies. Its goal is
to provide a friendly and collaborative gathering place for
individuals and communities working to push these technologies
forward. &lt;i&gt;All Systems Go! 2017&lt;/i&gt; takes place in &lt;b&gt;Berlin,
Germany&lt;/b&gt; on &lt;b&gt;October 21st+22nd&lt;/b&gt;. &lt;i&gt;All Systems Go!&lt;/i&gt; is a
2-day event with 2-3 talks happening in parallel. Full presentation
slots are 30-45 minutes in length and lightning talk slots are 5-10
minutes.&lt;/p&gt;
&lt;p&gt;In particular, we are looking for sessions including, but not limited to, the following topics:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Low-level container executors and infrastructure&lt;/li&gt;
&lt;li&gt;IoT and embedded OS infrastructure&lt;/li&gt;
&lt;li&gt;OS, container, IoT image delivery and updating&lt;/li&gt;
&lt;li&gt;Building Linux devices and applications&lt;/li&gt;
&lt;li&gt;Low-level desktop technologies&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;System and service management&lt;/li&gt;
&lt;li&gt;Tracing and performance measuring&lt;/li&gt;
&lt;li&gt;IPC and RPC systems&lt;/li&gt;
&lt;li&gt;Security and Sandboxing&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;While our focus is definitely more on the user-space side of things,
talks about kernel projects are welcome too, as long as they have a
clear and direct relevance for user-space.&lt;/p&gt;
&lt;p&gt;To submit your proposal now please visit our &lt;a href="https://cfp.all-systems-go.io/en/ASG2017/events/new"&gt;CFP submission web site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further information about All Systems Go! visit our &lt;a href="http://all-systems-go.io/"&gt;conference web site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;systemd.conf&lt;/i&gt; will not take place this year in lieu of &lt;i&gt;All
Systems Go!&lt;/i&gt;. &lt;i&gt;All Systems Go!&lt;/i&gt; welcomes all projects that
contribute to Linux user space, which, of course, includes
systemd. Thus, anything you think was appropriate for submission to
&lt;i&gt;systemd.conf&lt;/i&gt; is also fitting for &lt;i&gt;All Systems Go&lt;/i&gt;!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 30 Aug 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-08-30:/blog/all-systems-go-2017-cfp-closes-soon.html</guid><category>projects</category></item><item><title>All Systems Go! 2017 Speakers</title><link>https://0pointer.net/blog/all-systems-go-2017-speakers.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2017 Headline Speakers Announced!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;Don't forget to send in your submissions to the All Systems Go! 2017 CfP! Proposals are accepted until &lt;b&gt;September 3rd&lt;/b&gt;!&lt;/p&gt;
&lt;p&gt;A couple of headline speakers have been announced now:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;&lt;b&gt;Alban Crequy&lt;/b&gt; (Kinvolk)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Brian "Redbeard" Harrington&lt;/b&gt; (CoreOS)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Gianluca Borello&lt;/b&gt; (Sysdig)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Jon Boulle&lt;/b&gt; (NStack/CoreOS)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Martin Pitt&lt;/b&gt; (Debian)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Thomas Graf&lt;/b&gt; (covalent.io/Cilium)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Vincent Batts&lt;/b&gt; (Red Hat/OCI)&lt;/li&gt;
&lt;li&gt;(and yours truly)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These folks will also review your submissions as part of the papers committee!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://all-systems-go.io/"&gt;&lt;img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;All Systems Go!&lt;/i&gt; is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;All Systems Go! 2017&lt;/i&gt; takes place in &lt;b&gt;Berlin, Germany&lt;/b&gt; on &lt;b&gt;October 21st+22nd&lt;/b&gt;.&lt;/p&gt;
&lt;p&gt;To submit your proposal now please visit our &lt;a href="https://cfp.all-systems-go.io/en/ASG2017/events/new"&gt;CFP submission web site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further information about All Systems Go! visit our &lt;a href="http://all-systems-go.io/"&gt;conference web site&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 10 Aug 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-08-10:/blog/all-systems-go-2017-speakers.html</guid><category>projects</category></item><item><title>casync Video</title><link>https://0pointer.net/blog/casync-video.html</link><description>&lt;h1&gt;Video of my casync Presentation @ kinvolk&lt;/h1&gt;
&lt;p&gt;The great folks at &lt;a href="https://kinvolk.io/"&gt;kinvolk&lt;/a&gt; have uploaded a
&lt;a href="https://www.youtube.com/watch?v=JnNkBJ6pr9s"&gt;video of my casync presentation at their offices last
week&lt;/a&gt;.&lt;/p&gt;
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/JnNkBJ6pr9s" frameborder="0" allowfullscreen&gt;&lt;/iframe&gt;

&lt;p&gt;The &lt;a href="http://0pointer.de/public/casync-kinvolk2017.pdf"&gt;slides are
available&lt;/a&gt; as well.&lt;/p&gt;
&lt;p&gt;Enjoy!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 18 Jul 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-07-18:/blog/casync-video.html</guid><category>projects</category></item><item><title>mkosi — A Tool for Generating OS Images</title><link>https://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html</link><description>&lt;h1&gt;Introducing mkosi&lt;/h1&gt;
&lt;p&gt;After blogging about
&lt;a href="http://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html"&gt;&lt;code&gt;casync&lt;/code&gt;&lt;/a&gt;
I realized I never blogged about the
&lt;a href="https://github.com/systemd/mkosi"&gt;&lt;code&gt;mkosi&lt;/code&gt;&lt;/a&gt; tool that combines nicely
with it. &lt;code&gt;mkosi&lt;/code&gt; has been around for a while already, and its time to
make it a bit better known. &lt;code&gt;mkosi&lt;/code&gt; stands for &lt;em&gt;Make Operating System
Image&lt;/em&gt;, and is a tool for precisely that: generating an OS tree or
image that can be booted.&lt;/p&gt;
&lt;p&gt;Yes, there are many tools like &lt;code&gt;mkosi&lt;/code&gt;, and a number of them are quite
well known and popular. But &lt;code&gt;mkosi&lt;/code&gt; has a number of features that I
think make it interesting for a variety of use-cases that other tools
don't cover that well.&lt;/p&gt;
&lt;h1&gt;What is mkosi?&lt;/h1&gt;
&lt;p&gt;What are those use-cases, and what does &lt;code&gt;mkosi&lt;/code&gt; precisely set apart?
&lt;code&gt;mkosi&lt;/code&gt; is definitely a tool with a focus on developer's needs for
building OS images, for testing and debugging, but also for generating
production images with cryptographic protection. A typical use-case
would be to add a &lt;code&gt;mkosi.default&lt;/code&gt; file to an existing project (for
example, one written in C or Python), and thus making it easy to
generate an OS image for it. &lt;code&gt;mkosi&lt;/code&gt; will put together the image with
development headers and tools, compile your code in it, run your test
suite, then throw away the image again, and build a new one, this time
without development headers and tools, and install your build
artifacts in it. This final image is then "production-ready", and only
contains your built program and the minimal set of packages you
configured otherwise. Such an image could then be deployed with
&lt;code&gt;casync&lt;/code&gt; (or any other tool of course) to be delivered to your set of
servers, or IoT devices or whatever you are building.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;mkosi&lt;/code&gt; is supposed to be &lt;em&gt;legacy-free&lt;/em&gt;: the focus is clearly on
today's technology, not yesteryear's. Specifically this means that
we'll generate GPT partition tables, not MBR/DOS ones. When you tell
&lt;code&gt;mkosi&lt;/code&gt; to generate a bootable image for you, it will make it bootable
on EFI, not on legacy BIOS. The GPT images generated follow
specifications such as the &lt;a href="https://www.freedesktop.org/wiki/Specifications/DiscoverablePartitionsSpec/"&gt;Discoverable Partitions
Specification&lt;/a&gt;,
so that &lt;code&gt;/etc/fstab&lt;/code&gt; can remain unpopulated and tools such as
&lt;code&gt;systemd-nspawn&lt;/code&gt; can automatically dissect the image and boot from
them.&lt;/p&gt;
&lt;p&gt;So, let's have a look on the specific images it can generate:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Raw GPT disk image, with ext4 as root&lt;/li&gt;
&lt;li&gt;Raw GPT disk image, with btrfs as root&lt;/li&gt;
&lt;li&gt;Raw GPT disk image, with a read-only squashfs as root&lt;/li&gt;
&lt;li&gt;A plain directory on disk containing the OS tree directly (this is useful for creating generic container images)&lt;/li&gt;
&lt;li&gt;A btrfs subvolume on disk, similar to the plain directory&lt;/li&gt;
&lt;li&gt;A tarball of a plain directory&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When any of the GPT choices above are selected, a couple of additional
options are available:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A swap partition may be added in&lt;/li&gt;
&lt;li&gt;The system may be made bootable on EFI systems&lt;/li&gt;
&lt;li&gt;Separate partitions for &lt;code&gt;/home&lt;/code&gt; and &lt;code&gt;/srv&lt;/code&gt; may be added in&lt;/li&gt;
&lt;li&gt;The root, &lt;code&gt;/home&lt;/code&gt; and &lt;code&gt;/srv&lt;/code&gt; partitions may be optionally encrypted with LUKS&lt;/li&gt;
&lt;li&gt;The root partition may be protected using &lt;code&gt;dm-verity&lt;/code&gt;, thus making offline attacks on the generated system hard&lt;/li&gt;
&lt;li&gt;If the image is made bootable, the &lt;code&gt;dm-verity&lt;/code&gt; root hash is automatically added to the kernel command line, and the kernel together with its initial RAM disk and the kernel command line is optionally cryptographically signed for UEFI SecureBoot&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that &lt;code&gt;mkosi&lt;/code&gt; is distribution-agnostic. It currently can build
images based on the following Linux distributions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Fedora&lt;/li&gt;
&lt;li&gt;Debian&lt;/li&gt;
&lt;li&gt;Ubuntu&lt;/li&gt;
&lt;li&gt;ArchLinux&lt;/li&gt;
&lt;li&gt;openSUSE&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note though that not all distributions are supported at the same
feature level currently. Also, as &lt;code&gt;mkosi&lt;/code&gt; is based on &lt;code&gt;dnf
--installroot&lt;/code&gt;, &lt;code&gt;debootstrap&lt;/code&gt;, &lt;code&gt;pacstrap&lt;/code&gt; and &lt;code&gt;zypper&lt;/code&gt;, and those
packages are not packaged universally on all distributions, you might
not be able to build images for all those distributions on arbitrary
host distributions.&lt;/p&gt;
&lt;p&gt;The GPT images are put together in a way that they aren't just
compatible with UEFI systems, but also with VM and container managers
(that is, at least the smart ones, i.e. VM managers that know UEFI,
and container managers that grok GPT disk images) to a large
degree. In fact, the idea is that you can use &lt;code&gt;mkosi&lt;/code&gt; to build a
single GPT image that may be used to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Boot on bare-metal boxes&lt;/li&gt;
&lt;li&gt;Boot in a VM&lt;/li&gt;
&lt;li&gt;Boot in a &lt;code&gt;systemd-nspawn&lt;/code&gt; container&lt;/li&gt;
&lt;li&gt;Directly run a systemd service off, using systemd's &lt;code&gt;RootImage=&lt;/code&gt; unit file setting&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that in all four cases the &lt;code&gt;dm-verity&lt;/code&gt; data is automatically used
if available to ensure the image is not tampered with (yes, you read
that right, &lt;code&gt;systemd-nspawn&lt;/code&gt; and systemd's &lt;code&gt;RootImage=&lt;/code&gt; setting
automatically do &lt;code&gt;dm-verity&lt;/code&gt; these days if the image has it.)&lt;/p&gt;
&lt;h1&gt;Mode of Operation&lt;/h1&gt;
&lt;p&gt;The simplest usage of &lt;code&gt;mkosi&lt;/code&gt; is by simply invoking it without
parameters (as root):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# mkosi
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Without any configuration this will create a GPT disk image for you,
will call it &lt;code&gt;image.raw&lt;/code&gt; and drop it in the current directory. The
distribution used will be the same one as your host runs.&lt;/p&gt;
&lt;p&gt;Of course in most cases you want more control about how the image is
put together, i.e. select package sets, select the distribution, size
partitions and so on. Most of that you can actually specify on the
command line, but it is recommended to instead create a couple of
&lt;code&gt;mkosi.$SOMETHING&lt;/code&gt; files and directories in some directory. Then,
simply change to that directory and run &lt;code&gt;mkosi&lt;/code&gt; without any further
arguments. The tool will then look in the current working directory
for these files and directories and make use of them (similar to how
&lt;code&gt;make&lt;/code&gt; looks for a &lt;code&gt;Makefile&lt;/code&gt;…). Every single file/directory is
optional, but if they exist they are honored. Here's a list of the
files/directories &lt;code&gt;mkosi&lt;/code&gt; currently looks for:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.default&lt;/code&gt; — This is the main configuration file, here you
 can configure what kind of image you want, which distribution, which
 packages and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.extra/&lt;/code&gt; — If this directory exists, then &lt;code&gt;mkosi&lt;/code&gt; will copy
 everything inside it into the images built. You can place arbitrary
 directory hierarchies in here, and they'll be copied over whatever is
 already in the image, after it was put together by the distribution's
 package manager. This is the best way to drop additional static files
 into the image, or override distribution-supplied ones.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.build&lt;/code&gt; — This executable file is supposed to be a build
 script. When it exists, &lt;code&gt;mkosi&lt;/code&gt; will build two images, one after the
 other in the mode already mentioned above: the first version is the
 build image, and may include various build-time dependencies such as
 a compiler or development headers. The build script is also copied
 into it, and then run inside it. The script should then build
 whatever shall be built and place the result in &lt;code&gt;$DESTDIR&lt;/code&gt; (don't
 worry, popular build tools such as Automake or Meson all honor
 &lt;code&gt;$DESTDIR&lt;/code&gt; anyway, so there's not much to do here explicitly). It may
 also run a test suite, or anything else you like. After the script
 finished, the build image is removed again, and a second image (the
 &lt;em&gt;final&lt;/em&gt; image) is built. This time, no development packages are
 included, and the build script is not copied into the image again —
 however, the build artifacts from the first run (i.e. those placed in
 &lt;code&gt;$DESTDIR&lt;/code&gt;) are copied into the image.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.postinst&lt;/code&gt; — If this executable script exists, it is invoked
 inside the image (inside a &lt;code&gt;systemd-nspawn&lt;/code&gt; invocation) and can
 adjust the image as it likes at a very late point in the image
 preparation. If &lt;code&gt;mkosi.build&lt;/code&gt; exists, i.e. the dual-phased
 development build process used, then this script will be invoked
 twice: once inside the build image and once inside the final
 image. The first parameter passed to the script clarifies which phase
 it is run in.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.nspawn&lt;/code&gt; — If this file exists, it should contain a
 container configuration file for &lt;code&gt;systemd-nspawn&lt;/code&gt; (see
 &lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.nspawn.html"&gt;systemd.nspawn(5)&lt;/a&gt;
 for details), which shall be shipped along with the final image and
 shall be included in the check-sum calculations (see below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.cache/&lt;/code&gt; — If this directory exists, it is used as package
 cache directory for the builds. This directory is effectively bind
 mounted into the image at build time, in order to speed up building
 images. The package installers of the various distributions will
 place their package files here, so that subsequent runs can reuse
 them.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.passphrase&lt;/code&gt; — If this file exists, it should contain a
 pass-phrase to use for the LUKS encryption (if that's enabled for the
 image built). This file should not be readable to other users.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi.secure-boot.crt&lt;/code&gt; and &lt;code&gt;mkosi.secure-boot.key&lt;/code&gt; should be an
 X.509 key pair to use for signing the kernel and initrd for UEFI
 SecureBoot, if that's enabled.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;How to use it&lt;/h1&gt;
&lt;p&gt;So, let's come back to our most trivial example, without any of the
&lt;code&gt;mkosi.$SOMETHING&lt;/code&gt; files around:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# mkosi
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As mentioned, this will create a build file &lt;code&gt;image.raw&lt;/code&gt; in the current
directory. How do we use it? Of course, we could &lt;code&gt;dd&lt;/code&gt; it onto some USB
stick and boot it on a bare-metal device. However, it's much simpler
to first run it in a container for testing:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# systemd-nspawn -bi image.raw
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And there you go: the image should boot up, and just work for you.&lt;/p&gt;
&lt;p&gt;Now, let's make things more interesting. Let's still not use any of
the &lt;code&gt;mkosi.$SOMETHING&lt;/code&gt; files around:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;#&lt;/span&gt; mkosi -t raw_btrfs --bootable -o foobar.raw
&lt;span class="gh"&gt;#&lt;/span&gt; systemd-nspawn -bi foobar.raw
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is similar as the above, but we made three changes: it's no
longer GPT + &lt;code&gt;ext4&lt;/code&gt;, but GPT + &lt;code&gt;btrfs&lt;/code&gt;. Moreover, the system is made
bootable on UEFI systems, and finally, the output is now called
&lt;code&gt;foobar.raw&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Because this system is bootable on UEFI systems, we can run it in KVM:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;qemu-kvm -m 512 -smp 2 -bios /usr/share/edk2/ovmf/OVMF_CODE.fd -drive format=raw,file=foobar.raw
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will look very similar to the &lt;code&gt;systemd-nspawn&lt;/code&gt; invocation, except
that this uses full VM virtualization rather than container
virtualization. (Note that the way to run a UEFI qemu/kvm instance
appears to change all the time and is different on the various
distributions. It's quite annoying, and I can't really tell you what
the right qemu command line is to make this work on your system.)&lt;/p&gt;
&lt;p&gt;Of course, it's not all raw GPT disk images with &lt;code&gt;mkosi&lt;/code&gt;. Let's try
a plain directory image:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# mkosi -d fedora -t directory -o quux
# systemd-nspawn -bD quux
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Of course, if you generate the image as plain directory you can't boot
it on bare-metal just like that, nor run it in a VM.&lt;/p&gt;
&lt;p&gt;A more complex command line is the following:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mkosi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fedora&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;raw_squashfs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="nx"&gt;checksum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="nx"&gt;xz&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="kn"&gt;package&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;openssh&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="kn"&gt;package&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;emacs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In this mode we explicitly pick Fedora as the distribution to use, ask
&lt;code&gt;mkosi&lt;/code&gt; to generate a compressed GPT image with a root squashfs,
compress the result with &lt;code&gt;xz&lt;/code&gt;, and generate a &lt;code&gt;SHA256SUMS&lt;/code&gt; file with
the hashes of the generated artifacts. The package will contain the
SSH client as well as everybody's favorite editor.&lt;/p&gt;
&lt;p&gt;Now, let's make use of the various &lt;code&gt;mkosi.$SOMETHING&lt;/code&gt; files. Let's
say we are working on some Automake-based project and want to make it
easy to generate a disk image off the development tree with the
version you are hacking on. Create a configuration file:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;mkosi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;EOF&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;Distribution&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fedora&lt;/span&gt;
&lt;span class="k"&gt;Release&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Output&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;Format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;raw_btrfs&lt;/span&gt;
&lt;span class="n"&gt;Bootable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;yes&lt;/span&gt;

&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Packages&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;packages&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;appear&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;both&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;image&lt;/span&gt;
&lt;span class="n"&gt;Packages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;openssh&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;clients&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;httpd&lt;/span&gt;
&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;packages&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;appear&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;but&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;absent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;image&lt;/span&gt;
&lt;span class="n"&gt;BuildPackages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;make&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gcc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;libcurl&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;devel&lt;/span&gt;
&lt;span class="n"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And let's add a build script:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# cat &amp;gt; mkosi.build &amp;lt;&amp;lt;EOF&lt;/span&gt;
&lt;span class="c1"&gt;#!/bin/sh&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;autogen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sh&lt;/span&gt;
&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;configure&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;
&lt;span class="n"&gt;make&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n n-Quoted"&gt;`nproc`&lt;/span&gt;
&lt;span class="n"&gt;make&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;install&lt;/span&gt;
&lt;span class="n"&gt;EOF&lt;/span&gt;
&lt;span class="c1"&gt;# chmod +x mkosi.build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And with all that in place we can now build our project into a disk image, simply by typing:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# mkosi
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let's try it out:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# systemd-nspawn -bi image.raw
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Of course, if you do this you'll notice that building an image like
this can be quite slow. And slow build times are actively hurtful to
your productivity as a developer. Hence let's make things a bit
faster. First, let's make use of a package cache shared between runs:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# mkdir mkosi.cache
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Building images now should already be substantially faster (and
generate less network traffic) as the packages will now be downloaded
only once and reused. However, you'll notice that unpacking all those
packages and the rest of the work is still quite slow. But &lt;code&gt;mkosi&lt;/code&gt; can
help you with that. Simply use &lt;code&gt;mkosi&lt;/code&gt;'s incremental build feature. In
this mode &lt;code&gt;mkosi&lt;/code&gt; will make a copy of the build and final images
immediately before dropping in your build sources or artifacts, so
that building an image becomes a lot quicker: instead of always
starting totally from scratch a build will now reuse everything it can
reuse from a previous run, and immediately begin with building your
sources rather than the build image to build your sources in. To
enable the incremental build feature use &lt;code&gt;-i&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# mkosi -i
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note that if you use this option, the package list is not updated
anymore from your distribution's servers, as the cached copy is made
after all packages are installed, and hence until you actually delete
the cached copy the distribution's network servers aren't contacted
again and no RPMs or DEBs are downloaded. This means the distribution
you use becomes "frozen in time" this way. (Which might be a bad
thing, but also a good thing, as it makes things kinda reproducible.)&lt;/p&gt;
&lt;p&gt;Of course, if you run &lt;code&gt;mkosi&lt;/code&gt; a couple of times you'll notice that it
won't overwrite the generated image when it already exists. You can
either delete the file yourself first (&lt;code&gt;rm image.raw&lt;/code&gt;) or let &lt;code&gt;mkosi&lt;/code&gt;
do it for you right before building a new image, with &lt;code&gt;mkosi -f&lt;/code&gt;. You
can also tell &lt;code&gt;mkosi&lt;/code&gt; to not only remove any such pre-existing images,
but also remove any cached copies of the incremental feature, by using
&lt;code&gt;-f&lt;/code&gt; twice.&lt;/p&gt;
&lt;p&gt;I wrote &lt;code&gt;mkosi&lt;/code&gt; originally in order to test systemd, and quickly
generate a disk image of various distributions with the most current
systemd version from git, without all that affecting my host system. I
regularly use &lt;code&gt;mkosi&lt;/code&gt; for that today, in incremental mode. The two
commands I use most in that context are:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;#&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;mkosi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;systemd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;nspawn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;bi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;image&lt;/span&gt;.&lt;span class="nv"&gt;raw&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And sometimes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;# mkosi -iff &amp;amp;&amp;amp; systemd-nspawn -bi image.raw
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The latter I use only if I want to regenerate everything based on the
very newest set of RPMs provided by Fedora, instead of a cached
snapshot of it.&lt;/p&gt;
&lt;p&gt;BTW, the &lt;code&gt;mkosi&lt;/code&gt; files for systemd are included in the systemd git
tree:
&lt;a href="https://github.com/systemd/systemd/blob/master/.mkosi/mkosi.fedora"&gt;&lt;code&gt;mkosi.default&lt;/code&gt;&lt;/a&gt;
and
&lt;a href="https://github.com/systemd/systemd/blob/master/mkosi.build"&gt;&lt;code&gt;mkosi.build&lt;/code&gt;&lt;/a&gt;. This
way, any developer who wants to quickly test something with current
systemd git, or wants to prepare a patch based on it and test it can
check out the systemd repository and simply run &lt;code&gt;mkosi&lt;/code&gt; in it and a
few minutes later he has a bootable image he can test in
&lt;code&gt;systemd-nspawn&lt;/code&gt; or KVM. &lt;code&gt;casync&lt;/code&gt; has similar files:
&lt;a href="https://github.com/systemd/casync/blob/master/mkosi.default"&gt;&lt;code&gt;mkosi.default&lt;/code&gt;&lt;/a&gt;,
&lt;a href="https://github.com/systemd/casync/blob/master/mkosi.build"&gt;&lt;code&gt;mkosi.build&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Random Interesting Features&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;As mentioned already, &lt;code&gt;mkosi&lt;/code&gt; will generate &lt;code&gt;dm-verity&lt;/code&gt; enabled
 disk images if you ask for it. For that use the &lt;code&gt;--verity&lt;/code&gt; switch on
 the command line or &lt;code&gt;Verity=&lt;/code&gt; setting in &lt;code&gt;mkosi.default&lt;/code&gt;. Of course,
 &lt;code&gt;dm-verity&lt;/code&gt; implies that the root volume is read-only. In this mode
 the top-level &lt;code&gt;dm-verity&lt;/code&gt; hash will be placed along-side the output
 disk image in a file named the same way, but with the &lt;code&gt;.roothash&lt;/code&gt;
 suffix. If the image is to be created bootable, the root hash is also
 included on the kernel command line in the &lt;code&gt;roothash=&lt;/code&gt; parameter,
 which current systemd versions can use to both find and activate the
 root partition in a &lt;code&gt;dm-verity&lt;/code&gt; protected way. BTW: it's a good idea
 to combine this &lt;code&gt;dm-verity&lt;/code&gt; mode with the &lt;code&gt;raw_squashfs&lt;/code&gt; image mode,
 to generate a genuinely protected, compressed image suitable for
 running in your IoT device.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;As indicated above, &lt;code&gt;mkosi&lt;/code&gt; can automatically create a check-sum
 file &lt;code&gt;SHA256SUMS&lt;/code&gt; for you (&lt;code&gt;--checksum&lt;/code&gt;) covering all the files it
 outputs (which could be the image file itself, a matching &lt;code&gt;.nspawn&lt;/code&gt;
 file using the &lt;code&gt;mkosi.nspawn&lt;/code&gt; file mentioned above, as well as the
 &lt;code&gt;.roothash&lt;/code&gt; file for the &lt;code&gt;dm-verity&lt;/code&gt; root hash.) It can then
 optionally sign this with &lt;code&gt;gpg&lt;/code&gt; (&lt;code&gt;--sign&lt;/code&gt;). Note that &lt;code&gt;systemd&lt;/code&gt;'s
 &lt;code&gt;machinectl pull-tar&lt;/code&gt; and &lt;code&gt;machinectl pull-raw&lt;/code&gt; command can download
 these files and the &lt;code&gt;SHA256SUMS&lt;/code&gt; file automatically and verify things
 on download. With other words: what &lt;code&gt;mkosi&lt;/code&gt; outputs is perfectly
 ready for downloads using these two &lt;code&gt;systemd&lt;/code&gt; commands.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;As mentioned, &lt;code&gt;mkosi&lt;/code&gt; is big on supporting UEFI SecureBoot. To
 make use of that, place your X.509 key pair in two files
 &lt;code&gt;mkosi.secureboot.crt&lt;/code&gt; and &lt;code&gt;mkosi.secureboot.key&lt;/code&gt;, and set
 &lt;code&gt;SecureBoot=&lt;/code&gt; or &lt;code&gt;--secure-boot&lt;/code&gt;. If so, &lt;code&gt;mkosi&lt;/code&gt; will sign the
 kernel/initrd/kernel command line combination during the build. Of
 course, if you use this mode, you should also use
 &lt;code&gt;Verity=&lt;/code&gt;/&lt;code&gt;--verity=&lt;/code&gt;, otherwise the setup makes only partial
 sense. Note that &lt;code&gt;mkosi&lt;/code&gt; will not help you with actually enrolling
 the keys you use in your UEFI BIOS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;mkosi&lt;/code&gt; has minimal support for GIT checkouts: when it recognizes
 it is run in a git checkout and you use the &lt;code&gt;mkosi.build&lt;/code&gt; script
 stuff, the source tree will be copied into the build image, but will
 all files excluded by &lt;code&gt;.gitignore&lt;/code&gt; removed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There's support for encryption in place. Use &lt;code&gt;--encrypt=&lt;/code&gt; or
 &lt;code&gt;Encrypt=&lt;/code&gt;. Note that the UEFI ESP is never encrypted though, and the
 root partition only if explicitly requested. The &lt;code&gt;/home&lt;/code&gt; and &lt;code&gt;/srv&lt;/code&gt;
 partitions are unconditionally encrypted if that's enabled.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Images may be built with all documentation removed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The password for the root user and additional kernel command line
 arguments may be configured for the image to generate.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Minimum Requirements&lt;/h1&gt;
&lt;p&gt;Current &lt;code&gt;mkosi&lt;/code&gt; requires Python 3.5, and has a number of dependencies,
listed in the
&lt;a href="https://github.com/systemd/mkosi/blob/master/README.md"&gt;&lt;code&gt;README&lt;/code&gt;&lt;/a&gt;. Most
notably you need a somewhat recent systemd version to make use of its
full feature set: systemd 233. Older versions are already packaged for
various distributions, but much of what I describe above is only
available in the most recent release &lt;code&gt;mkosi 3&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The UEFI SecureBoot support requires &lt;code&gt;sbsign&lt;/code&gt; which currently isn't
available in Fedora, but there's &lt;a href="https://copr.fedorainfracloud.org/coprs/msekleta/sbsigntool/"&gt;a
COPR&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Future&lt;/h1&gt;
&lt;p&gt;It is my intention to continue turning &lt;code&gt;mkosi&lt;/code&gt; into a tool suitable
for:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Testing and debugging projects&lt;/li&gt;
&lt;li&gt;Building images for secure devices&lt;/li&gt;
&lt;li&gt;Building portable service images&lt;/li&gt;
&lt;li&gt;Building images for secure VMs and containers&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One of the biggest goals I have for the future is to teach &lt;code&gt;mkosi&lt;/code&gt; and
&lt;code&gt;systemd&lt;/code&gt;/&lt;code&gt;sd-boot&lt;/code&gt; native support for A/B IoT style partition
setups. The idea is that the combination of &lt;code&gt;systemd&lt;/code&gt;, &lt;code&gt;casync&lt;/code&gt; and
&lt;code&gt;mkosi&lt;/code&gt; provides generic building blocks for building secure,
auto-updating devices in a generic way from, even though all pieces
may be used individually, too.&lt;/p&gt;
&lt;h1&gt;FAQ&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Why are you reinventing the wheel again? This is exactly like
 &lt;code&gt;$SOMEOTHERPROJECT&lt;/code&gt;!&lt;/strong&gt; — Well, to my knowledge there's no tool that
 integrates this nicely with your project's development tree, and can
 do &lt;code&gt;dm-verity&lt;/code&gt; and UEFI SecureBoot and all that stuff for you. So
 nope, I don't think this exactly like &lt;code&gt;$SOMEOTHERPROJECT&lt;/code&gt;, thank you
 very much.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What about creating MBR/DOS partition images?&lt;/strong&gt; — That's really
 out of focus to me. This is an exercise in figuring out how generic
 OSes and devices in the future should be built and an attempt to
 commoditize OS image building. And no, the future doesn't speak MBR,
 sorry. That said, I'd be quite interested in adding support for
 booting on Raspberry Pi, possibly using a hybrid approach, i.e. using
 a GPT disk label, but arranging things in a way that the Raspberry Pi
 boot protocol (which is built around DOS partition tables), can still
 work.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Is this portable?&lt;/strong&gt; — Well, depends what you mean by
 &lt;em&gt;portable&lt;/em&gt;. No, this tool runs on Linux only, and as it uses
 &lt;code&gt;systemd-nspawn&lt;/code&gt; during the build process it doesn't run on
 non-&lt;code&gt;systemd&lt;/code&gt; systems either. But then again, you should be able to
 create images for any architecture you like with it, but of course if
 you want the image bootable on bare-metal systems only systems doing
 UEFI are supported (but &lt;code&gt;systemd-nspawn&lt;/code&gt; should still work fine on
 them).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Where can I get this stuff?&lt;/strong&gt; — Try
 &lt;a href="https://github.com/systemd/mkosi"&gt;GitHub&lt;/a&gt;. And some distributions
 carry packaged versions, but I think none of them the current v3
 yet.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Is this a systemd project?&lt;/strong&gt; — Yes, it's hosted under the
 &lt;a href="https://github.com/systemd"&gt;systemd GitHub umbrella&lt;/a&gt;. And yes,
 during run-time &lt;code&gt;systemd-nspawn&lt;/code&gt; in a current version is required. But
 no, the code-bases are separate otherwise, already because &lt;code&gt;systemd&lt;/code&gt;
 is a C project, and &lt;code&gt;mkosi&lt;/code&gt; Python.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Requiring systemd 233 is a pretty steep requirement, no?&lt;/strong&gt; —
 Yes, but the feature we need kind of matters (&lt;code&gt;systemd-nspawn&lt;/code&gt;'s
 &lt;code&gt;--overlay=&lt;/code&gt; switch), and again, this isn't supposed to be a tool for
 legacy systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Can I run the resulting images in LXC or Docker?&lt;/strong&gt; — Humm, I am
 not an LXC nor Docker guy. If you select &lt;code&gt;directory&lt;/code&gt; or &lt;code&gt;subvolume&lt;/code&gt;
 as image type, LXC should be able to boot the generated images just
 fine, but I didn't try. Last time I looked, Docker doesn't permit
 running proper init systems as PID 1 inside the container, as they
 define their own run-time without intention to emulate a proper
 system. Hence, no I don't think it will work, at least not with an
 unpatched Docker version. That said, again, don't ask me questions
 about Docker, it's not precisely my area of expertise, and quite
 frankly I am not a fan. To my knowledge neither LXC nor Docker are
 able to run containers directly off GPT disk images, hence the
 various &lt;code&gt;raw_xyz&lt;/code&gt; image types are definitely not compatible with
 either. That means if you want to generate a single raw disk image
 that can be booted unmodified both in a container and on bare-metal,
 then &lt;code&gt;systemd-nspawn&lt;/code&gt; is the container manager to go for
 (specifically, its &lt;code&gt;-i&lt;/code&gt;/&lt;code&gt;--image=&lt;/code&gt; switch).&lt;/li&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Should you care? Is this a tool for you?&lt;/h1&gt;
&lt;p&gt;Well, that's up to you really.&lt;/p&gt;
&lt;p&gt;If you hack on some complex project and need a quick way to compile
and run your project on a specific current Linux distribution, then
&lt;code&gt;mkosi&lt;/code&gt; is an excellent way to do that. Simply drop the &lt;code&gt;mkosi.default&lt;/code&gt;
and &lt;code&gt;mkosi.build&lt;/code&gt; files in your &lt;code&gt;git&lt;/code&gt; tree and everything will be
easy. (And of course, as indicated above: if the project you are
hacking on happens to be called &lt;code&gt;systemd&lt;/code&gt; or &lt;code&gt;casync&lt;/code&gt; be aware that
those files are already part of the git tree — you can just use them.)&lt;/p&gt;
&lt;p&gt;If you hack on some embedded or IoT device, then &lt;code&gt;mkosi&lt;/code&gt; is a great
choice too, as it will make it reasonably easy to generate secure
images that are protected against offline modification, by using
&lt;code&gt;dm-verity&lt;/code&gt; and UEFI SecureBoot.&lt;/p&gt;
&lt;p&gt;If you are an administrator and need a nice way to build images for a
VM or &lt;code&gt;systemd-nspawn&lt;/code&gt; container, or a portable service then &lt;code&gt;mkosi&lt;/code&gt;
is an excellent choice too.&lt;/p&gt;
&lt;p&gt;If you care about legacy computers, old distributions, non-&lt;code&gt;systemd&lt;/code&gt;
init systems, old VM managers, Docker, … then no, &lt;code&gt;mkosi&lt;/code&gt; is not for
you, but there are plenty of well-established alternatives around that
cover that nicely.&lt;/p&gt;
&lt;p&gt;And never forget: &lt;code&gt;mkosi&lt;/code&gt; is an Open Source project. We are happy to
accept your patches and other contributions.&lt;/p&gt;
&lt;p&gt;Oh, and one unrelated last thing: don't forget to &lt;a href="https://cfp.all-systems-go.io/en/ASG2017/events/new"&gt;submit your talk
proposal&lt;/a&gt;
and/or &lt;a href="https://ti.to/all-systems-go/all-systems-go"&gt;buy a ticket&lt;/a&gt; for
&lt;a href="https://all-systems-go.io/"&gt;All Systems Go! 2017 in Berlin&lt;/a&gt; — the
conference where things like &lt;code&gt;systemd&lt;/code&gt;, &lt;code&gt;casync&lt;/code&gt; and &lt;code&gt;mkosi&lt;/code&gt; are
discussed, along with a variety of other Linux userspace projects used
for building systems.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 28 Jun 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-06-28:/blog/mkosi-a-tool-for-generating-os-images.html</guid><category>projects</category></item><item><title>All Systems Go! 2017 CfP Open</title><link>https://0pointer.net/blog/all-systems-go-2017-cfp-open.html</link><description>&lt;p&gt;&lt;large&gt;&lt;b&gt;The All Systems Go! 2017 Call for Participation is Now Open!&lt;/b&gt;&lt;/large&gt;&lt;/p&gt;
&lt;p&gt;We’d like to invite presentation proposals for &lt;i&gt;All Systems Go! 2017&lt;/i&gt;!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://all-systems-go.io/"&gt;&lt;img src="https://all-systems-go.io/img/header-graphic.png" width="600" height="195" border="5"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;All Systems Go!&lt;/i&gt; is an Open Source community conference focused on the projects and technologies at the foundation of modern Linux systems — specifically low-level user-space technologies. Its goal is to provide a friendly and collaborative gathering place for individuals and communities working to push these technologies forward.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;All Systems Go! 2017&lt;/i&gt; takes place in &lt;b&gt;Berlin, Germany&lt;/b&gt; on &lt;b&gt;October 21st+22nd&lt;/b&gt;.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;All Systems Go!&lt;/i&gt; is a 2-day event with 2-3 talks happening in parallel. Full presentation slots are 30-45 minutes in length and lightning talk slots are 5-10 minutes.&lt;/p&gt;
&lt;p&gt;We are now accepting submissions for presentation proposals. In particular, we are looking for sessions including, but not limited to, the following topics:&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;Low-level container executors and infrastructure&lt;/li&gt;
&lt;li&gt;IoT and embedded OS infrastructure&lt;/li&gt;
&lt;li&gt;OS, container, IoT image delivery and updating&lt;/li&gt;
&lt;li&gt;Building Linux devices and applications&lt;/li&gt;
&lt;li&gt;Low-level desktop technologies&lt;/li&gt;
&lt;li&gt;Networking&lt;/li&gt;
&lt;li&gt;System and service management&lt;/li&gt;
&lt;li&gt;Tracing and performance measuring&lt;/li&gt;
&lt;li&gt;IPC and RPC systems&lt;/li&gt;
&lt;li&gt;Security and Sandboxing&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome too, as long as they have a clear and direct relevance for user-space.&lt;/p&gt;
&lt;p&gt;Please submit your proposals by &lt;b&gt;September 3rd&lt;/b&gt;. Notification of acceptance will be sent out 1-2 weeks later.&lt;/p&gt;
&lt;p&gt;To submit your proposal now please visit our &lt;a href="https://cfp.all-systems-go.io/en/ASG2017/events/new"&gt;CFP submission web site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further information about All Systems Go! visit our &lt;a href="http://all-systems-go.io/"&gt;conference web site&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;i&gt;systemd.conf&lt;/i&gt; will not take place this year in lieu of &lt;i&gt;All Systems Go!&lt;/i&gt;. &lt;i&gt;All Systems Go!&lt;/i&gt; welcomes all projects that contribute to Linux user space, which, of course, includes systemd. Thus, anything you think was appropriate for submission to &lt;i&gt;systemd.conf&lt;/i&gt; is also fitting for &lt;i&gt;All Systems Go&lt;/i&gt;!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 20 Jun 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-06-20:/blog/all-systems-go-2017-cfp-open.html</guid><category>projects</category></item><item><title>casync — A tool for distributing file system images</title><link>https://0pointer.net/blog/casync-a-tool-for-distributing-file-system-images.html</link><description>&lt;h1&gt;Introducing casync&lt;/h1&gt;
&lt;p&gt;In the past months I have been working on a new project:
&lt;a href="https://github.com/systemd/casync/"&gt;&lt;code&gt;casync&lt;/code&gt;&lt;/a&gt;. &lt;code&gt;casync&lt;/code&gt; takes
inspiration from the popular &lt;a href="https://rsync.samba.org/"&gt;&lt;code&gt;rsync&lt;/code&gt;&lt;/a&gt; file
synchronization tool as well as the probably even more popular
&lt;a href="https://git-scm.com/"&gt;&lt;code&gt;git&lt;/code&gt;&lt;/a&gt; revision control system. It combines the
idea of the &lt;code&gt;rsync&lt;/code&gt; algorithm with the idea of &lt;code&gt;git&lt;/code&gt;-style
content-addressable file systems, and creates a new system for
efficiently storing and delivering file system images, optimized for
high-frequency update cycles over the Internet. Its current focus is
on delivering IoT, container, VM, application, portable service or OS
images, but I hope to extend it later in a generic fashion to become
useful for backups and home directory synchronization as well (but
more about that later).&lt;/p&gt;
&lt;p&gt;The basic technological building blocks &lt;code&gt;casync&lt;/code&gt; is built from are
neither new nor particularly innovative (at least not anymore),
however the way &lt;code&gt;casync&lt;/code&gt; combines them is different from existing tools,
and that's what makes it useful for a variety of use-cases that other
tools can't cover that well.&lt;/p&gt;
&lt;h1&gt;Why?&lt;/h1&gt;
&lt;p&gt;I created &lt;code&gt;casync&lt;/code&gt; after studying how today's popular tools store and
deliver file system images. To briefly name a few: Docker has a
layered tarball approach,
&lt;a href="https://ostree.readthedocs.io/en/latest/"&gt;OSTree&lt;/a&gt; serves the
individual files directly via HTTP and maintains packed deltas to
speed up updates, while other systems operate on the block layer and
place raw &lt;code&gt;squashfs&lt;/code&gt; images (or other archival file systems, such as
IS09660) for download on HTTP shares (in the better cases combined
with &lt;a href="http://zsync.moria.org.uk/"&gt;&lt;code&gt;zsync&lt;/code&gt;&lt;/a&gt; data).&lt;/p&gt;
&lt;p&gt;Neither of these approaches appeared fully convincing to me when used
in high-frequency update cycle systems. In such systems, it is
important to optimize towards a couple of goals:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Most importantly, make updates cheap traffic-wise (for this most tools use image deltas of some form)&lt;/li&gt;
&lt;li&gt;Put boundaries on disk space usage on servers (keeping deltas between all version combinations clients might want to run updates between, would suggest keeping an exponentially growing amount of deltas on servers)&lt;/li&gt;
&lt;li&gt;Put boundaries on disk space usage on clients&lt;/li&gt;
&lt;li&gt;Be friendly to Content Delivery Networks (CDNs), i.e. serve neither too many small nor too many overly large files, and only require the most basic form of HTTP. Provide the repository administrator with high-level knobs to tune the average file size delivered.&lt;/li&gt;
&lt;li&gt;Simplicity to use for users, repository administrators and developers&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I don't think any of the tools mentioned above are really good on more
than a small subset of these points.&lt;/p&gt;
&lt;p&gt;Specifically: Docker's layered tarball approach dumps the "delta"
question onto the feet of the image creators: the best way to make
your image downloads minimal is basing your work on an existing image
clients might already have, and inherit its resources, maintaining full
history. Here, revision control (a tool for the developer) is
intermingled with update management (a concept for optimizing
production delivery). As container histories grow individual deltas
are likely to stay small, but on the other hand a brand-new deployment
usually requires downloading the full history onto the deployment
system, even though there's no use for it there, and likely requires
substantially more disk space and download sizes.&lt;/p&gt;
&lt;p&gt;OSTree's serving of individual files is unfriendly to CDNs (as many
small files in file trees cause an explosion of HTTP GET
requests). To counter that OSTree supports placing pre-calculated
delta images between selected revisions on the delivery servers, which
means a certain amount of revision management, that leaks into the
clients.&lt;/p&gt;
&lt;p&gt;Delivering direct &lt;code&gt;squashfs&lt;/code&gt; (or other file system) images is almost
beautifully simple, but of course means every update requires a full
download of the newest image, which is both bad for disk usage and
generated traffic. Enhancing it with &lt;code&gt;zsync&lt;/code&gt; makes this a much better
option, as it can reduce generated traffic substantially at very
little cost of history/meta-data (no explicit deltas between a large
number of versions need to be prepared server side). On the other hand
server requirements in disk space and functionality (HTTP Range
requests) are minus points for the use-case I am interested in.&lt;/p&gt;
&lt;p&gt;(Note: all the mentioned systems have great properties, and it's not
my intention to badmouth them. They only point I am trying to make is
that for the use case I care about — file system image delivery with
high high frequency update-cycles — each system comes with certain
drawbacks.)&lt;/p&gt;
&lt;h1&gt;Security &amp;amp; Reproducibility&lt;/h1&gt;
&lt;p&gt;Besides the issues pointed out above I wasn't happy with the security
and reproducibility properties of these systems. In today's world
where security breaches involving hacking and breaking into connected
systems happen every day, an image delivery system that cannot make
strong guarantees regarding data integrity is out of
date. Specifically, the tarball format is famously nondeterministic:
the very same file tree can result in any number of different
valid serializations depending on the tool used, its version and the
underlying OS and file system. Some &lt;code&gt;tar&lt;/code&gt; implementations attempt to
correct that by guaranteeing that each file tree maps to exactly
one valid serialization, but such a property is always only specific
to the tool used. I strongly believe that any good update system must
guarantee on every single link of the chain that there's only one
valid representation of the data to deliver, that can easily be
verified.&lt;/p&gt;
&lt;h1&gt;What casync Is&lt;/h1&gt;
&lt;p&gt;So much about the background why I created &lt;code&gt;casync&lt;/code&gt;. Now, let's have a
look what &lt;code&gt;casync&lt;/code&gt; actually is like, and what it does. Here's the brief
technical overview:&lt;/p&gt;
&lt;p&gt;Encoding: Let's take a large linear data stream, split it into
variable-sized chunks (the size of each being a function of the
chunk's contents), and store these chunks in individual, compressed
files in some directory, each file named after a strong hash value of
its contents, so that the hash value may be used to as key for
retrieving the full chunk data. Let's call this directory a "chunk
store". At the same time, generate a "chunk index" file that lists
these chunk hash values plus their respective chunk sizes in a simple
linear array. The chunking algorithm is supposed to create variable,
but similarly sized chunks from the data stream, and do so in a way
that the same data results in the same chunks even if placed at
varying offsets. For more information &lt;a href="https://moinakg.wordpress.com/2013/06/22/high-performance-content-defined-chunking/"&gt;see this blog
story&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Decoding: Let's take the chunk index file, and reassemble the large
linear data stream by concatenating the uncompressed chunks retrieved
from the chunk store, keyed by the listed chunk hash values.&lt;/p&gt;
&lt;p&gt;As an extra twist, we introduce a well-defined, reproducible,
random-access serialization format for file trees (think: a more
modern &lt;code&gt;tar&lt;/code&gt;), to permit efficient, stable storage of complete file
trees in the system, simply by serializing them and then passing them
into the encoding step explained above.&lt;/p&gt;
&lt;p&gt;Finally, let's put all this on the network: for each image you want to
deliver, generate a chunk index file and place it on an HTTP
server. Do the same with the chunk store, and share it between the
various index files you intend to deliver.&lt;/p&gt;
&lt;p&gt;Why bother with all of this? Streams with similar contents will result
in mostly the same chunk files in the chunk store. This means it is
very efficient to store many related versions of a data stream in the
same chunk store, thus minimizing disk usage. Moreover, when
transferring linear data streams chunks already known on the receiving
side can be made use of, thus minimizing network traffic.&lt;/p&gt;
&lt;p&gt;Why is this different from &lt;code&gt;rsync&lt;/code&gt; or OSTree, or similar tools? Well,
one major difference between &lt;code&gt;casync&lt;/code&gt; and those tools is that we
remove file boundaries before chunking things up. This means that
small files are lumped together with their siblings and large files
are chopped into pieces, which permits us to recognize similarities in
files and directories beyond file boundaries, and makes sure our chunk
sizes are pretty evenly distributed, without the file boundaries
affecting them.&lt;/p&gt;
&lt;p&gt;The "chunking" algorithm is based on a the buzhash rolling hash
function. SHA256 is used as strong hash function to generate digests
of the chunks. xz is used to compress the individual chunks.&lt;/p&gt;
&lt;p&gt;Here's a diagram, hopefully explaining a bit how the encoding process
works, wasn't it for my crappy drawing skills:&lt;/p&gt;
&lt;p&gt;&lt;a href="http://0pointer.de/public/casync.png"&gt;&lt;img src="http://0pointer.de/public/casync.png" width="800" height="862" alt="Diagram"/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The diagram shows the encoding process from top to bottom. It starts
with a block device or a file tree, which is then serialized and
chunked up into variable sized blocks. The compressed chunks are then
placed in the chunk store, while a chunk index file is written listing
the chunk hashes in order. (The original SVG of this graphic may be
found &lt;a href="http://0pointer.de/public/casync.svg"&gt;here&lt;/a&gt;.)&lt;/p&gt;
&lt;h1&gt;Details&lt;/h1&gt;
&lt;p&gt;Note that &lt;code&gt;casync&lt;/code&gt; operates on two different layers, depending on the
use-case of the user:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;You may use it on the block layer. In this case the raw block data
on disk is taken as-is, read directly from the block device, split
into chunks as described above, compressed, stored and delivered.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You may use it on the file system layer. In this case, the
file tree serialization format mentioned above comes into play:
the file tree is serialized depth-first (much like &lt;code&gt;tar&lt;/code&gt; would do
it) and then split into chunks, compressed, stored and delivered.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The fact that it may be used on both the block and file system layer
opens it up for a variety of different use-cases. In the VM and IoT
ecosystems shipping images as block-level serializations is more
common, while in the container and application world file-system-level
serializations are more typically used.&lt;/p&gt;
&lt;p&gt;Chunk index files referring to block-layer serializations carry the
&lt;code&gt;.caibx&lt;/code&gt; suffix, while chunk index files referring to file system
serializations carry the &lt;code&gt;.caidx&lt;/code&gt; suffix. Note that you may also use
&lt;code&gt;casync&lt;/code&gt; as direct &lt;code&gt;tar&lt;/code&gt; replacement, i.e. without the chunking, just
generating the plain linear file tree serialization. Such files
carry the &lt;code&gt;.catar&lt;/code&gt; suffix. Internally &lt;code&gt;.caibx&lt;/code&gt; are identical to
&lt;code&gt;.caidx&lt;/code&gt; files, the only difference is semantical: &lt;code&gt;.caidx&lt;/code&gt; files
describe a &lt;code&gt;.catar&lt;/code&gt; file, while &lt;code&gt;.caibx&lt;/code&gt; files may describe any other
blob. Finally, chunk stores are directories carrying the &lt;code&gt;.castr&lt;/code&gt;
suffix.&lt;/p&gt;
&lt;h1&gt;Features&lt;/h1&gt;
&lt;p&gt;Here are a couple of other features &lt;code&gt;casync&lt;/code&gt; has:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;When downloading a new image you may use &lt;code&gt;casync&lt;/code&gt;'s &lt;code&gt;--seed=&lt;/code&gt;
feature: each block device, file, or directory specified is processed
using the same chunking logic described above, and is used as
preferred source when putting together the downloaded image locally,
avoiding network transfer of it. This of course is useful whenever
updating an image: simply specify one or more old versions as seed and
only download the chunks that truly changed since then. Note that
using seeds requires no history relationship between seed and the new
image to download. This has major benefits: you can even use it to
speed up downloads of relatively foreign and unrelated data. For
example, when downloading a container image built using Ubuntu you can
use your Fedora host OS tree in &lt;code&gt;/usr&lt;/code&gt; as seed, and &lt;code&gt;casync&lt;/code&gt; will
automatically use whatever it can from that tree, for example timezone
and locale data that tends to be identical between
distributions. Example: &lt;code&gt;casync extract
http://example.com/myimage.caibx --seed=/dev/sda1 /dev/sda2&lt;/code&gt;. This
will place the block-layer image described by the indicated URL in the
&lt;code&gt;/dev/sda2&lt;/code&gt; partition, using the existing &lt;code&gt;/dev/sda1&lt;/code&gt; data as seeding
source. An invocation like this could be typically used by IoT systems
with an A/B partition setup. Example 2: &lt;code&gt;casync extract
http://example.com/mycontainer-v3.caidx --seed=/srv/container-v1
--seed=/srv/container-v2 /src/container-v3&lt;/code&gt;, is very similar but
operates on the file system layer, and uses two old container versions
to seed the new version.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When operating on the file system level, the user has fine-grained
control on the meta-data included in the serialization. This is
relevant since different use-cases tend to require a different set of
saved/restored meta-data. For example, when shipping OS images, file
access bits/ACLs and ownership matter, while file modification times
hurt. When doing personal backups OTOH file ownership matters little
but file modification times are important. Moreover different backing
file systems support different feature sets, and storing more
information than necessary might make it impossible to validate a tree
against an image if the meta-data cannot be replayed in full. Due to
this, &lt;code&gt;casync&lt;/code&gt; provides a set of &lt;code&gt;--with=&lt;/code&gt; and &lt;code&gt;--without=&lt;/code&gt; parameters
that allow fine-grained control of the data stored in the file tree
serialization, including the granularity of modification times and
more. The precise set of selected meta-data features is also always
part of the serialization, so that seeding can work correctly and
automatically.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;casync&lt;/code&gt; tries to be as accurate as possible when storing file
system meta-data. This means that besides the usual baseline of file
meta-data (file ownership and access bits), and more advanced features
(extended attributes, ACLs, file capabilities) a number of more exotic
data is stored as well, including Linux
&lt;a href="https://linux.die.net/man/1/chattr"&gt;chattr(1)&lt;/a&gt; file attributes, as
well as &lt;a href="https://en.wikipedia.org/wiki/File_attribute#DOS_and_Windows"&gt;FAT file
attributes&lt;/a&gt;
(you may wonder why the latter? — EFI is FAT, and &lt;code&gt;/efi&lt;/code&gt; is part of
the comprehensive serialization of any host). In the future I intend
to extend this further, for example storing &lt;code&gt;btrfs&lt;/code&gt; sub-volume
information where available. Note that as described above every single
type of meta-data may be turned off and on individually, hence if you
don't need FAT file bits (and I figure it's pretty likely you don't),
then they won't be stored.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The user creating &lt;code&gt;.caidx&lt;/code&gt; or &lt;code&gt;.caibx&lt;/code&gt; files may control the desired
average chunk length (before compression) freely, using the
&lt;code&gt;--chunk-size=&lt;/code&gt; parameter. Smaller chunks increase the number of
generated files in the chunk store and increase HTTP GET load on the
server, but also ensure that sharing between similar images is
improved, as identical patterns in the images stored are more likely
to be recognized. By default &lt;code&gt;casync&lt;/code&gt; will use a 64K average chunk
size. Tweaking this can be particularly useful when adapting the
system to specific CDNs, or when delivering compressed disk images
such as &lt;code&gt;squashfs&lt;/code&gt; (see below).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Emphasis is placed on making all invocations reproducible,
well-defined and strictly deterministic. As mentioned above this is a
requirement to reach the intended security guarantees, but is also
useful for many other use-cases. For example, the &lt;code&gt;casync digest&lt;/code&gt;
command may be used to calculate a hash value identifying a specific
directory in all desired detail (use &lt;code&gt;--with=&lt;/code&gt; and &lt;code&gt;--without&lt;/code&gt; to pick
the desired detail). Moreover the &lt;code&gt;casync mtree&lt;/code&gt; command may be used
to generate a BSD &lt;a href="https://www.freebsd.org/cgi/man.cgi?mtree(5)"&gt;mtree(5)&lt;/a&gt; compatible manifest of a directory tree,
&lt;code&gt;.caidx&lt;/code&gt; or &lt;code&gt;.catar&lt;/code&gt; file.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The file system serialization format is nicely composable. By this
I mean that the serialization of a file tree is the concatenation of
the serializations of all files and file sub-trees located at the
top of the tree, with zero meta-data references from any of these
serializations into the others. This property is essential to ensure
maximum reuse of chunks when similar trees are serialized.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When extracting file trees or disk image files, &lt;code&gt;casync&lt;/code&gt;
will automatically create
&lt;a href="http://man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html"&gt;reflinks&lt;/a&gt;
from any specified seeds if the underlying file system supports it
(such as &lt;code&gt;btrfs&lt;/code&gt;, &lt;code&gt;ocfs&lt;/code&gt;, and future &lt;code&gt;xfs&lt;/code&gt;). After all, instead of
copying the desired data from the seed, we can just tell the file
system to link up the relevant blocks. This works both when extracting
&lt;code&gt;.caidx&lt;/code&gt; and &lt;code&gt;.caibx&lt;/code&gt; files — the latter of course only when the
extracted disk image is placed in a regular raw image file on disk,
rather than directly on a plain block device, as plain block devices
do not know the concept of reflinks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally, when extracting file trees, &lt;code&gt;casync&lt;/code&gt; can
create traditional UNIX hard-links for identical files in specified
seeds (&lt;code&gt;--hardlink=yes&lt;/code&gt;). This works on all UNIX file systems, and can
save substantial amounts of disk space. However, this only works for
very specific use-cases where disk images are considered read-only
after extraction, as any changes made to one tree will propagate to
all other trees sharing the same hard-linked files, as that's the
nature of hard-links. In this mode, &lt;code&gt;casync&lt;/code&gt; exposes OSTree-like
behavior, which is built heavily around read-only hard-link trees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;casync&lt;/code&gt; tries to be smart when choosing what to include in file
system images. Implicitly, file systems such as procfs and sysfs are
excluded from serialization, as they expose API objects, not real
files. Moreover, the "nodump" (&lt;code&gt;+d&lt;/code&gt;)
&lt;a href="https://linux.die.net/man/1/chattr"&gt;chattr(1)&lt;/a&gt; flag is honored by
default, permitting users to mark files to exclude from serialization.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When creating and extracting file trees &lt;code&gt;casync&lt;/code&gt; may apply an
automatic or explicit UID/GID shift. This is particularly useful when
transferring container image for use with Linux user name-spacing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In addition to local operation, &lt;code&gt;casync&lt;/code&gt; currently supports HTTP,
HTTPS, FTP and ssh natively for downloading chunk index files and
chunks (the ssh mode requires installing &lt;code&gt;casync&lt;/code&gt; on the remote host,
though, but an sftp mode not requiring that should be easy to
add). When creating index files or chunks, only ssh is supported as
remote back-end.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When operating on block-layer images, you may expose locally or
remotely stored images as local block devices. Example: &lt;code&gt;casync mkdev
http://example.com/myimage.caibx&lt;/code&gt; exposes the disk image described by
the indicated URL as local block device in &lt;code&gt;/dev&lt;/code&gt;, which you then may
use the usual block device tools on, such as mount or fdisk (only
read-only though). Chunks are downloaded on access with high priority,
and at low priority when idle in the background. Note that in this
mode, &lt;code&gt;casync&lt;/code&gt; also plays a role similar to "dm-verity", as all blocks
are validated against the strong digests in the chunk index file
before passing them on to the kernel's block layer. This feature is
implemented though Linux' NBD kernel facility.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Similar, when operating on file-system-layer images, you may mount
locally or remotely stored images as regular file systems. Example:
&lt;code&gt;casync mount http://example.com/mytree.caidx /srv/mytree&lt;/code&gt; mounts the
file tree image described by the indicated URL as a local directory
&lt;code&gt;/srv/mytree&lt;/code&gt;. This feature is implemented though Linux' FUSE kernel
facility. Note that special care is taken that the images exposed this
way can be packed up again with &lt;code&gt;casync make&lt;/code&gt; and are guaranteed to
return the bit-by-bit exact same serialization again that it was
mounted from. No data is lost or changed while passing things through
FUSE (OK, strictly speaking this is a lie, we do lose ACLs, but that's
hopefully just a temporary gap to be fixed soon).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In IoT A/B fixed size partition setups the file systems placed in
the two partitions are usually much shorter than the partition size,
in order to keep some room for later, larger updates. &lt;code&gt;casync&lt;/code&gt; is able
to analyze the super-block of a number of common file systems in order
to determine the actual size of a file system stored on a block
device, so that writing a file system to such a partition and reading
it back again will result in reproducible data. Moreover this speeds
up the seeding process, as there's little point in seeding the
white-space after the file system within the partition.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Example Command Lines&lt;/h1&gt;
&lt;p&gt;Here's how to use &lt;code&gt;casync&lt;/code&gt;, explained with a few examples:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;make&lt;span class="w"&gt; &lt;/span&gt;foobar.caidx&lt;span class="w"&gt; &lt;/span&gt;/some/directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will create a chunk index file &lt;code&gt;foobar.caidx&lt;/code&gt; in the local
directory, and populate the chunk store directory &lt;code&gt;default.castr&lt;/code&gt;
located next to it with the chunks of the serialization (you can
change the name for the store directory with &lt;code&gt;--store=&lt;/code&gt; if you
like). This command operates on the file-system level. A similar
command operating on the block level:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;make&lt;span class="w"&gt; &lt;/span&gt;foobar.caibx&lt;span class="w"&gt; &lt;/span&gt;/dev/sda1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This command creates a chunk index file &lt;code&gt;foobar.caibx&lt;/code&gt; in the local
directory describing the current contents of the &lt;code&gt;/dev/sda1&lt;/code&gt; block
device, and populates &lt;code&gt;default.castr&lt;/code&gt; in the same way as above. Note
that you may as well read a raw disk image from a file instead of a
block device:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;make&lt;span class="w"&gt; &lt;/span&gt;foobar.caibx&lt;span class="w"&gt; &lt;/span&gt;myimage.raw
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To reconstruct the original file tree from the &lt;code&gt;.caidx&lt;/code&gt; file and
the chunk store of the first command, use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;extract&lt;span class="w"&gt; &lt;/span&gt;foobar.caidx&lt;span class="w"&gt; &lt;/span&gt;/some/other/directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And similar for the block-layer version:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;extract&lt;span class="w"&gt; &lt;/span&gt;foobar.caibx&lt;span class="w"&gt; &lt;/span&gt;/dev/sdb1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;or, to extract the block-layer version into a raw disk image:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;extract&lt;span class="w"&gt; &lt;/span&gt;foobar.caibx&lt;span class="w"&gt; &lt;/span&gt;myotherimage.raw
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The above are the most basic commands, operating on local data
only. Now let's make this more interesting, and reference remote
resources:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;extract&lt;span class="w"&gt; &lt;/span&gt;http://example.com/images/foobar.caidx&lt;span class="w"&gt; &lt;/span&gt;/some/other/directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This extracts the specified &lt;code&gt;.caidx&lt;/code&gt; onto a local directory. This of
course assumes that &lt;code&gt;foobar.caidx&lt;/code&gt; was uploaded to the HTTP server in
the first place, along with the chunk store. You can use any command
you like to accomplish that, for example &lt;code&gt;scp&lt;/code&gt; or
&lt;code&gt;rsync&lt;/code&gt;. Alternatively, you can let &lt;code&gt;casync&lt;/code&gt; do this directly when
generating the chunk index:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;make&lt;span class="w"&gt; &lt;/span&gt;ssh.example.com:images/foobar.caidx&lt;span class="w"&gt; &lt;/span&gt;/some/directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will use ssh to connect to the &lt;code&gt;ssh.example.com&lt;/code&gt; server, and then
places the &lt;code&gt;.caidx&lt;/code&gt; file and the chunks on it. Note that this mode of
operation is "smart": this scheme will only upload chunks currently
missing on the server side, and not re-transmit what already is
available.&lt;/p&gt;
&lt;p&gt;Note that you can always configure the precise path or URL of the
chunk store via the &lt;code&gt;--store=&lt;/code&gt; option. If you do not do that, then the
store path is automatically derived from the path or URL: the last
component of the path or URL is replaced by &lt;code&gt;default.castr&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Of course, when extracting &lt;code&gt;.caidx&lt;/code&gt; or &lt;code&gt;.caibx&lt;/code&gt; files from remote sources,
using a local seed is advisable:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;extract&lt;span class="w"&gt; &lt;/span&gt;http://example.com/images/foobar.caidx&lt;span class="w"&gt; &lt;/span&gt;--seed&lt;span class="o"&gt;=&lt;/span&gt;/some/exising/directory&lt;span class="w"&gt; &lt;/span&gt;/some/other/directory
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Or on the block layer:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;extract&lt;span class="w"&gt; &lt;/span&gt;http://example.com/images/foobar.caibx&lt;span class="w"&gt; &lt;/span&gt;--seed&lt;span class="o"&gt;=&lt;/span&gt;/dev/sda1&lt;span class="w"&gt; &lt;/span&gt;/dev/sdb2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;When creating chunk indexes on the file system layer &lt;code&gt;casync&lt;/code&gt; will by
default store meta-data as accurately as possible. Let's create a chunk
index with reduced meta-data:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;make&lt;span class="w"&gt; &lt;/span&gt;foobar.caidx&lt;span class="w"&gt; &lt;/span&gt;--with&lt;span class="o"&gt;=&lt;/span&gt;sec-time&lt;span class="w"&gt; &lt;/span&gt;--with&lt;span class="o"&gt;=&lt;/span&gt;symlinks&lt;span class="w"&gt; &lt;/span&gt;--with&lt;span class="o"&gt;=&lt;/span&gt;read-only&lt;span class="w"&gt; &lt;/span&gt;/some/dir
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This command will create a chunk index for a file tree serialization
that has three features above the absolute baseline supported: 1s
granularity time-stamps, symbolic links and a single read-only bit. In
this mode, all the other meta-data bits are not stored, including
nanosecond time-stamps, full UNIX permission bits, file ownership or
even ACLs or extended attributes.&lt;/p&gt;
&lt;p&gt;Now let's make a &lt;code&gt;.caidx&lt;/code&gt; file available locally as a mounted file
system, without extracting it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;mount&lt;span class="w"&gt; &lt;/span&gt;http://example.comf/images/foobar.caidx&lt;span class="w"&gt; &lt;/span&gt;/mnt/foobar
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And similar, let's make a &lt;code&gt;.caibx&lt;/code&gt; file available locally as a block device:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;mkdev&lt;span class="w"&gt; &lt;/span&gt;http://example.comf/images/foobar.caibx
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will create a block device in &lt;code&gt;/dev&lt;/code&gt; and print the used device
node path to STDOUT.&lt;/p&gt;
&lt;p&gt;As mentioned, &lt;code&gt;casync&lt;/code&gt; is big about reproducibility. Let's make use of
that to calculate the a digest identifying a very specific version of
a file tree:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;digest&lt;span class="w"&gt; &lt;/span&gt;.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This digest will include all meta-data bits &lt;code&gt;casync&lt;/code&gt; and the underlying
file system know about. Usually, to make this useful you want to
configure exactly what meta-data to include:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;digest&lt;span class="w"&gt; &lt;/span&gt;--with&lt;span class="o"&gt;=&lt;/span&gt;unix&lt;span class="w"&gt; &lt;/span&gt;.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This makes use of the &lt;code&gt;--with=unix&lt;/code&gt; shortcut for selecting meta-data
fields. Specifying &lt;code&gt;--with-unix=&lt;/code&gt; selects all meta-data that
traditional UNIX file systems support. It is a shortcut for writing out:
&lt;code&gt;--with=16bit-uids --with=permissions --with=sec-time --with=symlinks
--with=device-nodes --with=fifos --with=sockets&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note that when calculating digests or creating chunk indexes you may
also use the negative &lt;code&gt;--without=&lt;/code&gt; option to remove specific features
but start from the most precise:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;digest&lt;span class="w"&gt; &lt;/span&gt;--without&lt;span class="o"&gt;=&lt;/span&gt;flag-immutable
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This generates a digest with the most accurate meta-data, but leaves
one feature out: &lt;a href="https://linux.die.net/man/1/chattr"&gt;chattr(1)&lt;/a&gt;'s
immutable (&lt;code&gt;+i&lt;/code&gt;) file flag.&lt;/p&gt;
&lt;p&gt;To list the contents of a &lt;code&gt;.caidx&lt;/code&gt; file use a command like the following:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;list&lt;span class="w"&gt; &lt;/span&gt;http://example.com/images/foobar.caidx
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;or&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;casync&lt;span class="w"&gt; &lt;/span&gt;mtree&lt;span class="w"&gt; &lt;/span&gt;http://example.com/images/foobar.caidx
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The former command will generate a brief list of files and
directories, not too different from &lt;code&gt;tar t&lt;/code&gt; or &lt;code&gt;ls -al&lt;/code&gt; in its
output. The latter command will generate a BSD
&lt;a href="https://www.freebsd.org/cgi/man.cgi?mtree(5)"&gt;mtree(5)&lt;/a&gt; compatible
manifest. Note that &lt;code&gt;casync&lt;/code&gt; actually stores substantially more file
meta-data than &lt;code&gt;mtree&lt;/code&gt; files can express, though.&lt;/p&gt;
&lt;h1&gt;What casync isn't&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;casync&lt;/code&gt; is not an attempt to minimize serialization and downloaded
deltas to the extreme. Instead, the tool is supposed to find a good
middle ground, that is good on traffic and disk space, but not at the
price of convenience or requiring explicit revision control. If you
care about updates that are absolutely minimal, there are binary delta
systems around that might be an option for you, such as &lt;a href="https://www.chromium.org/developers/design-documents/software-updates-courgette"&gt;Google's
Courgette&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;casync&lt;/code&gt; is not a replacement for &lt;code&gt;rsync&lt;/code&gt;, or &lt;code&gt;git&lt;/code&gt; or &lt;code&gt;zsync&lt;/code&gt; or
anything like that. They have very different use-cases and
semantics. For example, &lt;code&gt;rsync&lt;/code&gt; permits you to directly synchronize two
file trees remotely. &lt;code&gt;casync&lt;/code&gt; just cannot do that, and it is unlikely
it every will.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Where next?&lt;/h1&gt;
&lt;p&gt;&lt;code&gt;casync&lt;/code&gt; is supposed to be a generic synchronization tool. Its primary
focus for now is delivery of OS images, but I'd like to make it useful
for a couple other use-cases, too. Specifically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;To make the tool useful for backups, encryption is missing. I have
pretty concrete plans how to add that. When implemented, the tool
might become an alternative to &lt;a href="https://restic.github.io/"&gt;&lt;code&gt;restic&lt;/code&gt;&lt;/a&gt;,
&lt;a href="https://borgbackup.readthedocs.io/"&gt;BorgBackup&lt;/a&gt; or
&lt;a href="https://www.tarsnap.com/"&gt;&lt;code&gt;tarsnap&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Right now, if you want to deploy &lt;code&gt;casync&lt;/code&gt; in real-life, you still
need to validate the downloaded &lt;code&gt;.caidx&lt;/code&gt; or &lt;code&gt;.caibx&lt;/code&gt; file yourself, for
example with some &lt;code&gt;gpg&lt;/code&gt; signature. It is my intention to integrate with
&lt;code&gt;gpg&lt;/code&gt; in a minimal way so that signing and verifying chunk index files
is done automatically.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In the longer run, I'd like to build an automatic synchronizer for
&lt;code&gt;$HOME&lt;/code&gt; between systems from this. Each &lt;code&gt;$HOME&lt;/code&gt; instance would be
stored automatically in regular intervals in the cloud using casync,
and conflicts would be resolved locally.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;casync&lt;/code&gt; is written in a shared library style, but it is not yet
built as one. Specifically this means that almost all of &lt;code&gt;casync&lt;/code&gt;'s
functionality is supposed to be available as C API soon, and
applications can process &lt;code&gt;casync&lt;/code&gt; files on every level. It is my
intention to make this library useful enough so that it will be easy
to write a module for GNOME's &lt;code&gt;gvfs&lt;/code&gt; subsystem in order to make remote
or local &lt;code&gt;.caidx&lt;/code&gt; files directly available to applications (as an
alternative to &lt;code&gt;casync mount&lt;/code&gt;). In fact the idea is to make this all
flexible enough that even the remoting back-ends can be replaced
easily, for example to replace &lt;code&gt;casync&lt;/code&gt;'s default HTTP/HTTPS back-ends
built on CURL with GNOME's own HTTP implementation, in order to share
cookies, certificates, … There's also an alternative method to
integrate with &lt;code&gt;casync&lt;/code&gt; in place already: simply invoke &lt;code&gt;casync&lt;/code&gt; as a
sub-process. &lt;code&gt;casync&lt;/code&gt; will inform you about a certain set of state
changes using a mechanism compatible with
&lt;a href="https://www.freedesktop.org/software/systemd/man/sd_notify.html"&gt;sd_notify(3)&lt;/a&gt;. In
future it will also propagate progress data this way and more.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I intend to a add a new seeding back-end that sources chunks from
the local network. After downloading the new &lt;code&gt;.caidx&lt;/code&gt; file off the
Internet &lt;code&gt;casync&lt;/code&gt; would then search for the listed chunks on the local
network first before retrieving them from the Internet. This should
speed things up on all installations that have multiple similar
systems deployed in the same network.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Further plans are listed tersely in the
&lt;a href="https://github.com/systemd/casync/blob/master/TODO"&gt;TODO&lt;/a&gt; file.&lt;/p&gt;
&lt;h1&gt;FAQ:&lt;/h1&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Is this a systemd project?&lt;/em&gt;&lt;/strong&gt; — &lt;code&gt;casync&lt;/code&gt; is hosted under the
github &lt;a href="https://github.com/systemd/systemd"&gt;systemd&lt;/a&gt; umbrella, and the
projects share the same coding style. However, the code-bases are
distinct and without interdependencies, and &lt;code&gt;casync&lt;/code&gt; works fine both
on systemd systems and systems without it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Is &lt;code&gt;casync&lt;/code&gt; portable?&lt;/em&gt;&lt;/strong&gt; — At the moment: no. I only run Linux and
that's what I code for. That said, I am open to accepting portability
patches (unlike for systemd, which doesn't really make sense on
non-Linux systems), as long as they don't interfere too much with the
way &lt;code&gt;casync&lt;/code&gt; works. Specifically this means that I am not too
enthusiastic about merging portability patches for OSes lacking the
&lt;a href="http://man7.org/linux/man-pages/man2/open.2.html"&gt;openat(2)&lt;/a&gt; family
of APIs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Does &lt;code&gt;casync&lt;/code&gt; require reflink-capable file systems to work, such
as &lt;code&gt;btrfs&lt;/code&gt;?&lt;/em&gt;&lt;/strong&gt; — No it doesn't. The reflink magic in &lt;code&gt;casync&lt;/code&gt; is
employed when the file system permits it, and it's good to have it,
but it's not a requirement, and &lt;code&gt;casync&lt;/code&gt; will implicitly fall back to
copying when it isn't available. Note that &lt;code&gt;casync&lt;/code&gt; supports a number
of file system features on a variety of file systems that aren't
available everywhere, for example FAT's system/hidden file flags or
&lt;code&gt;xfs&lt;/code&gt;'s &lt;code&gt;projinherit&lt;/code&gt; file flag.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Is &lt;code&gt;casync&lt;/code&gt; stable?&lt;/em&gt;&lt;/strong&gt; — I just tagged the first, initial
release. While I have been working on it since quite some time and it
is quite featureful, this is the first time I advertise it publicly,
and it hence received very little testing outside of its own test
suite. I am also not fully ready to commit to the stability of the
current serialization or chunk index format. I don't see any breakages
coming for it though. &lt;code&gt;casync&lt;/code&gt; is pretty light on documentation right
now, and does not even have a man page. I also intend to correct that
soon.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Are the &lt;code&gt;.caidx&lt;/code&gt;/&lt;code&gt;.caibx&lt;/code&gt; and &lt;code&gt;.catar&lt;/code&gt; file formats open and
documented?&lt;/em&gt;&lt;/strong&gt; — &lt;code&gt;casync&lt;/code&gt; is Open Source, so if you want to know the
precise format, have a look at the sources for now. It's definitely my
intention to add comprehensive docs for both formats however. Don't
forget this is just the initial version right now.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;&lt;code&gt;casync&lt;/code&gt; is just like &lt;code&gt;$SOMEOTHERTOOL&lt;/code&gt;! Why are you reinventing
the wheel (again)?&lt;/em&gt;&lt;/strong&gt; — Well, because &lt;code&gt;casync&lt;/code&gt; &lt;em&gt;isn't&lt;/em&gt; "just like" some
other tool. I am pretty sure I did my homework, and that there is no
tool just like &lt;code&gt;casync&lt;/code&gt; right now. The tools coming closest are probably
&lt;code&gt;rsync&lt;/code&gt;, &lt;code&gt;zsync&lt;/code&gt;, &lt;code&gt;tarsnap&lt;/code&gt;, &lt;code&gt;restic&lt;/code&gt;, but they are quite different beasts
each.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why did you invent your own serialization format for file trees?
Why don't you just use &lt;code&gt;tar&lt;/code&gt;?&lt;/em&gt;&lt;/strong&gt; — That's a good question, and other
systems — most prominently &lt;code&gt;tarsnap&lt;/code&gt; — do that. However, as mentioned
above &lt;code&gt;tar&lt;/code&gt; doesn't enforce reproducibility. It also doesn't really do
random access: if you want to access some specific file you need to
read every single byte stored before it in the &lt;code&gt;tar&lt;/code&gt; archive to find
it, which is of course very expensive. The serialization &lt;code&gt;casync&lt;/code&gt;
implements places a focus on reproducibility, random access, and
meta-data control. Much like traditional &lt;code&gt;tar&lt;/code&gt; it can still be
generated and extracted in a stream fashion though.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Does &lt;code&gt;casync&lt;/code&gt; save/restore SELinux/SMACK file labels?&lt;/em&gt;&lt;/strong&gt; — At the
moment not. That's not because I wouldn't want it to, but simply
because I am not a guru of either of these systems, and didn't want to
implement something I do not fully grok nor can test. If you look at
the sources you'll find that there's already some definitions in place
that keep room for them though. I'd be delighted to accept a patch
implementing this fully.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;What about delivering &lt;code&gt;squashfs&lt;/code&gt; images? How well does chunking
work on compressed serializations?&lt;/em&gt;&lt;/strong&gt; – That's a very good point!
Usually, if you apply the a chunking algorithm to a compressed data
stream (let's say a &lt;code&gt;tar.gz&lt;/code&gt; file), then changing a single bit at the
front will propagate into the entire remainder of the file, so that
minimal changes will explode into major changes. Thankfully this
doesn't apply that strictly to &lt;code&gt;squashfs&lt;/code&gt; images, as it provides
random access to files and directories and thus breaks up the
compression streams in regular intervals to make seeking easy. This
fact is beneficial for systems employing chunking, such as &lt;code&gt;casync&lt;/code&gt; as
this means single bit changes might affect their vicinity but will not
explode in an unbounded fashion. In order achieve best results when
delivering &lt;code&gt;squashfs&lt;/code&gt; images through &lt;code&gt;casync&lt;/code&gt; the block sizes of
&lt;code&gt;squashfs&lt;/code&gt; and the chunks sizes of &lt;code&gt;casync&lt;/code&gt; should be matched up
(using &lt;code&gt;casync&lt;/code&gt;'s &lt;code&gt;--chunk-size=&lt;/code&gt; option). How precisely to choose
both values is left a research subject for the user, for now.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;What does the name &lt;code&gt;casync&lt;/code&gt; mean?&lt;/em&gt;&lt;/strong&gt; – It's a synchronizing
tool, hence the &lt;code&gt;-sync&lt;/code&gt; suffix, following &lt;code&gt;rsync&lt;/code&gt;'s naming. It makes
use of the content-addressable concept of &lt;code&gt;git&lt;/code&gt; hence the &lt;code&gt;ca-&lt;/code&gt;
prefix.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;***Where can I get this stuff? Is it already packaged? *** – Check
out the sources on &lt;a href="https://github.com/systemd/casync/"&gt;GitHub&lt;/a&gt;. I
just tagged the &lt;a href="https://github.com/systemd/casync/releases/tag/v1"&gt;first
version&lt;/a&gt;. Martin
Pitt has &lt;a href="https://plus.google.com/+MartinPitti/posts/8YMp3xNh1q7"&gt;packaged &lt;code&gt;casync&lt;/code&gt; for
Ubuntu&lt;/a&gt;. There
is also an &lt;a href="https://aur.archlinux.org/packages/casync-git/"&gt;ArchLinux
package&lt;/a&gt;. Zbigniew
Jędrzejewski-Szmek has prepared a &lt;a href="https://apps.fedoraproject.org/packages/casync"&gt;Fedora
RPM&lt;/a&gt; that hopefully
will soon be included in the distribution.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h1&gt;Should you care? Is this a tool for you?&lt;/h1&gt;
&lt;p&gt;Well, that's up to you really. If you are involved with projects that
need to deliver IoT, VM, container, application or OS images, then
maybe this is a great tool for you — but other options exist, some of
which are linked above.&lt;/p&gt;
&lt;p&gt;Note that &lt;code&gt;casync&lt;/code&gt; is an Open Source project: if it doesn't do exactly
what you need, prepare a patch that adds what you need, and we'll
consider it.&lt;/p&gt;
&lt;p&gt;If you are interested in the project and would like to talk about this
in person, I'll be presenting &lt;code&gt;casync&lt;/code&gt; soon at &lt;a href="https://www.meetup.com/linux-technologies-berlin/events/240909087/"&gt;Kinvolk's Linux
Technologies
Meetup&lt;/a&gt;
in Berlin, Germany. You are invited. I also intend to talk about it at
&lt;a href="https://all-systems-go.io/"&gt;All Systems Go!&lt;/a&gt;, also in Berlin.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 20 Jun 2017 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2017-06-20:/blog/casync-a-tool-for-distributing-file-system-images.html</guid><category>projects</category></item><item><title>Avoiding CVE-2016-8655 with systemd</title><link>https://0pointer.net/blog/avoiding-cve-2016-8655-with-systemd.html</link><description>&lt;h1&gt;Avoiding CVE-2016-8655 with systemd&lt;/h1&gt;
&lt;p&gt;Just a quick note: on recent versions of
&lt;a href="https://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt; it is
relatively easy to block the vulnerability described in
&lt;a href="http://seclists.org/oss-sec/2016/q4/607"&gt;CVE-2016-8655&lt;/a&gt; for
individual services.&lt;/p&gt;
&lt;p&gt;Since systemd release v211 there's an option
&lt;a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RestrictAddressFamilies="&gt;&lt;code&gt;RestrictAddressFamilies=&lt;/code&gt;&lt;/a&gt;
for service unit files which takes away the right to create sockets of
specific address families for processes of the service. In your unit
file, add &lt;code&gt;RestrictAddressFamilies=~AF_PACKET&lt;/code&gt; to the &lt;code&gt;[Service]&lt;/code&gt;
section to make &lt;code&gt;AF_PACKET&lt;/code&gt; unavailable to it (i.e. a blacklist),
which is sufficient to close the attack path. Safer of course is a
whitelist of address families whch you can define by dropping the &lt;code&gt;~&lt;/code&gt;
character from the assignment. Here's a trivial example:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;…&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;ExecStart&lt;/span&gt;&lt;span class="o"&gt;=/&lt;/span&gt;&lt;span class="n"&gt;usr&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;bin&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;mydaemon&lt;/span&gt;
&lt;span class="n"&gt;RestrictAddressFamilies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AF_INET6&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AF_UNIX&lt;/span&gt;
&lt;span class="err"&gt;…&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This restricts access to socket families, so that the service may
access only &lt;code&gt;AF_INET&lt;/code&gt;, &lt;code&gt;AF_INET6&lt;/code&gt; or &lt;code&gt;AF_UNIX&lt;/code&gt; sockets, which is
usually the right, minimal set for most system daemons. (&lt;code&gt;AF_INET&lt;/code&gt; is
the low-level name for the IPv4 address family, &lt;code&gt;AF_INET6&lt;/code&gt; for the
IPv6 address family, and &lt;code&gt;AF_UNIX&lt;/code&gt; for local UNIX socket IPC).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/systemd/systemd/blob/8e458bfe4e2aa36c939db62561b2a59206d78577/NEWS#L45"&gt;Starting with systemd v232&lt;/a&gt; we added &lt;code&gt;RestrictAddressFamilies=&lt;/code&gt; to all
of systemd's own unit files, always with the minimal set of socket
address families appropriate.&lt;/p&gt;
&lt;p&gt;With the upcoming v233 release we'll provide a second method for
blocking this vulnerability. Using
&lt;a href="https://github.com/systemd/systemd/pull/4536"&gt;&lt;code&gt;RestrictNamespaces=&lt;/code&gt;&lt;/a&gt;
it is possible to limit which types of Linux namespaces a service may
get access to. Use &lt;code&gt;RestrictNamespaces=yes&lt;/code&gt; to prohibit access to any
kind of namespace, or set &lt;code&gt;RestrictNamespaces=net ipc&lt;/code&gt; (or similar) to
restrict access to a specific set (in this case: network and IPC
namespaces). Given that user namespaces have been a major source of
security vulnerabilities in the past months it's probably a good idea
to block namespaces on all services which don't need them (which is
probably most of them).&lt;/p&gt;
&lt;p&gt;Of course, ideally, distributions such as Fedora, as well as upstream
developers would turn on the various sandboxing settings systemd
provides like these ones by default, since they know best which kind
of address families or namespaces a specific daemon needs.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 07 Dec 2016 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-12-07:/blog/avoiding-cve-2016-8655-with-systemd.html</guid><category>projects</category></item><item><title>systemd.conf 2016 Over Now</title><link>https://0pointer.net/blog/systemdconf-2016-over-now.html</link><description>&lt;h1&gt;systemd.conf 2016 is Over Now!&lt;/h1&gt;
&lt;p&gt;A few days ago &lt;a href="https://systemd.io/"&gt;systemd.conf 2016&lt;/a&gt; ended, our
second conference of this kind. I personally enjoyed this conference a
lot: the talks, the atmosphere, the audience, the organization, the
location, they all were excellent!&lt;/p&gt;
&lt;p&gt;I'd like to take the opportunity to thanks everybody involved. In
particular I'd like to thank &lt;em&gt;Chris&lt;/em&gt;, &lt;em&gt;Daniel&lt;/em&gt;, &lt;em&gt;Sandra&lt;/em&gt; and &lt;em&gt;Henrike&lt;/em&gt;
for organizing the conference, your work was stellar!&lt;/p&gt;
&lt;p&gt;I'd also like to thank our sponsors, without which the conference
couldn't take place like this, of course. In particular I'd like to
thank our gold sponsor, &lt;strong&gt;Red Hat&lt;/strong&gt;, our organizing sponsor &lt;strong&gt;Kinvolk&lt;/strong&gt;, as
well as our silver sponsors &lt;strong&gt;CoreOS&lt;/strong&gt; and &lt;strong&gt;Facebook&lt;/strong&gt;. I'd also like to
thank our bronze sponsors &lt;strong&gt;Collabora&lt;/strong&gt;, &lt;strong&gt;OpenSUSE&lt;/strong&gt;, &lt;strong&gt;Pantheon&lt;/strong&gt;, &lt;strong&gt;Pengutronix&lt;/strong&gt;,
our supporting sponsor &lt;strong&gt;Codethink&lt;/strong&gt; and last but not least our media
sponsor &lt;strong&gt;Linux Magazin&lt;/strong&gt;. Thank you all!&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/redhat.png" width="300" height="97"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/kinvolk_logo.png" width="300" height="187"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/coreos.png" width="300" height="116"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/facebook-logo.png" width="300" height="113"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/collabora.png" width="300" height="169"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/opensuse-logo.png" width="300" height="190"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/pantheon.png" width="300" height="106"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/pengutronix.png" width="300" height="84"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/codethink-logo.png" width="300" height="88"&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="https://conf.systemd.io/media/imgs/sponsors/linux-magazin.png" width="300" height="126"&gt;&lt;/p&gt;
&lt;p&gt;I'd also like to thank the &lt;a href="https://c3voc.de/"&gt;Video Operation Center
("VOC")&lt;/a&gt; for their amazing work on live-streaming
the conference and making all talks available on YouTube. It's amazing
how efficient the VOC is, it's simply stunning! Thank you guys!&lt;/p&gt;
&lt;p&gt;In case you missed this year's iteration of the conference, please
have a look at our &lt;strong&gt;&lt;a href="https://www.youtube.com/channel/UCvq_RgZp3kljp9X8Io9Z1DA"&gt;YouTube
Channel&lt;/a&gt;&lt;/strong&gt;. You'll
find all of this year's talks there, as well the ones from last
year. (For example, my welcome talk is available
&lt;a href="https://www.youtube.com/watch?v=DUUbFGNZ1vI"&gt;here&lt;/a&gt;). Enjoy!&lt;/p&gt;
&lt;p&gt;We hope to see you again next year, for systemd.conf 2017 in Berlin!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 05 Oct 2016 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-10-05:/blog/systemdconf-2016-over-now.html</guid><category>projects</category></item><item><title>systemd.conf 2016 Workshop Tickets Available</title><link>https://0pointer.net/blog/systemdconf-2016-workshop-tickets-available.html</link><description>&lt;h1&gt;Tickets for systemd 2016 Workshop day still available!&lt;/h1&gt;
&lt;p&gt;We still have a number of ticket for the workshop day of &lt;a href="https://conf.systemd.io/"&gt;systemd.conf
2016&lt;/a&gt; available. If you are a newcomer to
systemd, and would like to learn about various systemd facilities, or
if you already know your way around, but would like to know more: this
is the best chance to do so. The workshop day is the 28th of
September, one day before the main conference, at the betahaus in
Berlin, Germany. The schedule for the day is available
&lt;a href="https://cfp.systemd.io/en/systemdconf_2016/public/schedule/0"&gt;here&lt;/a&gt;. There
are five interesting, extensive sessions, run by the systemd hackers
themselves. Who better to learn systemd from, than the folks who wrote
it?&lt;/p&gt;
&lt;p&gt;Note that the workshop day and the main conference days require
different tickets. (Also note: there are still a few tickets available for
the main conference!).&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ti.to/systemdconf/systemdconf-2016"&gt;Buy a ticket here.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;See you in Berlin!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sun, 18 Sep 2016 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-09-18:/blog/systemdconf-2016-workshop-tickets-available.html</guid><category>projects</category></item><item><title>Preliminary systemd.conf 2016 Schedule</title><link>https://0pointer.net/blog/preliminary-systemdconf-2016-now-available.html</link><description>&lt;h1&gt;A Preliminary systemd.conf 2016 Schedule is Now Available!&lt;/h1&gt;
&lt;p&gt;We have just published a first, preliminary version of the
&lt;a href="https://cfp.systemd.io/en/systemdconf_2016/public/schedule/1"&gt;systemd.conf 2016
schedule&lt;/a&gt;. There
is a small number of white slots in the schedule still, because we're
missing confirmation from a small number of presenters. The missing
talks will be added in as soon as they are confirmed.&lt;/p&gt;
&lt;p&gt;The schedule consists of 5 workshops by high-profile speakers during
the workshop day, 22 exciting talks during the main conference days,
followed by one full day of hackfests.&lt;/p&gt;
&lt;p&gt;Please sign up for the conference soon! Only a limited number of
tickets are available, hence make sure to secure yours quickly before
they run out! (Last year we sold out.) &lt;a href="https://ti.to/systemdconf/systemdconf-2016"&gt;Please sign up here for the
conference!&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 16 Aug 2016 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-08-16:/blog/preliminary-systemdconf-2016-now-available.html</guid><category>projects</category></item><item><title>FINAL REMINDER! systemd.conf 2016 CfP Ends on Monday!</title><link>https://0pointer.net/blog/final-reminder-systemdconf-2016-cfp-ends-on-monday.html</link><description>&lt;p&gt;Please note that the &lt;a href="https://conf.systemd.io/"&gt;systemd.conf 2016&lt;/a&gt;
Call for Participation ends on Monday, on &lt;strong&gt;Aug. 1st&lt;/strong&gt;!  Please send
in your talk proposal by then! We’ve already got a good number of
excellent submissions, but we are very interested in yours, too!&lt;/p&gt;
&lt;p&gt;&lt;a href="http://systemd.io/"&gt;&lt;img src="http://0pointer.de/public/systemdconf2016.png" width="750" height="349" border="0"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;We are looking for talks on all facets of systemd: deployment,
maintenance, administration, development. Regardless of whether you
use it in the cloud, on embedded, on IoT, on the desktop, on mobile,
in a container or on the server: we are interested in your
submissions!&lt;/p&gt;
&lt;p&gt;In addition to proposals for talks for the main conference, we are
looking for proposals for &lt;strong&gt;workshop sessions&lt;/strong&gt; held during our
Workshop Day (the first day of the conference). The workshop format
consists of a day of 2-3h training sessions, that may cover any
systemd-related topic you'd like. We are both interested in
submissions from the developer community as well as submissions from
organizations making use of systemd! Introductory workshop sessions
are particularly welcome, as the Workshop Day is intended to open up
our conference to newcomers and people who aren't systemd gurus yet,
but would like to become more fluent.&lt;/p&gt;
&lt;p&gt;For further details on the submissions we are looking for and the CfP
process, please consult the &lt;a href="https://cfp.systemd.io/en/systemdconf_2016/cfp/session/new"&gt;CfP
page&lt;/a&gt; and
submit your proposal using the provided form!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ALSO:&lt;/strong&gt; Please sign up for the conference soon! Only a
&lt;strong&gt;limited&lt;/strong&gt; number of tickets are available, hence make sure to secure
yours quickly before they run out! (Last year we sold out.) &lt;a href="https://ti.to/systemdconf/systemdconf-2016"&gt;Please
sign up here for the
conference!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AND OF COURSE:&lt;/strong&gt; We are also looking for more sponsors for
systemd.conf!  If you are working on systemd-related projects, or make
use of it in your company, &lt;a href="https://conf.systemd.io/files/systemdconf2016SponsorshipProspectus.pdf"&gt;please consider &lt;strong&gt;becoming a sponsor&lt;/strong&gt; of
systemd.conf
2016&lt;/a&gt;!
Without our sponsors we couldn't organize systemd.conf 2016!&lt;/p&gt;
&lt;p&gt;Thank you very much, and see you in Berlin!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 28 Jul 2016 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-07-28:/blog/final-reminder-systemdconf-2016-cfp-ends-on-monday.html</guid><category>projects</category></item><item><title>REMINDER! systemd.conf 2016 CfP Ends in Two Weeks!</title><link>https://0pointer.net/blog/reminder-systemdconf-2016-cfp-ends-in-two-weeks.html</link><description>&lt;p&gt;Please note that the &lt;a href="https://conf.systemd.io/"&gt;systemd.conf 2016&lt;/a&gt;
Call for Participation ends in less than two weeks, on &lt;strong&gt;Aug. 1st&lt;/strong&gt;!
Please send in your talk proposal by then! We’ve already got a good
number of excellent submissions, but we are interested in yours even
more!&lt;/p&gt;
&lt;p&gt;We are looking for talks on all facets of systemd: deployment,
maintenance, administration, development. Regardless of whether you
use it in the cloud, on embedded, on IoT, on the desktop, on mobile,
in a container or on the server: we are interested in your
submissions!&lt;/p&gt;
&lt;p&gt;In addition to proposals for talks for the main conference, we are
looking for proposals for &lt;strong&gt;workshop sessions&lt;/strong&gt; held during our
Workshop Day (the first day of the conference). The workshop format
consists of a day of 2-3h training sessions, that may cover any
systemd-related topic you'd like. We are both interested in
submissions from the developer community as well as submissions from
organizations making use of systemd! Introductory workshop sessions
are particularly welcome, as the Workshop Day is intended to open up
our conference to newcomers and people who aren't systemd gurus yet,
but would like to become more fluent.&lt;/p&gt;
&lt;p&gt;For further details on the submissions we are looking for and the CfP
process, please consult the &lt;a href="https://cfp.systemd.io/en/systemdconf_2016/cfp/session/new"&gt;CfP
page&lt;/a&gt; and
submit your proposal using the provided form!&lt;/p&gt;
&lt;p&gt;And keep in mind:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;REMINDER:&lt;/strong&gt; Please sign up for the conference soon! Only a
&lt;strong&gt;limited&lt;/strong&gt; number of tickets are available, hence make sure to secure
yours quickly before they run out! (Last year we sold out.) &lt;a href="https://ti.to/systemdconf/systemdconf-2016"&gt;Please
sign up here for the
conference!&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AND OF COURSE:&lt;/strong&gt; We are also looking for more sponsors for
systemd.conf!  If you are working on systemd-related projects, or make
use of it in your company, &lt;a href="https://conf.systemd.io/files/systemdconf2016SponsorshipProspectus.pdf"&gt;please consider &lt;strong&gt;becoming a sponsor&lt;/strong&gt; of
systemd.conf
2016&lt;/a&gt;!
Without our sponsors we couldn't organize systemd.conf 2016!&lt;/p&gt;
&lt;p&gt;Thank you very much, and see you in Berlin!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 19 Jul 2016 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-07-19:/blog/reminder-systemdconf-2016-cfp-ends-in-two-weeks.html</guid><category>projects</category></item><item><title>CfP is now open</title><link>https://0pointer.net/blog/cfp-is-now-open.html</link><description>&lt;h1&gt;The systemd.conf 2016 Call for Participation is Now Open!&lt;/h1&gt;
&lt;p&gt;We’d like to invite presentation and workshop proposals for &lt;a href="https://systemd.io/"&gt;systemd.conf 2016&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;The conference will consist of three parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One day of &lt;b&gt;workshops&lt;/b&gt;, consisting of in-depth (2-3hr) training and learning-by-doing sessions (Sept. 28th)&lt;/li&gt;
&lt;li&gt;Two days of regular &lt;b&gt;talks&lt;/b&gt; (Sept. 29th-30th)&lt;/li&gt;
&lt;li&gt;One day of &lt;b&gt;hackfest&lt;/b&gt; (Oct. 1st)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are now accepting submissions for the first three days: proposals
for workshops, training sessions and regular talks. In particular, we
are looking for sessions including, but not limited to, the following
topics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use Cases: systemd in today’s and tomorrow’s &lt;b&gt;devices&lt;/b&gt; and &lt;b&gt;applications&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;systemd and &lt;b&gt;containers&lt;/b&gt;, in the cloud and on &lt;b&gt;servers&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;systemd in &lt;b&gt;distributions&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Embedded&lt;/b&gt; systemd and in &lt;b&gt;IoT&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;systemd on the &lt;b&gt;desktop&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Networking&lt;/b&gt; with systemd&lt;/li&gt;
&lt;li&gt;… and everything else related to &lt;a href="https://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Please submit your proposals by &lt;strong&gt;August 1st, 2016&lt;/strong&gt;. Notification of acceptance will be sent out 1-2 weeks later.&lt;/p&gt;
&lt;p&gt;If submitting a workshop proposal please contact &lt;a href="mailto:info@systemd.io"&gt;the organizers&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;To submit a talk, please visit &lt;a href="https://cfp.systemd.io/en/systemdconf_2016/cfp/session/new"&gt;our CfP submission page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further information on systemd.conf 2016, please visit &lt;a href="https://systemd.io/"&gt;our conference web site&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 12 May 2016 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-05-12:/blog/cfp-is-now-open.html</guid><category>projects</category></item><item><title>Announcing systemd.conf 2016</title><link>https://0pointer.net/blog/announcing-systemdconf-2016.html</link><description>&lt;h1&gt;Announcing systemd.conf 2016&lt;/h1&gt;
&lt;p&gt;We are happy to announce the 2016 installment of systemd.conf, the conference of the systemd project!&lt;/p&gt;
&lt;p&gt;After our successful first conference 2015 we’d like to repeat the event in 2016 for the second time. The conference will take place on &lt;strong&gt;September 28th&lt;/strong&gt; until &lt;strong&gt;October 1st&lt;/strong&gt;, 2016 at &lt;strong&gt;betahaus&lt;/strong&gt; in &lt;strong&gt;Berlin, Germany&lt;/strong&gt;. The event is a few days before LinuxCon Europe, which also is located in Berlin this year. This year, the conference will consist of two days of presentations, a one-day hackfest and one day of hands-on training sessions.&lt;/p&gt;
&lt;p&gt;The website is online now, please visit &lt;a href="https://conf.systemd.io"&gt;https://conf.systemd.io/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Tickets at early-bird prices are available already. Purchase them at &lt;a href="https://ti.to/systemdconf/systemdconf-2016"&gt;https://ti.to/systemdconf/systemdconf-2016&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The Call for Presentations will open soon, we are looking forward to your submissions! A separate announcement will be published as soon as the CfP is open.&lt;/p&gt;
&lt;p&gt;systemd.conf 2016 is a organized jointly by the &lt;strong&gt;systemd community&lt;/strong&gt; and &lt;strong&gt;kinvolk.io&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;We are looking for sponsors! We’ve got early commitments from some of last year’s sponsors: &lt;strong&gt;Collabora&lt;/strong&gt;, &lt;strong&gt;Pengutronix&lt;/strong&gt; &amp;amp; &lt;strong&gt;Red Hat&lt;/strong&gt;.  Please see the web site for details about how your company may become a sponsor, too.&lt;/p&gt;
&lt;p&gt;If you have any questions, please contact us at &lt;a href="mailto:info@systemd.io"&gt;info@systemd.io&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 04 Apr 2016 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2016-04-04:/blog/announcing-systemdconf-2016.html</guid><category>projects</category></item><item><title>Introducing sd-event</title><link>https://0pointer.net/blog/introducing-sd-event.html</link><description>&lt;h1&gt;The Event Loop API of libsystemd&lt;/h1&gt;
&lt;p&gt;When we began working on
&lt;a href="https://wiki.freedesktop.org/www/Software/systemd/"&gt;systemd&lt;/a&gt; we built
it around a hand-written ad-hoc event loop, wrapping &lt;a href="http://man7.org/linux/man-pages/man7/epoll.7.html"&gt;Linux
epoll&lt;/a&gt;. The more
our project grew the more we realized the limitations of using raw
epoll:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;As we used
  &lt;a href="http://man7.org/linux/man-pages/man2/timerfd_create.2.html"&gt;timerfd&lt;/a&gt;
  for our timer events, each event source cost one file descriptor and
  we had many of them! File descriptors are a scarce resource on UNIX,
  as
  &lt;a href="http://man7.org/linux/man-pages/man2/setrlimit.2.html"&gt;RLIMIT_NOFILE&lt;/a&gt;
  is typically set to 1024 or similar, limiting the number of
  available file descriptors per process to 1021, which isn't
  particularly a lot.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ordering of event dispatching became a nightmare. In many cases, we
  wanted to make sure that a certain kind of event would always be
  dispatched before another kind of event, if both happen at the same
  time. For example, when the last process of a service dies, we might
  be notified about that via a SIGCHLD signal, via an
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_notify.html"&gt;sd_notify() "STATUS="&lt;/a&gt;
  message, and via a control group notification. We wanted to get
  these events in the right order, to know when it's safe to process
  and subsequently release the runtime data systemd keeps about the
  service or process: it shouldn't be done if there are still events
  about it pending.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;For each program we added to the systemd project we noticed we were
  adding similar code, over and over again, to work with epoll's
  complex interfaces. For example, finding the right file descriptor
  and callback function to dispatch an epoll event to, without running
  into invalidated pointer issues is outright difficult and requires
  non-trivial code.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Integrating child process watching into our event loops was much
  more complex than one could hope, and even more so if child process
  events should be ordered against each other and unrelated kinds of
  events.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Eventually, we started working on
&lt;a href="the-new-sd-bus-api-of-systemd.html"&gt;sd-bus&lt;/a&gt;. At
the same time we decided to seize the opportunity, put together a
proper event loop API in C, and then not only port sd-bus on top of
it, but also the rest of systemd. The result of this is
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd-event.html"&gt;sd-event&lt;/a&gt;. After
almost two years of development we declared sd-event stable in systemd
version 221, and published it as official API of libsystemd.&lt;/p&gt;
&lt;h2&gt;Why?&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-event.h"&gt;sd-event.h&lt;/a&gt;,
of course, is not the first event loop API around, and it doesn't
implement any really novel concepts. When we started working on it we
tried to do our homework, and checked the various existing event loop
APIs, maybe looking for candidates to adopt instead of doing our own,
and to learn about the strengths and weaknesses of the various
implementations existing. Ultimately, we found no implementation that
could deliver what we needed, or where it would be easy to add the
missing bits: as usual in the systemd project, we wanted something
that allows us access to all the Linux-specific bits, instead of
limiting itself to the least common denominator of UNIX. We weren't
looking for an abstraction API, but simply one that makes epoll usable
in system code.&lt;/p&gt;
&lt;p&gt;With this blog story I'd like to take the opportunity to introduce you
to sd-event, and explain why it might be a good candidate to adopt as
event loop implementation in your project, too.&lt;/p&gt;
&lt;p&gt;So, here are some features it provides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;I/O event sources, based on epoll's file descriptor watching,
  including edge triggered events (EPOLLET). See
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_io.html"&gt;sd_event_add_io(3)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Timer event sources, based on &lt;code&gt;timerfd_create()&lt;/code&gt;, supporting the
  &lt;code&gt;CLOCK_MONOTONIC&lt;/code&gt;, &lt;code&gt;CLOCK_REALTIME&lt;/code&gt;, &lt;code&gt;CLOCK_BOOTIME&lt;/code&gt; clocks, as well
  as the &lt;code&gt;CLOCK_REALTIME_ALARM&lt;/code&gt; and &lt;code&gt;CLOCK_BOOTTIME_ALARM&lt;/code&gt; clocks that
  can resume the system from suspend. When creating timer events a
  required accuracy parameter may be specified which allows coalescing
  of timer events to minimize power consumption. For each clock only a
  single timer file descriptor is kept, and all timer events are
  multiplexed with a priority queue. See
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_time.html"&gt;sd_event_add_time(3)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;UNIX process signal events, based on
  &lt;a href="http://man7.org/linux/man-pages/man2/signalfd.2.html"&gt;signalfd(2)&lt;/a&gt;,
  including full support for real-time signals, and queued
  parameters. See &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_signal.html"&gt;sd_event_add_signal(3)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Child process state change events, based on
  &lt;a href="http://man7.org/linux/man-pages/man2/waitid.2.html"&gt;waitid(2)&lt;/a&gt;. See
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_child.html"&gt;sd_event_add_child(3)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Static event sources, of three types: defer, post and exit, for
  invoking calls in each event loop, after other event sources or at
  event loop termination. See
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_defer.html"&gt;sd_event_add_defer(3)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Event sources may be assigned a 64bit priority value, that controls
  the order in which event sources are dispatched if multiple are
  pending simultanously. See
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_source_set_priority.html"&gt;sd_event_source_set_priority(3)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The event loop may automatically send watchdog notification messages
  to the service manager. See &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_set_watchdog.html"&gt;sd_event_set_watchdog(3)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The event loop may be integrated into foreign event loops, such as
  the GLib one. The event loop API is hence composable, the same way
  the underlying epoll logic is. See
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_get_fd.html"&gt;sd_event_get_fd(3)&lt;/a&gt;
  for an example.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The API is fully OOM safe.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A complete set of documentation in UNIX man page format is
  available, with
  &lt;a href="http://www.freedesktop.org/software/systemd/man/sd-event.html"&gt;sd-event(3)&lt;/a&gt;
  as the entry page.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It's pretty widely available, and requires no extra
  dependencies. Since systemd is built on it, most major distributions
  ship the library in their default install set.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;After two years of development, and after being used in all of
  systemd's components, it has received a fair share of testing already,
  even though we only recently decided to declare it stable and turned
  it into a public API.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note that sd-event has some potential drawbacks too:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If portability is essential to you, sd-event is not your best
  option. sd-event is a wrapper around Linux-specific APIs, and that's
  visible in the API. For example: our event callbacks receive
  structures defined by Linux-specific APIs such as signalfd.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It's a low-level C API, and it doesn't isolate you from the OS
  underpinnings. While I like to think that it is relatively nice and
  easy to use from C, it doesn't compromise on exposing the low-level
  functionality. It just fills the gaps in what's missing between
  epoll, timerfd, signalfd and related concepts, and it does not hide
  that away.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Either way, I believe that sd-event is a great choice when looking for
an event loop API, in particular if you work on system-level software
and embedded, where functionality like timer coalescing or
watchdog support matter.&lt;/p&gt;
&lt;h2&gt;Getting Started&lt;/h2&gt;
&lt;p&gt;Here's a short example how to use sd-event in a simple daemon. In this
example, we'll not just use &lt;a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-event.h"&gt;sd-event.h&lt;/a&gt;, but also &lt;a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-daemon.h"&gt;sd-daemon.h&lt;/a&gt; to
implement a system service.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;alloca.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;endian.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;errno.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;netinet/in.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;signal.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdbool.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;sys/ioctl.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;sys/socket.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;unistd.h&amp;gt;&lt;/span&gt;

&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;systemd/sd-daemon.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;systemd/sd-event.h&amp;gt;&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;io_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sd_event_source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;es&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;revents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;userdata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;ssize_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* UDP enforces a somewhat reasonable maximum datagram size of 64K, we can just allocate the buffer on the stack */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ioctl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FIONREAD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;alloca&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;recv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EAGAIN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;memcmp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;EXIT&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="cm"&gt;/* Request a clean exit */&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;sd_event_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sd_event_source_get_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;es&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;fwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;fflush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;union&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;sockaddr_in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;sockaddr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_event_source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;event_source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_event&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;sigset_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sigemptyset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;sigaddset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIGTERM&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;||&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;sigaddset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIGINT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Block SIGTERM first, so that the event loop can handle it */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sigprocmask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SIG_BLOCK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Let&amp;#39;s make use of the default handler and &amp;quot;floating&amp;quot; reference features of sd_event_add_signal() */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_add_signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIGTERM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_add_signal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SIGINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Enable automatic service watchdog support */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_set_watchdog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SOCK_DGRAM&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;SOCK_CLOEXEC&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;SOCK_NONBLOCK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;sockaddr_in&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sin_family&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AF_INET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sin_port&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;htobe16&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7777&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;sizeof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sa&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;errno&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_add_io&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;event_source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EPOLLIN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;io_handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_notifyf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                          &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;READY=1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;                          &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;STATUS=Daemon startup completed, processing events.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nl"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;event_source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_source_unref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_source&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_event_unref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failure: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXIT_FAILURE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXIT_SUCCESS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The example above shows how to write a minimal UDP/IP server, that
listens on port 7777. Whenever a datagram is received it outputs its
contents to STDOUT, unless it is precisely the string &lt;code&gt;EXIT\n&lt;/code&gt; in
which case the service exits. The service will react to SIGTERM and
SIGINT and do a clean exit then. It also notifies the service manager
about its completed startup, if it runs under a service
manager. Finally, it sends watchdog keep-alive messages to the service
manager if it asked for that, and if it runs under a service manager.&lt;/p&gt;
&lt;p&gt;When run as systemd service this service's STDOUT will be connected to
the logging framework of course, which means the service can act as a
minimal UDP-based remote logging service.&lt;/p&gt;
&lt;p&gt;To compile and link this example, save it as &lt;code&gt;event-example.c&lt;/code&gt;, then run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;gcc&lt;span class="w"&gt; &lt;/span&gt;event-example.c&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;event-example&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;pkg-config&lt;span class="w"&gt; &lt;/span&gt;--cflags&lt;span class="w"&gt; &lt;/span&gt;--libs&lt;span class="w"&gt; &lt;/span&gt;libsystemd&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For a first test, simply run the resulting binary from the command
line, and test it against the following netcat command line:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;$&lt;span class="w"&gt; &lt;/span&gt;nc&lt;span class="w"&gt; &lt;/span&gt;-u&lt;span class="w"&gt; &lt;/span&gt;localhost&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;7777&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the sake of brevity error checking is minimal, and in a real-world
application should, of course, be more comprehensive. However, it
hopefully gets the idea across how to write a daemon that reacts to
external events with sd-event.&lt;/p&gt;
&lt;p&gt;For further details on the functions used in the example above, please
consult the manual pages:
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd-event.html"&gt;sd-event(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_exit.html"&gt;sd_event_exit(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_source_get_event.html"&gt;sd_event_source_get_event(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_default.html"&gt;sd_event_default(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_signal.html"&gt;sd_event_add_signal(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_set_watchdog.html"&gt;sd_event_set_watchdog(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_add_io.html"&gt;sd_event_add_io(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_notifyf.html"&gt;sd_notifyf(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_loop.html"&gt;sd_event_loop(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_source_unref.html"&gt;sd_event_source_unref(3)&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_event_unref.html"&gt;sd_event_unref(3)&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;So, is this the event loop to end all other event loops? Certainly
not. I actually believe in "event loop plurality". There are many
reasons for that, but most importantly: sd-event is supposed to be an
event loop suitable for writing a wide range of applications, but it's
definitely not going to solve all event loop problems. For example,
while the priority logic is important for many usecase it comes with
drawbacks for others: if not used carefully high-priority event
sources can easily starve low-priority event sources. Also, in order
to implement the priority logic, sd-event needs to linearly iterate
through the event structures returned by
&lt;a href="http://man7.org/linux/man-pages/man2/epoll_wait.2.html"&gt;epoll_wait(2)&lt;/a&gt;
to sort the events by their priority, resulting in worst case
O(n*log(n)) complexity on each event loop wakeup (for n = number of
file descriptors). Then, to implement priorities fully, sd-event only
dispatches a single event before going back to the kernel and asking
for new events. sd-event will hence not provide the theoretically
possible best scalability to huge numbers of file descriptors. Of
course, this could be optimized, by improving epoll, and making it
support how todays's event loops actually work (after, all, this is
the problem set all event loops that implement priorities -- including
GLib's -- have to deal with), but even then: the design of sd-event is focussed on
running one event loop per thread, and it dispatches events strictly
ordered. In many other important usecases a very different design is
preferable: one where events are distributed to a set of worker threads
and are dispatched out-of-order.&lt;/p&gt;
&lt;p&gt;Hence, don't mistake sd-event for what it isn't. It's not supposed to
unify everybody on a single event loop. It's just supposed to be a
very good implementation of an event loop suitable for a large part of
the typical usecases.&lt;/p&gt;
&lt;p&gt;Note that our APIs, including
&lt;a href="the-new-sd-bus-api-of-systemd.html"&gt;sd-bus&lt;/a&gt;, integrate nicely into
sd-event event loops, but do not require it, and may be integrated
into other event loops too, as long as they support watching for time
and I/O events.&lt;/p&gt;
&lt;p&gt;And that's all for now. If you are considering using sd-event for your
project and need help or have questions, please direct them to the
&lt;a href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel"&gt;systemd mailing list&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 19 Nov 2015 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-11-19:/blog/introducing-sd-event.html</guid><category>projects</category></item><item><title>systemd.conf 2015 Summary</title><link>https://0pointer.net/blog/systemdconf-2015-summary.html</link><description>&lt;h1&gt;systemd.conf 2015 is Over Now!&lt;/h1&gt;
&lt;p&gt;Last week our first &lt;a href="https://systemd.events/"&gt;systemd.conf&lt;/a&gt; conference
took place at betahaus, in Berlin, Germany. With almost 100 attendees,
a dense schedule of 23 high-quality talks stuffed into a single track
on just two days, a productive hackfest and numerous consumed
Club-Mates I believe it was quite a success!&lt;/p&gt;
&lt;p&gt;If you couldn't attend the conference, you may watch all talks on our
&lt;a href="https://www.youtube.com/channel/UCvq_RgZp3kljp9X8Io9Z1DA"&gt;YouTube
Channel&lt;/a&gt;. The
slides are &lt;a href="https://drive.google.com/open?id=0B-UWEwsUY5PJZXQ2emdsVXJ4OTA"&gt;available
online&lt;/a&gt;,
too.&lt;/p&gt;
&lt;p&gt;Many photos from the conference are available on the &lt;a href="https://plus.google.com/events/gallery/cilbcdfrpbk12h2qe8o18fn7m04"&gt;Google Events
Page&lt;/a&gt;. Enjoy!&lt;/p&gt;
&lt;p&gt;I'd specifically like to thank Daniel Mack, Chris Kühl and Nils Magnus
for running the conference, and making sure that it worked out as
smoothly as it did! Thank you very much, you did a fantastic job!&lt;/p&gt;
&lt;p&gt;I'd also specifically like to thank the &lt;a href="http://c3voc.de/"&gt;CCC Video Operation
Center&lt;/a&gt; folks for the excellent video coverage of
the conference. Not only did they implement a live-stream for the
entire talks part of the conference, but also cut and uploaded videos
of all talks to our &lt;a href="https://www.youtube.com/channel/UCvq_RgZp3kljp9X8Io9Z1DA"&gt;YouTube
Channel&lt;/a&gt;
within the same day (in fact, within a few hours after the talks
finished). That's quite an impressive feat!&lt;/p&gt;
&lt;p&gt;The folks from LinuxTag e.V. put a lot of time and energy in the
organization. It was great to see how well this all worked out!
Excellent work!&lt;/p&gt;
&lt;p&gt;(BTW, LinuxTag e.V. and the CCC Video Operation Center folks are
willing to help with the organization of Free Software community
events in Germany (and Europe?). Hence, if you need an entity that can
do the financial work and other stuff for your Free Software project's
conference, consider pinging LinuxTag, they might be willing to
help. Similar, if you are organizing such an event and are thinking
about providing video coverage, consider pinging the the CCC VOC
folks! Both of them get our best recommendations!)&lt;/p&gt;
&lt;p&gt;I'd also like to thank &lt;a href="https://systemd.events/systemdconf-2015/sponsors"&gt;our conference
sponsors&lt;/a&gt;!
Specifically, we'd like to thank our Gold Sponsors &lt;strong&gt;Red Hat&lt;/strong&gt; and
&lt;strong&gt;CoreOS&lt;/strong&gt; for their support. We'd also like to thank our Silver
Sponsor &lt;strong&gt;Codethink&lt;/strong&gt;, and our Bronze Sponsors &lt;strong&gt;Pengutronix&lt;/strong&gt;,
&lt;strong&gt;Pantheon&lt;/strong&gt;, &lt;strong&gt;Collabora&lt;/strong&gt;, &lt;strong&gt;Endocode&lt;/strong&gt;, the &lt;strong&gt;Linux Foundation&lt;/strong&gt;,
&lt;strong&gt;Samsung&lt;/strong&gt; and &lt;strong&gt;Travelping&lt;/strong&gt;, as well as our Cooperation Partners
&lt;strong&gt;LinuxTag&lt;/strong&gt; and &lt;strong&gt;kinvolk.io&lt;/strong&gt;, and our Media Partner &lt;strong&gt;Golem.de&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Last but not least I'd really like to thank our speakers and attendees
for presenting and participating in the conference. Of course, the
conference we put together specifically for you, and we really hope
you had as much fun at it as we did!&lt;/p&gt;
&lt;p&gt;Thank you all for attending, supporting, and organizing &lt;a href="https://systemd.events/"&gt;systemd.conf
2015&lt;/a&gt;! We are looking forward to seeing you
and working with you again at systemd.conf 2016!&lt;/p&gt;
&lt;p&gt;Thanks!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 09 Nov 2015 00:00:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-11-09:/blog/systemdconf-2015-summary.html</guid><category>projects</category></item><item><title>Second Round of systemd.conf 2015 Sponsors</title><link>https://0pointer.net/blog/second-round-of-systemdconf-2015-sponsors.html</link><description>&lt;h1&gt;Second Round of systemd.conf 2015 Sponsors&lt;/h1&gt;
&lt;p&gt;We are happy to announce the second round of &lt;a href="https://systemd.events/"&gt;systemd.conf
2015&lt;/a&gt; sponsors! In addition to those from
&lt;a href="http://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html"&gt;the first
announcement&lt;/a&gt;, we have:&lt;/p&gt;
&lt;p&gt;Our second &lt;strong&gt;Gold&lt;/strong&gt; sponsor is Red Hat!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/red"&gt;&lt;img src="https://systemd.events/sites/default/files/Red_Hat_RGB-220.png" width="220" height="85"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What began as a better way to build software—openness, transparency, collaboration—soon shifted the balance of power in an entire industry. The revolution of choice continues. Today Red Hat® is the world's leading provider of open source solutions, using a community-powered approach to provide reliable and high-performing cloud, virtualization, storage, Linux®, and middleware technologies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Bronze&lt;/strong&gt; sponsor is &lt;em&gt;Samsung&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/samsung-poland-rd-center"&gt;&lt;img src="https://systemd.events/sites/default/files/samsung_logo.png" width="220" height="86"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;From the beginning we have established a very fast pace and are currently one of the biggest and fastest growing modern-technology R&amp;amp;D centers in East-Central Europe.
We have started with designing subsystems for digital satellite television, however, we have quickly expanded the scope of our interest. Currently, it includes advanced systems of digital television, platform convergence, mobile systems, smart solutions, and enterprise solutions.
Also a vital role in our activity plays the quality and certification center, which controls the conformity of Samsung Electronics products with the highest standards of quality and reliability.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Bronze&lt;/strong&gt; sponsor is &lt;em&gt;travelping&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/travelping"&gt;&lt;img src="https://systemd.events/sites/default/files/travelping_logo.png" width="220" height="60"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Travelping is passionate about networks, communications and devices. We empower our customers to deploy and operate networks using our state of the art products, solutions and services.
Our products and solutions are based on our industry proven physical and virtual appliance platforms. These purpose built platforms ensure best in class performance, scalability and reliability combined with consistent end to end management capabilities.
To build this products, Travelping has developed a own embedded, cross platform Linux distribution called CAROS.io which incorporates the systemd service manager and tools.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Bronze&lt;/strong&gt; sponsor is &lt;em&gt;Collabora&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/collabora"&gt;&lt;img src="https://systemd.events/sites/default/files/collabora-logo.png" width="220" height="124"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Collabora has over 10 years of experience working with top tier OEMs &amp;amp; silicon manufacturers worldwide to develop products based on Open Source software. Through the use of Open Source technologies and methodologies, Collabora helps clients in multiple market segments gain faster time to market and save millions of dollars in licensing and maintenance costs. Collabora has already brought to market several products relying on systemd extensively.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Bronze&lt;/strong&gt; sponsor is &lt;em&gt;Endocode&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/endocode"&gt;&lt;img src="https://systemd.events/sites/default/files/endocode-logo.png" width="220" height="52"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Endocode AG. An employee-owned, software engineering company from Berlin. Open Source is our heart and soul.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Bronze&lt;/strong&gt; sponsor is the &lt;em&gt;Linux&lt;/em&gt; &lt;em&gt;Foundation&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/linux-foundation"&gt;&lt;img src="https://systemd.events/sites/default/files/Linux_Foundation-logo.png" width="220" height="95"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The Linux Foundation advances the growth of Linux and offers its collaborative principles and practices to any endeavor.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We are &lt;strong&gt;Cooperating&lt;/strong&gt; with &lt;em&gt;LinuxTag&lt;/em&gt; &lt;em&gt;e.V.&lt;/em&gt; on the organization:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/linuxtag-ev"&gt;&lt;img src="https://systemd.events/sites/default/files/Linuxtag-logo.png" width="220" height="149"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;LinuxTag is Europe's leading organizer of Linux and Open Source events. Born of the community and in business for 20 years, we organize LinuxTag, an annual conference and exhibition attracting thousands of visitors. We also participate and cooperate in organizing workshops, tutorials, seminars, and other events together with and for the Open Source community. Selected events include non-profit workshops, the German Kernel Summit at FrOSCon, participation in the Open Tech Summit, and others. We take care of the organizational framework of systemd.conf 2015. LinuxTag e.V. is a non-profit organization and welcomes donations of ideas and workforce.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Media&lt;/strong&gt; Partner is &lt;em&gt;Golem&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/golem"&gt;&lt;img src="https://systemd.events/sites/default/files/golem_logo.png" width="220" height="220"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Golem.de is an up to date online-publication intended for professional computer users. It provides technology insights of the IT and telecommunications industry. Golem.de offers profound and up to date information on significant and trending topics. Online- and IT-Professionals, marketing managers, purchasers, and readers inspired by technology receive substantial information on product, market and branding potentials through tests, interviews und market analysis.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We'd like to thank our sponsors for their support! Without sponsors our conference would not be possible!&lt;/p&gt;
&lt;p&gt;The Conference s SOLD OUT since a few weeks. We no longer accept registrations, nor paper submissions.&lt;/p&gt;
&lt;p&gt;For further details about systemd.conf consult the &lt;a href="https://systemd.events/"&gt;conference website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;See the &lt;a href="http://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html"&gt;the first round of sponsor announcements&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;See you in Berlin!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 19 Oct 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-10-19:/blog/second-round-of-systemdconf-2015-sponsors.html</guid><category>projects</category></item><item><title>systemd.conf close to being sold out!</title><link>https://0pointer.net/blog/systemdconf-close-to-being-sold-out.html</link><description>&lt;h1&gt;Only 14 tickets still available!&lt;/h1&gt;
&lt;p&gt;systemd.conf 2015 is close to being sold out, there are &lt;em&gt;only&lt;/em&gt; &lt;em&gt;14&lt;/em&gt;
&lt;em&gt;tickets&lt;/em&gt; &lt;em&gt;left&lt;/em&gt; now. If you haven't bought your ticket yet, now is
the time to do it, because otherwise it will be too late and all
tickets will be gone!&lt;/p&gt;
&lt;p&gt;Why attend? At this conference you'll get to meet everybody who is
involved with the systemd project and learn what they are working on,
and where the project will go next. You'll hear from major users and
projects working with systemd. It's the primary forum where you can
make yourself heard and get first hand access to everybody who's
working on the future of the core Linux userspace!&lt;/p&gt;
&lt;p&gt;To get an idea about the schedule, please consult our &lt;a href="https://systemd.events/systemdconf-2015/schedule"&gt;preliminary
schedule&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In order to &lt;strong&gt;register&lt;/strong&gt; for the conference, please visit &lt;a href="https://systemd.events/systemdconf-2015/registration"&gt;the
registration
page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are still looking for sponsors. If you'd like to join the ranks of
systemd.conf 2015 sponsors, please have a look at our &lt;a href="https://systemd.events/systemdconf-2015/become-sponsor"&gt;Becoming a
Sponsor&lt;/a&gt; page!&lt;/p&gt;
&lt;p&gt;For further details about systemd.conf consult the &lt;a href="https://systemd.events/"&gt;conference
website&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 23 Sep 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-09-23:/blog/systemdconf-close-to-being-sold-out.html</guid><category>projects</category></item><item><title>Preliminary systemd.conf 2015 Schedule</title><link>https://0pointer.net/blog/preliminary-systemdconf-2015-schedule.html</link><description>&lt;h1&gt;A Preliminary systemd.conf 2015 Schedule is Now Online!&lt;/h1&gt;
&lt;p&gt;We are happy to announce that an initial, preliminary version of the
&lt;a href="https://systemd.events/systemdconf-2015/schedule"&gt;systemd.conf 2015
schedule&lt;/a&gt; is now
online! (Please ignore that some rows in the schedule link the same
session twice on that page. That's a bug in the web site CMS we are
working on to fix.)&lt;/p&gt;
&lt;p&gt;We got an overwhelming number of high-quality submissions during the
CfP!  Because there were so many good talks we really wanted to
accept, we decided to do two full days of talks now, leaving one more
day for the hackfest and BoFs. We also shortened many of the slots, to
make room for more. All in all we now have a schedule packed with
fantastic presentations!&lt;/p&gt;
&lt;p&gt;The areas covered range from containers, to system provisioning,
stateless systems, distributed init systems, the kdbus IPC, control
groups, systemd on the desktop, systemd in embedded devices,
configuration management and systemd, and systemd in downstream
distributions.&lt;/p&gt;
&lt;p&gt;We'd like to thank everybody who submited a presentation proposal!&lt;/p&gt;
&lt;p&gt;Also, don't forget to &lt;strong&gt;register&lt;/strong&gt; for the conference! Only a limited number of
registrations are available due to space constraints!
&lt;a href="https://systemd.events/systemdconf-2015/registration"&gt;Register here!&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are still looking for sponsors. If you'd like to join the ranks of
systemd.conf 2015 sponsors, please have a look at our &lt;a href="https://systemd.events/systemdconf-2015/become-sponsor"&gt;Becoming a
Sponsor&lt;/a&gt; page!&lt;/p&gt;
&lt;p&gt;For further details about systemd.conf consult the &lt;a href="https://systemd.events/"&gt;conference
website&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 16 Sep 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-09-16:/blog/preliminary-systemdconf-2015-schedule.html</guid><category>projects</category></item><item><title>systemd.conf 2015 CfP REMINDER</title><link>https://0pointer.net/blog/systemdconf-2015-cfp-reminder.html</link><description>&lt;h1&gt;LAST REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!&lt;/h1&gt;
&lt;p&gt;Here's the last reminder that the systemd.conf 2015 CfP ends on August
31st 11:59:59pm Central European Time (that's monday next week)! Make
sure to submit your proposals until then!&lt;/p&gt;
&lt;p&gt;Please submit your proposals &lt;a href="https://systemd.events/systemdconf-2015/call-presentations"&gt;on our
website&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;And don't forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
&lt;a href="https://systemd.events/systemdconf-2015/registration"&gt;Register here!&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further details about systemd.conf consult the &lt;a href="https://systemd.events/"&gt;conference website&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 28 Aug 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-08-28:/blog/systemdconf-2015-cfp-reminder.html</guid><category>projects</category></item><item><title>First Round of systemd.conf 2015 Sponsors</title><link>https://0pointer.net/blog/first-round-of-systemdconf-2015-sponsors.html</link><description>&lt;h1&gt;First Round of systemd.conf 2015 Sponsors&lt;/h1&gt;
&lt;p&gt;We are happy to announce the first round of &lt;a href="https://systemd.events/"&gt;systemd.conf
2015&lt;/a&gt; sponsors!&lt;/p&gt;
&lt;p&gt;Our first &lt;strong&gt;Gold&lt;/strong&gt; sponsor is CoreOS!&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/coreos"&gt;&lt;img src="https://systemd.events/sites/default/files/coreos-logo.png" width="240" height="105"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;CoreOS develops software for modern infrastructure that delivers a consistent operating environment for distributed applications. CoreOS's commercial offering, Tectonic, is an enterprise-ready platform that combines Kubernetes and the CoreOS stack to run Linux containers. In addition CoreOS is the creator and maintainer of open source projects such as CoreOS Linux, etcd, fleet, flannel and rkt. The strategies and architectures that influence CoreOS allow companies like Google, Facebook and Twitter to run their services at scale with high resilience. Learn more about CoreOS here https://coreos.com/, Tectonic here, https://tectonic.com/ or follow CoreOS on Twitter @coreoslinux.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Silver&lt;/strong&gt; sponsor is &lt;em&gt;Codethink&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/codethink"&gt;&lt;img src="https://systemd.events/sites/default/files/codethink-logo_0.png" width="220" height="64"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Codethink is a software services consultancy, focusing on engineering reliable systems for long-term deployment with open source technologies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Bronze&lt;/strong&gt; sponsor is &lt;em&gt;Pantheon&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/pantheon"&gt;&lt;img src="https://systemd.events/sites/default/files/Pantheon_logo.png" width="220" height="91"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Pantheon is a platform for professional website development, testing, and deployment. Supporting Drupal and WordPress, Pantheon runs over 100,000 websites for the world's top brands, universities, and media organizations on top of over a million containers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;A &lt;strong&gt;Bronze&lt;/strong&gt; sponsor is &lt;em&gt;Pengutronix&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://systemd.events/systemdconf-2015/sponsors/pengutronix"&gt;&lt;img src="https://systemd.events/sites/default/files/pengutronix_0.png" width="220" height="76"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Pengutronix provides consulting, training and development services for Embedded Linux to customers from the industry. The Kernel Team ports Linux to customer hardware and has more than 3100 patches in the official mainline kernel. In addition to lowlevel ports, the Pengutronix Application Team is responsible for board support packages based on PTXdist or Yocto and deals with system integration (this is where systemd plays an important role). The Graphics Team works on accelerated multimedia tasks, based on the Linux kernel, GStreamer, Qt and web technologies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;We'd like to thank our sponsors for their support! Without sponsors our conference would not be possible!&lt;/p&gt;
&lt;p&gt;We'll shortly announce our second round of sponsors, please stay tuned!&lt;/p&gt;
&lt;p&gt;If you'd like to join the ranks of systemd.conf 2015 sponsors, please have a look at our &lt;a href="https://systemd.events/systemdconf-2015/become-sponsor"&gt;Becoming a Sponsor&lt;/a&gt; page!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reminder!&lt;/strong&gt; The systemd.conf 2015 Call for Presentations ends on monday, &lt;strong&gt;August 31st&lt;/strong&gt;! Please make sure to submit your proposals on the &lt;a href="https://systemd.events/systemdconf-2015/call-presentations"&gt;CfP page&lt;/a&gt; until then!&lt;/p&gt;
&lt;p&gt;Also, don't forget to &lt;strong&gt;register&lt;/strong&gt; for the conference! Only a limited number of
registrations are available due to space constraints!
&lt;a href="https://systemd.events/systemdconf-2015/registration"&gt;Register here!&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further details about systemd.conf consult the &lt;a href="https://systemd.events/"&gt;conference website&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 25 Aug 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-08-25:/blog/first-round-of-systemdconf-2015-sponsors.html</guid><category>projects</category></item><item><title>systemd.conf 2015 Call for Presentations</title><link>https://0pointer.net/blog/systemdconf-2015-call-for-presentations.html</link><description>&lt;h1&gt;REMINDER! systemd.conf 2015 Call for Presentations ends August 31st!&lt;/h1&gt;
&lt;p&gt;We'd like to remind you that the systemd.conf 2015 Call for Presentations ends
on &lt;strong&gt;August 31st&lt;/strong&gt;! Please submit your presentation proposals before that data
&lt;a href="https://systemd.events/systemdconf-2015/call-presentations"&gt;on our website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are specifically interested in submissions from projects and vendors building
today's and tomorrow's &lt;strong&gt;products&lt;/strong&gt;, &lt;strong&gt;services&lt;/strong&gt; and &lt;strong&gt;devices&lt;/strong&gt; with systemd. We'd like to
learn about the problems you encounter and the benefits you see! Hence, if
you work for a company using systemd, please submit a presentation!&lt;/p&gt;
&lt;p&gt;We are also specifically interested in submissions from &lt;strong&gt;downstream&lt;/strong&gt; &lt;strong&gt;distribution&lt;/strong&gt;
&lt;strong&gt;maintainers&lt;/strong&gt; of systemd! If you develop or maintain systemd packages in a
distribution, please submit a presentation reporting about the state, future
and the problems of systemd packaging so that we can improve downstream
collaboration!&lt;/p&gt;
&lt;p&gt;And of course, all talks regarding systemd usage in &lt;strong&gt;containers&lt;/strong&gt;, in the &lt;strong&gt;cloud&lt;/strong&gt;,
on &lt;strong&gt;servers&lt;/strong&gt;, on the &lt;strong&gt;desktop&lt;/strong&gt;, in &lt;strong&gt;mobile&lt;/strong&gt; and in &lt;strong&gt;embedded&lt;/strong&gt; are highly welcome! Talks
about systemd &lt;strong&gt;networking&lt;/strong&gt; and &lt;strong&gt;kdbus&lt;/strong&gt; IPC are very welcome too!&lt;/p&gt;
&lt;p&gt;Please submit your presentations until &lt;em&gt;August 31st&lt;/em&gt;!&lt;/p&gt;
&lt;p&gt;And don't forget to register for the conference! Only a limited number of
registrations are available due to space constraints!
&lt;a href="https://systemd.events/systemdconf-2015/registration"&gt;Register here!&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Also, limited travel and entry fee sponsorship is available for community contributors. Please contact us for details!&lt;/p&gt;
&lt;p&gt;For further details about the CfP consult the &lt;a href="https://systemd.events/systemdconf-2015/call-presentations"&gt;CfP page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;For further details about systemd.conf consult the &lt;a href="https://systemd.events/"&gt;conference website&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 19 Aug 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-08-19:/blog/systemdconf-2015-call-for-presentations.html</guid><category>projects</category></item><item><title>Announcing systemd.conf 2015</title><link>https://0pointer.net/blog/announcing-systemdconf-2015.html</link><description>&lt;h1&gt;Announcing systemd.conf 2015&lt;/h1&gt;
&lt;p&gt;We are happy to announce the inaugural &lt;a href="https://systemd.events/"&gt;systemd.conf 2015&lt;/a&gt; conference of the &lt;a href="https://wiki.freedesktop.org/www/Software/systemd/"&gt;systemd project&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The conference takes place November 5th-7th, 2015 in Berlin, Germany.&lt;/p&gt;
&lt;p&gt;Only a limited number of tickets are available, hence make sure to sign up quickly.&lt;/p&gt;
&lt;p&gt;For further details consult the &lt;a href="https://systemd.events/"&gt;conference website&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 29 Jul 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-07-29:/blog/announcing-systemdconf-2015.html</guid><category>projects</category></item><item><title>The new sd-bus API of systemd</title><link>https://0pointer.net/blog/the-new-sd-bus-api-of-systemd.html</link><description>&lt;p&gt;With the new &lt;a href="http://lists.freedesktop.org/archives/systemd-devel/2015-June/033170.html"&gt;v221 release of
systemd&lt;/a&gt;
we are declaring the
&lt;a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-bus.h"&gt;sd-bus&lt;/a&gt;
API shipped with
&lt;a href="https://wiki.freedesktop.org/www/Software/systemd/"&gt;systemd&lt;/a&gt;
stable. sd-bus is our minimal &lt;a href="https://en.wikipedia.org/wiki/D-Bus"&gt;D-Bus
IPC&lt;/a&gt; C library, supporting as
back-ends both classic socket-based D-Bus and
&lt;a href="https://github.com/systemd/kdbus"&gt;kdbus&lt;/a&gt;. The library has been been
part of systemd for a while, but has only been used internally, since
we wanted to have the liberty to still make API changes without
affecting external consumers of the library. However, now we are
confident to commit to a stable API for it, starting with v221.&lt;/p&gt;
&lt;p&gt;In this blog story I hope to provide you with a quick overview on
sd-bus, a short reiteration on D-Bus and its concepts, as well as a
few simple examples how to write D-Bus clients and services with it.&lt;/p&gt;
&lt;h1&gt;What is D-Bus again?&lt;/h1&gt;
&lt;p&gt;Let's start with a quick reminder what
&lt;a href="https://en.wikipedia.org/wiki/D-Bus"&gt;D-Bus&lt;/a&gt; actually is: it's a
powerful, generic IPC system for Linux and other operating systems. It
knows concepts like buses, objects, interfaces, methods, signals,
properties. It provides you with fine-grained access control, a rich
type system, discoverability, introspection, monitoring, reliable
multicasting, service activation, file descriptor passing, and
more. There are bindings for numerous programming languages that are
used on Linux.&lt;/p&gt;
&lt;p&gt;D-Bus has been a core component of Linux systems since more than 10
years. It is certainly the most widely established high-level local
IPC system on Linux. Since systemd's inception it has been the IPC
system it exposes its interfaces on. And even before systemd, it was
the IPC system Upstart used to expose its interfaces. It is used by
GNOME, by KDE and by a variety of system components.&lt;/p&gt;
&lt;p&gt;D-Bus refers to both &lt;a href="http://dbus.freedesktop.org/doc/dbus-specification.html"&gt;a
specification&lt;/a&gt;,
and &lt;a href="https://wiki.freedesktop.org/www/Software/dbus/"&gt;a reference
implementation&lt;/a&gt;. The
reference implementation provides both a bus server component, as well
as a client library. While there are multiple other, popular
reimplementations of the client library – for both C and other
programming languages –, the only commonly used server side is the
one from the reference implementation. (However, the kdbus project is
working on providing an alternative to this server implementation as a
kernel component.)&lt;/p&gt;
&lt;p&gt;D-Bus is mostly used as local IPC, on top of AF_UNIX sockets. However,
the protocol may be used on top of TCP/IP as well. It does not
natively support encryption, hence using D-Bus directly on TCP is
usually not a good idea. It is possible to combine D-Bus with a
transport like ssh in order to secure it. systemd uses this to make
many of its APIs accessible remotely.&lt;/p&gt;
&lt;p&gt;A frequently asked question about D-Bus is why it exists at all,
given that AF_UNIX sockets and FIFOs already exist on UNIX and have
been used for a long time successfully. To answer this question let's
make a comparison with popular web technology of today: what
AF_UNIX/FIFOs are to D-Bus, TCP is to HTTP/REST. While AF_UNIX
sockets/FIFOs only shovel raw bytes between processes, D-Bus defines
actual message encoding and adds concepts like method call
transactions, an object system, security mechanisms, multicasting and
more.&lt;/p&gt;
&lt;p&gt;From our 10year+ experience with D-Bus we know today that while there
are some areas where we can improve things (and we are working on
that, both with kdbus and sd-bus), it generally appears to be a very
well designed system, that stood the test of time, aged well and is
widely established. Today, if we'd sit down and design a completely
new IPC system incorporating all the experience and knowledge we
gained with D-Bus, I am sure the result would be very close to what
D-Bus already is.&lt;/p&gt;
&lt;p&gt;Or in short: D-Bus is great. If you hack on a Linux project and need a
local IPC, it should be your first choice. Not only because D-Bus is
well designed, but also because there aren't many alternatives that
can cover similar functionality.&lt;/p&gt;
&lt;h1&gt;Where does sd-bus fit in?&lt;/h1&gt;
&lt;p&gt;Let's discuss why sd-bus exists, how it compares with the other
existing C D-Bus libraries and why it might be a library to consider
for your project.&lt;/p&gt;
&lt;p&gt;For C, there are two established, popular D-Bus libraries: libdbus, as
it is shipped in the reference implementation of D-Bus, as well as
GDBus, a component of GLib, the low-level tool library of GNOME.&lt;/p&gt;
&lt;p&gt;Of the two libdbus is the much older one, as it was written at the
time the specification was put together. The library was written with
a focus on being portable and to be useful as back-end for higher-level
language bindings. Both of these goals required the API to be very
generic, resulting in a relatively baroque, hard-to-use API that lacks
the bits that make it easy and fun to use from C. It provides the
building blocks, but few tools to actually make it straightforward to
build a house from them. On the other hand, the library is suitable
for most use-cases (for example, it is OOM-safe making it suitable for
writing lowest level system software), and is portable to operating
systems like Windows or more exotic UNIXes.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://developer.gnome.org/gio/stable/gdbus-convenience.html"&gt;GDBus&lt;/a&gt;
is a much newer implementation. It has been written after considerable
experience with using a GLib/GObject wrapper around libdbus. GDBus is
implemented from scratch, shares no code with libdbus. Its design
differs substantially from libdbus, it contains code generators to
make it specifically easy to expose GObject objects on the bus, or
talking to D-Bus objects as GObject objects. It translates D-Bus data
types to GVariant, which is GLib's powerful data serialization
format. If you are used to GLib-style programming then you'll feel
right at home, hacking D-Bus services and clients with it is a lot
simpler than using libdbus.&lt;/p&gt;
&lt;p&gt;With sd-bus we now provide a third implementation, sharing no code
with either libdbus or GDBus. For us, the focus was on providing kind
of a middle ground between libdbus and GDBus: a low-level C library
that actually is fun to work with, that has enough syntactic sugar to
make it easy to write clients and services with, but on the other hand
is more low-level than GDBus/GLib/GObject/GVariant. To be able to use
it in systemd's various system-level components it needed to be
OOM-safe and minimal. Another major point we wanted to focus on was
supporting a kdbus back-end right from the beginning, in addition to
the socket transport of the original D-Bus specification ("dbus1"). In
fact, we wanted to design the library closer to kdbus' semantics than
to dbus1's, wherever they are different, but still cover both
transports nicely. In contrast to libdbus or GDBus portability is not
a priority for sd-bus, instead we try to make the best of the Linux
platform and expose specific Linux concepts wherever that is
beneficial. Finally, performance was also an issue (though a secondary
one): neither libdbus nor GDBus will win any speed records. We wanted
to improve on performance (throughput and latency) -- but simplicity
and correctness are more important to us. We believe the result of our
work delivers our goals quite nicely: the library is fun to use,
supports kdbus and sockets as back-end, is relatively minimal, and the
&lt;a href="http://lists.freedesktop.org/archives/systemd-devel/2015-May/031418.html"&gt;performance is substantially
better&lt;/a&gt;
than both libdbus and GDBus.&lt;/p&gt;
&lt;p&gt;To decide which of the three APIs to use for you C project, here are
short guidelines:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If you hack on a GLib/GObject project, GDBus is definitely your
  first choice.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If portability to non-Linux kernels -- including Windows, Mac OS and
  other UNIXes -- is important to you, use either GDBus (which more or
  less means buying into GLib/GObject) or libdbus (which requires a
  lot of manual work).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Otherwise, sd-bus would be my recommended choice.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(I am not covering C++ specifically here, this is all about plain C
only. But do note: if you use Qt, then QtDBus is the D-Bus API of
choice, being a wrapper around libdbus.)&lt;/p&gt;
&lt;h1&gt;Introduction to D-Bus Concepts&lt;/h1&gt;
&lt;p&gt;To the uninitiated D-Bus usually appears to be a relatively opaque
technology. It uses lots of concepts that appear unnecessarily complex
and redundant on first sight. But actually, they make a lot of
sense. Let's have a look:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A &lt;em&gt;bus&lt;/em&gt; is where you look for IPC services. There are usually two
   kinds of buses: a system bus, of which there's exactly one per
   system, and which is where you'd look for system services; and a
   user bus, of which there's one per user, and which is where you'd
   look for user services, like the address book service or the mail
   program. (Originally, the user bus was actually a session bus -- so
   that you get multiple of them if you log in many times as the same
   user --, and on most setups it still is, but we are working on
   moving things to a true user bus, of which there is only one per
   user on a system, regardless how many times that user happens to
   log in.)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;em&gt;service&lt;/em&gt; is a program that offers some IPC API on a bus. A
   service is identified by a name in reverse domain name
   notation. Thus, the &lt;code&gt;org.freedesktop.NetworkManager&lt;/code&gt; service on the
   system bus is where NetworkManager's APIs are available and
   &lt;code&gt;org.freedesktop.login1&lt;/code&gt; on the system bus is where
   &lt;code&gt;systemd-logind&lt;/code&gt;'s APIs are exposed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;em&gt;client&lt;/em&gt; is a program that makes use of some IPC API on a bus. It
   talks to a service, monitors it and generally doesn't provide any
   services on its own. That said, lines are blurry and many services
   are also clients to other services. Frequently the term &lt;em&gt;peer&lt;/em&gt; is
   used as a generalization to refer to either a service or a client.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An &lt;em&gt;object path&lt;/em&gt; is an identifier for an object on a specific
   service. In a way this is comparable to a C pointer, since that's
   how you generally reference a C object, if you hack object-oriented
   programs in C. However, C pointers are just memory addresses, and
   passing memory addresses around to other processes would make
   little sense, since they of course refer to the address space of
   the service, the client couldn't make sense of it. Thus, the D-Bus
   designers came up with the object path concept, which is just a
   string that looks like a file system path. Example:
   &lt;code&gt;/org/freedesktop/login1&lt;/code&gt; is the object path of the 'manager'
   object of the &lt;code&gt;org.freedesktop.login1&lt;/code&gt; service (which, as we
   remember from above, is still the service &lt;code&gt;systemd-logind&lt;/code&gt;
   exposes). Because object paths are structured like file system
   paths they can be neatly arranged in a tree, so that you end up
   with a venerable tree of objects. For example, you'll find all user
   sessions &lt;code&gt;systemd-logind&lt;/code&gt; manages below the
   &lt;code&gt;/org/freedesktop/login1/session&lt;/code&gt; sub-tree, for example called
   &lt;code&gt;/org/freedesktop/login1/session/_7&lt;/code&gt;,
   &lt;code&gt;/org/freedesktop/login1/session/_55&lt;/code&gt; and so on. How services
   precisely label their objects and arrange them in a tree is
   completely up to the developers of the services.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Each object that is identified by an object path has one or more
   &lt;em&gt;interfaces&lt;/em&gt;. An interface is a collection of signals, methods, and
   properties (collectively called &lt;em&gt;members&lt;/em&gt;), that belong
   together. The concept of a D-Bus interface is actually pretty
   much identical to what you know from programming languages such as
   Java, which also know an interface concept. Which interfaces an
   object implements are up the developers of the service. Interface
   names are in reverse domain name notation, much like service
   names. (Yes, that's admittedly confusing, in particular since it's
   pretty common for simpler services to reuse the service name string
   also as an interface name.) A couple of interfaces are standardized
   though and you'll find them available on many of the objects
   offered by the various services. Specifically, those are
   &lt;code&gt;org.freedesktop.DBus.Introspectable&lt;/code&gt;, &lt;code&gt;org.freedesktop.DBus.Peer&lt;/code&gt;
   and &lt;code&gt;org.freedesktop.DBus.Properties&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An interface can contain &lt;em&gt;methods&lt;/em&gt;. The word "method" is more or
   less just a fancy word for "function", and is a term used pretty
   much the same way in object-oriented languages such as Java. The
   most common interaction between D-Bus peers is that one peer
   invokes one of these methods on another peer and gets a reply. A
   D-Bus method takes a couple of parameters, and returns others. The
   parameters are transmitted in a type-safe way, and the type
   information is included in the introspection data you can query
   from each object. Usually, method names (and the other member
   types) follow a &lt;em&gt;CamelCase&lt;/em&gt; syntax. For example, &lt;code&gt;systemd-logind&lt;/code&gt;
   exposes an &lt;code&gt;ActivateSession&lt;/code&gt; method on the
   &lt;code&gt;org.freedesktop.login1.Manager&lt;/code&gt; interface that is available on the
   &lt;code&gt;/org/freedesktop/login1&lt;/code&gt; object of the &lt;code&gt;org.freedesktop.login1&lt;/code&gt;
   service.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;em&gt;signature&lt;/em&gt; describes a set of parameters a function (or signal,
   property, see below) takes or returns. It's a series of characters
   that each encode one parameter by its type. The set of types
   available is pretty powerful. For example, there are simpler types
   like &lt;code&gt;s&lt;/code&gt; for string, or &lt;code&gt;u&lt;/code&gt; for 32bit integer, but also complex
   types such as &lt;code&gt;as&lt;/code&gt; for an array of strings or &lt;code&gt;a(sb)&lt;/code&gt; for an array
   of structures consisting of one string and one boolean each.  See
   &lt;a href="http://dbus.freedesktop.org/doc/dbus-specification.html#type-system"&gt;the D-Bus specification&lt;/a&gt;
   for the full explanation of the type system.  The
   &lt;code&gt;ActivateSession&lt;/code&gt; method mentioned above takes a single string as
   parameter (the parameter signature is hence &lt;code&gt;s&lt;/code&gt;), and returns
   nothing (the return signature is hence the empty string). Of
   course, the signature can get a lot more complex, see below for
   more examples.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;em&gt;signal&lt;/em&gt; is another member type that the D-Bus object system
   knows. Much like a method it has a signature. However, they serve
   different purposes. While in a method call a single client issues a
   request on a single service, and that service sends back a response
   to the client, signals are for general notification of
   peers. Services send them out when they want to tell one or more
   peers on the bus that something happened or changed. In contrast to
   method calls and their replies they are hence usually broadcast
   over a bus. While method calls/replies are used for duplex
   one-to-one communication, signals are usually used for simplex
   one-to-many communication (note however that that's not a
   requirement, they can also be used one-to-one). Example:
   &lt;code&gt;systemd-logind&lt;/code&gt; broadcasts a &lt;code&gt;SessionNew&lt;/code&gt; signal from its manager
   object each time a user logs in, and a &lt;code&gt;SessionRemoved&lt;/code&gt; signal
   every time a user logs out.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;em&gt;property&lt;/em&gt; is the third member type that the D-Bus object system
   knows. It's similar to the property concept known by languages like
   C#. Properties also have a signature, and are more or less just
   variables that an object exposes, that can be read or altered by
   clients. Example: &lt;code&gt;systemd-logind&lt;/code&gt; exposes a property &lt;code&gt;Docked&lt;/code&gt; of
   the signature &lt;code&gt;b&lt;/code&gt; (a boolean). It reflects whether &lt;code&gt;systemd-logind&lt;/code&gt;
   thinks the system is currently in a docking station of some form
   (only applies to laptops …).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So much for the various concepts D-Bus knows. Of course, all these new
concepts might be overwhelming. Let's look at them from a different
perspective. I assume many of the readers have an understanding of
today's web technology, specifically HTTP and REST. Let's try to
compare the concept of a HTTP request with the concept of a D-Bus
method call:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A HTTP request you issue on a specific network. It could be the
   Internet, or it could be your local LAN, or a company
   VPN. Depending on which network you issue the request on, you'll be
   able to talk to a different set of servers. This is not unlike the
   "bus" concept of D-Bus.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On the network you then pick a specific HTTP server to talk
   to. That's roughly comparable to picking a service on a specific bus.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On the HTTP server you then ask for a specific URL. The "path" part
   of the URL (by which I mean everything after the host name of the
   server, up to the last "/") is pretty similar to a D-Bus object path.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The "file" part of the URL (by which I mean everything after the
   last slash, following the path, as described above), then defines
   the actual call to make. In D-Bus this could be mapped to an
   interface and method name.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, the parameters of a HTTP call follow the path after the
   "?", they map to the signature of the D-Bus call.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, comparing an HTTP request to a D-Bus method call is a bit
comparing apples and oranges. However, I think it's still useful to
get a bit of a feeling of what maps to what.&lt;/p&gt;
&lt;h1&gt;From the shell&lt;/h1&gt;
&lt;p&gt;So much about the concepts and the gray theory behind them. Let's make
this exciting, let's actually see how this feels on a real system.&lt;/p&gt;
&lt;p&gt;Since a while systemd has included a tool &lt;code&gt;busctl&lt;/code&gt; that is useful to
explore and interact with the D-Bus object system. When invoked
without parameters, it will show you a list of all peers connected to
the system bus. (Use &lt;code&gt;--user&lt;/code&gt; to see the peers of your user bus
instead):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl
&lt;span class="go"&gt;NAME                                       PID PROCESS         USER             CONNECTION    UNIT                      SESSION    DESCRIPTION&lt;/span&gt;
&lt;span class="go"&gt;:1.1                                         1 systemd         root             :1.1          -                         -          -&lt;/span&gt;
&lt;span class="go"&gt;:1.11                                      705 NetworkManager  root             :1.11         NetworkManager.service    -          -&lt;/span&gt;
&lt;span class="go"&gt;:1.14                                      744 gdm             root             :1.14         gdm.service               -          -&lt;/span&gt;
&lt;span class="go"&gt;:1.4                                       708 systemd-logind  root             :1.4          systemd-logind.service    -          -&lt;/span&gt;
&lt;span class="go"&gt;:1.7200                                  17563 busctl          lennart          :1.7200       session-1.scope           1          -&lt;/span&gt;
&lt;span class="go"&gt;[…]&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.NetworkManager             705 NetworkManager  root             :1.11         NetworkManager.service    -          -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.login1                     708 systemd-logind  root             :1.4          systemd-logind.service    -          -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.systemd1                     1 systemd         root             :1.1          -                         -          -&lt;/span&gt;
&lt;span class="go"&gt;org.gnome.DisplayManager                   744 gdm             root             :1.14         gdm.service               -          -&lt;/span&gt;
&lt;span class="go"&gt;[…]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(I have shortened the output a bit, to make keep things brief).&lt;/p&gt;
&lt;p&gt;The list begins with a list of all peers currently connected to the
bus. They are identified by peer names like ":1.11". These are called
&lt;em&gt;unique names&lt;/em&gt; in D-Bus nomenclature. Basically, every peer has a
unique name, and they are assigned automatically when a peer connects
to the bus. They are much like an IP address if you so will. You'll
notice that a couple of peers are already connected, including our
little busctl tool itself as well as a number of system services. The
list then shows all actual services on the bus, identified by their
service names (as discussed above; to discern them from the unique
names these are also called &lt;em&gt;well-known names&lt;/em&gt;). In many ways
well-known names are similar to DNS host names, i.e. they are a
friendlier way to reference a peer, but on the lower level they just
map to an IP address, or in this comparison the unique name. Much like
you can connect to a host on the Internet by either its host name or
its IP address, you can also connect to a bus peer either by its
unique or its well-known name. (Note that each peer can have as many
well-known names as it likes, much like an IP address can have
multiple host names referring to it).&lt;/p&gt;
&lt;p&gt;OK, that's already kinda cool. Try it for yourself, on your local
machine (all you need is a recent, systemd-based distribution).&lt;/p&gt;
&lt;p&gt;Let's now go the next step. Let's see which objects the
&lt;code&gt;org.freedesktop.login1&lt;/code&gt; service actually offers:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;tree&lt;span class="w"&gt; &lt;/span&gt;org.freedesktop.login1
&lt;span class="go"&gt;└─/org/freedesktop/login1&lt;/span&gt;
&lt;span class="go"&gt;  ├─/org/freedesktop/login1/seat&lt;/span&gt;
&lt;span class="go"&gt;  │ ├─/org/freedesktop/login1/seat/seat0&lt;/span&gt;
&lt;span class="go"&gt;  │ └─/org/freedesktop/login1/seat/self&lt;/span&gt;
&lt;span class="go"&gt;  ├─/org/freedesktop/login1/session&lt;/span&gt;
&lt;span class="go"&gt;  │ ├─/org/freedesktop/login1/session/_31&lt;/span&gt;
&lt;span class="go"&gt;  │ └─/org/freedesktop/login1/session/self&lt;/span&gt;
&lt;span class="go"&gt;  └─/org/freedesktop/login1/user&lt;/span&gt;
&lt;span class="go"&gt;    ├─/org/freedesktop/login1/user/_1000&lt;/span&gt;
&lt;span class="go"&gt;    └─/org/freedesktop/login1/user/self&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Pretty, isn't it? What's actually even nicer, and which the output
does &lt;em&gt;not&lt;/em&gt; show is that there's full command line completion
available: as you press TAB the shell will auto-complete the service
names for you. It's a real pleasure to explore your D-Bus objects that
way!&lt;/p&gt;
&lt;p&gt;The output shows some objects that you might recognize from the
explanations above. Now, let's go further. Let's see what interfaces,
methods, signals and properties one of these objects actually exposes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;introspect&lt;span class="w"&gt; &lt;/span&gt;org.freedesktop.login1&lt;span class="w"&gt; &lt;/span&gt;/org/freedesktop/login1/session/_31
&lt;span class="go"&gt;NAME                                TYPE      SIGNATURE RESULT/VALUE                             FLAGS&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.DBus.Introspectable interface -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Introspect                         method    -         s                                        -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.DBus.Peer           interface -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.GetMachineId                       method    -         s                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Ping                               method    -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.DBus.Properties     interface -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Get                                method    ss        v                                        -&lt;/span&gt;
&lt;span class="go"&gt;.GetAll                             method    s         a{sv}                                    -&lt;/span&gt;
&lt;span class="go"&gt;.Set                                method    ssv       -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.PropertiesChanged                  signal    sa{sv}as  -                                        -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.login1.Session      interface -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Activate                           method    -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Kill                               method    si        -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Lock                               method    -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.PauseDeviceComplete                method    uu        -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.ReleaseControl                     method    -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.ReleaseDevice                      method    uu        -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.SetIdleHint                        method    b         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.TakeControl                        method    b         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.TakeDevice                         method    uu        hb                                       -&lt;/span&gt;
&lt;span class="go"&gt;.Terminate                          method    -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Unlock                             method    -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Active                             property  b         true                                     emits-change&lt;/span&gt;
&lt;span class="go"&gt;.Audit                              property  u         1                                        const&lt;/span&gt;
&lt;span class="go"&gt;.Class                              property  s         &amp;quot;user&amp;quot;                                   const&lt;/span&gt;
&lt;span class="go"&gt;.Desktop                            property  s         &amp;quot;&amp;quot;                                       const&lt;/span&gt;
&lt;span class="go"&gt;.Display                            property  s         &amp;quot;&amp;quot;                                       const&lt;/span&gt;
&lt;span class="go"&gt;.Id                                 property  s         &amp;quot;1&amp;quot;                                      const&lt;/span&gt;
&lt;span class="go"&gt;.IdleHint                           property  b         true                                     emits-change&lt;/span&gt;
&lt;span class="go"&gt;.IdleSinceHint                      property  t         1434494624206001                         emits-change&lt;/span&gt;
&lt;span class="go"&gt;.IdleSinceHintMonotonic             property  t         0                                        emits-change&lt;/span&gt;
&lt;span class="go"&gt;.Leader                             property  u         762                                      const&lt;/span&gt;
&lt;span class="go"&gt;.Name                               property  s         &amp;quot;lennart&amp;quot;                                const&lt;/span&gt;
&lt;span class="go"&gt;.Remote                             property  b         false                                    const&lt;/span&gt;
&lt;span class="go"&gt;.RemoteHost                         property  s         &amp;quot;&amp;quot;                                       const&lt;/span&gt;
&lt;span class="go"&gt;.RemoteUser                         property  s         &amp;quot;&amp;quot;                                       const&lt;/span&gt;
&lt;span class="go"&gt;.Scope                              property  s         &amp;quot;session-1.scope&amp;quot;                        const&lt;/span&gt;
&lt;span class="go"&gt;.Seat                               property  (so)      &amp;quot;seat0&amp;quot; &amp;quot;/org/freedesktop/login1/seat... const&lt;/span&gt;
&lt;span class="go"&gt;.Service                            property  s         &amp;quot;gdm-autologin&amp;quot;                          const&lt;/span&gt;
&lt;span class="go"&gt;.State                              property  s         &amp;quot;active&amp;quot;                                 -&lt;/span&gt;
&lt;span class="go"&gt;.TTY                                property  s         &amp;quot;/dev/tty1&amp;quot;                              const&lt;/span&gt;
&lt;span class="go"&gt;.Timestamp                          property  t         1434494630344367                         const&lt;/span&gt;
&lt;span class="go"&gt;.TimestampMonotonic                 property  t         34814579                                 const&lt;/span&gt;
&lt;span class="go"&gt;.Type                               property  s         &amp;quot;x11&amp;quot;                                    const&lt;/span&gt;
&lt;span class="go"&gt;.User                               property  (uo)      1000 &amp;quot;/org/freedesktop/login1/user/_1... const&lt;/span&gt;
&lt;span class="go"&gt;.VTNr                               property  u         1                                        const&lt;/span&gt;
&lt;span class="go"&gt;.Lock                               signal    -         -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.PauseDevice                        signal    uus       -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.ResumeDevice                       signal    uuh       -                                        -&lt;/span&gt;
&lt;span class="go"&gt;.Unlock                             signal    -         -                                        -&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As before, the busctl command supports command line completion, hence
both the service name and the object path used are easily put together
on the shell simply by pressing TAB. The output shows the methods,
properties, signals of one of the session objects that are currently
made available by &lt;code&gt;systemd-logind&lt;/code&gt;. There's a section for each
interface the object knows. The second column tells you what kind of
member is shown in the line. The third column shows the signature of
the member. In case of method calls that's the input parameters, the
fourth column shows what is returned. For properties, the fourth
column encodes the current value of them.&lt;/p&gt;
&lt;p&gt;So far, we just explored. Let's take the next step now: let's become
active - let's call a method:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;call&lt;span class="w"&gt; &lt;/span&gt;org.freedesktop.login1&lt;span class="w"&gt; &lt;/span&gt;/org/freedesktop/login1/session/_31&lt;span class="w"&gt; &lt;/span&gt;org.freedesktop.login1.Session&lt;span class="w"&gt; &lt;/span&gt;Lock
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I don't think I need to mention this anymore, but anyway: again
there's full command line completion available. The third argument is
the interface name, the fourth the method name, both can be easily
completed by pressing TAB. In this case we picked the &lt;code&gt;Lock&lt;/code&gt; method,
which activates the screen lock for the specific session. And yupp,
the instant I pressed enter on this line my screen lock turned on
(this only works on DEs that correctly hook into &lt;code&gt;systemd-logind&lt;/code&gt; for
this to work. GNOME works fine, and KDE should work too).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;Lock&lt;/code&gt; method call we picked is very simple, as it takes no
parameters and returns none. Of course, it can get more complicated
for some calls. Here's another example, this time using one of
systemd's own bus calls, to start an arbitrary system unit:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;call&lt;span class="w"&gt; &lt;/span&gt;org.freedesktop.systemd1&lt;span class="w"&gt; &lt;/span&gt;/org/freedesktop/systemd1&lt;span class="w"&gt; &lt;/span&gt;org.freedesktop.systemd1.Manager&lt;span class="w"&gt; &lt;/span&gt;StartUnit&lt;span class="w"&gt; &lt;/span&gt;ss&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cups.service&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;replace&amp;quot;&lt;/span&gt;
&lt;span class="go"&gt;o &amp;quot;/org/freedesktop/systemd1/job/42684&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This call takes two strings as input parameters, as we denote in the
signature string that follows the method name (as usual, command line
completion helps you getting this right). Following the signature the
next two parameters are simply the two strings to pass. The specified
signature string hence indicates what comes next. systemd's StartUnit
method call takes the unit name to start as first parameter, and the
mode in which to start it as second. The call returned a single object
path value. It is encoded the same way as the input parameter: a
signature (just &lt;code&gt;o&lt;/code&gt; for the object path) followed by the actual value.&lt;/p&gt;
&lt;p&gt;Of course, some method call parameters can get a ton more complex, but
with &lt;code&gt;busctl&lt;/code&gt; it's relatively easy to encode them all. See &lt;a href="http://www.freedesktop.org/software/systemd/man/busctl.html"&gt;the man
page&lt;/a&gt; for
details.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;busctl&lt;/code&gt; knows a number of other operations. For example, you can use
it to monitor D-Bus traffic as it happens (including generating a
&lt;code&gt;.cap&lt;/code&gt; file for use with Wireshark!) or you can set or get specific
properties. However, this blog story was supposed to be about sd-bus,
not &lt;code&gt;busctl&lt;/code&gt;, hence let's cut this short here, and let me direct you
to the man page in case you want to know more about the tool.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;busctl&lt;/code&gt; (like the rest of system) is implemented using the sd-bus
API. Thus it exposes many of the features of sd-bus itself. For
example, you can use to connect to remote or container buses. It
understands both kdbus and classic D-Bus, and more!&lt;/p&gt;
&lt;h1&gt;sd-bus&lt;/h1&gt;
&lt;p&gt;But enough! Let's get back on topic, let's talk about sd-bus itself.&lt;/p&gt;
&lt;p&gt;The sd-bus set of APIs is mostly contained in the header file
&lt;a href="https://github.com/systemd/systemd/blob/master/src/systemd/sd-bus.h"&gt;sd-bus.h&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's a random selection of features of the library, that make it
compare well with the other implementations available.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Supports both kdbus and dbus1 as back-end.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Has high-level support for connecting to remote buses via ssh, and
   to buses of local OS containers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Powerful credential model, to implement authentication of clients
   in services. Currently 34 individual fields are supported, from the
   PID of the client to the cgroup or capability sets.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Support for tracking the life-cycle of peers in order to release
   local objects automatically when all peers referencing them
   disconnected.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The client builds an efficient decision tree to determine which
   handlers to deliver an incoming bus message to.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Automatically translates D-Bus errors into UNIX style errors and
   back (this is lossy though), to ensure best integration of D-Bus
   into low-level Linux programs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Powerful but lightweight object model for exposing local objects on
   the bus. Automatically generates introspection as necessary.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The API is currently not fully documented, but we are working on
completing the set of manual pages. For details
&lt;a href="http://www.freedesktop.org/software/systemd/man/index.html#S"&gt;see all pages starting with &lt;code&gt;sd_bus_&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Invoking a Method, from C, with sd-bus&lt;/h1&gt;
&lt;p&gt;So much about the library in general. Here's an example for connecting
to the bus and issuing a method call:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;systemd/sd-bus.h&amp;gt;&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SD_BUS_ERROR_NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Connect to the system bus */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_open_system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to connect to system bus: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Issue the method call and store the respons message in m */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_call_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;org.freedesktop.systemd1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="cm"&gt;/* service to contact */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/org/freedesktop/systemd1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="cm"&gt;/* object path */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;org.freedesktop.systemd1.Manager&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="cm"&gt;/* interface name */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;StartUnit&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;                          &lt;/span&gt;&lt;span class="cm"&gt;/* method name */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="cm"&gt;/* object to return error in */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;                                   &lt;/span&gt;&lt;span class="cm"&gt;/* return message on success */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;ss&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;                                 &lt;/span&gt;&lt;span class="cm"&gt;/* input signature */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;cups.service&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;                       &lt;/span&gt;&lt;span class="cm"&gt;/* first argument */&lt;/span&gt;
&lt;span class="w"&gt;                               &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;replace&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt;                           &lt;/span&gt;&lt;span class="cm"&gt;/* second argument */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to issue method call: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Parse the response message */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_message_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;o&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to parse response message: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Queued service job as %s.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nl"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_error_free&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_message_unref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_unref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXIT_FAILURE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXIT_SUCCESS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Save this example as &lt;code&gt;bus-client.c&lt;/code&gt;, then build it with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;gcc&lt;span class="w"&gt; &lt;/span&gt;bus-client.c&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;bus-client&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;pkg-config&lt;span class="w"&gt; &lt;/span&gt;--cflags&lt;span class="w"&gt; &lt;/span&gt;--libs&lt;span class="w"&gt; &lt;/span&gt;libsystemd&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will generate a binary &lt;code&gt;bus-client&lt;/code&gt; you can now run. Make sure to
run it as root though, since access to the &lt;code&gt;StartUnit&lt;/code&gt; method is
privileged:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;# &lt;/span&gt;./bus-client
&lt;span class="go"&gt;Queued service job as /org/freedesktop/systemd1/job/3586.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And that's it already, our first example. It showed how we invoked a
method call on the bus. The actual function call of the method is very
close to the &lt;code&gt;busctl&lt;/code&gt; command line we used before. I hope the code
excerpt needs little further explanation. It's supposed to give you a
taste how to write D-Bus clients with sd-bus. For more more
information please have a look at the header file, the man page or
even the sd-bus sources.&lt;/p&gt;
&lt;h1&gt;Implementing a Service, in C, with sd-bus&lt;/h1&gt;
&lt;p&gt;Of course, just calling a single method is a rather simplistic
example. Let's have a look on how to write a bus service. We'll write
a small calculator service, that exposes a single object, which
implements an interface that exposes two methods: one to multiply two
64bit signed integers, and one to divide one 64bit signed integer by
another.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;stdlib.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;errno.h&amp;gt;&lt;/span&gt;
&lt;span class="cp"&gt;#include&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cpf"&gt;&amp;lt;systemd/sd-bus.h&amp;gt;&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;method_multiply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sd_bus_message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;userdata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ret_error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Read the parameters */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_message_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;xx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to parse parameters: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Reply with the response */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_reply_method_return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;method_divide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sd_bus_message&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;void&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;userdata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ret_error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Read the parameters */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_message_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;xx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to parse parameters: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Return an error on division by zero */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;sd_bus_error_set_const&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ret_error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;net.poettering.DivisionByZero&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Sorry, can&amp;#39;t allow division by zero.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;EINVAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_reply_method_return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="cm"&gt;/* The vtable of our little object, implements the net.poettering.Calculator interface */&lt;/span&gt;
&lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_vtable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;calculator_vtable&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;SD_BUS_VTABLE_START&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;SD_BUS_METHOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Multiply&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;xx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_multiply&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SD_BUS_VTABLE_UNPRIVILEGED&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;SD_BUS_METHOD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Divide&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;xx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;method_divide&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;SD_BUS_VTABLE_UNPRIVILEGED&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;SD_BUS_VTABLE_END&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;argc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Connect to the user bus this time */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_open_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to connect to system bus: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Install the object */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_add_object_vtable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                                     &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                                     &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/net/poettering/Calculator&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="cm"&gt;/* object path */&lt;/span&gt;
&lt;span class="w"&gt;                                     &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;net.poettering.Calculator&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="cm"&gt;/* interface name */&lt;/span&gt;
&lt;span class="w"&gt;                                     &lt;/span&gt;&lt;span class="n"&gt;calculator_vtable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;                                     &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to issue method call: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="cm"&gt;/* Take a well-known service name so that clients can find us */&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_request_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;net.poettering.Calculator&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to acquire service name: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(;;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="cm"&gt;/* Process requests */&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to process bus: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="cm"&gt;/* we processed a request, try to process another one, right-away */&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="cm"&gt;/* Wait for the next request to process */&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;sd_bus_wait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Failed to wait on bus: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;strerror&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="k"&gt;goto&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nl"&gt;finish&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_slot_unref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slot&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;sd_bus_unref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bus&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXIT_FAILURE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;EXIT_SUCCESS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Save this example as &lt;code&gt;bus-service.c&lt;/code&gt;, then build it with:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;gcc&lt;span class="w"&gt; &lt;/span&gt;bus-service.c&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;bus-service&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;pkg-config&lt;span class="w"&gt; &lt;/span&gt;--cflags&lt;span class="w"&gt; &lt;/span&gt;--libs&lt;span class="w"&gt; &lt;/span&gt;libsystemd&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, let's run it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;./bus-service
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In another terminal, let's try to talk to it. Note that this service
is now on the user bus, not on the system bus as before. We do this
for simplicity reasons: on the system bus access to services is
tightly controlled so unprivileged clients cannot request privileged
operations. On the user bus however things are simpler: as only
processes of the user owning the bus can connect no further policy
enforcement will complicate this example. Because the service is on
the user bus, we have to pass the &lt;code&gt;--user&lt;/code&gt; switch on the &lt;code&gt;busctl&lt;/code&gt;
command line. Let's start with looking at the service's object tree.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;--user&lt;span class="w"&gt; &lt;/span&gt;tree&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator
&lt;span class="go"&gt;└─/net/poettering/Calculator&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As we can see, there's only a single object on the service, which is
not surprising, given that our code above only registered one. Let's
see the interfaces and the members this object exposes:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;--user&lt;span class="w"&gt; &lt;/span&gt;introspect&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator&lt;span class="w"&gt; &lt;/span&gt;/net/poettering/Calculator
&lt;span class="go"&gt;NAME                                TYPE      SIGNATURE RESULT/VALUE FLAGS&lt;/span&gt;
&lt;span class="go"&gt;net.poettering.Calculator           interface -         -            -&lt;/span&gt;
&lt;span class="go"&gt;.Divide                             method    xx        x            -&lt;/span&gt;
&lt;span class="go"&gt;.Multiply                           method    xx        x            -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.DBus.Introspectable interface -         -            -&lt;/span&gt;
&lt;span class="go"&gt;.Introspect                         method    -         s            -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.DBus.Peer           interface -         -            -&lt;/span&gt;
&lt;span class="go"&gt;.GetMachineId                       method    -         s            -&lt;/span&gt;
&lt;span class="go"&gt;.Ping                               method    -         -            -&lt;/span&gt;
&lt;span class="go"&gt;org.freedesktop.DBus.Properties     interface -         -            -&lt;/span&gt;
&lt;span class="go"&gt;.Get                                method    ss        v            -&lt;/span&gt;
&lt;span class="go"&gt;.GetAll                             method    s         a{sv}        -&lt;/span&gt;
&lt;span class="go"&gt;.Set                                method    ssv       -            -&lt;/span&gt;
&lt;span class="go"&gt;.PropertiesChanged                  signal    sa{sv}as  -            -&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The sd-bus library automatically added a couple of generic interfaces,
as mentioned above. But the first interface we see is actually the one
we added! It shows our two methods, and both take "xx" (two 64bit
signed integers) as input parameters, and return one "x". Great! But
does it work?&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;--user&lt;span class="w"&gt; &lt;/span&gt;call&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator&lt;span class="w"&gt; &lt;/span&gt;/net/poettering/Calculator&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator&lt;span class="w"&gt; &lt;/span&gt;Multiply&lt;span class="w"&gt; &lt;/span&gt;xx&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;7&lt;/span&gt;
&lt;span class="go"&gt;x 35&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Woohoo! We passed the two integers 5 and 7, and the service actually
multiplied them for us and returned a single integer 35! Let's try the
other method:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;--user&lt;span class="w"&gt; &lt;/span&gt;call&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator&lt;span class="w"&gt; &lt;/span&gt;/net/poettering/Calculator&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator&lt;span class="w"&gt; &lt;/span&gt;Divide&lt;span class="w"&gt; &lt;/span&gt;xx&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;99&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;17&lt;/span&gt;
&lt;span class="go"&gt;x 5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Oh, wow! It can even do integer division! Fantastic! But let's trick
it into dividing by zero:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gp"&gt;$ &lt;/span&gt;busctl&lt;span class="w"&gt; &lt;/span&gt;--user&lt;span class="w"&gt; &lt;/span&gt;call&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator&lt;span class="w"&gt; &lt;/span&gt;/net/poettering/Calculator&lt;span class="w"&gt; &lt;/span&gt;net.poettering.Calculator&lt;span class="w"&gt; &lt;/span&gt;Divide&lt;span class="w"&gt; &lt;/span&gt;xx&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;43&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
&lt;span class="go"&gt;Sorry, can&amp;#39;t allow division by zero.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Nice! It detected this nicely and returned a clean error about it. If
you look in the source code example above you'll see how precisely we
generated the error.&lt;/p&gt;
&lt;p&gt;And that's really all I have for today. Of course, the examples I
showed are short, and I don't get into detail here on what precisely
each line does. However, this is supposed to be a short introduction
into D-Bus and sd-bus, and it's already way too long for that …&lt;/p&gt;
&lt;p&gt;I hope this blog story was useful to you. If you are interested in
using sd-bus for your own programs, I hope this gets you started. If
you have further questions, check the (incomplete) man pages, and
inquire us on IRC or the systemd mailing list. If you need more
examples, have a look at the systemd source tree, all of systemd's
many bus services use sd-bus extensively.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 19 Jun 2015 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2015-06-19:/blog/the-new-sd-bus-api-of-systemd.html</guid><category>projects</category></item><item><title>Revisiting How We Put Together Linux Systems</title><link>https://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html</link><description>&lt;p&gt;In a previous blog story I discussed
&lt;a href="http://0pointer.net/blog/projects/stateless.html"&gt;Factory Reset, Stateless Systems, Reproducible Systems &amp;amp; Verifiable Systems&lt;/a&gt;,
I now want to take the opportunity to explain a bit where we want to
take this with
&lt;a href="http://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt; in the
longer run, and what we want to build out of it. This is going to be a
longer story, so better grab a cold bottle of
&lt;a href="https://en.wikipedia.org/wiki/Club-Mate"&gt;Club Mate&lt;/a&gt; before you start
reading.&lt;/p&gt;
&lt;p&gt;Traditional Linux distributions are built around packaging systems
like RPM or dpkg, and an organization model where upstream developers
and downstream packagers are relatively clearly separated: an upstream
developer writes code, and puts it somewhere online, in a tarball. A
packager than grabs it and turns it into RPMs/DEBs. The user then
grabs these RPMs/DEBs and installs them locally on the system. For a
variety of uses this is a fantastic scheme: users have a large
selection of readily packaged software available, in mostly uniform
packaging, from a single source they can trust. In this scheme the
distribution vets all software it packages, and as long as the user
trusts the distribution all should be good. The distribution takes the
responsibility of ensuring the software is not malicious, of timely
fixing security problems and helping the user if something is wrong.&lt;/p&gt;
&lt;h1&gt;Upstream Projects&lt;/h1&gt;
&lt;p&gt;However, this scheme also has a number of problems, and doesn't fit
many use-cases of our software particularly well. Let's have a look at
the problems of this scheme for many upstreams:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Upstream software vendors are fully dependent on downstream
  distributions to package their stuff. It's the downstream
  distribution that decides on schedules, packaging details, and how
  to handle support. Often upstream vendors want much faster release
  cycles then the downstream distributions follow.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Realistic testing is extremely unreliable and next to
  impossible. Since the end-user can run a variety of different
  package versions together, and expects the software he runs to just
  work on any combination, the test matrix explodes. If upstream tests
  its version on distribution X release Y, then there's no guarantee
  that that's the precise combination of packages that the end user
  will eventually run. In fact, it is very unlikely that the end user
  will, since most distributions probably updated a number of
  libraries the package relies on by the time the package ends up being
  made available to the user. The fact that each package can be
  individually updated by the user, and each user can combine library
  versions, plug-ins and executables relatively freely, results in a high
  risk of something going wrong.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Since there are so many different distributions in so many different
  versions around, if upstream tries to build and test software for
  them it needs to do so for a large number of distributions, which is
  a massive effort.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The distributions are actually quite different in many ways. In
  fact, they are different in a lot of the most basic
  functionality. For example, the path where to put x86-64 libraries
  is different on Fedora and Debian derived systems..&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Developing software for a number of distributions and versions is
  hard: if you want to do it, you need to actually install them, each
  one of them, manually, and then build your software for each.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Since most downstream distributions have strict licensing and
  trademark requirements (and rightly so), any kind of closed source
  software (or otherwise non-free) does not fit into this scheme at
  all.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This all together makes it really hard for many upstreams to work
nicely with the current way how Linux works. Often they try to improve
the situation for them, for example by bundling libraries, to make
their test and build matrices smaller.&lt;/p&gt;
&lt;h1&gt;System Vendors&lt;/h1&gt;
&lt;p&gt;The &lt;em&gt;toolbox&lt;/em&gt; approach of classic Linux distributions is fantastic for
people who want to put together their individual system, nicely
adjusted to exactly what they need. However, this is not really how
many of today's Linux systems are built, installed or updated. If you
build any kind of embedded device, a server system, or even user
systems, you frequently do your work based on complete system images,
that are linearly versioned. You build these images somewhere, and
then you replicate them atomically to a larger number of systems. On
these systems, you don't install or remove packages, you get a defined
set of files, and besides installing or updating the system there are
no ways how to change the set of tools you get.&lt;/p&gt;
&lt;p&gt;The current Linux distributions are not particularly good at providing
for this major use-case of Linux. Their strict focus on individual
packages as well as package managers as end-user install and update
tool is incompatible with what many system vendors want.&lt;/p&gt;
&lt;h1&gt;Users&lt;/h1&gt;
&lt;p&gt;The classic Linux distribution scheme is frequently not what end users
want, either. Many users are used to app markets like Android, Windows
or iOS/Mac have. Markets are a platform that doesn't package, build or
maintain software like distributions do, but simply allows users to
quickly find and download the software they need, with the app vendor
responsible for keeping the app updated, secured, and all that on the
vendor's release cycle. Users tend to be impatient. They want their
software quickly, and the fine distinction between trusting a single
distribution or a myriad of app developers individually is usually not
important for them. The companies behind the marketplaces usually try
to improve this trust problem by providing sand-boxing technologies: as
a replacement for the distribution that audits, vets, builds and
packages the software and thus allows users to trust it to a certain
level, these vendors try to find technical solutions to ensure that
the software they offer for download can't be malicious.&lt;/p&gt;
&lt;h1&gt;Existing Approaches To Fix These Problems&lt;/h1&gt;
&lt;p&gt;Now, all the issues pointed out above are not new, and there are
sometimes quite successful attempts to do something about it. Ubuntu
Apps, Docker, Software Collections, ChromeOS, CoreOS all fix part of
this problem set, usually with a strict focus on one facet of Linux
systems. For example, Ubuntu Apps focus strictly on end user (desktop)
applications, and don't care about how we built/update/install the OS
itself, or containers. Docker OTOH focuses on containers only, and
doesn't care about end-user apps. Software Collections tries to focus
on the development environments. ChromeOS focuses on the OS itself,
but only for end-user devices. CoreOS also focuses on the OS, but
only for server systems.&lt;/p&gt;
&lt;p&gt;The approaches they find are usually good at specific things, and use
a variety of different technologies, on different layers. However,
none of these projects tried to fix this problems in a generic way,
for all uses, right in the core components of the OS itself.&lt;/p&gt;
&lt;p&gt;Linux has come to tremendous successes because its kernel is so
generic: you can build supercomputers and tiny embedded devices out of
it. It's time we come up with a basic, reusable scheme how to solve
the problem set described above, that is equally generic.&lt;/p&gt;
&lt;h1&gt;What We Want&lt;/h1&gt;
&lt;p&gt;The systemd cabal (Kay Sievers, Harald Hoyer, Daniel Mack, Tom
Gundersen, David Herrmann, and yours truly) recently met in Berlin
about all these things, and tried to come up with a scheme that is
somewhat simple, but tries to solve the issues generically, for all
use-cases, as part of the systemd project. All that in a way that is
somewhat compatible with the current scheme of distributions, to allow
a slow, gradual adoption. Also, and that's something one cannot stress
enough: the &lt;em&gt;toolbox&lt;/em&gt; scheme of classic Linux distributions is
actually a good one, and for many cases the right one. However, we
need to make sure we make distributions relevant again for &lt;em&gt;all&lt;/em&gt;
use-cases, not just those of highly individualized systems.&lt;/p&gt;
&lt;p&gt;Anyway, so let's summarize what we are trying to do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We want an efficient way that allows vendors to package their
  software (regardless if just an app, or the whole OS) directly for
  the end user, and know the precise combination of libraries and
  packages it will operate with.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want to allow end users and administrators to install these
  packages on their systems, regardless which distribution they have
  installed on it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want a unified solution that ultimately can cover updates for
  full systems, OS containers, end user apps, programming ABIs, and
  more. These updates shall be double-buffered, (at least). This is an
  absolute necessity if we want to prepare the ground for operating
  systems that manage themselves, that can update safely without
  administrator involvement.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want our images to be trustable (i.e. signed). In fact we want a
  fully trustable OS, with images that can be verified by a full
  trust chain from the firmware (EFI SecureBoot!), through the boot loader, through the
  kernel, and initrd. Cryptographically secure verification of the
  code we execute is relevant on the desktop (like ChromeOS does), but
  also for apps, for embedded devices and even on servers (in a post-Snowden
  world, in particular).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;What We Propose&lt;/h1&gt;
&lt;p&gt;So much about the set of problems, and what we are trying to do. So,
now, let's discuss the technical bits we came up with:&lt;/p&gt;
&lt;p&gt;The scheme we propose is built around the variety of concepts of btrfs
and Linux file system name-spacing. btrfs at this point already has a
large number of features that fit neatly in our concept, and the
maintainers are busy working on a couple of others we want to
eventually make use of.&lt;/p&gt;
&lt;p&gt;As first part of our proposal we make heavy use of btrfs sub-volumes and
introduce a clear naming scheme for them. We name snapshots like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;usr:&amp;lt;vendorid&amp;gt;:&amp;lt;architecture&amp;gt;:&amp;lt;version&amp;gt;&lt;/code&gt; -- This refers to a full
  vendor operating system tree. It's basically a /usr tree (and no
  other directories), in a specific version, with everything you need to boot
  it up inside it. The &lt;code&gt;&amp;lt;vendorid&amp;gt;&lt;/code&gt; field is replaced by some vendor
  identifier, maybe a scheme like
  &lt;code&gt;org.fedoraproject.FedoraWorkstation&lt;/code&gt;. The &lt;code&gt;&amp;lt;architecture&amp;gt;&lt;/code&gt; field
  specifies a CPU architecture the OS is designed for, for example
  &lt;code&gt;x86-64&lt;/code&gt;. The &lt;code&gt;&amp;lt;version&amp;gt;&lt;/code&gt; field specifies a specific OS version, for
  example &lt;code&gt;23.4&lt;/code&gt;. An example sub-volume name could hence look like this:
  &lt;code&gt;usr:org.fedoraproject.FedoraWorkstation:x86_64:23.4&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;root:&amp;lt;name&amp;gt;:&amp;lt;vendorid&amp;gt;:&amp;lt;architecture&amp;gt;&lt;/code&gt; -- This refers to an
  &lt;em&gt;instance&lt;/em&gt; of an operating system. Its basically a root directory,
  containing primarily /etc and /var (but possibly more). Sub-volumes
  of this type do not contain a populated /usr tree though. The
  &lt;code&gt;&amp;lt;name&amp;gt;&lt;/code&gt; field refers to some instance name (maybe the host name of
  the instance). The other fields are defined as above. An example
  sub-volume name is
  &lt;code&gt;root:revolution:org.fedoraproject.FedoraWorkstation:x86_64&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;runtime:&amp;lt;vendorid&amp;gt;:&amp;lt;architecture&amp;gt;:&amp;lt;version&amp;gt;&lt;/code&gt; -- This refers to a
  vendor &lt;em&gt;runtime&lt;/em&gt;. A runtime here is supposed to be a set of
  libraries and other resources that are needed to run apps (for the
  concept of &lt;em&gt;apps&lt;/em&gt; see below), all in a /usr tree. In this regard this
  is very similar to the &lt;code&gt;usr&lt;/code&gt; sub-volumes explained above, however,
  while a &lt;code&gt;usr&lt;/code&gt; sub-volume is a full OS and contains everything
  necessary to boot, a runtime is really only a set of
  libraries. You cannot boot it, but you can run apps with it. An
  example sub-volume name is: &lt;code&gt;runtime:org.gnome.GNOME3_20:x86_64:3.20.1&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;framework:&amp;lt;vendorid&amp;gt;:&amp;lt;architecture&amp;gt;:&amp;lt;version&amp;gt;&lt;/code&gt; -- This is very
  similar to a vendor runtime, as described above, it contains just a
  /usr tree, but goes one step further: it additionally contains all
  development headers, compilers and build tools, that allow
  developing against a specific runtime. For each runtime there should
  be a framework. When you develop against a specific framework in a
  specific architecture, then the resulting app will be compatible
  with the runtime of the same vendor ID and architecture. Example:
  &lt;code&gt;framework:org.gnome.GNOME3_20:x86_64:3.20.1&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;app:&amp;lt;vendorid&amp;gt;:&amp;lt;runtime&amp;gt;:&amp;lt;architecture&amp;gt;:&amp;lt;version&amp;gt;&lt;/code&gt; -- This
  encapsulates an application bundle. It contains a tree that at
  runtime is mounted to &lt;code&gt;/opt/&amp;lt;vendorid&amp;gt;&lt;/code&gt;, and contains all the
  application's resources. The &lt;code&gt;&amp;lt;vendorid&amp;gt;&lt;/code&gt; could be a string like
  &lt;code&gt;org.libreoffice.LibreOffice&lt;/code&gt;, the &lt;code&gt;&amp;lt;runtime&amp;gt;&lt;/code&gt; refers to one the
  vendor id of one specific runtime the application is built for, for
  example &lt;code&gt;org.gnome.GNOME3_20:3.20.1&lt;/code&gt;. The &lt;code&gt;&amp;lt;architecture&amp;gt;&lt;/code&gt; and
  &lt;code&gt;&amp;lt;version&amp;gt;&lt;/code&gt; refer to the architecture the application is built for,
  and of course its version. Example:
  &lt;code&gt;app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;home:&amp;lt;user&amp;gt;:&amp;lt;uid&amp;gt;:&amp;lt;gid&amp;gt;&lt;/code&gt; -- This sub-volume shall refer to the home
  directory of the specific user. The &lt;code&gt;&amp;lt;user&amp;gt;&lt;/code&gt; field contains the user
  name, the &lt;code&gt;&amp;lt;uid&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;gid&amp;gt;&lt;/code&gt; fields the numeric Unix UIDs and GIDs
  of the user. The idea here is that in the long run the list of
  sub-volumes is sufficient as a user database (but see
  below). Example: &lt;code&gt;home:lennart:1000:1000&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;btrfs partitions that adhere to this naming scheme should be clearly
identifiable. It is our intention to introduce a new GPT partition type
ID for this.&lt;/p&gt;
&lt;h1&gt;How To Use It&lt;/h1&gt;
&lt;p&gt;After we introduced this naming scheme let's see what we can build of
this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;When booting up a system we mount the root directory from one of the
  &lt;code&gt;root&lt;/code&gt; sub-volumes, and then mount /usr from a matching &lt;code&gt;usr&lt;/code&gt;
  sub-volume. &lt;em&gt;Matching&lt;/em&gt; here means it carries the same &lt;code&gt;&amp;lt;vendor-id&amp;gt;&lt;/code&gt;
  and &lt;code&gt;&amp;lt;architecture&amp;gt;&lt;/code&gt;. Of course, by default we should pick the
  matching &lt;code&gt;usr&lt;/code&gt; sub-volume with the newest version by default.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When we boot up an OS container, we do exactly the same as the when
  we boot up a regular system: we simply combine a &lt;code&gt;usr&lt;/code&gt; sub-volume
  with a &lt;code&gt;root&lt;/code&gt; sub-volume.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When we enumerate the system's users we simply go through the
  list of &lt;code&gt;home&lt;/code&gt; snapshots.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When a user authenticates and logs in we mount his home
  directory from his snapshot.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When an app is run, we set up a new file system name-space, mount the
  &lt;code&gt;app&lt;/code&gt; sub-volume to &lt;code&gt;/opt/&amp;lt;vendorid&amp;gt;/&lt;/code&gt;, and the appropriate &lt;code&gt;runtime&lt;/code&gt;
  sub-volume the app picked to &lt;code&gt;/usr&lt;/code&gt;, as well as the user's
  &lt;code&gt;/home/$USER&lt;/code&gt; to its place.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When a developer wants to develop against a specific runtime he
  installs the right framework, and then temporarily transitions into
  a name space where &lt;code&gt;/usr&lt;/code&gt;is mounted from the framework sub-volume, and
  &lt;code&gt;/home/$USER&lt;/code&gt; from his own home directory. In this name space he then
  runs his build commands. He can build in multiple name spaces at the
  same time, if he intends to builds software for multiple runtimes or
  architectures at the same time.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Instantiating a new system or OS container (which is exactly the same
in this scheme) just consists of creating a new appropriately named
&lt;code&gt;root&lt;/code&gt; sub-volume. Completely naturally you can share one vendor OS
copy in one specific version with a multitude of container instances.&lt;/p&gt;
&lt;p&gt;Everything is &lt;em&gt;double-buffered&lt;/em&gt; (or actually, n-fold-buffered), because
&lt;code&gt;usr&lt;/code&gt;, &lt;code&gt;runtime&lt;/code&gt;, &lt;code&gt;framework&lt;/code&gt;, &lt;code&gt;app&lt;/code&gt; sub-volumes can exist in multiple
versions. Of course, by default the execution logic should always pick
the newest release of each sub-volume, but it is up to the user keep
multiple versions around, and possibly execute older versions, if he
desires to do so. In fact, like on ChromeOS this could even be handled
automatically: if a system fails to boot with a newer snapshot, the
boot loader can automatically revert back to an older version of the
OS.&lt;/p&gt;
&lt;h1&gt;An Example&lt;/h1&gt;
&lt;p&gt;Note that in result this allows installing not only multiple end-user
applications into the same btrfs volume, but also multiple operating
systems, multiple system instances, multiple runtimes, multiple
frameworks. Or to spell this out in an example:&lt;/p&gt;
&lt;p&gt;Let's say Fedora, Mageia and ArchLinux all implement this scheme,
and provide ready-made end-user images. Also, the GNOME, KDE, SDL
projects all define a runtime+framework to develop against. Finally,
both LibreOffice and Firefox provide their stuff according to this
scheme. You can now trivially install of these into the same btrfs
volume:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;usr:org.fedoraproject.WorkStation:x86_64:24.7&lt;/li&gt;
&lt;li&gt;usr:org.fedoraproject.WorkStation:x86_64:24.8&lt;/li&gt;
&lt;li&gt;usr:org.fedoraproject.WorkStation:x86_64:24.9&lt;/li&gt;
&lt;li&gt;usr:org.fedoraproject.WorkStation:x86_64:25beta&lt;/li&gt;
&lt;li&gt;usr:org.mageia.Client:i386:39.3&lt;/li&gt;
&lt;li&gt;usr:org.mageia.Client:i386:39.4&lt;/li&gt;
&lt;li&gt;usr:org.mageia.Client:i386:39.6&lt;/li&gt;
&lt;li&gt;usr:org.archlinux.Desktop:x86_64:302.7.8&lt;/li&gt;
&lt;li&gt;usr:org.archlinux.Desktop:x86_64:302.7.9&lt;/li&gt;
&lt;li&gt;usr:org.archlinux.Desktop:x86_64:302.7.10&lt;/li&gt;
&lt;li&gt;root:revolution:org.fedoraproject.WorkStation:x86_64&lt;/li&gt;
&lt;li&gt;root:testmachine:org.fedoraproject.WorkStation:x86_64&lt;/li&gt;
&lt;li&gt;root:foo:org.mageia.Client:i386&lt;/li&gt;
&lt;li&gt;root:bar:org.archlinux.Desktop:x86_64&lt;/li&gt;
&lt;li&gt;runtime:org.gnome.GNOME3_20:x86_64:3.20.1&lt;/li&gt;
&lt;li&gt;runtime:org.gnome.GNOME3_20:x86_64:3.20.4&lt;/li&gt;
&lt;li&gt;runtime:org.gnome.GNOME3_20:x86_64:3.20.5&lt;/li&gt;
&lt;li&gt;runtime:org.gnome.GNOME3_22:x86_64:3.22.0&lt;/li&gt;
&lt;li&gt;runtime:org.kde.KDE5_6:x86_64:5.6.0&lt;/li&gt;
&lt;li&gt;framework:org.gnome.GNOME3_22:x86_64:3.22.0&lt;/li&gt;
&lt;li&gt;framework:org.kde.KDE5_6:x86_64:5.6.0&lt;/li&gt;
&lt;li&gt;app:org.libreoffice.LibreOffice:GNOME3_20:x86_64:133&lt;/li&gt;
&lt;li&gt;app:org.libreoffice.LibreOffice:GNOME3_22:x86_64:166&lt;/li&gt;
&lt;li&gt;app:org.mozilla.Firefox:GNOME3_20:x86_64:39&lt;/li&gt;
&lt;li&gt;app:org.mozilla.Firefox:GNOME3_20:x86_64:40&lt;/li&gt;
&lt;li&gt;home:lennart:1000:1000&lt;/li&gt;
&lt;li&gt;home:hrundivbakshi:1001:1001&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the example above, we have three vendor operating systems
installed. All of them in three versions, and one even in a beta
version. We have four system instances around. Two of them of Fedora,
maybe one of them we usually boot from, the other we run for very
specific purposes in an OS container. We also have the runtimes for
two GNOME releases in multiple versions, plus one for KDE. Then, we
have the development trees for one version of KDE and GNOME around, as
well as two apps, that make use of two releases of the GNOME
runtime. Finally, we have the home directories of two users.&lt;/p&gt;
&lt;p&gt;Now, with the name-spacing concepts we introduced above, we can
actually relatively freely mix and match apps and OSes, or develop
against specific frameworks in specific versions on any operating
system. It doesn't matter if you booted your ArchLinux instance, or
your Fedora one, you can execute both LibreOffice and Firefox just
fine, because at execution time they get matched up with the right
runtime, and all of them are available from all the operating systems
you installed. You get the precise runtime that the upstream vendor of
Firefox/LibreOffice did their testing with. It doesn't matter anymore
which distribution you run, and which distribution the vendor prefers.&lt;/p&gt;
&lt;p&gt;Also, given that the user database is actually encoded in the
sub-volume list, it doesn't matter which system you boot, the
distribution should be able to find your local users automatically,
without any configuration in /etc/passwd.&lt;/p&gt;
&lt;h1&gt;Building Blocks&lt;/h1&gt;
&lt;p&gt;With this naming scheme plus the way how we can combine them on
execution we already came quite far, but how do we actually get these
sub-volumes onto the final machines, and how do we update them? Well,
btrfs has a feature they call "send-and-receive". It basically allows
you to "diff" two file system versions, and generate a binary
delta. You can generate these deltas on a developer's machine and then
push them into the user's system, and he'll get the exact same
sub-volume too. This is how we envision installation and updating of
operating systems, applications, runtimes, frameworks. At installation
time, we simply deserialize an initial send-and-receive delta into
our btrfs volume, and later, when a new version is released we just
add in the few bits that are new, by dropping in another
send-and-receive delta under a new sub-volume name. And we do it
exactly the same for the OS itself, for a runtime, a framework or an
app. There's no technical distinction anymore. The underlying
operation for installing apps, runtime, frameworks, vendor OSes, as well
as the operation for updating them is done the exact same way for all.&lt;/p&gt;
&lt;p&gt;Of course, keeping multiple full /usr trees around sounds like an
awful lot of waste, after all they will contain a lot of very similar
data, since a lot of resources are shared between distributions,
frameworks and runtimes. However, thankfully btrfs actually is able to
de-duplicate this for us. If we add in a new app snapshot, this simply
adds in the new files that changed. Moreover different runtimes and
operating systems might actually end up sharing the same tree.&lt;/p&gt;
&lt;p&gt;Even though the example above focuses primarily on the end-user,
desktop side of things, the concept is also extremely powerful in
server scenarios. For example, it is easy to build your own &lt;code&gt;usr&lt;/code&gt;
trees and deliver them to your hosts using this scheme. The &lt;code&gt;usr&lt;/code&gt;
sub-volumes are supposed to be something that administrators can put
together. After deserializing them into a couple of hosts, you can
trivially instantiate them as OS containers there, simply by adding a
new &lt;code&gt;root&lt;/code&gt; sub-volume for each instance, referencing the &lt;code&gt;usr&lt;/code&gt; tree you
just put together. Instantiating OS containers hence becomes as easy
as creating a new btrfs sub-volume. And you can still update the images
nicely, get fully double-buffered updates and everything.&lt;/p&gt;
&lt;p&gt;And of course, this scheme also applies great to embedded
use-cases. Regardless if you build a TV, an IVI system or a phone: you
can put together you OS versions as &lt;code&gt;usr&lt;/code&gt; trees, and then use
btrfs-send-and-receive facilities to deliver them to the systems, and
update them there.&lt;/p&gt;
&lt;p&gt;Many people when they hear the word "btrfs" instantly reply with "is
it ready yet?". Thankfully, most of the functionality we really need
here is strictly read-only. With the exception of the &lt;code&gt;home&lt;/code&gt;
sub-volumes (see below) all snapshots are strictly read-only, and are
delivered as immutable vendor trees onto the devices. They never are
changed. Even if btrfs might still be immature, for this kind of
read-only logic it should be more than good enough.&lt;/p&gt;
&lt;p&gt;Note that this scheme also enables doing &lt;em&gt;fat&lt;/em&gt; systems: for example,
an installer image could include a Fedora version compiled for x86-64,
one for i386, one for ARM, all in the same btrfs volume. Due to btrfs'
de-duplication they will share as much as possible, and when the image
is booted up the right sub-volume is automatically picked. Something
similar of course applies to the apps too!&lt;/p&gt;
&lt;p&gt;This also allows us to implement something that we like to call
&lt;em&gt;Operating-System-As-A-Virus&lt;/em&gt;. Installing a new system is little more
than:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Creating a new GPT partition table&lt;/li&gt;
&lt;li&gt;Adding an EFI System Partition (FAT) to it&lt;/li&gt;
&lt;li&gt;Adding a new btrfs volume to it&lt;/li&gt;
&lt;li&gt;Deserializing a single &lt;code&gt;usr&lt;/code&gt; sub-volume into the btrfs volume&lt;/li&gt;
&lt;li&gt;Installing a boot loader into the EFI System Partition&lt;/li&gt;
&lt;li&gt;Rebooting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now, since the only real vendor data you need is the &lt;code&gt;usr&lt;/code&gt; sub-volume,
you can trivially duplicate this onto any block device you want. Let's
say you are a happy Fedora user, and you want to provide a friend with
his own installation of this awesome system, all on a USB stick. All
you have to do for this is do the steps above, using your installed
&lt;code&gt;usr&lt;/code&gt; tree as source to copy. And there you go! And you don't have to
be afraid that any of your personal data is copied too, as the &lt;code&gt;usr&lt;/code&gt;
sub-volume is the exact version your vendor provided you with. Or with
other words: there's no distinction anymore between installer images
and installed systems. It's all the same. Installation becomes
replication, not more. Live-CDs and installed systems can be fully
identical.&lt;/p&gt;
&lt;p&gt;Note that in this design apps are actually developed against a single,
very specific runtime, that contains all libraries it can link against
(including a specific glibc version!). Any library that is not
included in the runtime the developer picked must be included in the
app itself. This is similar how apps on Android declare one very
specific Android version they are developed against. This greatly
simplifies application installation, as there's no dependency hell:
each app pulls in one runtime, and the app is actually free to pick
which one, as you can have multiple installed, though only one is used
by each app.&lt;/p&gt;
&lt;p&gt;Also note that operating systems built this way will never see
"half-updated" systems, as it is common when a system is updated using
RPM/dpkg. When updating the system the code will either run the old or
the new version, but it will never see part of the old files and part
of the new files. This is the same for apps, runtimes, and frameworks,
too.&lt;/p&gt;
&lt;h1&gt;Where We Are Now&lt;/h1&gt;
&lt;p&gt;We are currently working on a lot of the groundwork necessary for
this. This scheme relies on the ability to monopolize the
vendor OS resources in /usr, which is the key of what I described in
&lt;a href="http://0pointer.net/blog/projects/stateless.html"&gt;Factory Reset, Stateless Systems, Reproducible Systems &amp;amp; Verifiable Systems&lt;/a&gt;
a few weeks back. Then, of course, for the full desktop app concept we
need a strong sandbox, that does more than just hiding files from the
file system view. After all with an app concept like the above the
primary interfacing between the executed desktop apps and the rest of the
system is via IPC (which is why we work on kdbus and teach it all
kinds of sand-boxing features), and the kernel itself. Harald Hoyer has
started working on generating the btrfs send-and-receive images based
on Fedora.&lt;/p&gt;
&lt;p&gt;Getting to the full scheme will take a while. Currently we have many
of the building blocks ready, but some major items are missing. For
example, we push quite a few problems into btrfs, that other solutions
try to solve in user space. One of them is actually
signing/verification of images. The btrfs maintainers are working on
adding this to the code base, but currently nothing exists. This
functionality is essential though to come to a fully verified system
where a trust chain exists all the way from the firmware to the
apps. Also, to make the &lt;code&gt;home&lt;/code&gt; sub-volume scheme fully workable we
actually need encrypted sub-volumes, so that the sub-volume's
pass-phrase can be used for authenticating users in PAM. This doesn't
exist either.&lt;/p&gt;
&lt;p&gt;Working towards this scheme is a gradual process. Many of the steps we
require for this are useful outside of the grand scheme though, which
means we can slowly work towards the goal, and our users can already
take benefit of what we are working on as we go.&lt;/p&gt;
&lt;p&gt;Also, and most importantly, this is not really a departure from
traditional operating systems:&lt;/p&gt;
&lt;p&gt;Each app, each OS and each app sees a traditional Unix hierarchy with
/usr, /home, /opt, /var, /etc. It executes in an environment that is
pretty much identical to how it would be run on traditional systems.&lt;/p&gt;
&lt;p&gt;There's no need to fully move to a system that uses only btrfs and
follows strictly this sub-volume scheme. For example, we intend to
provide implicit support for systems that are installed on ext4 or
xfs, or that are put together with traditional packaging tools such as
RPM or dpkg: if the the user tries to install a
runtime/app/framework/os image on a system that doesn't use btrfs so
far, it can just create a loop-back btrfs image in /var, and push the
data into that. Even us developers will run our stuff like this for a
while, after all this new scheme is not particularly useful for highly
individualized systems, and we developers usually tend to run
systems like that.&lt;/p&gt;
&lt;p&gt;Also note that this in no way a departure from packaging systems like
RPM or DEB. Even if the new scheme we propose is used for installing
and updating a specific system, it is RPM/DEB that is used to put
together the vendor OS tree initially. Hence, even in this scheme
RPM/DEB are highly relevant, though not strictly as an end-user tool
anymore, but as a build tool.&lt;/p&gt;
&lt;h1&gt;So Let's Summarize Again What We Propose&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We want a unified scheme, how we can install and update OS images,
  user apps, runtimes and frameworks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want a unified scheme how you can relatively freely mix OS
  images, apps, runtimes and frameworks on the same system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want a fully trusted system, where cryptographic verification of
  all executed code can be done, all the way to the firmware, as
  standard feature of the system.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want to allow app vendors to write their programs against very
  specific frameworks, under the knowledge that they will end up being
  executed with the exact same set of libraries chosen.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want to allow parallel installation of multiple OSes and versions
  of them, multiple runtimes in multiple versions, as well as multiple
  frameworks in multiple versions. And of course, multiple apps in
  multiple versions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want everything &lt;em&gt;double buffered&lt;/em&gt; (or actually n-fold buffered), to
  ensure we can reliably update/rollback versions, in particular to
  safely do automatic updates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want a system where updating a runtime, OS, framework, or OS
  container is as simple as adding in a new snapshot and restarting
  the runtime/OS/framework/OS container.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want a system where we can easily instantiate a number of OS
  instances from a single vendor tree, with zero difference for doing
  this on order to be able to boot it on bare metal/VM or as a
  container.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;We want to enable Linux to have an open scheme that people can use
  to build app markets and similar schemes, not restricted to a
  specific vendor.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Final Words&lt;/h1&gt;
&lt;p&gt;I'll be talking about this at LinuxCon Europe in October. I originally
intended to discuss this at the Linux Plumbers Conference (which I
assumed was the right forum for this kind of major plumbing level
improvement), and at linux.conf.au, but there was no interest in my
session submissions there...&lt;/p&gt;
&lt;p&gt;Of course this is all work in progress. These are our current ideas we
are working towards. As we progress we will likely change a number of
things. For example, the precise naming of the sub-volumes might look
very different in the end.&lt;/p&gt;
&lt;p&gt;Of course, we are developers of the systemd project. Implementing this
scheme is not just a job for the systemd developers. This is a
reinvention how distributions work, and hence needs great support from
the distributions. We really hope we can trigger some interest by
publishing this proposal now, to get the distributions on board. This
after all is explicitly not supposed to be a solution for one specific
project and one specific vendor product, we care about making this
open, and solving it for the generic case, without cutting corners.&lt;/p&gt;
&lt;p&gt;If you have any questions about this, you know how you can reach us
(IRC, mail, G+, ...).&lt;/p&gt;
&lt;p&gt;The future is going to be awesome!&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 01 Sep 2014 00:00:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2014-09-01:/blog/revisiting-how-we-put-together-linux-systems.html</guid><category>projects</category></item><item><title>FUDCON + GNOME.Asia Beijing 2014</title><link>https://0pointer.net/blog/projects/fudcon-gnomeasia.html</link><description>
                
&lt;p&gt;Thanks to the funding from FUDCON I had the chance to attend and
keynote at the combined &lt;a href="https://fedoraproject.org/wiki/FUDCon:Beijing_2014"&gt;FUDCON Beijing 2014&lt;/a&gt;
and &lt;a href="http://2014.gnome.asia/"&gt;GNOME.Asia 2014&lt;/a&gt; conference in
Beijing, China.&lt;/p&gt;

&lt;p&gt;My talk was about systemd's present and future, what we achieved
and where we are going. In my talk I tried to explain a bit where we
are coming from, and how we changed focus from being purely an init
system, to more being a set of basic building blocks to build an OS
from. Most of the talk I talked about where we still intend to take
systemd, which areas we believe should be covered by systemd, and of
course, also the always difficult question, on where to draw the line
and what clearly is outside of the focus of systemd. The slides of my
talk you &lt;a href="http://0pointer.de/public/gnomeasia2014.pdf"&gt;find
online&lt;/a&gt;. (No video recording I am aware of, sorry.)&lt;/p&gt;

&lt;p&gt;The combined conferences were a lot of fun, and as usual, the best
discussions I had in the hallway track, discussing Linux and
systemd.&lt;/p&gt;

&lt;p&gt;A number of pictures of the conference are &lt;a href="https://plus.google.com/events/gallery/cqsjvgg7o125tkli6up5d60f83g"&gt;now
online&lt;/a&gt;. Enjoy!&lt;/p&gt;

&lt;p&gt;After the conference I stayed for a few more days in Beijing, doing
a bit of sightseeing. What a fantastic city! The food was amazing, we
tried all kinds of fantastic stuff, from Peking duck, to Bullfrog
Sechuan style. Yummy. And one of those days I am sure I will find the
time to actually sort my photos and put them online, too.&lt;/p&gt;

&lt;p&gt;I am really looking forward to the next FUDCON/GNOME.Asia!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 04 Jul 2014 18:43:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2014-07-04:/blog/projects/fudcon-gnomeasia.html</guid><category>projects</category></item><item><title>Factory Reset, Stateless Systems, Reproducible Systems &amp; Verifiable Systems</title><link>https://0pointer.net/blog/projects/stateless.html</link><description>
                
&lt;p&gt;&lt;small&gt;(Just a small heads-up: I don't blog as much as I used to, I
nowadays update my &lt;a href="https://plus.google.com/u/0/+LennartPoetteringTheOneAndOnly/posts"&gt;Google+
page&lt;/a&gt; a lot more frequently. You might want to subscribe that if
you are interested in more frequent technical updates on what we are
working on.)&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;In the past weeks we have been working on a couple of features for
&lt;a href="http://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt;
that enable a number of new usecases I'd like to shed some light
on. Taking benefit of the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge/"&gt;&lt;tt&gt;/usr&lt;/tt&gt;
merge&lt;/a&gt; that a number of distributions have completed we want to
bring runtime behaviour of Linux systems to the next level. With the
&lt;tt&gt;/usr&lt;/tt&gt; merge completed most static vendor-supplied OS data is
found exclusively in &lt;tt&gt;/usr&lt;/tt&gt;, only a few additional bits in
&lt;tt&gt;/var&lt;/tt&gt; and &lt;tt&gt;/etc&lt;/tt&gt; are necessary to make a system
boot. On this we can build to enable a couple of new features:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;A mechanism we call &lt;i&gt;Factory Reset&lt;/i&gt; shall flush out
&lt;tt&gt;/etc&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt;, but keep the vendor-supplied
&lt;tt&gt;/usr&lt;/tt&gt;, bringing the system back into a well-defined, pristine
vendor state with no local state or configuration. This functionality
is useful across the board from servers, to desktops, to embedded
devices.&lt;/li&gt;

&lt;li&gt;A &lt;i&gt;Stateless System&lt;/i&gt; goes one step further: a system like
this never stores &lt;tt&gt;/etc&lt;/tt&gt; or &lt;tt&gt;/var&lt;/tt&gt; on persistent
storage, but always comes up with pristine vendor state. On systems
like this every reboot acts as factor reset. This functionality is
particularly useful for simple containers or systems that boot off the
network or read-only media, and receive all configuration they need
during runtime from vendor packages or protocols like DHCP or are
capable of discovering their parameters automatically from the
available hardware or periphery.&lt;/li&gt;

&lt;li&gt;&lt;i&gt;Reproducible Systems&lt;/i&gt; multiply a vendor image into many
containers or systems. Only local configuration or state is stored
per-system, while the vendor operating system is pulled in from the
same, immutable, shared snapshot. Each system hence has its private
&lt;tt&gt;/etc&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt; for receiving local configuration,
however the OS tree in &lt;tt&gt;/usr&lt;/tt&gt; is pulled in via bind mounts (in
case of containers) or technologies like NFS (in case of physical
systems), or btrfs snapshots from a &lt;i&gt;golden master&lt;/i&gt; image. This is
particular interesting for containers where the goal is to run
thousands of container images from the same OS tree. However, it also
has a number of other usecases, for example thin client systems, which
can boot the same NFS share a number of times. Furthermore this
mechanism is useful to implement very simple OS installers, that
simply unserialize a &lt;tt&gt;/usr&lt;/tt&gt; snapshot into a file system,
install a boot loader, and reboot.&lt;/li&gt;

&lt;li&gt;&lt;i&gt;Verifiable Systems&lt;/i&gt; are closely related to stateless
systems: if the underlying storage technology can cryptographically
ensure that the vendor-supplied OS is trusted and in a consistent
state, then it must be made sure that &lt;tt&gt;/etc&lt;/tt&gt; or &lt;tt&gt;/var&lt;/tt&gt;
are either included in the OS image, or simply unnecessary for booting.&lt;/li&gt;

&lt;/ol&gt;

&lt;h3&gt;Concepts&lt;/h3&gt;

&lt;p&gt;A number of Linux-based operating systems have tried to implement
some of the schemes described out above in one way or
another. Particularly interesting are &lt;a href="https://wiki.gnome.org/Projects/OSTree"&gt;GNOME's OSTree&lt;/a&gt;, &lt;a href="https://coreos.com/"&gt;CoreOS&lt;/a&gt; and Google's Android and
ChromeOS. They generally found different solutions for the specific
problems you have when implementing schemes like this, sometimes taking
shortcuts that keep only the specific case in mind, and cannot cover
the general purpose. With systemd now being at the core of so many
distributions and deeply involved in bringing up and maintaining the
system we came to the conclusion that we should attempt to add generic
support for setups like this to systemd itself, to open this up for
the general purpose distributions to build on. We decided to focus on
three kinds of systems:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;The &lt;i&gt;stateful&lt;/i&gt; system, the traditional system as we know it with
machine-specific &lt;tt&gt;/etc&lt;/tt&gt;, &lt;tt&gt;/usr&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt;, all
properly populated.&lt;/li&gt;

&lt;li&gt;Startup without a populated &lt;tt&gt;/var&lt;/tt&gt;, but with configured
&lt;tt&gt;/etc&lt;/tt&gt;. (We will call these &lt;i&gt;volatile&lt;/i&gt; systems.)&lt;/li&gt;

&lt;li&gt;Startup without either &lt;tt&gt;/etc&lt;/tt&gt; or &lt;tt&gt;/var&lt;/tt&gt;. (We will
call these &lt;i&gt;stateless&lt;/i&gt; systems.)&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;A factory reset is just a special case of the latter two modes,
where the system boots up without &lt;tt&gt;/var&lt;/tt&gt; and &lt;tt&gt;/etc&lt;/tt&gt; but
the next boot is a normal stateful boot like like the first described
mode. Note that a mode where &lt;tt&gt;/etc&lt;/tt&gt; is flushed, but
&lt;tt&gt;/var&lt;/tt&gt; is not is nothing we intend to cover (why? well, the
user ID question becomes much harder, see below, and we simply saw no
usecase for it worth the trouble).&lt;/p&gt;

&lt;h4&gt;Problems&lt;/h4&gt;

&lt;p&gt;Booting up a system without a populated &lt;tt&gt;/var&lt;/tt&gt; is relatively
straight-forward. With &lt;a href="http://cgit.freedesktop.org/systemd/systemd/plain/tmpfiles.d/var.conf"&gt;a
few lines of tmpfiles configuration&lt;/a&gt; it is possible to populate
&lt;tt&gt;/var&lt;/tt&gt; with its basic structure in a way that is sufficient to
make a system boot cleanly. systemd version 214 and newer ship with
support for this. Of course, support for this scheme in systemd is
only a small part of the solution. While a lot of software
reconstructs the directory hierarchy it needs in &lt;tt&gt;/var&lt;/tt&gt;
automatically, many software does not. In case like this it is
necessary to ship a couple of additional tmpfiles lines that setup up
at boot-time the necessary files or directories in &lt;tt&gt;/var&lt;/tt&gt; to
make the software operate, similar to what RPM or DEB packages would
set up at installation time.&lt;/p&gt;

&lt;p&gt;Booting up a system without a populated &lt;tt&gt;/etc&lt;/tt&gt; is a more
difficult task. In &lt;tt&gt;/etc&lt;/tt&gt; we have a lot of configuration bits
that are essential for the system to operate, for example and most
importantly system user and group information in &lt;tt&gt;/etc/passwd&lt;/tt&gt;
and &lt;tt&gt;/etc/group&lt;/tt&gt;. If the system boots up without &lt;tt&gt;/etc&lt;/tt&gt;
there must be a way to replicate the minimal information necessary in
it, so that the system manages to boot up fully.&lt;/p&gt;

&lt;p&gt;To make this even more complex, in order to support "offline"
updates of &lt;tt&gt;/usr&lt;/tt&gt; that are replicated into a number of systems
possessing private &lt;tt&gt;/etc&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt; there needs to be a
way how these directories can be upgraded transparently when
necessary, for example by recreating caches like
&lt;tt&gt;/etc/ld.so.cache&lt;/tt&gt; or adding missing system users to
&lt;tt&gt;/etc/passwd&lt;/tt&gt; on next reboot.&lt;/p&gt;

&lt;p&gt;Starting with systemd 215 (yet unreleased, as I type this) we will
ship with a number of features in systemd that make &lt;tt&gt;/etc&lt;/tt&gt;-less
boots functional:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;&lt;p&gt;A new tool &lt;tt&gt;systemd-sysusers&lt;/tt&gt; as been added. It introduces
a new drop-in directory &lt;tt&gt;/usr/lib/sysusers.d/&lt;/tt&gt;. Minimal
descriptions of necessary system users and groups can be placed
there. Whenever the tool is invoked it will create these users in
&lt;tt&gt;/etc/passwd&lt;/tt&gt; and &lt;tt&gt;/etc/group&lt;/tt&gt; should they be
missing. It is only suitable for creating system users and groups, not
for normal users. It will write to the files directly via the
appropriate glibc APIs, which is the right thing to do for system
users. (For normal users no such APIs exist, as the users might be
stored centrally on LDAP or suchlike, and they are out of focus for
our usecase.) The major benefit of this tool is that system user
definition can happen offline: a package simply has to drop in a new
file to register a user. This makes system user registration
&lt;i&gt;declarative&lt;/i&gt; instead of &lt;i&gt;imperative&lt;/i&gt; -- which is the way
how system users are traditionally created from RPM or DEB
installation scripts. By being declarative it is easy to replicate the
users on next boot to a number of system instances.&lt;/p&gt;

&lt;p&gt;To make this new
tool interesting for packaging scripts we make it easy to
alternatively invoke it during package installation time, thus being a
good alternative to invocations of &lt;tt&gt;useradd -r&lt;/tt&gt; and
&lt;tt&gt;groupadd -r&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;Some OS designs use a static, fixed user/group list stored in
&lt;tt&gt;/usr&lt;/tt&gt; as primary database for users/groups, which fixed
UID/GID mappings. While this works for specific systems, this cannot
cover the general purpose. As the UID/GID range for system
users/groups is very small (only containing 998 users and groups on most systems), the
best has to be made from this space and only UIDs/GIDs necessary on
the specific system should be allocated. This means allocation has to
be dynamic and adjust to what is necessary.&lt;/p&gt;

&lt;p&gt;Also note that this tool has
one very nice feature: in addition to fully dynamic, and fully static
UID/GID assignment for the users to create, it supports reading
UID/GID numbers off existing files in &lt;tt&gt;/usr&lt;/tt&gt;, so that vendors
can make use of setuid/setgid binaries owned by specific users.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;We also added a &lt;a href="http://cgit.freedesktop.org/systemd/systemd/plain/sysusers.d/systemd.conf.in"&gt;default
user definition list&lt;/a&gt; which creates the most basic users the system
and systemd need. Of course, very likely downstream distributions
might need to alter this default list, add new entries and possibly
map specific users to particular numeric UIDs.&lt;/li&gt;

&lt;li&gt;A new condition &lt;tt&gt;ConditionNeedsUpdate=&lt;/tt&gt; has been
added. With this mechanism it is possible to conditionalize execution
of services depending on whether &lt;tt&gt;/usr&lt;/tt&gt; is newer than
&lt;tt&gt;/etc&lt;/tt&gt; or &lt;tt&gt;/var&lt;/tt&gt;. The idea is that various services that
need to be added into the boot process on upgrades make use of this to
not delay boot-ups on normal boots, but run as necessary should
&lt;tt&gt;/usr&lt;/tt&gt; have been update since the last boot. This is
implemented based on the &lt;tt&gt;mtime&lt;/tt&gt; timestamp of the
&lt;tt&gt;/usr&lt;/tt&gt;: if the OS has been updated the packaging software
should &lt;i&gt;touch&lt;/i&gt; the directory, thus informing all instances that
an upgrade of &lt;tt&gt;/etc&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt; might be necessary.&lt;/li&gt;

&lt;li&gt;We added a number of service files, that make use of the new
&lt;tt&gt;ConditionNeedsUpdate=&lt;/tt&gt; switch, and run a couple of services
after each update. Among them are the aforementiond
&lt;tt&gt;systemd-sysusers&lt;/tt&gt; tool, as well as services that rebuild the
udev hardware database, the journal catalog database and the library
cache in &lt;tt&gt;/etc/ld.so.cache&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;If systemd detects an empty &lt;tt&gt;/etc&lt;/tt&gt; at early boot it will
now use the &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.preset.html"&gt;unit
preset&lt;/a&gt; information to enable all services by default that the
vendor or packager declared. It will then proceed booting.&lt;/li&gt;

&lt;li&gt;We added &lt;a href="http://cgit.freedesktop.org/systemd/systemd/plain/tmpfiles.d/etc.conf"&gt;a
new tmpfiles snippet&lt;/a&gt; that is able to reconstruct the
most basic structure of &lt;tt&gt;/etc&lt;/tt&gt; if it is missing.&lt;/li&gt;

&lt;li&gt;tmpfiles also gained the ability copy entire directory trees into
place should they be missing. This is particularly useful for copying
certain essential files or directories into &lt;tt&gt;/etc&lt;/tt&gt; without
which the system refuses to boot. Currently the most prominent
candidates for this are &lt;tt&gt;/etc/pam.d&lt;/tt&gt; and
&lt;tt&gt;/etc/dbus-1&lt;/tt&gt;. In the long run we hope that packages can be
fixed so that they always work correctly without configuration in
&lt;tt&gt;/etc&lt;/tt&gt;. Depending on the software this means that they should
come with compiled-in defaults that just work should their
configuration file be missing, or that they should fall back to static
vendor-supplied configuration in &lt;tt&gt;/usr&lt;/tt&gt; that is used whenever
&lt;tt&gt;/etc&lt;/tt&gt; doesn't have any configuration. Both the PAM and the
D-Bus case are probably candidates for the latter. Given that there
are probably many cases like this we are working with a number of
folks to introduce a new directory called &lt;tt&gt;/usr/share/etc&lt;/tt&gt;
(name is not settled yet) to major distributions, that always
contain the full, original, vendor-supplied configuration of all
packages. This is very useful here, so that there's an obvious place
to copy the original configuration from, but it is also useful
completely independently as this provides administrators with an easy
place to &lt;tt&gt;diff&lt;/tt&gt; their own configuration in &lt;tt&gt;/etc&lt;/tt&gt;
against to see what local changes are in place.&lt;/li&gt;

&lt;li&gt;&lt;p&gt;We added a new &lt;tt&gt;--tmpfs=&lt;/tt&gt; switch to &lt;tt&gt;systemd-nspawn&lt;/tt&gt;
to make testing of systems with unpopulated &lt;tt&gt;/etc&lt;/tt&gt; and
&lt;tt&gt;/var&lt;/tt&gt; easy. For example, to run a fully state-less container, use a command line like this:&lt;/p&gt;

&lt;pre&gt;# system-nspawn -D /srv/mycontainer --read-only --tmpfs=/var --tmpfs=/etc -b&lt;/pre&gt;

&lt;p&gt;This command line will invoke the container tree stored in
&lt;tt&gt;/srv/mycontainer&lt;/tt&gt; in a read-only way, but with a (writable)
tmpfs mounted to &lt;tt&gt;/var&lt;/tt&gt; and &lt;tt&gt;/etc&lt;/tt&gt;. With a very recent
git snapshot of systemd invoking a Fedora rawhide system should mostly
work OK, modulo the D-Bus and PAM problems mentioned above. A later
version of &lt;tt&gt;systemd-nspawn&lt;/tt&gt; is likely to gain a high-level
switch &lt;tt&gt;--mode={stateful|volatile|stateless}&lt;/tt&gt; that sets
combines this into simple switches reusing the vocabulary introduced
earlier.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;What's Next&lt;/h3&gt;

&lt;p&gt;Pulling this all together we are very close to making boots with
empty &lt;tt&gt;/etc&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt; on general purpose Linux
operating systems a reality. Of course, while doing the groundwork in
systemd gets us some distance, there's a lot of work left. Most
importantly: the majority of Linux packages are simply incomptible
with this scheme the way they are currently set up. They do not work
without configuration in &lt;tt&gt;/etc&lt;/tt&gt; or state directories in
&lt;tt&gt;/var&lt;/tt&gt;; they do not drop system user information in
&lt;tt&gt;/usr/lib/sysusers.d&lt;/tt&gt;. However, we believe it's our job to do
the groundwork, and to start somewhere.&lt;/p&gt;

&lt;p&gt;So what does this mean for the next steps? Of course, currently
very little of this is available in any distribution (simply already
because 215 isn't even released yet). However, this will hopefully
change quickly. As soon as that is accomplished we can start working
on making the other components of the OS work nicely in this
scheme. If you are an upstream developer, please consider making your
software work correctly if &lt;tt&gt;/etc&lt;/tt&gt; and/or &lt;tt&gt;/var&lt;/tt&gt; are not
populated. This means:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;When you need a state directory in &lt;tt&gt;/var&lt;/tt&gt; and it is missing,
create it first. If you cannot do that, because you dropped priviliges
or suchlike, please consider dropping in a tmpfiles snippet that
creates the directory with the right permissions early at boot, should
it be missing.&lt;/li&gt;

&lt;li&gt;When you need configuration files in &lt;tt&gt;/etc&lt;/tt&gt; to work
properly, consider changing your application to work nicely when these
files are missing, and automatically fall back to either built-in
defaults, or to static vendor-supplied configuration files shipped in
&lt;tt&gt;/usr&lt;/tt&gt;, so that administrators can override configuration in
&lt;tt&gt;/etc&lt;/tt&gt; but if they don't the default configuration counts.&lt;/li&gt;

&lt;li&gt;When you need a system user or group, consider dropping in a file
into &lt;tt&gt;/usr/lib/sysusers.d&lt;/tt&gt; describing the users. (Currently
documentation on this is minimal, we will provide more docs on this
shortly.)&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;If you are a packager, you can also help on making this all work:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;Ask upstream to implement what we describe above, possibly even preparing a patch for this.&lt;/li&gt;

&lt;li&gt;If upstream will not make these changes, then consider dropping in
tmpfiles snippets that copy the bare minimum of configuration files to
make your software work from somewhere in &lt;tt&gt;/usr&lt;/tt&gt; into
&lt;tt&gt;/etc&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;Consider moving from imperative &lt;tt&gt;useradd&lt;/tt&gt; commands in
packaging scripts, to declarative &lt;tt&gt;sysusers&lt;/tt&gt; files. Ideally,
this is shipped upstream too, but if that's not possible then simply
adding this to packages should be good enough.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Of course, before moving to declarative system user definitions you
should consult with your distribution whether their packaging policy
even allows that. Currently, most distributions will not, so we have
to work to get this changed first.&lt;/p&gt;

&lt;p&gt;Anyway, so much about what we have been working on and where we want to take this.&lt;/p&gt;

&lt;h4&gt;Conclusion&lt;/h4&gt;

&lt;p&gt;Before we finish, let me stress again why we are doing all
this:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;For end-user machines like desktops, tablets or mobile phones, we
want a generic way to implement factory reset, which the user can make
use of when the system is broken (saves you support costs), or when he
wants to sell it and get rid of his private data, and renew that "fresh
car smell".&lt;/li&gt;

&lt;li&gt;For embedded machines we want a generic way how to reset
devices. We also want a way how every single boot can be identical to
a factory reset, in a stateless system design.&lt;/li&gt;

&lt;li&gt;For all kinds of systems we want to centralize vendor data in
&lt;tt&gt;/usr&lt;/tt&gt; so that it can be strictly read-only, and fully
cryptographically verified as one unit.&lt;/li&gt;

&lt;li&gt;We want to enable new kinds of OS installers that simply
deserialize a vendor OS &lt;tt&gt;/usr&lt;/tt&gt; snapshot into a new file system,
install a boot loader and reboot, leaving all first-time configuration
to the next boot.&lt;/li&gt;

&lt;li&gt;We want to enable new kinds of OS updaters that build on this, and
manage a number of vendor OS &lt;tt&gt;/usr&lt;/tt&gt; snapshots in verified states, and
which can then update &lt;tt&gt;/etc&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt; simply by
rebooting into a newer version.&lt;/li&gt;

&lt;li&gt;We wanto to scale container setups naturally, by sharing a single
&lt;i&gt;golden master&lt;/i&gt; &lt;tt&gt;/usr&lt;/tt&gt; tree with a large number of instances that
simply maintain their own private &lt;tt&gt;/etc&lt;/tt&gt; and &lt;tt&gt;/var&lt;/tt&gt; for
their private configuration and state, while still allowing clean
updates of &lt;tt&gt;/usr&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;We want to make thin clients that share &lt;tt&gt;/usr&lt;/tt&gt; across the
network work by allowing stateless bootups. During all discussions on
how &lt;tt&gt;/usr&lt;/tt&gt; was to be organized this was fequently mentioned. A
setup like this so far only worked in very specific cases, with this
scheme we want to make this work in general case.&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Of course, we have no illusions, just doing the groundwork for all
of this in systemd doesn't make this all a real-life solution
yet. Also, it's very unlikely that all of Fedora (or any other general
purpose distribution) will support this scheme for all its packages
soon, however, we are quite confident that the idea is convincing,
that we need to start somewhere, and that getting the most core
packages adapted to this shouldn't be out of reach.&lt;/p&gt;

&lt;p&gt;Oh, and of course, the concepts behind this are really not new, we
know that. However, what's new here is that we try to make them
available in a general purpose OS core, instead of special purpose
systems.&lt;/p&gt;

&lt;p&gt;Anyway, let's get the ball rolling! Late's make stateless systems a
reality!&lt;/p&gt;

&lt;p&gt;And that's all I have for now. I am sure this leaves a lot of
questions open. If you have any, join us on IRC on &lt;tt&gt;#systemd&lt;/tt&gt;
on freenode or comment on &lt;a href="https://plus.google.com/+LennartPoetteringTheOneAndOnly/posts/hT4jsCkmQzv"&gt;Google+&lt;/a&gt;.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 17 Jun 2014 18:13:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2014-06-17:/blog/projects/stateless.html</guid><category>projects</category></item><item><title>Upcoming Events</title><link>https://0pointer.net/blog/projects/dates.html</link><description>
                
&lt;p&gt;You are invited to three events:&lt;/p&gt;

&lt;p&gt;Christoph Wickert set up a &lt;a href="https://plus.google.com/events/cgbotu8inedql8qlecjo3a6glk8"&gt;Fedora 19
Release Party&lt;/a&gt; here in Berlin! Please join us on &lt;b&gt;Tuesday, July
2nd&lt;/b&gt;.&lt;/p&gt;

&lt;p&gt;We'll have another &lt;a href="https://plus.google.com/events/ck4p957u79bgm3jeiq8meh1b2ns"&gt;Berlin Open
Source Meetup&lt;/a&gt; on &lt;b&gt;Sunday, July 14th&lt;/b&gt;.&lt;/p&gt;

&lt;p&gt;And finally, theres' going to be another &lt;a href="https://plus.google.com/events/cb1urr7jt5p4voutfelci14c5qc"&gt;systemd
Hackfest&lt;/a&gt;, this time colocated with &lt;a href="https://www.guadec.org/"&gt;GUADEC&lt;/a&gt;, on &lt;b&gt;Tuesday/Wednesday, August 6th/7th&lt;/b&gt;.&lt;/p&gt;

&lt;p&gt;See you soon!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 01 Jul 2013 01:04:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-07-01:/blog/projects/dates.html</guid><category>projects</category></item><item><title>GNOME.Asia and LinuxCon Japan</title><link>https://0pointer.net/blog/projects/asia-2013.html</link><description>
                
&lt;p&gt;Two weeks ago I attended GNOME.Asia/Seoul and LinuxCon Japan/Tokyo, thanks
to sponsoring by the GNOME Foundation and the Linux Foundation. At GNOME.Asia I
spoke about &lt;a href="http://0pointer.de/public/gnome-asia-2013-apps.pdf"&gt;Sandboxed
Applications for GNOME&lt;/a&gt;, and at LinuxCon Japan about &lt;a href="http://0pointer.de/public/linuxcon-japan-2013-systemd.pdf"&gt;the first
three years of systemd&lt;/a&gt;. (I think at least the latter one was videotaped,
and recordings might show up on the net eventually). I like to believe both
talks went pretty well, and helped getting the message across to community what
we are working on and what the roadmap for us is, and what we expect from the
various projects, and especially GNOME.  However, for me personally the
&lt;i&gt;hallway track&lt;/i&gt; was the most interesting part. The personal Q&amp;amp;A regarding
our work on kdbus, cgroups, systemd and related projects where highly
interesting. In fact, at both conferences we had something like impromptu
hackfests on the topics of kdbus and cgroups, with some conferences attendees.
I also enjoyed the opportunity to be on Karen's upcoming GNOME podcast,
recorded in a session at Gyeongbokgung Palace in Seoul (what better place could
there be for a podcast recording?).&lt;/p&gt;

&lt;p&gt;I'd like to thank the GNOME and Linux foundations for sponsoring my attendance to these conferences. I'd especially like to thank the organizers of GNOME.Asia for their perfectly organized conference!&lt;/p&gt;

&lt;p&gt;&lt;img src="https://live.gnome.org/Travel/Policy?action=AttachFile&amp;amp;do=get&amp;amp;target=sponsored-badge-simple.png" alt="GNOME Travel Badge" /&gt;&lt;/p&gt;


        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sun, 09 Jun 2013 16:30:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-06-09:/blog/projects/asia-2013.html</guid><category>projects</category></item><item><title>It's Time Again!</title><link>https://0pointer.net/blog/projects/berlin-open-source-meetup-4.html</link><description>
                
&lt;p&gt;My fellow Berliners! There's another &lt;a href="https://plus.google.com/events/cnikpv83amqf0mr8cf0ag7f2qus"&gt;Berlin
Open Source Meetup&lt;/a&gt; scheduled for this Sunday! You are invited!&lt;/p&gt;

&lt;p&gt;See you on Sunday!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 08 Apr 2013 10:58:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-04-08:/blog/projects/berlin-open-source-meetup-4.html</guid><category>projects</category></item><item><title>What Are We Breaking Now?</title><link>https://0pointer.net/blog/projects/brno.html</link><description>
                
&lt;p&gt;End of February &lt;a href="http://www.devconf.cz/"&gt;devconf.cz&lt;/a&gt;
took place in Brno, Czech Republic. At the conference Kay Sievers,
Harald Hoyer and I did two presentations about our work on &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt;
and about the systemd Journal. These talks were taped and the
recordings are now available online.&lt;/p&gt;

&lt;p&gt;First, here's our talk about &lt;a href="https://www.youtube.com/watch?v=_rrpjYD373A"&gt;&lt;i&gt;What Are We
Breaking Now?&lt;/i&gt;&lt;/a&gt;, in which we try to give an overview on what we
are working on currently in the systemd context, and what we expect to
do in the next few months. We cover &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames"&gt;Predictable Network Interface
Names&lt;/a&gt;, the &lt;a href="http://www.freedesktop.org/wiki/Specifications/BootLoaderSpec"&gt;Boot
Loader Spec&lt;/a&gt;, kdbus, the Apps framework, and more.&lt;/p&gt;

&lt;object width="420" height="315"&gt;&lt;param name="movie" value="http://www.youtube.com/v/_rrpjYD373A?hl=en_US&amp;amp;version=3"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/_rrpjYD373A?hl=en_US&amp;amp;version=3" type="application/x-shockwave-flash" width="420" height="315" allowscriptaccess="always" allowfullscreen="true"&gt;&lt;/embed&gt;&lt;/object&gt;

&lt;p&gt;And then, I did my second talk about &lt;a href="https://www.youtube.com/watch?v=i4CACB7paLc"&gt;&lt;i&gt;The systemd
Journal&lt;/i&gt;&lt;/a&gt;, with a focus on how to practically make use of
&lt;tt&gt;journalctl&lt;/tt&gt;, as a day-to-day tool for administrators (these practical
bits start around 28:40). The commands demoed here are all explained in an &lt;a href="http://0pointer.de/blog/projects/journalctl.html"&gt;earlier blog story of
mine&lt;/a&gt;.&lt;/p&gt;

&lt;object width="420" height="315"&gt;&lt;param name="movie" value="http://www.youtube.com/v/i4CACB7paLc?hl=en_US&amp;amp;version=3"&gt;&lt;/param&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;/param&gt;&lt;param name="allowscriptaccess" value="always"&gt;&lt;/param&gt;&lt;embed src="http://www.youtube.com/v/i4CACB7paLc?hl=en_US&amp;amp;version=3" type="application/x-shockwave-flash" width="420" height="315" allowscriptaccess="always" allowfullscreen="true"&gt;&lt;/embed&gt;&lt;/object&gt;

&lt;p&gt;Unfortunately, the audience questions are sometimes hard or
impossible to understand from the videos, and sometimes the text on
the slides is hard to read, but I still believe that the two talks are
quite interesting.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 14 Mar 2013 16:58:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-03-14:/blog/projects/brno.html</guid><category>projects</category></item><item><title>systemd Hackfest!</title><link>https://0pointer.net/blog/projects/hackfest.html</link><description>
                
&lt;p&gt;Hey, you, systemd hacker, Fedora hacker! Listen up! This Thu/Fri is the &lt;a href="https://plus.google.com/u/0/events/cnklef88b85tb6tgf6ue3hn32lg"&gt;systemd
Hackfest&lt;/a&gt; in Brno/Czech Rep, right before &lt;a href="http://www.devconf.cz/"&gt;devconf.cz&lt;/a&gt;!  On thursday we'll talk about
(and hack on) all things systemd. And the hackfest friday is going to be a &lt;a href="https://fedoraproject.org/wiki/FAD_systemd_2013"&gt;Fedora Activity Day&lt;/a&gt;,
so we'll have a focus on systemd integration into Fedora.&lt;/p&gt;

&lt;p&gt;You are invited!&lt;/p&gt;

&lt;p&gt;See you in Brno!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 18 Feb 2013 18:59:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-02-18:/blog/projects/hackfest.html</guid><category>projects</category></item><item><title>The Biggest Myths</title><link>https://0pointer.net/blog/projects/the-biggest-myths.html</link><description>
                
&lt;p&gt;Since we first proposed &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt;
for inclusion in the distributions it has been frequently discussed in
many forums, mailing lists and conferences. In these discussions one
can often hear certain myths about systemd, that are repeated over and
over again, but certainly don't gain any truth by constant
repetition. Let's take the time to debunk a few of them:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is monolithic.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;If you build systemd with all configuration options enabled you
will build 69 individual binaries. These binaries all serve different
tasks, and are neatly separated for a number of reasons. For example,
we designed systemd with security in mind, hence most daemons run at
minimal privileges (using kernel capabilities, for example) and are
responsible for very specific tasks only, to minimize their security
surface and impact. Also, systemd parallelizes the boot more than any
prior solution. This parallization happens by running more processes
in parallel. Thus it is essential that systemd is nicely split up into
many binaries and thus processes. In fact, many of these
binaries&lt;sup&gt;[1]&lt;/sup&gt; are separated out so nicely, that they are very
useful outside of systemd, too.&lt;/p&gt;

&lt;p&gt;A package involving 69 individual binaries can hardly be called
&lt;i&gt;monolithic&lt;/i&gt;. What is different from prior solutions however,
is that we ship more components in a single tarball, and maintain them
upstream in a single repository with a unified release cycle.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is about speed.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Yes, systemd is fast (&lt;a href="https://plus.google.com/108087225644395745666/posts/LyPQgKdntgA"&gt;A
pretty complete userspace boot-up in ~900ms, anyone?&lt;/a&gt;), but that's
primarily just a side-effect of doing things &lt;i&gt;right&lt;/i&gt;. In fact, we
never really sat down and optimized the last tiny bit of performance
out of systemd. Instead, we actually frequently knowingly picked the
slightly slower code paths in order to keep the code more
readable. This doesn't mean being fast was irrelevant for us, but
reducing systemd to its speed is certainly quite a misconception,
since that is certainly not anywhere near the top of our list of
goals.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd's fast boot-up is irrelevant for
servers.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;That is just completely not true. Many administrators actually are
keen on reduced downtimes during maintenance windows. In High
Availability setups it's kinda nice if the failed machine comes back
up really fast. In cloud setups with a large number of VMs or
containers the price of slow boots multiplies with the number of
instances. Spending minutes of CPU and IO on really slow boots of
hundreds of VMs or containers reduces your system's density
drastically, heck, it even costs you more energy. Slow boots can be
quite financially expensive. Then, fast booting of containers allows
you to implement a logic such as &lt;a href="http://0pointer.de/blog/projects/socket-activated-containers.html"&gt;socket
activated containers&lt;/a&gt;, allowing you to drastically increase the
density of your cloud system.&lt;/p&gt;

&lt;p&gt;Of course, in many server setups boot-up is indeed irrelevant, but
systemd is supposed to cover the whole range. And yes, I am aware
that often it is the server firmware that costs the most time at
boot-up, and the OS anyways fast compared to that, but well, systemd
is still supposed to cover the whole range (see above...), and no,
not all servers have such bad firmware, and certainly not VMs and
containers, which are servers of a kind, too.&lt;sup&gt;[2]&lt;/sup&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is incompatible with shell scripts.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;This is entirely bogus. &lt;i&gt;We&lt;/i&gt; just don't use them for the boot
process, because we believe they aren't the best tool for that
specific purpose, but that doesn't mean systemd was incompatible with
them. You can easily run shell scripts as systemd services, heck, you
can run scripts written in &lt;i&gt;any&lt;/i&gt; language as systemd services,
systemd doesn't care the slightest bit what's inside your
executable. Moreover, we heavily use shell scripts for our own
purposes, for installing, building, testing systemd. And you can stick
your scripts in the early boot process, use them for normal services,
you can run them at latest shutdown, there are practically no
limits.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is difficult.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;This also is entire non-sense. A systemd platform is actually much
simpler than traditional Linuxes because it unifies
system objects and their dependencies as systemd units. The
configuration file language is very simple, and redundant
configuration files we got rid of. We provide uniform tools for much
of the configuration of the system. The system is much less
conglomerate than traditional Linuxes are. We also have pretty
comprehensive documentation (&lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;all linked
from the homepage&lt;/a&gt;) about pretty much every detail of systemd, and
this not only covers admin/user-facing interfaces, but also developer
APIs.&lt;/p&gt;

&lt;p&gt;systemd certainly comes with a learning curve. Everything
does. However, we like to believe that it is actually simpler to
understand systemd than a Shell-based boot for most people. Surprised
we say that? Well, as it turns out, Shell is not a pretty language to
learn, it's syntax is arcane and complex. systemd unit files are
substantially easier to understand, they do not expose a programming
language, but are simple and declarative by nature. That all said, if
you are experienced in shell, then yes, adopting systemd will take a
bit of learning.&lt;/p&gt;

&lt;p&gt;To make learning easy we tried hard to provide the maximum
compatibility to previous solutions. But not only that, on many
distributions you'll find that some of the traditional tools will now
even tell you -- while executing what you are asking for -- how you
could do it with the newer tools instead, in a possibly nicer way.&lt;/p&gt;

&lt;p&gt;Anyway, the take-away is probably that systemd is probably as
simple as such a system can be, and that we try hard to make it easy
to learn. But yes, if you know sysvinit then adopting systemd will
require a bit learning, but quite frankly if you mastered sysvinit,
then systemd should be easy for you.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is not modular.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Not true at all. At compile time you have a number of
&lt;tt&gt;configure&lt;/tt&gt; switches to select what you want to build, and what
not. And &lt;a href="http://freedesktop.org/wiki/Software/systemd/MinimalBuilds"&gt;we
document&lt;/a&gt; how you can select in even more detail what you need,
going beyond our configure switches.&lt;/p&gt;

&lt;p&gt;This modularity is not totally unlike the one of the Linux kernel,
where you can select many features individually at compile time. If the
kernel is modular enough for you then systemd should be pretty close,
too.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is only for desktops.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;That is certainly not true. With systemd we try to cover pretty
much the same range as Linux itself does. While we care for desktop
uses, we also care pretty much the same way for server uses, and
embedded uses as well. You can bet that Red Hat wouldn't make it a
core piece of RHEL7 if it wasn't the best option for managing services
on servers.&lt;/p&gt;

&lt;p&gt;People from numerous companies work on systemd. Car manufactureres
build it into cars, Red Hat uses it for a server operating system, and
GNOME uses many of its interfaces for improving the desktop. You find
it in toys, in space telescopes, and in wind turbines.&lt;/p&gt;

&lt;p&gt;Most features I most recently worked on are probably relevant
primarily on servers, such as &lt;a href="http://0pointer.de/blog/projects/socket-activated-containers.html"&gt;container
support&lt;/a&gt;, &lt;a href="http://0pointer.de/blog/projects/resources.html"&gt;resource
management&lt;/a&gt; or the &lt;a href="http://0pointer.de/blog/projects/security.html"&gt;security
features&lt;/a&gt;. We cover desktop systems pretty well already, and there
are number of companies doing systemd development for embedded, some
even offer consulting services in it.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd was created as result of the NIH syndrome.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;This is not true. Before we began working on systemd we were
pushing for Canonical's Upstart to be widely adopted (and Fedora/RHEL
used it too for a while). However, we eventually came to the
conclusion that its design was inherently flawed at its core (at least
in our eyes: most fundamentally, it leaves dependency management to
the admin/developer, instead of solving this hard problem in code),
and if something's wrong in the core you better replace it, rather
than fix it. This was hardly the only reason though, other things that
came into play, such as the licensing/contribution agreement mess
around it. NIH wasn't one of the reasons, though...&lt;sup&gt;[3]&lt;/sup&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is a freedesktop.org project.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Well, systemd is certainly hosted at fdo, but freedesktop.org is
little else but a repository for code and documentation. Pretty much
any coder can request a repository there and dump his stuff there (as
long as it's somewhat relevant for the infrastructure of free
systems). There's no cabal involved, no "standardization" scheme, no
project vetting, nothing. It's just a nice, free, reliable place to
have your repository. In that regard it's a bit like SourceForge,
github, kernel.org, just not commercial and without over-the-top
requirements, and hence a good place to keep our stuff.&lt;/p&gt;

&lt;p&gt;So yes, we host our stuff at fdo, but the implied assumption of
this myth in that there was a group of people who meet and then agree
on how the future free systems look like, is entirely bogus.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is not UNIX.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;There's certainly some truth in that. systemd's sources do not
contain a single line of code originating from original UNIX. However,
we derive inspiration from UNIX, and thus there's a ton of UNIX in
systemd. For example, the UNIX idea of "everything is a file" finds
reflection in that in systemd all services are exposed at runtime in a
kernel file system, the &lt;tt&gt;cgroupfs&lt;/tt&gt;. Then, one of the original
features of UNIX was multi-seat support, based on built-in terminal
support. Text terminals are hardly the state of the art how you
interface with your computer these days however. With systemd we
brought native &lt;a href="http://0pointer.de/blog/projects/multi-seat.html"&gt;multi-seat&lt;/a&gt;
support back, but this time with full support for today's hardware,
covering graphics, mice, audio, webcams and more, and all that fully
automatic, hotplug-capable and without configuration. In fact the
design of systemd as a suite of integrated tools that each have their
individual purposes but when used together are more than just the sum
of the parts, that's pretty much at the core of UNIX philosophy. Then,
the way our project is handled (i.e. maintaining much of the core OS
in a single git repository) is much closer to the BSD model (which is
a true UNIX, unlike Linux) of doing things (where most of the core OS
is kept in a single CVS/SVN repository) than things on Linux ever
were.&lt;/p&gt;

&lt;p&gt;Ultimately, UNIX is something different for everybody. For us
systemd maintainers it is something we derive inspiration from. For
others it is a religion, and much like the other world religions there
are different readings and understandings of it. Some define UNIX
based on specific pieces of code heritage, others see it just as a set
of ideas, others as a set of commands or APIs, and even others as a
definition of behaviours. Of course, it is impossible to ever make all
these people happy.&lt;/p&gt;

&lt;p&gt;Ultimately the question whether something is UNIX or not matters
very little. Being technically excellent is hardly exclusive to
UNIX. For us, UNIX is a major influence (heck, the biggest one), but
we also have other influences. Hence in some areas systemd will be
very UNIXy, and in others a little bit less.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is complex.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;There's certainly some truth in that. Modern computers are complex
beasts, and the OS running on it will hence have to be complex
too. However, systemd is certainly not more complex than prior
implementations of the same components. Much rather, it's simpler, and
has less redundancy (see above). Moreover, building a simple OS based
on systemd will involve much fewer packages than a traditional Linux
did. Fewer packages makes it easier to build your system, gets rid of
interdependencies and of much of the different behaviour of every
component involved.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is bloated.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Well, &lt;i&gt;bloated&lt;/i&gt; certainly has many different definitions. But in
most definitions systemd is probably the opposite of bloat. Since
systemd components share a common code base, they tend to share much
more code for common code paths. Here's an example: in a traditional
Linux setup, sysvinit, start-stop-daemon, inetd, cron, dbus, all
implemented a scheme to execute processes with various configuration
options in a certain, hopefully clean environment. On systemd the code
paths for all of this, for the configuration parsing, as well as the
actual execution is shared. This means less code, less place for
mistakes, less memory and cache pressure, and is thus a very good
thing. And as a side-effect you actually get a ton more functionality
for it...&lt;/p&gt;

&lt;p&gt;As mentioned above, systemd is also pretty modular. You can choose
at build time which components you need, and which you don't
need. People can hence specifically choose the level of "bloat" they
want.&lt;/p&gt;

&lt;p&gt;When you build systemd, it only requires three dependencies: glibc,
libcap and dbus. That's it. It can make use of more dependencies, but
these are entirely optional.&lt;/p&gt;

&lt;p&gt;So, yeah, whichever way you look at it, it's really not
&lt;i&gt;bloated&lt;/i&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd being Linux-only is not nice to the BSDs.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Completely wrong. The BSD folks are pretty much uninterested in
systemd. If systemd was portable, this would change nothing, they
still wouldn't adopt it. And the same is true for the other Unixes in
the world. Solaris has SMF, BSD has their own "rc" system, and they
always maintained it separately from Linux. The init system is very
close to the core of the entire OS. And these other operating systems
hence define themselves among other things by their core
userspace. The assumption that they'd adopt our core userspace if we
just made it portable, is completely without any foundation.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd being Linux-only makes it impossible for Debian to adopt it as default.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Debian supports non-Linux kernels in their distribution. systemd
won't run on those. Is that a problem though, and should that hinder
them to adopt system as default? Not really. The folks who ported
Debian to these other kernels were willing to invest time in a massive
porting effort, they set up test and build systems, and patched and
built numerous packages for their goal. The maintainance of both a
systemd unit file and a classic init script for the packaged services
is a negligable amount of work compared to that, especially since
those scripts more often than not exist already.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd could be ported to other kernels if its maintainers just wanted to.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;That is simply not true. Porting systemd to other kernel is not
feasible. We just use too many Linux-specific interfaces. For a few
one might find replacements on other kernels, some features one might
want to turn off, but for most this is nor really possible. Here's a
small, very incomprehensive list: &lt;tt&gt;cgroups, fanotify, umount2(),
/proc/self/mountinfo &lt;/tt&gt;(including notification)&lt;tt&gt;, /dev/swaps &lt;/tt&gt;(same)&lt;tt&gt;,
udev, netlink, &lt;/tt&gt;the structure of&lt;tt&gt; /sys, /proc/$PID/comm,
/proc/$PID/cmdline, /proc/$PID/loginuid, /proc/$PID/stat,
/proc/$PID/session, /proc/$PID/exe, /proc/$PID/fd, tmpfs, devtmpfs,
&lt;/tt&gt;capabilities, namespaces of all kinds, various&lt;tt&gt; prctl()s, &lt;/tt&gt;numerous&lt;tt&gt;
ioctls, &lt;/tt&gt;the&lt;tt&gt; mount() &lt;/tt&gt;system call and its semantics&lt;tt&gt;, selinux, audit,
inotify, statfs, O_DIRECTORY, O_NOATIME, /proc/$PID/root, waitid(),
SCM_CREDENTIALS, SCM_RIGHTS, mkostemp(), /dev/input, ...&lt;/tt&gt;&lt;/p&gt;

&lt;p&gt;And no, if you look at this list and pick out the few where you can
think of obvious counterparts on other kernels, then think again, and
look at the others you didn't pick, and the complexity of replacing
them.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is not portable for no reason.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Non-sense! We use the Linux-specific functionality because we need
it to implement what we want. Linux has so many features that
UNIX/POSIX didn't have, and we want to empower the user with
them. These features are incredibly useful, but only if they are
actually exposed in a friendly way to the user, and that's what we do
with systemd.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd uses binary configuration files.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;No idea who came up with this crazy myth, but it's absolutely not
true. systemd is configured pretty much exclusively via simple text
files. A few settings you can also alter with the kernel command line
and via environment variables. There's nothing binary in its
configuration (not even XML). Just plain, simple, easy-to-read text
files.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is a feature creep.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Well, systemd certainly covers more ground that it used to. It's
not just an init system anymore, but the basic userspace building
block to build an OS from, but we carefully make sure to keep most of
the features optional. You can turn a lot off at compile time, and
even more at runtime. Thus you can choose freely how much feature
creeping you want.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd forces you to do something.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;systemd is not the mafia. It's Free Software, you can do with it
whatever you want, and that includes not using it. That's pretty much
the opposite of "forcing".&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd makes it impossible to run syslog.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Not true, we carefully made sure when &lt;a href="http://0pointer.de/blog/projects/the-journal.html"&gt;we introduced
the journal&lt;/a&gt; that all data is also passed on to any syslog daemon
running. In fact, if something changed, then only that syslog gets
more complete data now than it got before, since we now cover early
boot stuff as well as STDOUT/STDERR of any system service.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is incompatible.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;We try very hard to provide the best possible compatibility with
sysvinit. In fact, the vast majority of init scripts should work just
fine on systemd, unmodified. However, there actually are indeed a few
incompatibilities, but we try to &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/Incompatibilities"&gt;document
these&lt;/a&gt; and explain what to do about them. Ultimately every system
that is not actually sysvinit itself will have a certain amount of
incompatibilities with it since it will not share the exect same code
paths.&lt;/p&gt;

&lt;p&gt;It is our goal to ensure that differences between the various
distributions are kept at a minimum. That means unit files usually
work just fine on a different distribution than you wrote it on, which
is a big improvement over classic init scripts which are very hard to
write in a way that they run on multiple Linux distributions, due to
numerous incompatibilities between them.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is not scriptable, because of its D-Bus use.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Not true. Pretty much every single D-Bus interface systemd provides
is also available in a command line tool, for example in &lt;a href="http://www.freedesktop.org/software/systemd/man/systemctl.html"&gt;&lt;tt&gt;systemctl&lt;/tt&gt;&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/loginctl.html"&gt;&lt;tt&gt;loginctl&lt;/tt&gt;&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/timedatectl.html"&gt;&lt;tt&gt;timedatectl&lt;/tt&gt;&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/hostnamectl.html"&gt;&lt;tt&gt;hostnamectl&lt;/tt&gt;&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/localectl.html"&gt;&lt;tt&gt;localectl&lt;/tt&gt;&lt;/a&gt;
and suchlike. You can easily call these tools from shell scripts, they
open up pretty much the entire API from the command line with
easy-to-use commands.&lt;/p&gt;

&lt;p&gt;That said, D-Bus actually has bindings for almost any scripting
language this world knows. Even from the shell you can invoke
arbitrary D-Bus methods with &lt;a href="http://dbus.freedesktop.org/doc/dbus-send.1.html"&gt;dbus-send&lt;/a&gt;
or &lt;a href="http://developer.gnome.org/gio/unstable/gdbus.html"&gt;gdbus&lt;/a&gt;. If
anything, this improves scriptability due to the good support of D-Bus
in the various scripting languages.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd requires you to use some arcane configuration
tools instead of allowing you to edit your configuration files
directly.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Not true at all. We offer some configuration tools, and using them
gets you a bit of additional functionality (for example, command line
completion for all settings!), but there's no need at all to use
them. You can always edit the files in question directly if you wish,
and that's fully supported. Of course sometimes you need to explicitly
reload configuration of some daemon after editing the configuration,
but that's pretty much true for most UNIX services.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is unstable and buggy.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Certainly not according to our data. We have been monitoring the
Fedora bug tracker (and some others) closely for a long long time. The
number of bugs is very low for such a central component of the OS,
especially if you discount the numerous RFE bugs we track for the
project. We are pretty good in keeping systemd out of the list of
blocker bugs of the distribution. We have a relatively fast
development cycle with mostly incremental changes to keep quality and
stability high.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is not debuggable.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;False. Some people try to imply that the shell was a good
debugger. Well, it isn't really. In systemd we provide you with actual
debugging features instead. For example: interactive debugging,
verbose tracing, the ability to mask any component during boot, and
more. Also, we provide &lt;a href="http://freedesktop.org/wiki/Software/systemd/Debugging"&gt;documentation
for it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It's certainly well debuggable, we needed that for our own
development work, after all. But we'll grant you one thing: it uses
different debugging tools, we believe more appropriate ones for the
purpose, though.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd makes changes for the changes' sake.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Very much untrue. We pretty much exclusively have technical
reasons for the changes we make, and we explain them in the various
pieces of documentation, wiki pages, blog articles, mailing list
announcements. We try hard to avoid making incompatible changes, and
if we do we try to document the why and how in detail. And if you
wonder about something, just ask us!&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd is a Red-Hat-only project, is private property
of some smart-ass developers, who use it to push their views to the
world.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Not true. Currently, there are 16 hackers with commit powers to the
systemd git tree. Of these 16 only six are employed by Red Hat. The 10
others are folks from ArchLinux, from Debian, from Intel, even from
Canonical, Mandriva, Pantheon and a number of community folks with
full commit rights. And they frequently commit big stuff, major
changes. Then, there are 374 individuals with patches in our tree, and
they too came from a number of different companies and backgrounds,
and many of those have way more than one patch in the tree. The
discussions about where we want to take systemd are done in the open,
on our IRC channel (&lt;tt&gt;#systemd&lt;/tt&gt; on freenode, you are always
weclome), on our &lt;a href="http://lists.freedesktop.org/mailman/listinfo/systemd-devel"&gt;mailing
list&lt;/a&gt;, and on public hackfests (&lt;a href="https://plus.google.com/events/cnklef88b85tb6tgf6ue3hn32lg"&gt;such
as our next one in Brno&lt;/a&gt;, you are invited). We regularly attend
various conferences, to collect feedback, to explain what we are doing
and why, like few others do. We &lt;a href="http://0pointer.de/blog"&gt;maintain blogs&lt;/a&gt;, engage in social
networks (&lt;a href="https://plus.google.com/104232583922197692623/posts"&gt;we actually
have some pretty interesting content on Google+&lt;/a&gt;, and our &lt;a href="https://plus.google.com/communities/114587707547576757881"&gt;Google+
Community is pretty alive, too&lt;/a&gt;.), and try really hard to explain
the why and the how how we do things, and to listen to feedback and
figure out where the current issues are (for example, from that
feedback we compiled this lists of often heard myths about
systemd...).&lt;/p&gt;

&lt;p&gt;What most systemd contributors probably share is a rough idea how a
good OS should look like, and the desire to make it happen. However,
by the very nature of the project being Open Source, and rooted in the
community systemd is just what people want it to be, and if it's not
what they want then they can drive the direction with patches and
code, and if that's not feasible, then there are numerous other
options to use, too, systemd is never exclusive.&lt;/p&gt;

&lt;p&gt;One goal of systemd is to unify the dispersed Linux landscape a
bit. We try to get rid of many of the more pointless differences of
the various distributions in various areas of the core OS. As part of
that we sometimes adopt schemes that were previously used by only one
of the distributions and push it to a level where it's the default of
systemd, trying to gently push everybody towards the same set of basic
configuration. This is never exclusive though, distributions can
continue to deviate from that if they wish, however, if they end-up
using the well-supported default their work becomes much easier and
they might gain a feature or two. Now, as it turns out, more
frequently than not we actually adopted schemes that where Debianisms,
rather than Fedoraisms/Redhatisms as best supported scheme by
systemd. For example, systems running systemd now generally store
their hostname in &lt;tt&gt;/etc/hostname&lt;/tt&gt;, something that used to be
specific to Debian and now is used across distributions.&lt;/p&gt;

&lt;p&gt;One thing we'll grant you though, we sometimes can be
smart-asses. We try to be prepared whenever we open our mouth, in
order to be able to back-up with facts what we claim. That might make
us appear as smart-asses.&lt;/p&gt;

&lt;p&gt;But in general, yes, some of the more influental contributors of
systemd work for Red Hat, but they are in the minority, and systemd is
a healthy, open community with different interests, different
backgrounds, just unified by a few rough ideas where the trip should
go, a community where code and its design counts, and certainly not
company affiliation.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd doesn't support &lt;tt&gt;/usr&lt;/tt&gt; split from the root directory.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Non-sense. Since its beginnings systemd supports the
&lt;tt&gt;--with-rootprefix=&lt;/tt&gt; option to its &lt;tt&gt;configure&lt;/tt&gt; script
which allows you to tell systemd to neatly split up the stuff needed
for early boot and the stuff needed for later on. All this logic is
fully present and we keep it up-to-date right there in systemd's build
system.&lt;/p&gt;

&lt;p&gt;Of course, we still don't think that &lt;a href="http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken"&gt;actually
booting with &lt;tt&gt;/usr&lt;/tt&gt; unavailable is a good idea&lt;/a&gt;, but we
support this just fine in our build system. This won't fix the
inherent problems of the scheme that you'll encounter all across the
board, but you can't blame that on systemd, because in systemd we
support this just fine.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd doesn't allow your to replace its components.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Not true, you can turn off and replace pretty much any part of
systemd, with very few exceptions. And those exceptions (such as
journald) generally allow you to run an alternative side by side to
it, while cooperating nicely with it.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;b&gt;Myth: systemd's use of D-Bus instead of sockets makes it intransparent.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;This claim is already contradictory in itself: D-Bus uses sockets
as transport, too. Hence whenever D-Bus is used to send something
around, a socket is used for that too. D-Bus is mostly a standardized
serialization of messages to send over these sockets. If anything this
makes it more transparent, since this serialization is well
documented, understood and there are numerous tracing tools and
language bindings for it. This is very much unlike the usual
homegrown protocols the various classic UNIX daemons use to
communicate locally.&lt;/p&gt;&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Hmm, did I write I just wanted to debunk a "few" myths? Maybe these
were more than just a few... Anyway, I hope I managed to clear up a
couple of misconceptions. Thanks for your time.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footnotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] For example, &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"&gt;&lt;tt&gt;systemd-detect-virt&lt;/tt&gt;&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-tmpfiles.html"&gt;&lt;tt&gt;systemd-tmpfiles&lt;/tt&gt;&lt;/a&gt;,
&lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-udevd.service.html"&gt;&lt;tt&gt;systemd-udevd&lt;/tt&gt;&lt;/a&gt; are.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] Also, we are trying to do our little part on maybe
making this better. By exposing boot-time performance of the firmware
more prominently in systemd's boot output we hope to shame the
firmware writers to clean up their stuff.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[3] And anyways, guess which project includes a library "lib&lt;i&gt;nih&lt;/i&gt;" -- Upstart or systemd?&lt;sup&gt;[4]&lt;/sup&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[4] Hint: it's not systemd!&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sat, 26 Jan 2013 02:43:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-01-26:/blog/projects/the-biggest-myths.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XX</title><link>https://0pointer.net/blog/projects/socket-activated-containers.html</link><description>
                
&lt;p&gt; &lt;a href="http://0pointer.de/blog/projects/detect-virt.html"&gt;This is&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/resources.html"&gt;no&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/journalctl.html"&gt;time&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/serial-console.html"&gt;for&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/watchdog.html"&gt;procrastination,&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/self-documented-boot.html"&gt;here&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;is&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;already&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;twentieth&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Socket Activated Internet Services and OS Containers&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/socket-activation.html"&gt;Socket&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/socket-activation2.html"&gt;Activation&lt;/a&gt;
is an important feature of &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt;. When
we &lt;a href="http://0pointer.de/blog/projects/systemd.html"&gt;first
announced&lt;/a&gt; systemd we already tried to make the point how great
socket activation is for increasing parallelization and robustness of
socket services, but also for simplifying the dependency logic of the
boot. In this episode I'd like to explain why socket activation is an
important tool for drastically improving how many services and even
containers you can run on a single system with the same resource
usage. Or in other words, how you can drive up the density of customer
sites on a system while spending less on new hardware.&lt;/p&gt;

&lt;h5&gt;Socket Activated Internet Services&lt;/h5&gt;

&lt;p&gt;First, let's take a step back. What was &lt;i&gt;socket activation&lt;/i&gt; again? --
Basically, socket activation simply means that systemd sets up
listening sockets (IP or otherwise) on behalf of your services
(without these running yet), and then starts (&lt;i&gt;activates&lt;/i&gt;) the
services as soon as the first connection comes in. Depending on the
technology the services might idle for a while after having processed
the connection and possible follow-up connections before they exit on
their own, so that systemd will again listen on the sockets and
activate the services again the next time they are connected to. For
the client it is not visible whether the service it is interested in
is currently running or not. The service's IP socket stays continously
connectable, no connection attempt ever fails, and all connects will
be processed promptly.&lt;/p&gt;

&lt;p&gt;A setup like this lowers resource usage: as services are only
running when needed they only consume resources when required. Many
internet sites and services can benefit from that. For example, web
site hosters will have noticed that of the multitude of web sites that
are on the Internet only a tiny fraction gets a continous stream of
requests: the huge majority of web sites still needs to be available
all the time but gets requests only very unfrequently. With a scheme
like socket activation you take benefit of this. By hosting many of
these sites on a single system like this and only activating their
services as necessary allows a large degree of over-commit: you can
run more sites on your system than the available resources actually
allow. Of course, one shouldn't over-commit too much to avoid
contention during peak times.&lt;/p&gt;

&lt;p&gt;Socket activation like this is easy to use in systemd. Many modern
Internet daemons already support socket activation out of the box (and
for those which don't yet it's &lt;a href="http://0pointer.de/blog/projects/socket-activation.html"&gt;not&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/socket-activation2.html"&gt;hard&lt;/a&gt;
to add). Together with systemd's &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;instantiated
units support&lt;/a&gt; it is easy to write a pair of service and socket
templates that then may be instantiated multiple times, once for each
site. Then, (optionally) make use of some of the &lt;a href="http://0pointer.de/blog/projects/security.html"&gt;security
features&lt;/a&gt; of systemd to nicely isolate the customer's site's
services from each other (think: each customer's service should only
see the home directory of the customer, everybody else's directories
should be invisible), and there you go: you now have a highly scalable
and reliable server system, that serves a maximum of securely
sandboxed services at a minimum of resources, and all nicely done with
built-in technology of your OS.&lt;/p&gt;

&lt;p&gt;This kind of setup is already in production use in a number of
companies. For example, the great folks at &lt;a href="https://www.getpantheon.com/"&gt;Pantheon&lt;/a&gt; are running their
scalable instant Drupal system on a setup that is similar to this. (In
fact, Pantheon's David Strauss pioneered this scheme. David, you
rock!)&lt;/p&gt;

&lt;h5&gt;Socket Activated OS Containers&lt;/h5&gt;

&lt;p&gt;All of the above can already be done with older versions of
systemd. If you use a distribution that is based on systemd, you can
right-away set up a system like the one explained above. But let's
take this one step further. With systemd 197 (to be included in Fedora
19), we added support for socket activating not only individual
services, but &lt;i&gt;entire&lt;/i&gt; OS containers. And I really have to say it
at this point: this is stuff I am really excited
about. ;-)&lt;/p&gt;

&lt;p&gt;Basically, with socket activated OS containers, the host's systemd
instance will listen on a number of ports on behalf of a container,
for example one for SSH, one for web and one for the database, and as
soon as the first connection comes in, it will spawn the container
this is intended for, and pass to it all three sockets. Inside of the
container, another systemd is running and will accept the sockets and
then distribute them further, to the services running inside the
container using normal socket activation. The SSH, web and database
services will only see the inside of the container, even though they
have been activated by sockets that were originally created on the
host! Again, to the clients this all is not visible. That an entire OS
container is spawned, triggered by simple network connection is entirely
transparent to the client side.&lt;sup&gt;[1]&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The OS containers may contain (as the name suggests) a full
operating system, that might even be a different distribution than is
running on the host. For example, you could run your host on Fedora,
but run a number of Debian containers inside of it. The OS containers
will have their own systemd init system, their own SSH instances,
their own process tree, and so on, but will share a number of other
facilities (such as memory management) with the host.&lt;/p&gt;

&lt;p&gt;For now, only systemd's own trivial container manager, &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;systemd-nspawn&lt;/a&gt;
has been updated to support this kind of socket activation. We hope
that &lt;a href="http://libvirt.org/drvlxc.html"&gt;libvirt-lxc&lt;/a&gt; will
soon gain similar functionality. At this point, let's see in more
detail how such a setup is configured in systemd using nspawn:&lt;/p&gt;

&lt;p&gt;First, please use a tool such as &lt;tt&gt;debootstrap&lt;/tt&gt; or yum's
&lt;tt&gt;--installroot&lt;/tt&gt; to set up a container OS
tree&lt;sup&gt;[2]&lt;/sup&gt;. The details of that are a bit out-of-focus
for this story, there's plenty of documentation around how to do
this. Of course, make sure you have systemd v197 installed inside
the container. For accessing the container from the command line,
consider using &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;systemd-nspawn&lt;/a&gt;
itself. After you configured everything properly, try to boot it up
from the command line with systemd-nspawn's &lt;tt&gt;-b&lt;/tt&gt; switch.&lt;/p&gt;

&lt;p&gt;Assuming you now have a working container that boots up fine, let's
write a service file for it, to turn the container into a systemd
service on the host you can start and stop. Let's create
&lt;tt&gt;/etc/systemd/system/mycontainer.service&lt;/tt&gt; on the host:&lt;/p&gt;

&lt;pre&gt;
[Unit]
Description=My little container

[Service]
ExecStart=/usr/bin/systemd-nspawn -jbD /srv/mycontainer 3
KillMode=process
&lt;/pre&gt;

&lt;p&gt;This service can already be started and stopped via &lt;tt&gt;systemctl
start&lt;/tt&gt; and &lt;tt&gt;systemctl stop&lt;/tt&gt;. However, there's no nice way
to actually get a shell prompt inside the container. So let's add SSH
to it, and even more: let's configure SSH so that a connection to the
container's SSH port will socket-activate the entire container. First,
let's begin with telling the host that it shall now listen on the SSH
port of the container. Let's create
&lt;tt&gt;/etc/systemd/system/mycontainer.socket&lt;/tt&gt; on the host:&lt;/p&gt;

&lt;pre&gt;
[Unit]
Description=The SSH socket of my little container

[Socket]
ListenStream=23
&lt;/pre&gt;

&lt;p&gt;If we start this unit with &lt;tt&gt;systemctl start&lt;/tt&gt; on the host
then it will listen on port 23, and as soon as a connection comes in
it will activate our container service we defined above. We pick port
23 here, instead of the usual 22, as our host's SSH is already
listening on that. nspawn virtualizes the process list and the file
system tree, but does not actually virtualize the network stack, hence
we just pick different ports for the host and the various containers
here.&lt;/p&gt;

&lt;p&gt;Of course, the system inside the container doesn't yet know what to
do with the socket it gets passed due to socket activation. If you'd
now try to connect to the port, the container would start-up but the
incoming connection would be immediately closed since the container
can't handle it yet. Let's fix that!&lt;/p&gt;

&lt;p&gt;All that's necessary for that is teach SSH inside the container
socket activation. For that let's simply write a pair of socket and
service units for SSH. Let's create
&lt;tt&gt;/etc/systemd/system/sshd.socket&lt;/tt&gt; in the container:&lt;/p&gt;

&lt;pre&gt;[Unit]
Description=SSH Socket for Per-Connection Servers

[Socket]
ListenStream=23
Accept=yes&lt;/pre&gt;

&lt;p&gt;Then, let's add the matching SSH service file
&lt;tt&gt;/etc/systemd/system/sshd@.service&lt;/tt&gt; in the container:&lt;/p&gt;

&lt;pre&gt;[Unit]
Description=SSH Per-Connection Server for %I

[Service]
ExecStart=-/usr/sbin/sshd -i
StandardInput=socket&lt;/pre&gt;

&lt;p&gt;Then, make sure to hook &lt;tt&gt;sshd.socket&lt;/tt&gt; into the
&lt;tt&gt;sockets.target&lt;/tt&gt; so that unit is started automatically when the
container boots up:&lt;/p&gt;

&lt;pre&gt;ln -s /etc/systemd/system/sshd.socket /etc/systemd/system/sockets.target.wants/&lt;/pre&gt;

&lt;p&gt;And that's it. If we now activate &lt;tt&gt;mycontainer.socket&lt;/tt&gt; on
the host, the host's systemd will bind the socket and we can connect
to it. If we do this, the host's systemd will activate the container,
and pass the socket in to it. The container's systemd will then take
the socket, match it up with &lt;tt&gt;sshd.socket&lt;/tt&gt; inside the
container. As there's still our incoming connection queued on it, it
will then immediately trigger an instance of &lt;tt&gt;sshd@.service&lt;/tt&gt;,
and we'll have our login.&lt;/p&gt;

&lt;p&gt;And that's already everything there is to it. You can easily add
additional sockets to listen on to
&lt;tt&gt;mycontainer.socket&lt;/tt&gt;. Everything listed therein will be passed
to the container on activation, and will be matched up as good as
possible with all socket units configured inside the
container. Sockets that cannot be matched up will be closed, and
sockets that aren't passed in but are configured for listening will be
bound be the container's systemd instance.&lt;/p&gt;

&lt;p&gt;So, let's take a step back again. What did we gain through all of
this? Well, basically, we can now offer a number of full OS containers
on a single host, and the containers can offer their services without
running continously. The density of OS containers on the host can
hence be increased drastically.&lt;/p&gt;

&lt;p&gt;Of course, this only works for kernel-based virtualization, not for
hardware virtualization. i.e. something like this can only be
implemented on systems such as libvirt-lxc or nspawn, but not in
qemu/kvm.&lt;/p&gt;

&lt;p&gt;If you have a number of containers set up like this, here's one
cool thing the journal allows you to do. If you pass &lt;tt&gt;-m&lt;/tt&gt; to
&lt;tt&gt;journalctl&lt;/tt&gt; on the host, it will automatically discover the
journals of all local containers and interleave them on
display. Nifty, eh?&lt;/p&gt;

&lt;p&gt;With systemd 197 you have everything to set up your own socket
activated OS containers on-board. However, there are a couple of
improvements we're likely to add soon: for example, right now even if
all services inside the container exit on idle, the container still
will stay around, and we really should make it exit on idle too, if
all its services exited and no logins are around. As it turns out we
already have much of the infrastructure for this around: we can reuse
the auto-suspend functionality we added for laptops: detecting when a
laptop is idle and suspending it then is a very similar problem to
detecting when a container is idle and shutting it down then.&lt;/p&gt;

&lt;p&gt;Anyway, this blog story is already way too long. I hope I haven't
lost you half-way already with all this talk of virtualization,
sockets, services, different OSes and stuff. I hope this blog story is
a good starting point for setting up powerful highly scalable server
systems. If you want to know more, consult the documentation and drop
by our IRC channel. Thank you!&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footnotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] And BTW, &lt;a href="https://plus.google.com/115547683951727699051/posts/cVrLAJ8HYaP"&gt;this
is another reason&lt;/a&gt; why fast boot times the way systemd offers them
are actually a really good thing on servers, too.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] To make it easy: you need a command line such as &lt;tt&gt;yum
--releasever=19 --nogpg --installroot=/srv/mycontainer/ --disablerepo='*'
--enablerepo=fedora install systemd passwd yum fedora-release vim-minimal &lt;/tt&gt;
to install Fedora, and &lt;tt&gt;debootstrap --arch=amd64 unstable
/srv/mycontainer/&lt;/tt&gt; to install Debian. Also see the bottom of &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-nspawn.html"&gt;systemd-nspawn(1)&lt;/a&gt;.
Also note that auditing is currently broken for containers, and if enabled in
the kernel will cause all kinds of errors in the container. Use
&lt;tt&gt;audit=0&lt;/tt&gt; on the host's kernel command line to turn it off.&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 09 Jan 2013 18:58:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-01-09:/blog/projects/socket-activated-containers.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XIX</title><link>https://0pointer.net/blog/projects/detect-virt.html</link><description>
                
&lt;p&gt; &lt;a href="http://0pointer.de/blog/projects/resources.html"&gt;Happy&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/journalctl.html"&gt;new&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/serial-console.html"&gt;year&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/watchdog.html"&gt;2013!&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/self-documented-boot.html"&gt;Here&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;is&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;now&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;nineteenth&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Detecting Virtualization&lt;/h4&gt;

&lt;p&gt;When we started working on &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt;
we had a closer look on what the various existing init scripts used on
Linux where actually doing. Among other things we noticed that a
number of them where checking explicitly whether they were running in
a virtualized environment (i.e. in a kvm, VMWare, LXC guest or
suchlike) or not. Some init scripts disabled themselves in such
cases&lt;sup&gt;[1]&lt;/sup&gt;, others enabled themselves only in such
cases&lt;sup&gt;[2]&lt;/sup&gt;. Frequently, it would probably have been a better
idea to check for other conditions rather than explicitly checking for
virtualization, but after looking at this from all sides we came to
the conclusion that in many cases explicitly conditionalizing services
based on detected virtualization is a valid thing to do. As a result
we added a new configuration option to systemd that can be used to
conditionalize services this way: &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"&gt;&lt;tt&gt;ConditionVirtualization&lt;/tt&gt;&lt;/a&gt;;
we also added a small tool that can be used in shell scripts to detect
virtualization: &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"&gt;&lt;tt&gt;systemd-detect-virt(1)&lt;/tt&gt;&lt;/a&gt;;
and finally, we added a minimal bus interface to query this from other
applications.&lt;/p&gt;

&lt;p&gt;Detecting whether your code is run inside a virtualized environment
&lt;a href="http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/virt.c#n30"&gt;is
actually not that hard&lt;/a&gt;. Depending on what precisely you want to
detect it's little more than running the CPUID instruction and maybe
checking a few files in &lt;tt&gt;/sys&lt;/tt&gt; and &lt;tt&gt;/proc&lt;/tt&gt;. The
complexity is mostly about knowing the strings to look for, and
keeping this list up-to-date. Currently, the the virtualization
detection code in systemd can detect the following virtualization
systems:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;p&gt;Hardware virtualization (i.e. VMs):&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;qemu&lt;/li&gt;
&lt;li&gt;kvm&lt;/li&gt;
&lt;li&gt;vmware&lt;/li&gt;
&lt;li&gt;microsoft&lt;/li&gt;
&lt;li&gt;oracle&lt;/li&gt;
&lt;li&gt;xen&lt;/li&gt;
&lt;li&gt;bochs&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Same-kernel virtualization (i.e. containers):&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;chroot&lt;/li&gt;
&lt;li&gt;openvz&lt;/li&gt;
&lt;li&gt;lxc&lt;/li&gt;
&lt;li&gt;lxc-libvirt&lt;/li&gt;
&lt;li&gt;&lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;systemd-nspawn&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Let's have a look how one may make use if this functionality.&lt;/p&gt;

&lt;h5&gt;Conditionalizing Units&lt;/h5&gt;

&lt;p&gt;Adding &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"&gt;&lt;tt&gt;ConditionVirtualization&lt;/tt&gt;&lt;/a&gt;
to the &lt;tt&gt;[Unit]&lt;/tt&gt; section of a unit file is enough to
conditionalize it depending on which virtualization is used or whether
one is used at all. Here's an example:&lt;/p&gt;

&lt;pre&gt;[Unit]
Name=My Foobar Service (runs only only on guests)
ConditionVirtualization=yes

[Service]
ExecStart=/usr/bin/foobard&lt;/pre&gt;

&lt;p&gt;Instead of specifiying "&lt;tt&gt;yes&lt;/tt&gt;" or "&lt;tt&gt;no&lt;/tt&gt;" it is possible
to specify the ID of a specific virtualization solution (Example:
"&lt;tt&gt;kvm&lt;/tt&gt;", "&lt;tt&gt;vmware&lt;/tt&gt;", ...), or either
"&lt;tt&gt;container&lt;/tt&gt;" or "&lt;tt&gt;vm&lt;/tt&gt;" to check whether the kernel is
virtualized or the hardware. Also, checks can be prefixed with an exclamation mark ("!") to invert a check. For further details see the &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"&gt;manual page&lt;/a&gt;.&lt;/p&gt;

&lt;h5&gt;In Shell Scripts&lt;/h5&gt;

&lt;p&gt;In shell scripts it is easy to check for virtualized systems with
the &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"&gt;&lt;tt&gt;systemd-detect-virt(1)&lt;/tt&gt;&lt;/a&gt;
tool. Here's an example:&lt;/p&gt;

&lt;pre&gt;
if systemd-detect-virt -q ; then
        echo "Virtualization is used:" `systemd-detect-virt`
else
        echo "No virtualization is used."
fi&lt;/pre&gt;

&lt;p&gt;If this tool is run it will return with an exit code of zero
(success) if a virtualization solution has been found, non-zero
otherwise. It will also print a short identifier of the used
virtualization solution, which can be suppressed with
&lt;tt&gt;-q&lt;/tt&gt;. Also, with the &lt;tt&gt;-c&lt;/tt&gt; and &lt;tt&gt;-v&lt;/tt&gt; parameters it is
possible to detect only kernel or only hardware virtualization
environments. For further details see the &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"&gt;manual
page&lt;/a&gt;.&lt;/p&gt;

&lt;h5&gt;In Programs&lt;/h5&gt;

&lt;p&gt;Whether virtualization is available is also exported on the system bus:&lt;/p&gt;

&lt;pre&gt;$ gdbus call --system --dest org.freedesktop.systemd1 --object-path /org/freedesktop/systemd1 --method org.freedesktop.DBus.Properties.Get org.freedesktop.systemd1.Manager Virtualization
(&amp;lt;'systemd-nspawn'&amp;gt;,)&lt;/pre&gt;

&lt;p&gt;This property contains the empty string if no virtualization is
detected. Note that some container environments cannot be detected
directly from unprivileged code. That's why we expose this property on
the bus rather than providing a library -- the bus implicitly solves
the privilege problem quite nicely.&lt;/p&gt;

&lt;p&gt;Note that all of this will only ever detect and return information
about the "inner-most" virtualization solution. If you stack
virtualization ("We must go deeper!") then these interfaces will
expose the one the code is most directly interfacing
with. Specifically that means that if a container solution is used
inside of a VM, then only the container is generally detected and
returned.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footonotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] For example: running certain device management service in a
container environment that has no access to any physical hardware makes little sense.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] For example: some VM solutions work best if certain
vendor-specific userspace components are running that connect the
guest with the host in some way.&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 08 Jan 2013 21:19:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-01-08:/blog/projects/detect-virt.html</guid><category>projects</category></item><item><title>Third Berlin Open Source Meetup</title><link>https://0pointer.net/blog/projects/berlin-open-source-meetup-3.html</link><description>
                
&lt;p&gt;The Third &lt;a href="https://plus.google.com/u/0/events/c3f3a8go99cn72n8rsosbj7djks"&gt;Berlin Open Source Meetup&lt;/a&gt; is going to take place on Sunday, January 20th. You are invited!&lt;/p&gt;

&lt;p&gt;It's a public event, so everybody is welcome, and please feel free to invite others!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 03 Jan 2013 23:20:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2013-01-03:/blog/projects/berlin-open-source-meetup-3.html</guid><category>projects</category></item><item><title>foss.in Needs Your Funding!</title><link>https://0pointer.net/blog/projects/fossin2012-2.html</link><description>
                
&lt;p&gt;One of the most exciting conferences in the Free Software world, &lt;a href="http://foss.in/"&gt;foss.in&lt;/a&gt; in Bangalore, India has &lt;a href="http://atulchitnis.net/2012/sponsoring-foss-in/"&gt;trouble finding enough
sponsoring&lt;/a&gt; for this year's edition. &lt;a href="http://foss.in/2012/take-one-speakers-at-foss-in2012"&gt;Many speakers from
all around the Free Software world&lt;/a&gt; (including yours truly) have signed up
to present at the event, and the conference would appreciate any corporate
funding they can get!&lt;/p&gt;

&lt;p&gt;&lt;a href="http://atulchitnis.net/2012/sponsoring-foss-in/"&gt;Please check if
your company can help&lt;/a&gt; and &lt;a href="http://foss.in/sponsors"&gt;contact the
organizers&lt;/a&gt; for details!&lt;/p&gt;

&lt;p&gt;See you in Bangalore!&lt;/p&gt;

&lt;p&gt;&lt;a href="http://foss.in"&gt;&lt;img src="http://foss.in/wp-content/uploads/2008/11/speaking_250px.jpg" alt="FOSS.IN" width="250" height="250" border="0" /&gt;&lt;/a&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 15 Nov 2012 13:05:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-11-15:/blog/projects/fossin2012-2.html</guid><category>projects</category></item><item><title>systemd for Developers III</title><link>https://0pointer.net/blog/projects/journal-submit.html</link><description>
                
&lt;p&gt;Here's the third episode of &lt;a href="http://0pointer.de/blog/projects/socket-activation.html"&gt;of my&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/socket-activation2.html"&gt;&lt;i&gt;systemd for Developers&lt;/i&gt;&lt;/a&gt; series.&lt;/p&gt;

&lt;h4&gt;Logging to the Journal&lt;/h4&gt;

&lt;p&gt;In a &lt;a href="http://0pointer.de/blog/projects/journalctl.html"&gt;recent blog
story&lt;/a&gt; intended for administrators I shed some light on how to use
the &lt;a href="http://www.freedesktop.org/software/systemd/man/journalctl.html"&gt;journalctl(1)&lt;/a&gt;
tool to browse and search the systemd journal. In this blog story for developers
I want to explain a little how to get log data into the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt;
Journal in the first place.&lt;/p&gt;

&lt;p&gt;The good thing is that getting log data into the Journal is not
particularly hard, since there's a good chance the Journal already
collects it anyway and writes it to disk. The journal collects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;All data logged via libc &lt;tt&gt;syslog()&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;The data from the kernel logged with &lt;tt&gt;printk()&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;Everything written to STDOUT/STDERR of any system service&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This covers pretty much all of the traditional log output of a
Linux system, including messages from the kernel initialization phase,
the initial RAM disk, the early boot logic, and the main system
runtime.&lt;/p&gt;

&lt;h4&gt;syslog()&lt;/h4&gt;

&lt;p&gt;Let's have a quick look how &lt;tt&gt;syslog()&lt;/tt&gt; is used again. Let's
write a journal message using this call:&lt;/p&gt;

&lt;pre&gt;#include &amp;lt;syslog.h&amp;gt;

int main(int argc, char *argv[]) {
        syslog(LOG_NOTICE, "Hello World");
        return 0;
}&lt;/pre&gt;

&lt;p&gt;This is C code, of course. Many higher level languages provide APIs
that allow writing local syslog messages. Regardless which language
you choose, all data written like this ends up in the Journal.&lt;/p&gt;

&lt;p&gt;Let's have a look how this looks after it has been written into the
journal (this is the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/json"&gt;JSON
output&lt;/a&gt; &lt;tt&gt;journalctl -o json-pretty&lt;/tt&gt; generates):&lt;/p&gt;

&lt;pre&gt;{
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "_TRANSPORT" : "syslog",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
        "SYSLOG_FACILITY" : "1",
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "_PID" : "3068",
        "SYSLOG_IDENTIFIER" : "test-journal-submit",
        "MESSAGE" : "Hello World!",
        "_SOURCE_REALTIME_TIMESTAMP" : "1351126905014938"
}&lt;/pre&gt;

&lt;p&gt;This nicely shows how the Journal implicitly augmented our little
log message with various meta data fields which describe in more
detail the context our message was generated from. For an explanation
of the various fields, please refer to &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html"&gt;systemd.journal-fields(7)&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;printf()&lt;/h4&gt;

&lt;p&gt;If you are writing code that is run as a systemd service, generating journal
messages is even easier:&lt;/p&gt;

&lt;pre&gt;#include &amp;lt;stdio.h&amp;gt;

int main(int argc, char *argv[]) {
        printf("Hello World\n");
        return 0;
}&lt;/pre&gt;

&lt;p&gt;Yupp, that's easy, indeed.&lt;/p&gt;

&lt;p&gt;The printed string in this example is logged at a default log
priority of LOG_INFO&lt;sup&gt;[1]&lt;/sup&gt;. Sometimes it is useful to change
the log priority for such a printed string. When systemd parses
STDOUT/STDERR of a service it will look for priority values enclosed
in &amp;lt; &amp;gt; at the beginning of each line&lt;sup&gt;[2]&lt;/sup&gt;, following the scheme
used by the kernel's &lt;tt&gt;printk()&lt;/tt&gt; which in turn took
inspiration from the BSD syslog network serialization of messages. We
can make use of this systemd feature like this:&lt;/p&gt;

&lt;pre&gt;#include &amp;lt;stdio.h&amp;gt;

#define PREFIX_NOTICE "&amp;lt;5&amp;gt;"

int main(int argc, char *argv[]) {
        printf(PREFIX_NOTICE "Hello World\n");
        return 0;
}&lt;/pre&gt;

&lt;p&gt;Nice! Logging with nothing but &lt;tt&gt;printf()&lt;/tt&gt; but we still get
log priorities!&lt;/p&gt;

&lt;p&gt;This scheme works with any programming language, including, of course, shell:&lt;/p&gt;

&lt;pre&gt;#!/bin/bash

echo "&amp;lt;5&amp;gt;Hellow world"&lt;/pre&gt;

&lt;h4&gt;Native Messages&lt;/h4&gt;

&lt;p&gt;Now, what I explained above is not particularly exciting: the
take-away is pretty much only that things end up in the journal if
they are output using the traditional message printing APIs. Yaaawn!&lt;/p&gt;

&lt;p&gt;Let's make this more interesting, let's look at what the Journal
provides as native APIs for logging, and let's see what its benefits
are. Let's translate our little example into the 1:1 counterpart
using the Journal's logging API &lt;a href="http://0pointer.de/public/systemd-man/sd_journal_print.html"&gt;&lt;tt&gt;sd_journal_print(3)&lt;/tt&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;pre&gt;#include &amp;lt;systemd/sd-journal.h&amp;gt;

int main(int argc, char *argv[]) {
        sd_journal_print(LOG_NOTICE, "Hello World");
        return 0;
}&lt;/pre&gt;

&lt;p&gt;This doesn't look much more interesting than the two examples
above, right? After compiling this with &lt;tt&gt;`pkg-config --cflags
--libs libsystemd-journal`&lt;/tt&gt; appended to the compiler parameters,
let's have a closer look at the JSON representation of the journal
entry this generates:&lt;/p&gt;

&lt;pre&gt; {
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
&lt;b&gt;        "CODE_FUNC" : "main",&lt;/b&gt;
        "_TRANSPORT" : "journal",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
&lt;b&gt;        "CODE_FILE" : "src/journal/test-journal-submit.c",&lt;/b&gt;
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "MESSAGE" : "Hello World",
&lt;b&gt;        "CODE_LINE" : "4",&lt;/b&gt;
        "_PID" : "3516",
        "_SOURCE_REALTIME_TIMESTAMP" : "1351128226954170"
}&lt;/pre&gt;

&lt;p&gt;This looks pretty much the same, right? Almost! I highlighted three new
lines compared to the earlier output. Yes, you guessed it, by using
&lt;tt&gt;sd_journal_print()&lt;/tt&gt; meta information about the generating
source code location is implicitly appended to each
message&lt;sup&gt;[3]&lt;/sup&gt;, which is helpful for a developer to identify
the source of a problem.&lt;/p&gt;

&lt;p&gt;The primary reason for using the Journal's native logging APIs is a
not just the source code location however: it is to allow
passing additional structured log messages from the program into the
journal. This additional log data may the be used to search the
journal for, is available for consumption for other programs, and
might help the administrator to track down issues beyond what is
expressed in the human readable message text. Here's and example how
to do that with &lt;tt&gt;sd_journal_send()&lt;/tt&gt;:&lt;/p&gt;

&lt;pre&gt;#include &amp;lt;systemd/sd-journal.h&amp;gt;
#include &amp;lt;unistd.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;

int main(int argc, char *argv[]) {
        sd_journal_send("MESSAGE=Hello World!",
                        "MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555",
                        "PRIORITY=5",
                        "HOME=%s", getenv("HOME"),
                        "TERM=%s", getenv("TERM"),
                        "PAGE_SIZE=%li", sysconf(_SC_PAGESIZE),
                        "N_CPUS=%li", sysconf(_SC_NPROCESSORS_ONLN),
                        NULL);
        return 0;
}&lt;/pre&gt;

&lt;p&gt;This will write a log message to the journal much like the earlier
examples. However, this times a few additional, structured fields are
attached:&lt;/p&gt;

&lt;pre&gt;{
        "__CURSOR" : "s=ac9e9c423355411d87bf0ba1a9b424e8;i=5930;b=5335e9cf5d954633bb99aefc0ec38c25;m=16544f875b;t=4ccd863cdc4f0;x=896defe53cc1a96a",
        "__REALTIME_TIMESTAMP" : "1351129666274544",
        "__MONOTONIC_TIMESTAMP" : "95903778651",
        "_BOOT_ID" : "5335e9cf5d954633bb99aefc0ec38c25",
        "PRIORITY" : "5",
        "_UID" : "500",
        "_GID" : "500",
        "_AUDIT_SESSION" : "2",
        "_AUDIT_LOGINUID" : "500",
        "_SYSTEMD_CGROUP" : "/user/lennart/2",
        "_SYSTEMD_SESSION" : "2",
        "_SELINUX_CONTEXT" : "unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023",
        "_MACHINE_ID" : "a91663387a90b89f185d4e860000001a",
        "_HOSTNAME" : "epsilon",
        "CODE_FUNC" : "main",
        "_TRANSPORT" : "journal",
        "_COMM" : "test-journal-su",
        "_CMDLINE" : "./test-journal-submit",
        "CODE_FILE" : "src/journal/test-journal-submit.c",
        "_EXE" : "/home/lennart/projects/systemd/test-journal-submit",
        "MESSAGE" : "Hello World!",
        "_PID" : "4049",
        "CODE_LINE" : "6",
&lt;b&gt;        "MESSAGE_ID" : "52fb62f99e2c49d89cfbf9d6de5e3555",&lt;/b&gt;
&lt;b&gt;        "HOME" : "/home/lennart",&lt;/b&gt;
&lt;b&gt;        "TERM" : "xterm-256color",&lt;/b&gt;
&lt;b&gt;        "PAGE_SIZE" : "4096",&lt;/b&gt;
&lt;b&gt;        "N_CPUS" : "4",&lt;/b&gt;
        "_SOURCE_REALTIME_TIMESTAMP" : "1351129666241467"
}&lt;/pre&gt;

&lt;p&gt;Awesome! Our simple example worked! The five meta data fields we
attached to our message appeared in the journal. We used &lt;a href="http://0pointer.de/public/systemd-man/sd_journal_print.html"&gt;&lt;tt&gt;sd_journal_send()&lt;/tt&gt;&lt;/a&gt;
for this which works much like &lt;tt&gt;sd_journal_print()&lt;/tt&gt; but takes a
NULL terminated list of format strings each followed by its
arguments. The format strings must include the field name and a '='
before the values.&lt;/p&gt;

&lt;p&gt;Our little structured message included seven fields. The first three we passed are well-known fields:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;tt&gt;MESSAGE=&lt;/tt&gt; is the actual human readable message part of the structured message.&lt;/li&gt;
&lt;li&gt;&lt;tt&gt;PRIORITY=&lt;/tt&gt; is the numeric message priority value as known from BSD syslog formatted as an integer string.&lt;/li&gt;
&lt;li&gt;&lt;tt&gt;MESSAGE_ID=&lt;/tt&gt; is a 128bit ID that identifies our specific
message call, formatted as hexadecimal string. We randomly generated
this string with &lt;tt&gt;journalctl --new-id128&lt;/tt&gt;. This can be used by
applications to track down all occasions of this specific
message. The 128bit can be a UUID, but this is not a requirement or enforced.&lt;/li&gt;&lt;/ol&gt;

&lt;p&gt;Applications may relatively freely define additional fields as they
see fit (we defined four pretty arbitrary ones in our example). A
complete list of the currently well-known fields is available in &lt;a href="http://0pointer.de/public/systemd-man/systemd.journal-fields.html"&gt;systemd.journal-fields(7)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let's see how the message ID helps us finding this message and all
its occasions in the journal:&lt;/p&gt;

&lt;pre&gt;
$ journalctl MESSAGE_ID=52fb62f99e2c49d89cfbf9d6de5e3555
-- Logs begin at Thu, 2012-10-18 04:07:03 CEST, end at Thu, 2012-10-25 04:48:21 CEST. --
Oct 25 03:47:46 epsilon test-journal-se[4049]: Hello World!
Oct 25 04:40:36 epsilon test-journal-se[4480]: Hello World!
&lt;/pre&gt;

&lt;p&gt;Seems I already invoked this example tool twice!&lt;/p&gt;

&lt;p&gt;Many messages systemd itself generates &lt;a href="http://cgit.freedesktop.org/systemd/systemd/plain/src/systemd/sd-messages.h"&gt;have
message IDs&lt;/a&gt;. This is useful for example, to find all occasions
where a program dumped core (&lt;tt&gt;journalctl
MESSAGE_ID=fc2e22bc6ee647b6b90729ab34a250b1&lt;/tt&gt;), or when a user
logged in (&lt;tt&gt;journalctl
MESSAGE_ID=8d45620c1a4348dbb17410da57c60c66&lt;/tt&gt;). If your application
generates a message that might be interesting to recognize in the
journal stream later on, we recommend attaching such a message ID to
it. You can easily allocate a new one for your message with &lt;tt&gt;journalctl
--new-id128&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;This example shows how we can use the Journal's native APIs to
generate structured, recognizable messages. You can do much more than
this with the C API. For example, you may store binary data in journal
fields as well, which is useful to attach coredumps or hard disk SMART
states to events where this applies. In order to make this blog story
not longer than it already is we'll not go into detail about how to do
this, an I ask you to check out &lt;a href="http://0pointer.de/public/systemd-man/sd_journal_print.html"&gt;&lt;tt&gt;sd_journal_send(3)&lt;/tt&gt;&lt;/a&gt;
for further information on this.&lt;/p&gt;

&lt;h4&gt;Python&lt;/h4&gt;

&lt;p&gt;The examples above focus on C. Structured logging to the Journal is
also available from other languages. Along with systemd itself we ship
bindings for Python. Here's an example how to use this:&lt;/p&gt;

&lt;pre&gt;from systemd import journal
journal.send('Hello world')
journal.send('Hello, again, world', FIELD2='Greetings!', FIELD3='Guten tag')&lt;/pre&gt;

&lt;p&gt;Other binding exist for &lt;a href="http://fourkitchens.com/blog/2012/09/25/nodejs-extension-systemd"&gt;Node.js&lt;/a&gt;,
&lt;a href="https://github.com/systemd/php-systemd"&gt;PHP&lt;/a&gt;, &lt;a href="https://github.com/philips/luvit-systemd-journal"&gt;Lua&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Portability&lt;/h4&gt;

&lt;p&gt;Generating structured data is a very useful feature for services to
make their logs more accessible both for administrators and other
programs. In addition to the &lt;i&gt;implicit&lt;/i&gt; structure the Journal
adds to all logged messages it is highly beneficial if the various
components of our stack also provide &lt;i&gt;explicit&lt;/i&gt; structure
in their messages, coming from within the processes themselves.&lt;/p&gt;

&lt;p&gt;Porting an existing program to the Journal's logging APIs comes
with one pitfall though: the Journal is Linux-only. If non-Linux
portability matters for your project it's a good idea to provide an
alternative log output, and make it selectable at compile-time.&lt;/p&gt;

&lt;p&gt;Regardless which way to log you choose, in all cases we'll forward
the message to a classic syslog daemon running side-by-side with the
Journal, if there is one. However, much of the structured meta data of
the message is not forwarded since the classic syslog protocol simply
has no generally accepted way to encode this and we shouldn't attempt
to serialize meta data into classic syslog messages which might turn
&lt;tt&gt;/var/log/messages&lt;/tt&gt; into an unreadable dump of machine
data. Anyway, to summarize this: regardless if you log with
&lt;tt&gt;syslog()&lt;/tt&gt;, &lt;tt&gt;printf()&lt;/tt&gt;, &lt;tt&gt;sd_journal_print()&lt;/tt&gt; or
&lt;tt&gt;sd_journal_send()&lt;/tt&gt;, the message will be stored and indexed by
the journal and it will also be forwarded to classic syslog.&lt;/p&gt;

&lt;p&gt;And that's it for today. In a follow-up episode we'll focus on
retrieving messages from the Journal using the C API, possibly
filtering for a specific subset of messages. Later on, I hope to give
a real-life example how to port an existing service to the Journal's
logging APIs. Stay tuned!&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footnotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] This can be changed with the &lt;tt&gt;SyslogLevel=&lt;/tt&gt; service
setting. See &lt;a href="http://0pointer.de/public/systemd-man/systemd.exec.html"&gt;systemd.exec(5)&lt;/a&gt;
for details.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] Interpretation of the &amp;lt; &amp;gt; prefixes of logged lines
may be disabled with the &lt;tt&gt;SyslogLevelPrefix=&lt;/tt&gt; service setting. See &lt;a href="http://0pointer.de/public/systemd-man/systemd.exec.html"&gt;systemd.exec(5)&lt;/a&gt;
for details.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[3] Appending the code location to the log messages can be
turned off at compile time by defining
-DSD_JOURNAL_SUPPRESS_LOCATION.&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 25 Oct 2012 04:29:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-10-25:/blog/projects/journal-submit.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XVIII</title><link>https://0pointer.net/blog/projects/resources.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/journalctl.html"&gt;Hot
on&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/serial-console.html"&gt;the
heels&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/watchdog.html"&gt;of
the &lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/self-documented-boot.html"&gt;previous
story&lt;/a&gt;, &lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;here's&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;now&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;eighteenth&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Managing Resources&lt;/h4&gt;

&lt;p&gt;An important facet of modern computing is resource management: if
you run more than one program on a single machine you want to assign
the available resources to them enforcing particular policies. This is
particularly crucial on smaller, embedded or mobile systems where the
scarce resources are the main constraint, but equally for large
installations such as cloud setups, where resources are plenty, but
the number of programs/services/containers on a single node is
drastically higher.&lt;/p&gt;

&lt;p&gt;Traditionally, on Linux only one policy was really available: all
processes got about the same CPU time, or IO bandwith, modulated a bit
via the process &lt;i&gt;nice&lt;/i&gt; value. This approach is very simple and
covered the various uses for Linux quite well for a long
time. However, it has drawbacks: not all all processes deserve to be
even, and services involving lots of processes (think: Apache with a
lot of CGI workers) this way would get more resources than services
whith very few (think: syslog).&lt;/p&gt;

&lt;p&gt;When thinking about service management for systemd, we quickly
realized that resource management must be core functionality of it. In
a modern world -- regardless if server or embedded -- controlling CPU,
Memory, and IO resources of the various services cannot be an
afterthought, but must be built-in as first-class service settings. And
it must be per-service and not per-process as the traditional nice
values or &lt;a href="http://linux.die.net/man/2/setrlimit"&gt;POSIX
Resource Limits&lt;/a&gt; were.&lt;/p&gt;

&lt;p&gt;In this story I want to shed some light on what you can do to
enforce resource policies on systemd services. Resource Management in
one way or another has been available in systemd for a while already,
so it's really time we introduce this to the broader audience.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/cgroups-vs-cgroups.html"&gt;In an
earlier blog post&lt;/a&gt; I highlighted the difference between Linux
Control Croups (cgroups) as a labelled, hierarchal grouping mechanism,
and Linux cgroups as a resource controlling subsystem. While systemd
requires the former, the latter is optional. And this optional latter
part is now what we can make use of to manage per-service
resources. (At this points, it's probably a good idea to read up on &lt;a href="https://en.wikipedia.org/wiki/Cgroups"&gt;cgroups&lt;/a&gt; before
reading on, to get at least a basic idea what they are and what they
accomplish. Even thought the explanations below will be pretty
high-level, it all makes a lot more sense if you grok the background a
bit.)&lt;/p&gt;

&lt;p&gt;The main Linux cgroup controllers for resource management are &lt;a href="http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt"&gt;cpu&lt;/a&gt;,
&lt;a href="http://www.kernel.org/doc/Documentation/cgroups/memory.txt"&gt;memory&lt;/a&gt;
and &lt;a href="http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt"&gt;blkio&lt;/a&gt;. To
make use of these, they need to be enabled in the kernel, which many
distributions (including Fedora) do. systemd exposes a couple of high-level service
settings to make use of these controllers without requiring too much
knowledge of the gory kernel details. &lt;/p&gt;

&lt;h4&gt;Managing CPU&lt;/h4&gt;

&lt;p&gt;As a nice default, if the &lt;tt&gt;cpu&lt;/tt&gt; controller is enabled in the
kernel, systemd will create a cgroup for each service when starting
it. Without any further configuration this already has one nice
effect: on a systemd system every system service will get an even
amount of CPU, regardless how many processes it consists off. Or in
other words: on your web server MySQL will get the roughly same amount
of CPU as Apache, even if the latter consists a 1000 CGI script
processes, but the former only of a few worker tasks. (This behavior can
be turned off, see &lt;a href="http://0pointer.de/public/systemd-man/systemd.conf.html"&gt;DefaultControllers=&lt;/a&gt;
in &lt;tt&gt;/etc/systemd/system.conf&lt;/tt&gt;.)&lt;/p&gt;

&lt;p&gt;On top of this default, it is possible to explicitly configure the
CPU shares a service gets with the &lt;a href="http://0pointer.de/public/systemd-man/systemd.exec.html"&gt;CPUShares=&lt;/a&gt;
setting. The default value is 1024, if you increase this number you'll
assign more CPU to a service than an unaltered one at 1024, if you decrease it, less.&lt;/p&gt;

&lt;p&gt;Let's see in more detail, how we can make use of this. Let's say we
want to assign Apache 1500 CPU shares instead of the default of
1024. For that, let's create a new administrator service file for
Apache in &lt;tt&gt;/etc/systemd/system/httpd.service&lt;/tt&gt;, overriding the
vendor supplied one in &lt;tt&gt;/usr/lib/systemd/system/httpd.service&lt;/tt&gt;,
but let's change the &lt;tt&gt;CPUShares=&lt;/tt&gt; parameter:&lt;/p&gt;

&lt;pre&gt;.include /usr/lib/systemd/system/httpd.service

[Service]
CPUShares=1500&lt;/pre&gt;

&lt;p&gt;The first line will pull in the vendor service file. Now, lets's
reload systemd's configuration and restart Apache so that the new
service file is taken into account:&lt;/p&gt;

&lt;pre&gt;systemctl daemon-reload
systemctl restart httpd.service&lt;/pre&gt;

&lt;p&gt;And yeah, that's already it, you are done!&lt;/p&gt;

&lt;p&gt;(Note that setting &lt;tt&gt;CPUShares=&lt;/tt&gt; in a unit file will cause the
specific service to get its own cgroup in the &lt;tt&gt;cpu&lt;/tt&gt; hierarchy,
even if &lt;tt&gt;cpu&lt;/tt&gt; is not included in
&lt;tt&gt;DefaultControllers=&lt;/tt&gt;.)&lt;/p&gt;

&lt;h4&gt;Analyzing Resource usage&lt;/h4&gt;

&lt;p&gt;Of course, changing resource assignments without actually
understanding the resource usage of the services in questions is like
blind flying. To help you understand the resource usage of all
services, we created the tool &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-cgtop.html"&gt;systemd-cgtop&lt;/a&gt;,
that will enumerate all cgroups of the system, determine their
resource usage (CPU, Memory, and IO) and present them in a &lt;a href="http://linux.die.net/man/1/top"&gt;top&lt;/a&gt;-like fashion. Building
on the fact that systemd services are managed in cgroups this tool
hence can present to you for services what top shows you for
processes.&lt;/p&gt;

&lt;p&gt;Unfortunately, by default &lt;tt&gt;cgtop&lt;/tt&gt; will only be able to chart
CPU usage per-service for you, IO and Memory are only tracked as total
for the entire machine. The reason for this is simply that by default
there are no per-service cgroups in the &lt;tt&gt;blkio&lt;/tt&gt; and
&lt;tt&gt;memory&lt;/tt&gt; controller hierarchies but that's what we need to
determine the resource usage. The best way to get this data for all
services is to simply add the &lt;tt&gt;memory&lt;/tt&gt; and &lt;tt&gt;blkio&lt;/tt&gt;
controllers to the aforementioned &lt;tt&gt;DefaultControllers=&lt;/tt&gt; setting
in &lt;tt&gt;system.conf&lt;/tt&gt;.&lt;/p&gt;

&lt;h4&gt;Managing Memory&lt;/h4&gt;

&lt;p&gt;To enforce limits on memory systemd provides the
&lt;tt&gt;MemoryLimit=&lt;/tt&gt;, and &lt;tt&gt;MemorySoftLimit=&lt;/tt&gt; settings for
services, summing up the memory of all its processes. These settings
take memory sizes in bytes that are the total memory limit for the
service. This setting understands the usual K, M, G, T suffixes for
Kilobyte, Megabyte, Gigabyte, Terabyte (to the base of 1024).&lt;/p&gt;

&lt;pre&gt;.include /usr/lib/systemd/system/httpd.service

[Service]
MemoryLimit=1G&lt;/pre&gt;

&lt;p&gt;(Analogue to &lt;tt&gt;CPUShares=&lt;/tt&gt; above setting this option will cause
the service to get its own cgroup in the &lt;tt&gt;memory&lt;/tt&gt; cgroup
hierarchy.)&lt;/p&gt;

&lt;h4&gt;Managing Block IO&lt;/h4&gt;

&lt;p&gt;To control block IO multiple settings are available. First of all
&lt;tt&gt;BlockIOWeight=&lt;/tt&gt; may be used which assigns an IO &lt;i&gt;weight&lt;/i&gt;
to a specific service. In behaviour the &lt;i&gt;weight&lt;/i&gt; concept is not
unlike the &lt;i&gt;shares&lt;/i&gt; concept of CPU resource control (see
above). However, the default weight is 1000, and the valid range is
from 10 to 1000:&lt;/p&gt;

&lt;pre&gt;.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=500&lt;/pre&gt;

&lt;p&gt;Optionally, per-device weights can be specified:&lt;/p&gt;

&lt;pre&gt;.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=/dev/disk/by-id/ata-SAMSUNG_MMCRE28G8MXP-0VBL1_DC06K01009SE009B5252 750&lt;/pre&gt;

&lt;p&gt;Instead of specifiying an actual device node you also specify any
path in the file system:&lt;/p&gt;

&lt;pre&gt;.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOWeight=/home/lennart 750&lt;/pre&gt;

&lt;p&gt;If the specified path does not refer to a device node systemd will
determine the block device &lt;tt&gt;/home/lennart&lt;/tt&gt; is on, and assign
the bandwith weight to it.&lt;/p&gt;

&lt;p&gt;You can even add per-device and normal lines at the same time,
which will set the per-device weight for the device, and the other
value as default for everything else.&lt;/p&gt;

&lt;p&gt;Alternatively one may control explicit bandwith limits with the
&lt;tt&gt;BlockIOReadBandwidth=&lt;/tt&gt; and &lt;tt&gt;BlockIOWriteBandwidth=&lt;/tt&gt;
settings. These settings take a pair of device node and bandwith rate
(in bytes per second) or of a file path and bandwith rate:&lt;/p&gt;

&lt;pre&gt;.include /usr/lib/systemd/system/httpd.service

[Service]
BlockIOReadBandwith=/var/log 5M&lt;/pre&gt;

&lt;p&gt;This sets the maximum read bandwith on the block device backing
&lt;tt&gt;/var/log&lt;/tt&gt; to 5Mb/s.&lt;/p&gt;

&lt;p&gt;(Analogue to &lt;tt&gt;CPUShares=&lt;/tt&gt; and &lt;tt&gt;MemoryLimit=&lt;/tt&gt; using
any of these three settings will result in the service getting its own
cgroup in the &lt;tt&gt;blkio&lt;/tt&gt; hierarchy.)&lt;/p&gt;

&lt;h4&gt;Managing Other Resource Parameters&lt;/h4&gt;

&lt;p&gt;The options described above cover only a small subset of the
available controls the various Linux control group controllers
expose. We picked these and added high-level options for them since we
assumed that these are the most relevant for most folks, and that they
really needed a nice interface that can handle units properly and
resolve block device names.&lt;/p&gt;

&lt;p&gt;In many cases the options explained above might not be sufficient
for your usecase, but a low-level kernel cgroup setting might help. It
is easy to make use of these options from systemd unit files, without
having them covered with a high-level setting. For example, sometimes
it might be useful to set the &lt;i&gt;swappiness&lt;/i&gt; of a service. The
kernel makes this controllable via the &lt;tt&gt;memory.swappiness&lt;/tt&gt;
cgroup attribute, but systemd does not expose it as a high-level
option. Here's how you use it nonetheless, using the low-level
&lt;tt&gt;ControlGroupAttribute=&lt;/tt&gt; setting:&lt;/p&gt;

&lt;pre&gt;.include /usr/lib/systemd/system/httpd.service

[Service]
ControlGroupAttribute=memory.swappiness 70&lt;/pre&gt;

&lt;p&gt;(Analogue to the other cases this too causes the service to be
added to the memory hierarchy.)&lt;/p&gt;

&lt;p&gt;Later on we might add more high-level controls for the
various cgroup attributes. In fact, please ping us if you frequently
use one and believe it deserves more focus. We'll consider adding a
high-level option for it then. (Even better: send us a patch!)&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Disclaimer:&lt;/i&gt; note that making use of the various resource
controllers does have a runtime impact on the system. Enforcing
resource limits comes at a price. If you do use them, certain
operations do get slower. Especially the &lt;tt&gt;memory&lt;/tt&gt; controller
has (used to have?) a bad reputation to come at a performance
cost.&lt;/p&gt;

&lt;p&gt;For more details on all of this, please have a look at the
documenation of the &lt;a href="http://0pointer.de/public/systemd-man/systemd.exec.html"&gt;mentioned
unit settings&lt;/a&gt;, and of the &lt;a href="http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt"&gt;cpu&lt;/a&gt;,
&lt;a href="http://www.kernel.org/doc/Documentation/cgroups/memory.txt"&gt;memory&lt;/a&gt;
and &lt;a href="http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt"&gt;blkio&lt;/a&gt;
controllers.&lt;/p&gt;

&lt;p&gt;And that's it for now. Of course, this blog story only focussed on
the per-&lt;i&gt;service&lt;/i&gt; resource settings. On top this, you can also
set the more traditional, well-known per-&lt;i&gt;process&lt;/i&gt; resource
settings, which will then be inherited by the various subprocesses,
but always only be enforced per-process. More specifically that's
&lt;tt&gt;IOSchedulingClass=&lt;/tt&gt;, &lt;tt&gt;IOSchedulingPriority=&lt;/tt&gt;,
&lt;tt&gt;CPUSchedulingPolicy=&lt;/tt&gt;, &lt;tt&gt;CPUSchedulingPriority=&lt;/tt&gt;,
&lt;tt&gt;CPUAffinity=&lt;/tt&gt;, &lt;tt&gt;LimitCPU=&lt;/tt&gt; and related. These do not
make use of cgroup controllers and have a much lower performance
cost. We might cover those in a later article in more detail.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 24 Oct 2012 04:11:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-10-24:/blog/projects/resources.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XVII</title><link>https://0pointer.net/blog/projects/journalctl.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/serial-console.html"&gt;It's&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/watchdog.html"&gt;that&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/self-documented-boot.html"&gt;time again&lt;/a&gt;,
&lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;here's&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;now&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;seventeenth&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Using the Journal&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;A
while back I already&lt;/a&gt; posted a blog story introducing some
functionality of the journal, and how it is exposed in
&lt;tt&gt;systemctl&lt;/tt&gt;. In this episode I want to explain a few more uses
of the journal, and how you can make it work for you.&lt;/p&gt;

&lt;p&gt;If you are wondering what the journal is, here's an explanation in
a few words to get you up to speed: the journal is a component of &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt;,
that captures Syslog messages, Kernel log messages, initial RAM disk
and early boot messages as well as messages written to STDOUT/STDERR
of all services, indexes them and makes this available to the user. It
can be used in parallel, or in place of a traditional syslog daemon,
such as rsyslog or syslog-ng. For more information, see &lt;a href="http://0pointer.de/blog/projects/the-journal.html"&gt;the initial
announcement&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The journal has been part of Fedora since F17. With Fedora 18 it
now has grown into a reliable, powerful tool to handle your logs. Note
however, that on F17 and F18 the journal is configured by default to
store logs only in a small ring-buffer in &lt;tt&gt;/run/log/journal&lt;/tt&gt;,
i.e. not persistent. This of course limits its usefulness quite
drastically but is sufficient to show a bit of recent log history in
&lt;tt&gt;systemctl status&lt;/tt&gt;. For Fedora 19, we plan to change this, and
enable persistent logging by default. Then, journal files will be
stored in &lt;tt&gt;/var/log/journal&lt;/tt&gt; and can grow much larger, thus
making the journal a lot more useful.&lt;/p&gt;

&lt;h4&gt;Enabling Persistency&lt;/h4&gt;

&lt;p&gt;In the meantime, on F17 or F18, you can enable journald's persistent storage manually:&lt;/p&gt;

&lt;pre&gt;# mkdir -p /var/log/journal&lt;/pre&gt;

&lt;p&gt;After that, it's a good idea to reboot, to get some useful
structured data into your journal to play with. Oh, and since you have
the journal now, you don't need syslog anymore (unless having
&lt;tt&gt;/var/log/messages&lt;/tt&gt; as text file is a necessity for you.), so
you can choose to deinstall rsyslog:&lt;/p&gt;

&lt;pre&gt;# yum remove rsyslog&lt;/pre&gt;

&lt;h4&gt;Basics&lt;/h4&gt;

&lt;p&gt;Now we are ready to go. The following text shows a lot of features
of systemd 195 as it will be included in Fedora 18&lt;sup&gt;[1]&lt;/sup&gt;, so
if your F17 can't do the tricks you see, please wait for F18. First,
let's start with some basics. To access the logs of the journal use
the &lt;a href="http://www.freedesktop.org/software/systemd/man/journalctl.html"&gt;journalctl(1)&lt;/a&gt;
tool. To have a first look at the logs, just type in:&lt;/p&gt;

&lt;pre&gt;# journalctl&lt;/pre&gt;

&lt;p&gt;If you run this as root you will see all logs generated on the
system, from system components the same way as for logged in
users. The output you will get looks like a pixel-perfect copy of the
traditional &lt;tt&gt;/var/log/messages&lt;/tt&gt; format, but actually has a
couple of improvements over it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lines of error priority (and higher) will be highlighted red.&lt;/li&gt;
&lt;li&gt;Lines of notice/warning priority will be highlighted bold.&lt;/li&gt;
&lt;li&gt;The timestamps are converted into your local time-zone.&lt;/li&gt;
&lt;li&gt;The output is auto-paged with your pager of choice (defaults to &lt;tt&gt;less&lt;/tt&gt;).&lt;/li&gt;
&lt;li&gt;This will show &lt;i&gt;all&lt;/i&gt; available data, including rotated logs.&lt;/li&gt;
&lt;li&gt;Between the output of each boot we'll add a line clarifying that a new boot begins now.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that in this blog story I will not actually show you any of
the output this generates, I cut that out for brevity -- and to give
you a reason to try it out yourself with a current image for F18's
development version with systemd 195. But I do hope you get the idea
anyway.&lt;/p&gt;

&lt;h4&gt;Access Control&lt;/h4&gt;

&lt;p&gt;Browsing logs this way is already pretty nice. But requiring to be
root sucks of course, even administrators tend to do most of their
work as unprivileged users these days. By default, Journal users can
only watch their own logs, unless they are root or in the &lt;tt&gt;adm&lt;/tt&gt;
group. To make watching system logs more fun, let's add ourselves to
&lt;tt&gt;adm&lt;/tt&gt;:&lt;/p&gt;

&lt;pre&gt;# usermod -a -G adm lennart&lt;/pre&gt;

&lt;p&gt;After logging out and back in as &lt;tt&gt;lennart&lt;/tt&gt; I know have access
to the full journal of the system and all users:&lt;/p&gt;

&lt;pre&gt;$ journalctl&lt;/pre&gt;

&lt;h4&gt;Live View&lt;/h4&gt;

&lt;p&gt;If invoked without parameters journalctl will show me the current
log database. Sometimes one needs to watch logs as they grow, where
one previously used &lt;tt&gt;tail -f /var/log/messages&lt;/tt&gt;:&lt;/p&gt;

&lt;pre&gt;$ journalctl -f&lt;/pre&gt;

&lt;p&gt;Yes, this does exactly what you expect it to do: it will show you
the last ten logs lines and then wait for changes and show them as
they take place.&lt;/p&gt;

&lt;h4&gt;Basic Filtering&lt;/h4&gt;

&lt;p&gt;When invoking &lt;tt&gt;journalctl&lt;/tt&gt; without parameters you'll see the
whole set of logs, beginning with the oldest message stored. That of
course, can be a lot of data. Much more useful is just viewing the
logs of the current boot:&lt;/p&gt;

&lt;pre&gt;$ journalctl -b&lt;/pre&gt;

&lt;p&gt;This will show you only the logs of the current boot, with all the
aforementioned gimmicks mentioned. But sometimes even this is way too
much data to process. So what about just listing all the real issues
to care about: all messages of priority levels ERROR and worse, from
the current boot:&lt;/p&gt;

&lt;pre&gt;$ journalctl -b -p err&lt;/pre&gt;

&lt;p&gt;If you reboot only seldom the &lt;tt&gt;-b&lt;/tt&gt; makes little sense,
filtering based on time is much more useful:&lt;/p&gt;

&lt;pre&gt;$ journalctl --since=yesterday&lt;/pre&gt;

&lt;p&gt;And there you go, all log messages from the day before at 00:00 in
the morning until right now. Awesome! Of course, we can combine this with
&lt;tt&gt;-p err&lt;/tt&gt; or a similar match. But humm, we are looking for
something that happened on the 15th of October, or was it the 16th?&lt;/p&gt;

&lt;pre&gt;$ journalctl --since=2012-10-15 --until="2011-10-16 23:59:59"&lt;/pre&gt;

&lt;p&gt;Yupp, there we go, we found what we were looking for. But humm, I
noticed that some CGI script in Apache was acting up earlier today,
let's see what Apache logged at that time:&lt;/p&gt;

&lt;pre&gt;$ journalctl -u httpd --since=00:00 --until=9:30&lt;/pre&gt;

&lt;p&gt;Oh, yeah, there we found it. But hey, wasn't there an issue with
that disk &lt;tt&gt;/dev/sdc&lt;/tt&gt;? Let's figure out what was going on there:&lt;/p&gt;

&lt;pre&gt;$ journalctl /dev/sdc&lt;/pre&gt;

&lt;p&gt;OMG, a disk error!&lt;sup&gt;[2]&lt;/sup&gt; Hmm, let's quickly replace the
disk before we lose data. Done! Next! -- Hmm, didn't I see that the vpnc binary made a booboo? Let's
check for that:&lt;/p&gt;

&lt;pre&gt;$ journalctl /usr/sbin/vpnc&lt;/pre&gt;

&lt;p&gt;Hmm, I don't get this, this seems to be some weird interaction with
&lt;tt&gt;dhclient&lt;/tt&gt;, let's see both outputs, interleaved:&lt;/p&gt;

&lt;pre&gt;$ journalctl /usr/sbin/vpnc /usr/sbin/dhclient&lt;/pre&gt;

&lt;p&gt;That did it! Found it!&lt;/p&gt;

&lt;h4&gt;Advanced Filtering&lt;/h4&gt;

&lt;p&gt;Whew! That was awesome already, but let's turn this up a
notch. Internally systemd stores each log entry with a set of
&lt;i&gt;implicit&lt;/i&gt; meta data. This meta data looks a lot like an
environment block, but actually is a bit more powerful: values can
take binary, large values (though this is the exception, and usually
they just contain UTF-8), and fields can have multiple values assigned
(an exception too, usually they only have one value). This implicit
meta data is collected for each and every log message, without user
intervention. The data will be there, and wait to be used by
you. Let's see how this looks:&lt;/p&gt;

&lt;pre&gt;$ journalctl -o verbose -n
[...]
Tue, 2012-10-23 23:51:38 CEST [s=ac9e9c423355411d87bf0ba1a9b424e8;i=4301;b=5335e9cf5d954633bb99aefc0ec38c25;m=882ee28d2;t=4ccc0f98326e6;x=f21e8b1b0994d7ee]
        PRIORITY=6
        SYSLOG_FACILITY=3
        _MACHINE_ID=a91663387a90b89f185d4e860000001a
        _HOSTNAME=epsilon
        _TRANSPORT=syslog
        SYSLOG_IDENTIFIER=avahi-daemon
        _COMM=avahi-daemon
        _EXE=/usr/sbin/avahi-daemon
        _SYSTEMD_CGROUP=/system/avahi-daemon.service
        _SYSTEMD_UNIT=avahi-daemon.service
        _SELINUX_CONTEXT=system_u:system_r:avahi_t:s0
        _UID=70
        _GID=70
        _CMDLINE=avahi-daemon: registering [epsilon.local]
        MESSAGE=Joining mDNS multicast group on interface wlan0.IPv4 with address 172.31.0.53.
        _BOOT_ID=5335e9cf5d954633bb99aefc0ec38c25
        _PID=27937
        SYSLOG_PID=27937
        _SOURCE_REALTIME_TIMESTAMP=1351029098747042
&lt;/pre&gt;

&lt;p&gt;(I cut out a lot of noise here, I don't want to make this story
overly long. &lt;tt&gt;-n&lt;/tt&gt; without parameter shows you the last 10 log
entries, but I cut out all but the last.)&lt;/p&gt;

&lt;p&gt;With the &lt;tt&gt;-o verbose&lt;/tt&gt; switch we enabled verbose
output. Instead of showing a pixel-perfect copy of classic
&lt;tt&gt;/var/log/messages&lt;/tt&gt; that only includes a minimimal subset of
what is available we now see all the gory details the journal has
about each entry. But it's highly interesting: there is user credential
information, SELinux bits, machine information and more. For a full
list of common, well-known fields, see &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html"&gt;the
man page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now, as it turns out the journal database is indexed by &lt;i&gt;all&lt;/i&gt;
of these fields, out-of-the-box! Let's try this out:&lt;/p&gt;

&lt;pre&gt;$ journalctl _UID=70&lt;/pre&gt;

&lt;p&gt;And there you go, this will show all log messages logged from Linux
user ID 70. As it turns out one can easily combine these matches:&lt;/p&gt;

&lt;pre&gt;$ journalctl _UID=70 _UID=71&lt;/pre&gt;

&lt;p&gt;Specifying two matches for the same field will result in a logical
OR combination of the matches. All entries matching either will be
shown, i.e. all messages from either UID 70 or 71.&lt;/p&gt;

&lt;pre&gt;$ journalctl _HOSTNAME=epsilon _COMM=avahi-daemon&lt;/pre&gt;

&lt;p&gt;You guessed it, if you specify two matches for different field
names, they will be combined with a logical AND. All entries matching
both will be shown now, meaning that all messages from processes named
&lt;tt&gt;avahi-daemon&lt;/tt&gt; &lt;i&gt;and&lt;/i&gt; host &lt;tt&gt;epsilon&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;But of course, that's
not fancy enough for us. We are computer nerds after all, we live off
logical expressions. We must go deeper!&lt;/p&gt;

&lt;pre&gt;$ journalctl _HOSTNAME=theta _UID=70 + _HOSTNAME=epsilon _COMM=avahi-daemon&lt;/pre&gt;

&lt;p&gt;The + is an explicit OR you can use in addition to the implied OR when
you match the same field twice. The line above hence means: show me
everything from host &lt;tt&gt;theta&lt;/tt&gt; with UID 70, or of host
&lt;tt&gt;epsilon&lt;/tt&gt; with a process name of &lt;tt&gt;avahi-daemon&lt;/tt&gt;.&lt;/p&gt;

&lt;h4&gt;And now, it becomes magic!&lt;/h4&gt;

&lt;p&gt;That was already pretty cool, right? Righ! But heck, who can
remember all those values a field can take in the journal, I mean,
seriously, who has thaaaat kind of photographic memory? Well, the
journal has:&lt;/p&gt;

&lt;pre&gt;$ journalctl -F _SYSTEMD_UNIT&lt;/pre&gt;

&lt;p&gt;This will show us all values the field _SYSTEMD_UNIT takes in the
database, or in other words: the names of all systemd services which
ever logged into the journal. This makes it super-easy to build nice
matches. But wait, turns out this all is actually hooked up with shell
completion on bash! This gets even more awesome: as you type your
match expression you will get a list of well-known field names, and of
the values they can take! Let's figure out how to filter for SELinux
labels again. We remember the field name was something with SELINUX in
it, let's try that:&lt;/p&gt;

&lt;pre&gt;$ journalctl _SE&lt;b&gt;&amp;lt;TAB&amp;gt;&lt;/b&gt;&lt;/pre&gt;

&lt;p&gt;And yupp, it's immediately completed:&lt;/p&gt;

&lt;pre&gt;$ journalctl _SELINUX_CONTEXT=&lt;/pre&gt;

&lt;p&gt;Cool, but what's the label again we wanted to match for?&lt;/p&gt;

&lt;pre&gt;$ journalctl _SELINUX_CONTEXT=&lt;b&gt;&amp;lt;TAB&amp;gt;&amp;lt;TAB&amp;gt;&lt;/b&gt;
kernel                                                       system_u:system_r:local_login_t:s0-s0:c0.c1023               system_u:system_r:udev_t:s0-s0:c0.c1023
system_u:system_r:accountsd_t:s0                             system_u:system_r:lvm_t:s0                                   system_u:system_r:virtd_t:s0-s0:c0.c1023
system_u:system_r:avahi_t:s0                                 system_u:system_r:modemmanager_t:s0-s0:c0.c1023              system_u:system_r:vpnc_t:s0
system_u:system_r:bluetooth_t:s0                             system_u:system_r:NetworkManager_t:s0                        system_u:system_r:xdm_t:s0-s0:c0.c1023
system_u:system_r:chkpwd_t:s0-s0:c0.c1023                    system_u:system_r:policykit_t:s0                             unconfined_u:system_r:rpm_t:s0-s0:c0.c1023
system_u:system_r:chronyd_t:s0                               system_u:system_r:rtkit_daemon_t:s0                          unconfined_u:system_r:unconfined_t:s0-s0:c0.c1023
system_u:system_r:crond_t:s0-s0:c0.c1023                     system_u:system_r:syslogd_t:s0                               unconfined_u:system_r:useradd_t:s0-s0:c0.c1023
system_u:system_r:devicekit_disk_t:s0                        system_u:system_r:system_cronjob_t:s0-s0:c0.c1023            unconfined_u:unconfined_r:unconfined_dbusd_t:s0-s0:c0.c1023
system_u:system_r:dhcpc_t:s0                                 system_u:system_r:system_dbusd_t:s0-s0:c0.c1023              unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
system_u:system_r:dnsmasq_t:s0-s0:c0.c1023                   system_u:system_r:systemd_logind_t:s0
system_u:system_r:init_t:s0                                  system_u:system_r:systemd_tmpfiles_t:s0&lt;/pre&gt;

&lt;p&gt;Ah! Right! We wanted to see everything logged under PolicyKit's security label:&lt;/p&gt;

&lt;pre&gt;$ journalctl _SELINUX_CONTEXT=system_u:system_r:policykit_t:s0&lt;/pre&gt;

&lt;p&gt;Wow! That was easy! I didn't know anything related to SELinux could
be thaaat easy! ;-) Of course this kind of completion works with any
field, not just SELinux labels.&lt;/p&gt;

&lt;p&gt;So much for now. There's a lot more cool stuff in &lt;a href="http://www.freedesktop.org/software/systemd/man/journalctl.html"&gt;journalctl(1)&lt;/a&gt;
than this. For example, it generates JSON output for you! You can match
against kernel fields! You can get simple
&lt;tt&gt;/var/log/messages&lt;/tt&gt;-like output but with &lt;i&gt;relative&lt;/i&gt; timestamps!
And so much more!&lt;/p&gt;

&lt;p&gt;Anyway, in the next weeks I hope to post more stories about all the
cool things the journal can do for you. This is just the beginning,
stay tuned.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;Footnotes&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] systemd 195 is currently still in &lt;a href="https://admin.fedoraproject.org/updates/FEDORA-2012-16709/systemd-195-1.fc18"&gt;Bodhi&lt;/a&gt;
but hopefully will get into F18 proper soon, and definitely before the
release of Fedora 18.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] OK, I cheated here, indexing by block device is not in
the kernel yet, but on its way due to &lt;a href="http://www.spinics.net/lists/linux-scsi/msg62499.html"&gt;Hannes'
fantastic work&lt;/a&gt;, and I hope it will make appearence in
F18.&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 24 Oct 2012 00:16:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-10-24:/blog/projects/journalctl.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XVI</title><link>https://0pointer.net/blog/projects/serial-console.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/watchdog.html"&gt;And,&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/self-documented-boot.html"&gt;yes,&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;here's&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;now&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;sixteenth&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Gettys on Serial Consoles (and Elsewhere)&lt;/h4&gt;

&lt;p&gt;&lt;i&gt;TL;DR: To make use of a serial console, just use
&lt;tt&gt;console=ttyS0&lt;/tt&gt; on the kernel command line, and systemd will
automatically start a getty on it for you.&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;While physical &lt;a href="https://en.wikipedia.org/wiki/RS-232"&gt;RS232&lt;/a&gt; serial ports
have become exotic in today's PCs they play an important role in
modern servers and embedded hardware. They provide a relatively robust
and minimalistic way to access the console of your device, that works
even when the network is hosed, or the primary UI is unresponsive. VMs
frequently emulate a serial port as well.&lt;/p&gt;

&lt;p&gt;Of course, Linux has always had good support for serial consoles,
but with &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt; we
tried to make serial console support even simpler to use. In the
following text I'll try to give an overview how serial console &lt;a href="https://en.wikipedia.org/wiki/Getty_%28Unix%29"&gt;gettys&lt;/a&gt; on
systemd work, and how TTYs of any kind are handled.&lt;/p&gt;

&lt;p&gt;Let's start with the key take-away: in most cases, to get a login
prompt on your serial prompt you don't need to do anything. systemd
checks the kernel configuration for the selected kernel console and
will simply spawn a serial getty on it. That way it is entirely
sufficient to configure your kernel console properly (for example, by
adding &lt;tt&gt;console=ttyS0&lt;/tt&gt; to the kernel command line) and that's
it. But let's have a look at the details:&lt;/p&gt;

&lt;p&gt;In systemd, two template units are responsible for bringing up a
login prompt on text consoles:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;&lt;tt&gt;getty@.service&lt;/tt&gt; is responsible for &lt;a href="https://en.wikipedia.org/wiki/Virtual_console"&gt;virtual
terminal&lt;/a&gt; (VT) login prompts, i.e. those on your VGA screen as
exposed in &lt;tt&gt;/dev/tty1&lt;/tt&gt; and similar devices.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;serial-getty@.service&lt;/tt&gt; is responsible for all other
terminals, including serial ports such as &lt;tt&gt;/dev/ttyS0&lt;/tt&gt;. It
differs in a couple of ways from &lt;tt&gt;getty@.service&lt;/tt&gt;: among other
things the &lt;tt&gt;$TERM&lt;/tt&gt; environment variable is set to
&lt;tt&gt;vt102&lt;/tt&gt; (hopefully a good default for most serial terminals)
rather than &lt;tt&gt;linux&lt;/tt&gt; (which is the right choice for VTs only),
and a special logic that clears the VT scrollback buffer (and only
work on VTs) is skipped.&lt;/li&gt;

&lt;/ol&gt;

&lt;h5&gt;Virtual Terminals&lt;/h5&gt;

&lt;p&gt;Let's have a closer look how &lt;tt&gt;getty@.service&lt;/tt&gt; is started,
i.e. how login prompts on the virtual terminal (i.e. non-serial TTYs)
work. Traditionally, the init system on Linux machines was configured
to spawn a fixed number login prompts at boot. In most cases six
instances of the getty program were spawned, on the first six VTs,
&lt;tt&gt;tty1&lt;/tt&gt; to &lt;tt&gt;tty6&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;In a systemd world we made this more dynamic: in order to make
things more efficient login prompts are now started on demand only. As
you switch to the VTs the getty service is instantiated to
&lt;tt&gt;getty@tty2.service&lt;/tt&gt;, &lt;tt&gt;getty@tty5.service&lt;/tt&gt; and so
on. Since we don't have to unconditionally start the getty processes
anymore this allows us to save a bit of resources, and makes start-up
a bit faster. This behaviour is mostly transparent to the user: if the
user activates a VT the getty is started right-away, so that the user
will hardly notice that it wasn't running all the time. If he then
logs in and types &lt;tt&gt;ps&lt;/tt&gt; he'll notice however that getty
instances are only running for the VTs he so far switched to.&lt;/p&gt;

&lt;p&gt;By default this automatic spawning is done for the VTs up to VT6
only (in order to be close to the traditional default configuration of
Linux systems)&lt;sup&gt;[1]&lt;/sup&gt;.  Note that the auto-spawning of gettys
is only attempted if no other subsystem took possession of the VTs
yet. More specifically, if a user makes frequent use of &lt;a href="https://en.wikipedia.org/wiki/Fast_user_switching"&gt;fast user
switching&lt;/a&gt; via GNOME he'll get his X sessions on the first six VTs,
too, since the lowest available VT is allocated for each session.&lt;/p&gt;

&lt;p&gt;Two VTs are handled specially by the auto-spawning logic: firstly
&lt;tt&gt;tty1&lt;/tt&gt; gets special treatment: if we boot into graphical mode
the display manager takes possession of this VT. If we boot into
multi-user (text) mode a getty is started on it -- unconditionally,
without any on-demand logic&lt;sup&gt;[2]&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;Secondly, &lt;tt&gt;tty6&lt;/tt&gt; is
especially reserved for auto-spawned gettys and unavailable to other
subsystems such as X&lt;sup&gt;[3]&lt;/sup&gt;. This is done in order to ensure
that there's always a way to get a text login, even if due to
fast user switching X took possession of more than 5 VTs.&lt;/p&gt;

&lt;h5&gt;Serial Terminals&lt;/h5&gt;

&lt;p&gt;Handling of login prompts on serial terminals (and all other kind
of non-VT terminals) is different from that of VTs. By default systemd
will instantiate one &lt;tt&gt;serial-getty@.service&lt;/tt&gt; on the main
kernel&lt;sup&gt;[4]&lt;/sup&gt; console, if it is not a virtual terminal. The
kernel console is where the kernel outputs its own log messages and is
usually configured on the kernel command line in the boot loader via
an argument such as &lt;tt&gt;console=ttyS0&lt;/tt&gt;&lt;sup&gt;[5]&lt;/sup&gt;. This logic ensures that
when the user asks the kernel to redirect its output onto a certain
serial terminal, he will automatically also get a login prompt on it
as the boot completes&lt;sup&gt;[6]&lt;/sup&gt;. systemd will also spawn a login
prompt on the first special VM console (that's &lt;tt&gt;/dev/hvc0&lt;/tt&gt;,
&lt;tt&gt;/dev/xvc0&lt;/tt&gt;, &lt;tt&gt;/dev/hvsi0&lt;/tt&gt;), if the system is run in a VM
that provides these devices. This logic is implemented in a &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/Generators"&gt;generator&lt;/a&gt;
called &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-getty-generator.html"&gt;systemd-getty-generator&lt;/a&gt;
that is run early at boot and pulls in the necessary services
depending on the execution environment.&lt;/p&gt;

&lt;p&gt;In many cases, this automatic logic should already suffice to get
you a login prompt when you need one, without any specific
configuration of systemd. However, sometimes there's the need to
manually configure a serial getty, for example, if more than one
serial login prompt is needed or the kernel console should be
redirected to a different terminal than the login prompt. To
facilitate this it is sufficient to instantiate
&lt;tt&gt;serial-getty@.service&lt;/tt&gt; once for each serial port you want it
to run on&lt;sup&gt;[7]&lt;/sup&gt;:&lt;/p&gt;

&lt;pre&gt;# systemctl enable serial-getty@ttyS2.service
# systemctl start serial-getty@ttyS2.service&lt;/pre&gt;

&lt;p&gt;And that's it. This will make sure you get the login prompt on the
chosen port on all subsequent boots, and starts it right-away
too.&lt;/p&gt;

&lt;p&gt;Sometimes, there's the need to configure the login prompt in even
more detail. For example, if the default baud rate configured by the
kernel is not correct or other &lt;tt&gt;agetty&lt;/tt&gt; parameters need to
be changed. In such a case simply copy the default unit template to
&lt;tt&gt;/etc/systemd/system&lt;/tt&gt; and edit it there:&lt;/p&gt;

&lt;pre&gt;# cp /usr/lib/systemd/system/serial-getty@.service /etc/systemd/system/serial-getty@ttyS2.service
# vi /etc/systemd/system/serial-getty@ttyS2.service
 .... now make your changes to the agetty command line ...
# ln -s /etc/systemd/system/serial-getty@ttyS2.service /etc/systemd/system/getty.target.wants/
# systemctl daemon-reload
# systemctl start serial-getty@ttyS2.service&lt;/pre&gt;

&lt;p&gt;This creates a unit file that is specific to serial port
&lt;tt&gt;ttyS2&lt;/tt&gt;, so that you can make specific changes to this port and
this port only.&lt;/p&gt;

&lt;p&gt;And this is pretty much all there's to say about serial ports, VTs
and login prompts on them. I hope this was interesting, and please
come back soon for the next installment of this series!&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footnotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] You can easily modify this by changing
&lt;tt&gt;NAutoVTs=&lt;/tt&gt; in &lt;a href="http://www.freedesktop.org/software/systemd/man/logind.conf.html"&gt;logind.conf&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] Note that whether the getty on VT1 is started on-demand
or not hardly makes a difference, since VT1 is the default active VT
anyway, so the demand is there anyway at boot.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[3] You can easily change this special reserved VT by
modifying &lt;tt&gt;ReserveVT=&lt;/tt&gt; in &lt;a href="http://www.freedesktop.org/software/systemd/man/logind.conf.html"&gt;logind.conf&lt;/a&gt;.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[4] If multiple kernel consoles are used simultaneously, the
&lt;i&gt;main&lt;/i&gt; console is the one listed &lt;i&gt;first&lt;/i&gt; in
&lt;tt&gt;/sys/class/tty/console/active&lt;/tt&gt;, which is the &lt;i&gt;last&lt;/i&gt; one
listed on the kernel command line.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[5] See &lt;a href="https://www.kernel.org/doc/Documentation/kernel-parameters.txt"&gt;kernel-parameters.txt&lt;/a&gt;
for more information on this kernel command line
option.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[6] Note that &lt;tt&gt;agetty -s&lt;/tt&gt; is used here so that the
baud rate configured at the kernel command line is not altered and
continued to be used by the login prompt.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[7] Note that this &lt;tt&gt;systemctl enable&lt;/tt&gt; syntax only
works with systemd 188 and newer (i.e. F18). On older versions use
&lt;tt&gt;ln -s /usr/lib/systemd/system/serial-getty@.service
/etc/systemd/system/getty.target.wants/serial-getty@ttyS2.service ; systemctl
daemon-reload&lt;/tt&gt; instead.&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sat, 13 Oct 2012 02:56:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-10-13:/blog/projects/serial-console.html</guid><category>projects</category></item><item><title>Berlin Open Source Meetup</title><link>https://0pointer.net/blog/projects/berlin-open-source-meetup.html</link><description>
                
&lt;p&gt;&lt;a href="http://blixtra.org/blog/2012/08/06/berlin-open-source-meetup/"&gt;&lt;img src="http://blixtra.org/blog/wp-content/uploads/2012/08/Prater.jpg" width="500" height="375" alt="Prater" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Chris K&amp;uuml;hl and I are organizing a &lt;a href="http://blixtra.org/blog/2012/08/06/berlin-open-source-meetup/"&gt;Berlin
Open Source Meetup&lt;/a&gt; on Aug 19th at the Prater Biergarten in Prenzlauer Berg.
If you live in Berlin (or are passing by) and are involved in or interested in
Open Source then you are invited!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://plus.google.com/u/0/events/c9ffkptmk6kbjkgn7nb7bh5i1ek/107949128852701224835"&gt;There's also a Google+ event for the meetup.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's a public event, so everybody is welcome, and please feel free to invite others!&lt;/p&gt;

&lt;p&gt;See you at the Prater!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 06 Aug 2012 14:59:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-08-06:/blog/projects/berlin-open-source-meetup.html</guid><category>projects</category></item><item><title>Upcoming Hackfests/Sprints</title><link>https://0pointer.net/blog/hackfests.html</link><description>
                
&lt;p&gt;The &lt;a href="http://www.linuxplumbersconf.org/2012/"&gt;Linux Plumbers
Conference 2012&lt;/a&gt; will take place August 29th to 31st in San Diego,
California. We, the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt;
developers, would like to invite you to two hackfests/sprints that will happen
around LPC:&lt;/p&gt;

&lt;h4&gt;San Diego: libvirt/LXC/systemd/SELinux Integration Hackfest&lt;/h4&gt;

&lt;p&gt;On &lt;b&gt;28th of August&lt;/b&gt; we'll have a hackfest on the topic of closer
integration of libvirt, LXC, systemd and SELinux, colocated with LPC in
San Diego, California. We'll have a number of key people from these projects
participating, including Dan Walsh, Eric Paris, Daniel P. Berrange, Kay
Sievers and myself.&lt;/p&gt;

&lt;p&gt;Topics we'll cover: making Fedora/Linux boot entirely cleanly in
normal containers, teaching systemd's control tools minimal
container-awareness (such as being able to list all services of all
containers in one go, in addition to those running on the host
system), unified journal logging across multiple containers, the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface"&gt;systemd
container interface&lt;/a&gt;, auditing and containers, running multiple
instances from the same &lt;tt&gt;/usr&lt;/tt&gt; tree, and a lot more...&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Who should attend?&lt;/b&gt; Everybody hacking on the mentioned
projects who wants to help integrating them with the
goal of turning them into a secure, reliable, powerful container
solution for Linux.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Who should not attend?&lt;/b&gt; If you don't hack on any of these
projects, or if you are not interested in closer integration of at
least two of these projects.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;How to register?&lt;/b&gt; Just show up. You get extra points however
for letting us know in advance (just send us an email). Attendance is
free.&lt;/p&gt;

&lt;p&gt;&amp;#10149; See also: &lt;a href="https://plus.google.com/u/0/events/cvs9oi2q802vh57o1vr9le7tsjc/115547683951727699051"&gt;Google+ Event&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;San Francisco: systemd Journal Sprint&lt;/h4&gt;

&lt;p&gt;On &lt;b&gt;September 3-7&lt;/b&gt; we'll have a sprint on the topic of the systemd
Journal. It's going to take place at the &lt;a href="https://www.getpantheon.com/"&gt;Pantheon&lt;/a&gt; headquarters in San
Francisco, California. Among others, Kay Sievers, David Strauss and I will participate.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Who should attend?&lt;/b&gt; Everybody who wants to help improving the
systemd Journal, regardless if in its core itself, in client software
for it, hooking up other projects or writing library bindings for
it. Also, if you are using or planning to use the journal for a
project, we'd be very interested in high-bandwith face-to-face
feedback regarding what you are missing, what you don't like so much, and what
you find awesome in the Journal.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;How to register?&lt;/b&gt; Please sign up at &lt;a href="http://systemd.eventbrite.com/"&gt;EventBrite&lt;/a&gt;. Attendance is
free. For more information see the &lt;a href="http://lists.freedesktop.org/archives/systemd-devel/2012-July/005803.html"&gt;invitation
mail&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&amp;#10149; See also: &lt;a href="https://plus.google.com/u/0/events/cee28a21tk5lfv0u224kj6pa930/115547683951727699051"&gt;Google+ Event&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;See you in California!&lt;/i&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 20 Jul 2012 02:52:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-07-20:/blog/hackfests.html</guid><category>misc</category></item><item><title>foss.in 2012 CFP Ends in a Few Hours</title><link>https://0pointer.net/blog/projects/fossin2012.html</link><description>
                
&lt;p&gt;&lt;a href="http://foss.in/"&gt;foss.in 2012 in Bangalore&lt;/a&gt; takes place again after a
hiatus of some years. It has always been a fantastic conference, and a great opportunity to
visit Bangalore and India. I just submitted my talk proposals, so, hurry up, and &lt;a href="http://foss.in/participate/call-for-participation"&gt;submit yours&lt;/a&gt;!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sun, 08 Jul 2012 15:47:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-07-08:/blog/projects/fossin2012.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XV</title><link>https://0pointer.net/blog/projects/watchdog.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/self-documented-boot.html"&gt;Quickly
following the previous iteration&lt;/a&gt;, &lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;here's&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;now&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;fifteenth&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Watchdogs&lt;/h4&gt;

&lt;p&gt;There are three big target audiences we try to cover with &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt;:
the embedded/mobile folks, the desktop people and the server
folks. While the systems used by embedded/mobile tend to be
underpowered and have few resources are available, desktops tend to be
much more powerful machines -- but still much less resourceful than
servers. Nonetheless there are surprisingly many features that matter
to both extremes of this axis (embedded and servers), but not the
center (desktops). On of them is support for &lt;a href="https://en.wikipedia.org/wiki/Watchdog_timer"&gt;watchdogs&lt;/a&gt; in
hardware and software.&lt;/p&gt;

&lt;p&gt;Embedded devices frequently rely on watchdog hardware that resets
it automatically if software stops responding (more specifically,
stops signalling the hardware in fixed intervals that it is still
alive). This is required to increase reliability and make sure that
regardless what happens the best is attempted to get the system
working again. Functionality like this makes little sense on the
desktop&lt;sup&gt;[1]&lt;/sup&gt;. However, on
high-availability servers watchdogs are frequently used, again.&lt;/p&gt;

&lt;p&gt;Starting with version 183 systemd provides full support for
hardware watchdogs (as exposed in &lt;tt&gt;/dev/watchdog&lt;/tt&gt; to
userspace), as well as supervisor (software) watchdog support for
invidual system services. The basic idea is the following: if enabled,
systemd will regularly ping the watchdog hardware. If systemd or the
kernel hang this ping will not happen anymore and the hardware will
automatically reset the system. This way systemd and the kernel are
protected from boundless hangs -- by the hardware. To make the chain
complete, systemd then exposes a software watchdog interface for
individual services so that they can also be restarted (or some other
action taken) if they begin to hang. This software watchdog logic can
be configured individually for each service in the ping frequency and
the action to take. Putting both parts together (i.e. hardware
watchdogs supervising systemd and the kernel, as well as systemd
supervising all other services) we have a reliable way to watchdog
every single component of the system.&lt;/p&gt;

&lt;p&gt;To make use of the hardware watchdog it is sufficient to set the
&lt;tt&gt;RuntimeWatchdogSec=&lt;/tt&gt; option in
&lt;tt&gt;/etc/systemd/system.conf&lt;/tt&gt;. It defaults to 0 (i.e. no hardware
watchdog use). Set it to a value like 20s and the watchdog is
enabled. After 20s of no keep-alive pings the hardware will reset
itself. Note that systemd will send a ping to the hardware at half the
specified interval, i.e. every 10s. And that's already all there is to
it. By enabling this single, simple option you have turned on
supervision by the hardware of systemd and the kernel beneath
it.&lt;sup&gt;[2]&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Note that the hardware watchdog device (&lt;tt&gt;/dev/watchdog&lt;/tt&gt;) is
single-user only. That means that you can either enable this
functionality in systemd, or use a separate external watchdog daemon,
such as the aptly named &lt;a href="http://linux.die.net/man/8/watchdog"&gt;watchdog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;ShutdownWatchdogSec=&lt;/tt&gt; is another option that can be
configured in &lt;tt&gt;/etc/systemd/system.conf&lt;/tt&gt;. It controls the
watchdog interval to use during reboots. It defaults to 10min, and
adds extra reliability to the system reboot logic: if a clean reboot
is not possible and shutdown hangs, we rely on the watchdog hardware
to reset the system abruptly, as extra safety net.&lt;/p&gt;

&lt;p&gt;So much about the hardware watchdog logic. These two options are
really everything that is necessary to make use of the hardware
watchdogs. Now, let's have a look how to add watchdog logic to
individual services.&lt;/p&gt;

&lt;p&gt;First of all, to make software watchdog-supervisable it needs to be
patched to send out "I am alive" signals in regular intervals in its
event loop. Patching this is relatively easy. First, a daemon needs to
read the &lt;tt&gt;WATCHDOG_USEC=&lt;/tt&gt; environment variable. If it is set,
it will contain the watchdog interval in usec formatted as ASCII text
string, as it is configured for the service. The daemon should then
issue &lt;tt&gt;&lt;a href="http://www.freedesktop.org/software/systemd/man/sd_notify.html"&gt;sd_notify&lt;/a&gt;("WATCHDOG=1")&lt;/tt&gt;
calls every half of that interval. A daemon patched this way should
transparently support watchdog functionality by checking whether the
environment variable is set and honouring the value it is set to.&lt;/p&gt;

&lt;p&gt;To enable the software watchdog logic for a service (which has been
patched to support the logic pointed out above) it is sufficient to
set the &lt;tt&gt;WatchdogSec=&lt;/tt&gt; to the desired failure latency. See &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.service.html"&gt;systemd.service(5)&lt;/a&gt;
for details on this setting. This causes &lt;tt&gt;WATCHDOG_USEC=&lt;/tt&gt; to be
set for the service's processes and will cause the service to enter a
failure state as soon as no keep-alive ping is received within the
configured interval.&lt;/p&gt;

&lt;p&gt;If a service enters a failure state as soon as the watchdog logic
detects a hang, then this is hardly sufficient to build a reliable
system. The next step is to configure whether the service shall be
restarted and how often, and what to do if it then still fails. To
enable automatic service restarts on failure set
&lt;tt&gt;Restart=on-failure&lt;/tt&gt; for the service. To configure how many
times a service shall be attempted to be restarted use the combination
of &lt;tt&gt;StartLimitBurst=&lt;/tt&gt; and &lt;tt&gt;StartLimitInterval=&lt;/tt&gt; which
allow you to configure how often a service may restart within a time
interval. If that limit is reached, a special action can be
taken. This action is configured with &lt;tt&gt;StartLimitAction=&lt;/tt&gt;. The
default is a &lt;tt&gt;none&lt;/tt&gt;, i.e. that no further action is taken and
the service simply remains in the failure state without any further
attempted restarts. The other three possible values are
&lt;tt&gt;reboot&lt;/tt&gt;, &lt;tt&gt;reboot-force&lt;/tt&gt; and
&lt;tt&gt;reboot-immediate&lt;/tt&gt;. &lt;tt&gt;reboot&lt;/tt&gt; attempts a clean reboot,
going through the usual, clean shutdown logic. &lt;tt&gt;reboot-force&lt;/tt&gt;
is more abrupt: it will not actually try to cleanly shutdown any
services, but immediately kills all remaining services and unmounts
all file systems and then forcibly reboots (this way all file systems
will be clean but reboot will still be very fast). Finally,
&lt;tt&gt;reboot-immediate&lt;/tt&gt; does not attempt to kill any process or
unmount any file systems. Instead it just hard reboots the machine
without delay. &lt;tt&gt;reboot-immediate&lt;/tt&gt; hence comes closest to a
reboot triggered by a hardware watchdog. All these settings are
documented in &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.service.html"&gt;systemd.service(5)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Putting this all together we now have pretty flexible options to
watchdog-supervise a specific service and configure automatic restarts
of the service if it hangs, plus take ultimate action if that doesn't
help.&lt;/p&gt;

&lt;p&gt;Here's an example unit file:&lt;/p&gt;

&lt;pre&gt;[Unit]
Description=My Little Daemon
Documentation=man:mylittled(8)

[Service]
ExecStart=/usr/bin/mylittled
WatchdogSec=30s
Restart=on-failure
StartLimitInterval=5min
StartLimitBurst=4
StartLimitAction=reboot-force
&lt;/pre&gt;

&lt;p&gt;This service will automatically be restarted if it hasn't pinged
the system manager for longer than 30s or if it fails otherwise. If it
is restarted this way more often than 4 times in 5min action is taken
and the system quickly rebooted, with all file systems being clean
when it comes up again.&lt;/p&gt;

&lt;p&gt;And that's already all I wanted to tell you about! With hardware
watchdog support right in PID 1, as well as supervisor watchdog
support for individual services we should provide everything you need
for most watchdog usecases. Regardless if you are building an embedded
or mobile applience, or if your are working with high-availability
servers, please give this a try!&lt;/p&gt;

&lt;p&gt;(Oh, and if you wonder why in heaven PID 1 needs to deal with
&lt;tt&gt;/dev/watchdog&lt;/tt&gt;, and why this shouldn't be kept in a separate
daemon, then please read this again and try to understand that this is
all about the supervisor chain we are building here, where the hardware watchdog
supervises systemd, and systemd supervises the individual
services. Also, we believe that a service not responding should be
treated in a similar way as any other service error. Finally, pinging
&lt;tt&gt;/dev/watchdog&lt;/tt&gt; is one of the most trivial operations in the OS
(basically little more than a ioctl() call), to the support for this
is not more than a handful lines of code. Maintaining this externally
with complex IPC between PID 1 (and the daemons) and this watchdog
daemon would be drastically more complex, error-prone and resource
intensive.)&lt;/p&gt;

&lt;p&gt;Note that the built-in hardware watchdog support of systemd does
not conflict with other watchdog software by default. systemd does not
make use of &lt;tt&gt;/dev/watchdog&lt;/tt&gt; by default, and you are welcome to
use external watchdog daemons in conjunction with systemd, if this
better suits your needs.&lt;/p&gt;

&lt;p&gt;And one last thing: if you wonder whether your hardware has a
watchdog, then the answer is: almost definitely yes -- if it is anything more
recent than a few years. If you want to verify this, try the &lt;a href="http://karelzak.blogspot.de/2012/05/eject1-sulogin1-wdctl1.html"&gt;wdctl&lt;/a&gt;
tool from recent util-linux, which shows you everything you need to
know about your watchdog hardware.&lt;/p&gt;

&lt;p&gt;I'd like to thank the great folks from &lt;a href="http://www.pengutronix.de/"&gt;Pengutronix&lt;/a&gt; for contributing
most of the watchdog logic. Thank you!&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footnotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] Though actually most desktops tend to include watchdog
hardware these days too, as this is cheap to build and available in
most modern PC chipsets.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] So, here's a free tip for you if you hack on the core
OS: don't enable this feature while you hack. Otherwise your system
might suddenly reboot if you are in the middle of tracing through PID
1 with gdb and cause it to be stopped for a moment, so that no
hardware ping can be done...&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 28 Jun 2012 00:07:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-06-28:/blog/projects/watchdog.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XIV</title><link>https://0pointer.net/blog/projects/self-documented-boot.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/systemctl-journal.html"&gt;And&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;here's&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;fourteenth&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;The Self-Explanatory Boot&lt;/h4&gt;

&lt;p&gt;One complaint we often hear about &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt; is
that its boot process was hard to understand, even
incomprehensible. In general I can only disagree with this sentiment, I
even believe in quite the opposite: in comparison to what we had
before -- where to even remotely understand what was going on you had
to have a decent comprehension of the programming language that is
Bourne Shell&lt;sup&gt;[1]&lt;/sup&gt; -- understanding systemd's boot process is
substantially easier. However, like in many complaints there is some
truth in this frequently heard discomfort: for a seasoned Unix
administrator there indeed is a bit of learning to do when the switch
to &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt; is
made. And as systemd developers it is our duty to make the learning
curve shallow, introduce as few surprises as we can, and provide
good documentation where that is not possible.&lt;/p&gt;

&lt;p&gt;systemd always had huge body of documentation &lt;a href="http://www.freedesktop.org/software/systemd/man/"&gt;as manual
pages&lt;/a&gt; (nearly 100 individual pages now!), in the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;Wiki&lt;/a&gt; and
the various blog stories I posted. However, any amount of
documentation alone is not enough to make software easily
understood. In fact, thick manuals sometimes appear intimidating and
make the reader wonder where to start reading, if all he was
interested in was this one simple concept of the whole system.&lt;/p&gt;

&lt;p&gt;Acknowledging all this we have now added a new, neat, little
feature to systemd: the self-explanatory boot process. What do we mean
by that? Simply that each and every single component of our boot comes
with documentation and that this documentation is closely linked to
its component, so that it is easy to find.&lt;/p&gt;

&lt;p&gt;More specifically, all units in systemd (which are what
encapsulate the components of the boot) now include references to
their documentation, the documentation of their configuration files
and further applicable manuals. A user who is trying to understand the
purpose of a unit, how it fits into the boot process and how to
configure it can now easily look up this documentation with the
well-known &lt;tt&gt;systemctl status&lt;/tt&gt; command. Here's an example how
this looks for &lt;tt&gt;systemd-logind.service&lt;/tt&gt;:&lt;/p&gt;

&lt;pre&gt;
$ systemctl status systemd-logind.service
systemd-logind.service - Login Service
	  Loaded: loaded (/usr/lib/systemd/system/systemd-logind.service; static)
	  Active: active (running) since Mon, 25 Jun 2012 22:39:24 +0200; 1 day and 18h ago
	    Docs: &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-logind.service.html"&gt;man:systemd-logind.service(7)&lt;/a&gt;
	          &lt;a href="http://www.freedesktop.org/software/systemd/man/logind.conf.html"&gt;man:logind.conf(5)&lt;/a&gt;
	          &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/multiseat"&gt;http://www.freedesktop.org/wiki/Software/systemd/multiseat&lt;/a&gt;
	Main PID: 562 (systemd-logind)
	  CGroup: name=systemd:/system/systemd-logind.service
		  └ 562 /usr/lib/systemd/systemd-logind

Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event2 (Power Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event6 (Video Bus)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event0 (Lid Switch)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event1 (Sleep Button)
Jun 25 22:39:24 epsilon systemd-logind[562]: Watching system buttons on /dev/input/event7 (ThinkPad Extra Buttons)
Jun 25 22:39:25 epsilon systemd-logind[562]: New session 1 of user gdm.
Jun 25 22:39:25 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/42/X11-display.
Jun 25 22:39:32 epsilon systemd-logind[562]: New session 2 of user lennart.
Jun 25 22:39:32 epsilon systemd-logind[562]: Linked /tmp/.X11-unix/X0 to /run/user/500/X11-display.
Jun 25 22:39:54 epsilon systemd-logind[562]: Removed session 1.
&lt;/pre&gt;

&lt;p&gt;On the first look this output changed very little. If you look
closer however you will find that it now includes one new field:
&lt;tt&gt;Docs&lt;/tt&gt; lists references to the documentation of this
service. In this case there are two man page URIs and one web URL
specified. The man pages describe the purpose and configuration of
this service, the web URL includes an introduction to the basic
concepts of this service.&lt;/p&gt;

&lt;p&gt;If the user uses a recent graphical terminal implementation it is
sufficient to click on the URIs shown to get the respective
documentation&lt;sup&gt;[2]&lt;/sup&gt;. With other words: it never has been that
easy to figure out what a specific component of our boot is about:
just use &lt;tt&gt;systemctl status&lt;/tt&gt; to get more information about it
and click on the links shown to find the documentation.&lt;/p&gt;

&lt;p&gt;The past days I have written man pages and added these references
for every single unit we ship with systemd. This means, with
&lt;tt&gt;systemctl status&lt;/tt&gt; you now have a very easy way to find out
more about every single service of the core OS.&lt;/p&gt;

&lt;p&gt;If you are not using a graphical terminal (where you can just click
on URIs), a man page URI in the middle of the output of &lt;tt&gt;systemctl
status&lt;/tt&gt; is not the most useful thing to have. To make reading the
referenced man pages easier we have also added a new command:&lt;/p&gt;

&lt;pre&gt;systemctl help systemd-logind.service&lt;/pre&gt;

&lt;p&gt;Which will open the listed man pages right-away, without the need
to click anything or copy/paste an URI.&lt;/p&gt;

&lt;p&gt;The URIs are in the formats documented by the &lt;a href="https://www.kernel.org/doc/man-pages/online/pages/man7/url.7.html"&gt;uri(7)&lt;/a&gt;
man page. Units may reference http and https URLs, as well as man and
info pages.&lt;/p&gt;

&lt;p&gt;Of course all this doesn't make everything self-explanatory, simply
because the user still has to find out about &lt;tt&gt;systemctl status&lt;/tt&gt;
(and even &lt;tt&gt;systemctl&lt;/tt&gt; in the first place so that he even knows
what units there are); however with this basic knowledge further
help on specific units is in very easy reach.&lt;/p&gt;

&lt;p&gt;We hope that this kind of interlinking of runtime behaviour and the
matching documentation is a big step forward to make our boot easier
to understand.&lt;/p&gt;

&lt;p&gt;This functionality is partially already available in Fedora 17, and
will show up in complete form in Fedora 18.&lt;/p&gt;

&lt;p&gt;That all said, credit where credit is due: this kind of references
to documentation within the service descriptions is not new, Solaris'
SMF had similar functionality for quite some time. However, we believe
this new systemd feature is certainly a novelty on Linux, and with
systemd we now offer you the best documented and best self-explaining
init system.&lt;/p&gt;

&lt;p&gt;Of course, if you are writing unit files for your own packages,
please consider also including references to the documentation of your
services and its configuration. This is really easy to do, just list
the URIs in the new &lt;tt&gt;Documentation=&lt;/tt&gt; field in the
&lt;tt&gt;[Unit]&lt;/tt&gt; section of your unit files. For details see &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd.unit.html"&gt;systemd.unit(5)&lt;/a&gt;. The
more comprehensively we include links to documentation in our OS
services the easier the work of administrators becomes. (To make sure
Fedora makes comprehensive use of this functionality &lt;a href="https://fedorahosted.org/fpc/ticket/192"&gt;I filed a bug on
FPC&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Oh, and BTW: if you are looking for a rough overview of systemd's
boot process &lt;a href="http://www.freedesktop.org/software/systemd/man/bootup.html"&gt;here's
another new man page we recently added&lt;/a&gt;, which includes a pretty
ASCII flow chart of the boot process and the units involved.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footnotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] Which TBH is a pretty crufty, strange one on top.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] Well, &lt;a href="https://bugzilla.gnome.org/show_bug.cgi?id=676452"&gt;a terminal
where this bug is fixed&lt;/a&gt; (used together with &lt;a href="https://bugzilla.gnome.org/show_bug.cgi?id=676482"&gt;a help
browser where this one is fixed&lt;/a&gt;).&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 27 Jun 2012 17:45:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-06-27:/blog/projects/self-documented-boot.html</guid><category>projects</category></item><item><title>Presentation in Warsaw</title><link>https://0pointer.net/blog/projects/warsaw.html</link><description>
                
&lt;p&gt;I recently had the chance to speak about &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt;
and other projects, as well as the politics behind them at a &lt;a href="http://osec.pl/barcamp/lennart"&gt;Bar Camp in Warsaw&lt;/a&gt;,
organized by the fine people of &lt;a href="http://osec.pl/"&gt;OSEC&lt;/a&gt;. The presentation has been recorded,
and has now been posted online. It's a very long recording (1:43h),
but it's quite interesting (as I'd like to believe) and contains a bit
of background where we are coming from and where are going to. Anyway,
please have a look. Enjoy!&lt;/p&gt;

&lt;iframe width="560" height="315" src="http://www.youtube.com/embed/9UnEV9SPuw8" frameborder="0" allowfullscreen="1"&gt;&lt;/iframe&gt;

&lt;p&gt;I'd like to thank the organizers for this great event and for
publishing the recording online.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 24 May 2012 22:06:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-05-24:/blog/projects/warsaw.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XIII</title><link>https://0pointer.net/blog/projects/systemctl-journal.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/security.html"&gt;Here's&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;thirteenth&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Log and Service Status&lt;/h4&gt;

&lt;p&gt;This one is a short episode. One of the most commonly used commands
on a &lt;a href="http://www.freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt;
system is &lt;tt&gt;systemctl status&lt;/tt&gt; which may be used to determine the
status of a service (or other unit). It always has been a valuable
tool to figure out the processes, runtime information and other meta
data of a daemon running on the system.&lt;/p&gt;

&lt;p&gt;With Fedora 17 we introduced &lt;a href="http://0pointer.de/blog/projects/the-journal.html"&gt;the
journal&lt;/a&gt;, our new logging scheme that provides structured, indexed
and reliable logging on systemd systems, while providing a certain
degree of compatibility with classic syslog implementations. The
original reason we started to work on the journal was one specific
feature idea, that to the outsider might appear simple but without the
journal is difficult and inefficient to implement: along with the
output of &lt;tt&gt;systemctl status&lt;/tt&gt; we wanted to show the last 10 log
messages of the daemon. Log data is some of the most essential bits of
information we have on the status of a service. Hence it it is an
obvious choice to show next to the general status of the
service.&lt;/p&gt;

&lt;p&gt;And now to make it short: at the same time as we integrated the
journal into &lt;tt&gt;systemd&lt;/tt&gt; and Fedora we also hooked up
&lt;tt&gt;systemctl&lt;/tt&gt; with it. Here's an example output:&lt;/p&gt;

&lt;pre&gt;$ systemctl status avahi-daemon.service
avahi-daemon.service - Avahi mDNS/DNS-SD Stack
	  Loaded: loaded (/usr/lib/systemd/system/avahi-daemon.service; enabled)
	  Active: active (running) since Fri, 18 May 2012 12:27:37 +0200; 14s ago
	Main PID: 8216 (avahi-daemon)
	  Status: "avahi-daemon 0.6.30 starting up."
	  CGroup: name=systemd:/system/avahi-daemon.service
		  ├ 8216 avahi-daemon: running [omega.local]
		  └ 8217 avahi-daemon: chroot helper

May 18 12:27:37 omega avahi-daemon[8216]: Joining mDNS multicast group on interface eth1.IPv4 with address 172.31.0.52.
May 18 12:27:37 omega avahi-daemon[8216]: New relevant interface eth1.IPv4 for mDNS.
May 18 12:27:37 omega avahi-daemon[8216]: Network interface enumeration completed.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 192.168.122.1 on virbr0.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for fd00::e269:95ff:fe87:e282 on eth1.*.
May 18 12:27:37 omega avahi-daemon[8216]: Registering new address record for 172.31.0.52 on eth1.IPv4.
May 18 12:27:37 omega avahi-daemon[8216]: Registering HINFO record with values 'X86_64'/'LINUX'.
May 18 12:27:38 omega avahi-daemon[8216]: Server startup complete. Host name is omega.local. Local service cookie is 3555095952.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/ssh.service) successfully established.
May 18 12:27:38 omega avahi-daemon[8216]: Service "omega" (/services/sftp-ssh.service) successfully established.&lt;/pre&gt;

&lt;p&gt;This, of course, shows the status of everybody's favourite
mDNS/DNS-SD daemon with a list of its processes, along with -- as
promised -- the 10 most recent log lines. Mission accomplished!&lt;/p&gt;

&lt;p&gt;There are a couple of switches available to alter the output
slightly and adjust it to your needs. The two most interesting
switches are &lt;tt&gt;-f&lt;/tt&gt; to enable follow mode (as in &lt;tt&gt;tail
-f&lt;/tt&gt;) and &lt;tt&gt;-n&lt;/tt&gt; to change the number of lines to show (you
guessed it, as in &lt;tt&gt;tail -n&lt;/tt&gt;).&lt;/p&gt;

&lt;p&gt;The log data shown comes from three sources: everything any of the
daemon's processes logged with libc's &lt;tt&gt;syslog()&lt;/tt&gt; call,
everything submitted using the native Journal API, plus everything any
of the daemon's processes logged to STDOUT or STDERR. In short:
everything the daemon generates as log data is collected, properly
interleaved and shown in the same format.&lt;/p&gt;

&lt;p&gt;And that's it already for today. It's a very simple feature, but an
immensely useful one for every administrator. One of the kind "Why didn't
we already do this 15 years ago?".&lt;/p&gt;

&lt;p&gt;Stay tuned for the next installment!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 18 May 2012 12:37:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-05-18:/blog/projects/systemctl-journal.html</guid><category>projects</category></item><item><title>Boot &amp; Base OS Miniconf at Linux Plumbers Conference 2012, San Diego</title><link>https://0pointer.net/blog/projects/lpc2012.html</link><description>
                
&lt;p style="text-align: center"&gt;&lt;a href="http://www.linuxplumbersconf.org/2012/"&gt;&lt;img src="http://www.linuxplumbersconf.org/2012/style/tagline.png" width="493" height="90" alt="Linux Plumbers Conference Logo" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We are working on putting together &lt;a href="http://wiki.linuxplumbersconf.org/2012:boot_and_base_os"&gt;a miniconf on
the topic of Boot &amp;amp; Base OS&lt;/a&gt; for the Linux Plumbers Conference 2012 in San
Diego (Aug 29-31). And we need your submission!&lt;/p&gt;

&lt;p&gt;Are you working on some exciting project related to Boot and Base OS and
would like to present your work? Then please submit something &lt;a href="http://www.linuxplumbersconf.org/2012/2012-lpc-call-for-proposals-take-2/"&gt;following
these guidelines&lt;/a&gt;, but please CC Kay Sievers and Lennart Poettering.&lt;/p&gt;

&lt;p&gt;I hope that at this point the Linux Plumbers Conference
needs little introduction, so I will spare any further prose on how great and
useful and the best conference ever it is for everybody who works on the plumbing
layer of Linux. However, there's one conference that will be co-located with
LPC that is still little known, because it happens for the first time: &lt;a href="http://www.cconf.org/"&gt;The C Conference&lt;/a&gt;, organized by Brandon Philips
and friends. It covers all things C, and they are still looking for more
topics, in a &lt;a href="http://www.cconf.org/pfc/"&gt;reverse CFP&lt;/a&gt;. Please
consider submitting a proposal and registering to the conference!&lt;/p&gt;

&lt;p style="text-align: center"&gt;&lt;a href="http://www.cconf.org/"&gt;&lt;img src="http://www.cconf.org/assets/cconf.png" width="270" height="270" alt="C
Conference Logo" /&gt;&lt;/a&gt;&lt;/p&gt;


        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 03 May 2012 20:42:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-05-03:/blog/projects/lpc2012.html</guid><category>projects</category></item><item><title>The Most Awesome, Least-Advertised Fedora 17 Feature</title><link>https://0pointer.net/blog/projects/multi-seat.html</link><description>
                
&lt;p&gt;There's one feature In the upcoming Fedora 17 release that is
immensly useful but very little known, since its &lt;a href="https://fedoraproject.org/wiki/Features/ckremoval"&gt;feature page
'ckremoval'&lt;/a&gt; does not explicitly refer to it in its name: true
&lt;i&gt;automatic multi-seat&lt;/i&gt; support for Linux.&lt;/p&gt;

&lt;p&gt;A multi-seat computer is a system that offers not only one local
seat for a user, but multiple, at the same time. A seat refers to a
combination of a screen, a set of input devices (such as mice and
keyboards), and maybe an audio card or webcam, as individual local
workplace for a user. A multi-seat computer can drive an entire class
room of seats with only a fraction of the cost in hardware, energy,
administration and space: you only have one PC, which usually has way
enough CPU power to drive 10 or more workplaces. (In fact, even a
Netbook has fast enough to drive a couple of seats!) &lt;i&gt;Automatic
multi-seat&lt;/i&gt; refers to an entirely automatically managed seat setup:
whenever a new seat is plugged in a new login screen immediately
appears -- without any manual configuration --, and when the seat is
unplugged all user sessions on it are removed without delay.&lt;/p&gt;

&lt;p&gt;In Fedora 17 we added this functionality to the low-level user and
device tracking of systemd, replacing the previous ConsoleKit logic
that lacked support for automatic multi-seat. With all the ground work
done in systemd, udev and the other components of our plumbing layer
the last remaining bits were surprisingly easy to add.&lt;/p&gt;

&lt;p&gt;Currently, the automatic multi-seat logic works best with the USB
multi-seat hardware from &lt;a href="http://www.amazon.com/Plugable-Universal-DisplayLink-1920x1080-High-Speed/dp/B002PONXAI/ref=sr_1_3?ie=UTF8&amp;amp;qid=1335904746&amp;amp;sr=8-3"&gt;Plugable&lt;/a&gt;
you can buy cheaply on &lt;a href="http://www.amazon.com/Plugable-DC-125-Docking-Station-Multiseat/dp/B004PXPPNA/ref=sr_1_10?ie=UTF8&amp;amp;qid=1335904746&amp;amp;sr=8-10"&gt;Amazon
(US)&lt;/a&gt;. These devices require exactly zero configuration with the
new scheme implemented in Fedora 17: just plug them in at any time,
login screens pop up on them, and you have your additional
seats. Alternatively you can also assemble your seat manually with a
few easy &lt;a href="http://www.freedesktop.org/software/systemd/man/loginctl.html"&gt;loginctl
attach&lt;/a&gt; commands, from any kind of hardware you might have lying
around. To get a full seat you need multiple graphics cards, keyboards
and mice: one set for each seat. (Later on we'll probably have a graphical
setup utility for additional seats, but that's not a pressing issue we
believe, as the plug-n-play multi-seat support with the Plugable
devices is so awesomely nice.)&lt;/p&gt;

&lt;p&gt;Plugable provided us for free with hardware for testing
multi-seat. They are also involved with the upstream development of
the USB DisplayLink driver for Linux. Due to their positive
involvement with Linux we can only recommend to buy their
hardware. They are good guys, and support Free Software the way all
hardware vendors should! (And besides that, their hardware is also
nicely put together. For example, in contrast to most similar vendors
they actually assign proper vendor/product IDs to their USB hardware
so that we can easily recognize their hardware when plugged in to set
up automatic seats.)&lt;/p&gt;

&lt;p&gt;Currently, all this magic is only implemented in the GNOME stack
with the biggest component getting updated being the GNOME Display
Manager. On the Plugable USB hardware you get a full GNOME Shell
session with all the usual graphical gimmicks, the same way as on any
other hardware. (Yes, GNOME 3 works perfectly fine on simpler graphics
cards such as these USB devices!) If you are hacking on a different
desktop environment, or on a different display manager, please have a
look at &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/multiseat"&gt;the
multi-seat documentation&lt;/a&gt; we put together, and particularly at
our short piece about &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/writing-display-managers"&gt;writing
display managers&lt;/a&gt; which are multi-seat capable.&lt;/p&gt;

&lt;p&gt;If you work on a major desktop environment or display manager and
would like to implement multi-seat support for it, but lack the
aforementioned Plugable hardware, we might be able to provide you with
the hardware for free. Please contact us directly, and we might be
able to send you a device. Note that we don't have unlimited devices
available, hence we'll probably not be able to pass hardware to
everybody who asks, and we will pass the hardware preferably to people
who work on well-known software or otherwise have contributed good
code to the community already. Anyway, if in doubt, ping us, and
explain to us why you should get the hardware, and we'll consider you!
(Oh, and this not only applies to display managers, if you hack on some other
software where multi-seat awareness would be truly useful, then don't
hesitate and ping us!)&lt;/p&gt;

&lt;p&gt;Phoronix has &lt;a href="http://www.phoronix.com/scan.php?page=article&amp;amp;item=plugable_multiseat_kick"&gt;this
story about this new multi-seat&lt;/a&gt; support which is quite interesting and
full of pictures. Please have a look.&lt;/p&gt;

&lt;p&gt;Plugable started a &lt;a href="http://www.kickstarter.com/projects/1666707630/plugable-thin-client-the-50-computer"&gt;Pledge
drive&lt;/a&gt; to lower the price of the Plugable USB multi-seat terminals
further. It's full of pictures (&lt;a href="http://www.kickstarter.com/projects/1666707630/plugable-thin-client-the-50-computer/widget/video.html"&gt;&lt;b&gt;and a video showing all this in action!&lt;/b&gt;&lt;/a&gt;), and uses the code we now make
available in Fedora 17 as base. Please consider pledging a few
bucks.&lt;/p&gt;

&lt;p&gt;Recently David Zeuthen &lt;a href="https://plus.google.com/110773474140772402317/posts/NqPUifsFUYH"&gt;added
multi-seat support to udisks&lt;/a&gt; as well. With this in place, a user
logged in on a specific seat can only see the USB storage plugged into
his individual seat, but does not see any USB storage plugged into any
other local seat. With this in place we closed the last missing bit of
multi-seat support in our desktop stack.&lt;/p&gt;

&lt;p&gt;With this code in Fedora 17 we cover the big use cases of
multi-seat already: internet cafes, class rooms and similar
installations can provide PC workplaces cheaply and easily without any
manual configuration. Later on we want to build on this and make this
useful for different uses too: for example, the ability to get a login
screen as easily as plugging in a USB connector makes this not useful
only for saving money in setups for many people, but also in embedded
environments (consider monitoring/debugging screens made available via
this hotplug logic) or servers (get trivially quick local access to
your otherwise head-less server). To be truly useful in these areas we
need one more thing though: the ability to run a simply getty
(i.e. text login) on the seat, without necessarily involving a
graphical UI.&lt;/p&gt;

&lt;p&gt;The well-known X successor Wayland already comes out of the box with multi-seat
support based on this logic.&lt;/p&gt;

&lt;p&gt;Oh, and BTW, as Ubuntu appears to be "&lt;i&gt;focussing&lt;/i&gt;" on "&lt;i&gt;clarity&lt;/i&gt;" in the
"&lt;i&gt;cloud&lt;/i&gt;" now ;-), and chose Upstart instead of systemd, this feature
won't be available in Ubuntu any time soon. That's (one detail of) the
price Ubuntu has to pay for choosing to maintain it's own (largely
legacy, such as ConsoleKit) plumbing stack.&lt;/p&gt;

&lt;p&gt;Multi-seat has a long history on Unix. Since the earliest days Unix
systems could be accessed by multiple local terminals at the same
time. Since then local terminal support (and hence multi-seat)
gradually moved out of view in computing. The fewest machines these
days have more than one seat, the concept of terminals survived almost
exclusively in the context of PTYs (i.e. fully virtualized API
objects, disconnected from any real hardware seat) and VCs (i.e. a
single virtualized local seat), but almost not in any other way (well,
server setups still use serial terminals for emergency remote access,
but they almost never have more than one serial terminal). All what we
do in systemd is based on the ideas originally brought forward in
Unix; with systemd we now try to bring back a number of the good ideas
of Unix that since the old times were lost on the roadside. For
example, in true Unix style we already started to expose the concept
of a service in the file system (in
&lt;tt&gt;/sys/fs/cgroup/systemd/system/&lt;/tt&gt;), something where on Linux the
(often misunderstood) "&lt;i&gt;everything is a file&lt;/i&gt;" mantra previously
fell short. With automatic multi-seat support we bring back support
for terminals, but updated with all the features of today's desktops:
plug and play, zero configuration, full graphics, and not limited to
input devices and screens, but extending to all kinds of devices, such
as audio, webcams or USB memory sticks.&lt;/p&gt;

&lt;p&gt;Anyway, this is all for now; I'd like to thank everybody who was
involved with making multi-seat work so nicely and natively on the
Linux platform. You know who you are! Thanks a ton!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 01 May 2012 23:07:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-05-01:/blog/projects/multi-seat.html</guid><category>projects</category></item><item><title>systemd Status Update</title><link>https://0pointer.net/blog/projects/systemd-update-3.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/systemd-update-2.html"&gt;It
has been way too long since my last status update on
systemd&lt;/a&gt;. Here's another short, incomprehensive status update on
what we worked on for &lt;a href="http://freedesktop.org/wiki/Software/systemd"&gt;systemd&lt;/a&gt; since
then.&lt;/p&gt;

&lt;p&gt;We have been working hard to turn systemd into the most viable set
of components to build operating systems, appliances and devices from,
and make it the best choice for servers, for desktops and for embedded
environments alike. I think we have a really convincing set of
features now, but we are actively working on making it even
better.&lt;/p&gt;

&lt;p&gt;Here's a list of some more and some less interesting features, in
no particular order:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;We added an automatic pager to &lt;tt&gt;systemctl&lt;/tt&gt; (and related tools), similar
to how &lt;tt&gt;git&lt;/tt&gt; has it.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;systemctl&lt;/tt&gt; learnt a new switch &lt;tt&gt;--failed&lt;/tt&gt;, to show only
failed services.&lt;/li&gt;

&lt;li&gt;You may now start services immediately, overrding all dependency
logic by passing &lt;tt&gt;--ignore-dependencies&lt;/tt&gt; to
&lt;tt&gt;systemctl&lt;/tt&gt;. This is mostly a debugging tool and nothing people
should use in real life.&lt;/li&gt;

&lt;li&gt;Sending &lt;tt&gt;SIGKILL&lt;/tt&gt; as final part of the implicit shutdown
logic of services is now optional and may be configured with the
&lt;tt&gt;SendSIGKILL=&lt;/tt&gt; option individually for each service.&lt;/li&gt;

&lt;li&gt;We split off the Vala/Gtk tools into its own project &lt;tt&gt;systemd-ui&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;systemd-tmpfiles&lt;/tt&gt; learnt file globbing and creating FIFO
special files as well as character and block device nodes, and
symlinks. It also is capable of relabelling certain directories at
boot now (in the SELinux sense).&lt;/li&gt;

&lt;li&gt;Immediately before shuttding dow we will now invoke all binaries
found in &lt;tt&gt;/lib/systemd/system-shutdown/&lt;/tt&gt;, which is useful for
debugging late shutdown.&lt;/li&gt;

&lt;li&gt;You may now globally control where STDOUT/STDERR of services goes
(unless individual service configuration overrides it).&lt;/li&gt;

&lt;li&gt;There's a new &lt;tt&gt;ConditionVirtualization=&lt;/tt&gt; option, that makes
systemd skip a specific service if a certain virtualization technology
is found or not found. Similar, we now have a new option to detect
whether a certain security technology (such as SELinux) is available,
called &lt;tt&gt;ConditionSecurity=&lt;/tt&gt;. There's also
&lt;tt&gt;ConditionCapability=&lt;/tt&gt; to check whether a certain process
capability is in the capability bounding set of the system. There's
also a new &lt;tt&gt;ConditionFileIsExecutable=&lt;/tt&gt;,
&lt;tt&gt;ConditionPathIsMountPoint=&lt;/tt&gt;,
&lt;tt&gt;ConditionPathIsReadWrite=&lt;/tt&gt;,
&lt;tt&gt;ConditionPathIsSymbolicLink=&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;The file system condition directives now support globbing.&lt;/li&gt;

&lt;li&gt;Service conditions may now be "triggering" and "mandatory", meaning that
they can be a necessary requirement to hold for a service to start, or
simply one trigger among many.&lt;/li&gt;

&lt;li&gt;At boot time we now print warnings if: &lt;a href="http://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken"&gt;&lt;tt&gt;/usr&lt;/tt&gt;
is on a split-off partition but not already mounted by an initrd&lt;/a&gt;;
if &lt;tt&gt;/etc/mtab&lt;/tt&gt; is not a symlink to &lt;tt&gt;/proc/mounts&lt;/tt&gt;; &lt;a href="http://0pointer.de/blog/projects/cgroups-vs-cgroups.html"&gt;CONFIG_CGROUPS
is not enabled in the kernel&lt;/a&gt;. We'll also expose this as
&lt;i&gt;tainted&lt;/i&gt; flag on the bus.&lt;/li&gt;

&lt;li&gt;You may now boot the same OS image on a bare metal machine and in
Linux namespace containers and will get a clean boot in both
cases. This is more complicated than it sounds since device management
with udev or write access to &lt;tt&gt;/sys&lt;/tt&gt;, &lt;tt&gt;/proc/sys&lt;/tt&gt; or
things like &lt;tt&gt;/dev/kmsg&lt;/tt&gt; is not available in a container. This
makes systemd a first-class choice for managing thin container
setups. This is all tested with systemd's own &lt;tt&gt;systemd-nspawn&lt;/tt&gt;
tool but should work fine in LXC setups, too. Basically this means
that you do not have to adjust your OS manually to make it work in a
container environment, but will just work out of the box. It also
makes it easier to convert real systems into containers.&lt;/li&gt;

&lt;li&gt;We now automatically spawn gettys on HVC ttys when booting in VMs.&lt;/li&gt;

&lt;li&gt;We introduced &lt;tt&gt;/etc/machine-id&lt;/tt&gt; as a generalization of
D-Bus machine ID logic. See &lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;this
blog story for more information&lt;/a&gt;. On stateless/read-only systems
the machine ID is initialized randomly at boot. In virtualized
environments it may be passed in from the machine manager (with qemu's
&lt;tt&gt;-uuid&lt;/tt&gt; switch, or via the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface"&gt;container
interface&lt;/a&gt;).&lt;/li&gt;

&lt;li&gt;All of the systemd-specific &lt;tt&gt;/etc/fstab&lt;/tt&gt; mount options are
now in the &lt;tt&gt;x-systemd-&lt;i&gt;xyz&lt;/i&gt;&lt;/tt&gt; format.&lt;/li&gt;

&lt;li&gt;To make it easy to find non-converted services we will now
implicitly prefix all LSB and SysV init script descriptions with the
strings "&lt;tt&gt;LSB:&lt;/tt&gt;" resp. "&lt;tt&gt;SYSV:&lt;/tt&gt;".&lt;/li&gt;

&lt;li&gt;We introduced &lt;tt&gt;/run&lt;/tt&gt; and made it a hard dependency of
systemd. This directory is now widely accepted and implemented on all
relevant Linux distributions.&lt;/li&gt;

&lt;li&gt;systemctl can now execute all its operations remotely too (&lt;tt&gt;-H&lt;/tt&gt; switch).&lt;/li&gt;

&lt;li&gt;We now ship &lt;a href="http://0pointer.de/blog/projects/changing-roots.html"&gt;systemd-nspawn&lt;/a&gt;,
a really powerful tool that can be used to start containers for
debugging, building and testing, much like chroot(1). It is useful to
just get a shell inside a build tree, but is good enough to boot up a
full system in it, too.&lt;/li&gt;

&lt;li&gt;If we query the user for a hard disk password at boot he may hit
TAB to hide the asterisks we normally show for each key that is
entered, for extra paranoia.&lt;/li&gt;

&lt;li&gt;We don't enable &lt;tt&gt;udev-settle.service&lt;/tt&gt; anymore, which is
only required for certain legacy software that still hasn't been
updated to follow devices coming and going cleanly.&lt;/li&gt;

&lt;li&gt;We now include a tool that can plot boot speed graphs, similar to
bootchartd, called &lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;&lt;tt&gt;systemd-analyze&lt;/tt&gt;&lt;/a&gt;.&lt;/li&gt;

&lt;li&gt;At boot, we now initialize the kernel's &lt;tt&gt;binfmt_misc&lt;/tt&gt; logic with the data from &lt;tt&gt;/etc/binfmt.d&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;systemctl&lt;/tt&gt; now recognizes if it is run in a &lt;tt&gt;chroot()&lt;/tt&gt;
environment and will work accordingly (i.e. apply changes to the tree
it is run in, instead of talking to the actual PID 1 for this). It also has a new &lt;tt&gt;--root=&lt;/tt&gt; switch to work on an OS tree from outside of it.&lt;/li&gt;

&lt;li&gt;There's a new unit dependency type &lt;tt&gt;OnFailureIsolate=&lt;/tt&gt; that
allows entering a different target whenever a certain unit fails. For
example, this is interesting to enter emergency mode if file system
checks of crucial file systems failed.&lt;/li&gt;

&lt;li&gt;Socket units may now listen on Netlink sockets, special files
from &lt;tt&gt;/proc&lt;/tt&gt; and POSIX message queues, too.&lt;/li&gt;

&lt;li&gt;There's a new &lt;tt&gt;IgnoreOnIsolate=&lt;/tt&gt; flag which may be used to
ensure certain units are left untouched by isolation requests. There's
a new &lt;tt&gt;IgnoreOnSnapshot=&lt;/tt&gt; flag which may be used to exclude
certain units from snapshot units when they are created.&lt;/li&gt;

&lt;li&gt;There's now small mechanism services &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/hostnamed"&gt;for
changing the local hostname and other host meta data&lt;/a&gt;, &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/localed"&gt;changing
the system locale and console settings&lt;/a&gt; and the &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/timedated"&gt;system
clock&lt;/a&gt;.&lt;/li&gt;

&lt;li&gt;We now limit the capability bounding set for a number of our
internal services by default.&lt;/li&gt;

&lt;li&gt;Plymouth may now be disabled globally with
&lt;tt&gt;plymouth.enable=0&lt;/tt&gt; on the kernel command line.&lt;/li&gt;

&lt;li&gt;We now disallocate VTs when a getty finished running (and
optionally other tools run on VTs). This adds extra security since it
clears up the scrollback buffer so that subsequent users cannot get
access to a user's session output.&lt;/li&gt;

&lt;li&gt;In socket units there are now options to control the
&lt;tt&gt;IP_TRANSPARENT&lt;/tt&gt;, &lt;tt&gt;SO_BROADCAST&lt;/tt&gt;, &lt;tt&gt;SO_PASSCRED&lt;/tt&gt;,
&lt;tt&gt;SO_PASSSEC&lt;/tt&gt; socket options.&lt;/li&gt;

&lt;li&gt;The receive and send buffers of socket units may now be set larger
than the default system settings if needed by using
SO_{RCV,SND}BUFFORCE.&lt;/li&gt;

&lt;li&gt;We now set the hardware timezone as one of the first things in PID
1, in order to avoid time jumps during normal userspace operation, and
to guarantee sensible times on all generated logs. We also no longer
save the system clock to the RTC on shutdown, assuming that this is
done by the clock control tool when the user modifies the time, or
automatically by the kernel if NTP is enabled.&lt;/li&gt;

&lt;li&gt;The SELinux directory got moved from &lt;tt&gt;/selinux&lt;/tt&gt; to
&lt;tt&gt;/sys/fs/selinux&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;We added a small service &lt;tt&gt;systemd-logind&lt;/tt&gt; that keeps tracks
of logged in users and their sessions. It creates control groups for
them, implements the &lt;a href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html"&gt;XDG_RUNTIME_DIR
specification&lt;/a&gt; for them, maintains seats and device node ACLs and
implements shutdown/idle inhibiting for clients. It auto-spawns gettys
on all local VTs when the user switches to them (instead of starting
six of them unconditionally), thus reducing the resource foot print by
default. It has a D-Bus interface as well as &lt;a href="http://www.freedesktop.org/software/systemd/man/sd-login.html"&gt;a
simple synchronous library interface&lt;/a&gt;. This mechanism obsoletes
ConsoleKit which is now deprecated and should no longer be used.&lt;/li&gt;

&lt;li&gt;There's now full, automatic multi-seat support, and this is
enabled in GNOME 3.4. Just by pluging in new seat hardware you get a
new login screen on your seat's screen.&lt;/li&gt;

&lt;li&gt;There is now an option &lt;tt&gt;ControlGroupModify=&lt;/tt&gt; to allow
services to change the properties of their control groups dynamically,
and one to make control groups persistent in the tree
(&lt;tt&gt;ControlGroupPersistent=&lt;/tt&gt;) so that they can be created and
maintained by external tools.&lt;/li&gt;

&lt;li&gt;We now jump back into the &lt;tt&gt;initrd&lt;/tt&gt; in shutdown, so that it can
detach the root file system and the storage devices backing it. This
allows (for the first time!) to reliably undo complex storage setups
on shutdown and leave them in a clean state.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;systemctl&lt;/tt&gt; now supports &lt;i&gt;presets&lt;/i&gt;, a way for distributions and
administrators to define their own policies on whether services should
be enabled or disabled by default on package installation.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;systemctl&lt;/tt&gt; now has high-level verbs for masking/unmasking
units. There's also a new command (&lt;tt&gt;systemctl list-unit-files&lt;/tt&gt;)
for determining the list of all installed unit file files and whether
they are enabled or not.&lt;/li&gt;

&lt;li&gt;We now apply &lt;tt&gt;sysctl&lt;/tt&gt; variables to each new network device, as it
appears. This makes &lt;tt&gt;/etc/sysctl.d&lt;/tt&gt; compatible with hot-plug
network devices.&lt;/li&gt;

&lt;li&gt;There's limited profiling for SELinux start-up perfomance built
into PID 1.&lt;/li&gt;

&lt;li&gt;There's a new switch &lt;a href="http://0pointer.de/blog/projects/security.html"&gt;&lt;tt&gt;PrivateNetwork=&lt;/tt&gt;&lt;/a&gt;
to turn of any network access for a specific service.&lt;/li&gt;

&lt;li&gt;Service units may now include configuration for control group
parameters. A few (such as &lt;tt&gt;MemoryLimit=&lt;/tt&gt;) are exposed with
high-level options, and all others are available via the generic
&lt;tt&gt;ControlGroupAttribute=&lt;/tt&gt; setting.&lt;/li&gt;

&lt;li&gt;There's now the option to mount certain cgroup controllers
jointly at boot. We do this now for &lt;tt&gt;cpu&lt;/tt&gt; and
&lt;tt&gt;cpuacct&lt;/tt&gt; by default.&lt;/li&gt;

&lt;li&gt;We added &lt;a href="https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTs"&gt;the
journal&lt;/a&gt; and turned it on by default.&lt;/li&gt;

&lt;li&gt;All service output is now written to the Journal by default,
regardless whether it is sent via syslog or simply written to
stdout/stderr. Both message streams end up in the same location and
are interleaved the way they should. All log messages even from the
kernel and from early boot end up in the journal. Now, no service
output gets unnoticed and is saved and indexed at the same
location.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;systemctl status&lt;/tt&gt; will now show the last 10 log lines for
each service, directly from the journal.&lt;/li&gt;

&lt;li&gt;We now show the progress of &lt;tt&gt;fsck&lt;/tt&gt; at boot on the console,
again. We also show the much loved colorful &lt;tt&gt;[ OK ]&lt;/tt&gt; status
messages at boot again, as known from most SysV implementations.&lt;/li&gt;

&lt;li&gt;We merged udev into systemd.&lt;/li&gt;

&lt;li&gt;We implemented and documented interfaces to &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface"&gt;container
managers&lt;/a&gt; and &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/InitrdInterface"&gt;initrds&lt;/a&gt;
for passing execution data to systemd. We also implemented and
documented &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons"&gt;an
interface for storage daemons that are required to back the root file
system&lt;/a&gt;.&lt;/li&gt;

&lt;li&gt;There are two new options in service files to propagate reload requests between several units.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;systemd-cgls&lt;/tt&gt; won't show kernel threads by default anymore, or show empty control groups.&lt;/li&gt;

&lt;li&gt;We added a new tool &lt;tt&gt;systemd-cgtop&lt;/tt&gt; that shows resource
usage of whole services in a top(1) like fasion.&lt;/li&gt;

&lt;li&gt;systemd may now supervise services in watchdog style. If enabled
for a service the daemon daemon has to ping PID 1 in regular intervals
or is otherwise considered failed (which might then result in
restarting it, or even rebooting the machine, as configured). Also,
PID 1 is capable of pinging a hardware watchdog. Putting this
together, the hardware watchdogs PID 1 and PID 1 then watchdogs
specific services. This is highly useful for high-availability servers
as well as embedded machines. Since watchdog hardware is noawadays
built into all modern chipsets (including desktop chipsets), this
should hopefully help to make this a more widely used
functionality.&lt;/li&gt;

&lt;li&gt;We added support for a new kernel command line option
&lt;tt&gt;systemd.setenv=&lt;/tt&gt; to set an environment variable
system-wide.&lt;/li&gt;

&lt;li&gt;By default services which are started by systemd will have SIGPIPE
set to ignored. The Unix SIGPIPE logic is used to reliably implement
shell pipelines and when left enabled in services is usually just a
source of bugs and problems.&lt;/li&gt;

&lt;li&gt;You may now configure the rate limiting that is applied to
restarts of specific services. Previously the rate limiting parameters
were hard-coded (similar to SysV).&lt;/li&gt;

&lt;li&gt;There's now support for loading the IMA integrity policy into the
kernel early in PID 1, similar to how we already did it with the
SELinux policy.&lt;/li&gt;

&lt;li&gt;There's now an official API to schedule and query scheduled shutdowns.&lt;/li&gt;

&lt;li&gt;We changed the license from GPL2+ to LGPL2.1+.&lt;/li&gt;

&lt;li&gt;We made &lt;a href="http://www.freedesktop.org/software/systemd/man/systemd-detect-virt.html"&gt;&lt;tt&gt;systemd-detect-virt&lt;/tt&gt;&lt;/a&gt;
an official tool in the tool set. Since we already had code to detect
certain VM and container environments we now added an official tool
for administrators to make use of in shell scripts and suchlike.&lt;/li&gt;

&lt;li&gt;We documented &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/InterfacePortabilityAndStabilityChart"&gt;numerous
interfaces&lt;/a&gt; systemd introduced.&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Much of the stuff above is already available in Fedora 15 and 16,
or will be made available in the upcoming Fedora 17.&lt;/p&gt;

&lt;p&gt;And that's it for now. There's a lot of other stuff in the git commits, but
most of it is smaller and I will it thus spare you.&lt;/p&gt;

&lt;p&gt;I'd like to thank everybody who contributed to systemd over the past years.&lt;/p&gt;

&lt;p&gt;Thanks for your interest!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sat, 21 Apr 2012 00:17:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-04-21:/blog/projects/systemd-update-3.html</guid><category>projects</category></item><item><title>Control Groups vs. Control Groups</title><link>https://0pointer.net/blog/projects/cgroups-vs-cgroups.html</link><description>
                
&lt;p&gt;&lt;i&gt;TL;DR: &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/"&gt;systemd&lt;/a&gt; does not
require the performance-sensitive bits of Linux control groups enabled in the kernel.
However, it does require some non-performance-sensitive bits of the control
group logic.&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;In some areas of the community there's still some confusion about Linux
control groups and their performance impact, and what precisely it is that
systemd requires of them. In the hope to clear this up a bit, I'd like to point
out a few things:&lt;/p&gt;

&lt;p&gt;Control Groups are two things: &lt;b&gt;(A)&lt;/b&gt; &lt;i&gt;a way to hierarchally group and
label processes&lt;/i&gt;, and &lt;b&gt;(B)&lt;/b&gt; &lt;i&gt;a way to then apply resource limits&lt;/i&gt;
to these groups. systemd only requires the former (A), and not the latter (B).
That means you can compile your kernel without any control group resource
controllers (B) and systemd will work perfectly on it. However, if you in
addition disable the grouping feature entirely (A) then systemd will loudly
complain at boot and proceed only reluctantly with a big warning and in a
limited functionality mode.&lt;/p&gt;

&lt;p&gt;At compile time, the grouping/labelling feature in the kernel is enabled by
CONFIG_CGROUPS=y, the individual controllers by CONFIG_CGROUP_FREEZER=y,
CONFIG_CGROUP_DEVICE=y, CONFIG_CGROUP_CPUACCT=y, CONFIG_CGROUP_MEM_RES_CTLR=y,
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y, CONFIG_CGROUP_MEM_RES_CTLR_KMEM=y,
CONFIG_CGROUP_PERF=y, CONFIG_CGROUP_SCHED=y, CONFIG_BLK_CGROUP=y,
CONFIG_NET_CLS_CGROUP=y, CONFIG_NETPRIO_CGROUP=y. And since (as mentioned) we
only need the former (A), not the latter (B) you may disable all of the latter
options while enabling CONFIG_CGROUPS=y, if you want to run systemd on your
system.&lt;/p&gt;

&lt;p&gt;What about the performance impact of these options? Well, every bit of code
comes at some price, so none of these options come entirely for free. However,
the grouping feature (A) alters the general logic very little, it just sticks
hierarchial labels on processes, and its impact is minimal since that is
usually not in any hot path of the OS.  This is different for the various
controllers (B) which have a much bigger impact since they influence the resource
management of the OS and are full of hot paths. This means that the kernel
feature that systemd mandatorily requires (A) has a minimal effect on system
performance, but the actually performance-sensitive features of control groups
(B) are entirely optional.&lt;/p&gt;

&lt;p&gt;On boot, systemd will mount all controller hierarchies it finds enabled
in the kernel to individual directories below &lt;tt&gt;/sys/fs/cgroup/&lt;/tt&gt;. This is
the official place where kernel controllers are mounted to these days. The
&lt;tt&gt;/sys/fs/cgroup/&lt;/tt&gt; mount point in the kernel was created precisely for
this purpose. Since the control group controllers are a shared facility that
might be used by a number of different subsystems &lt;a href="http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups"&gt;a few
projects have agreed on a set of rules in order to avoid that the various bits
of code step on each other's toes when using these directories&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;systemd will also maintain its own, private, controller-less, named control
group hierarchy which is mounted to &lt;tt&gt;/sys/fs/cgroup/systemd/&lt;/tt&gt;.  This
hierarchy is private property of systemd, and other software should not try to
interfere with it. This hierarchy is how systemd makes use of the naming and
grouping feature of control groups (A) without actually requiring any kernel
controller enabled for that.&lt;/p&gt;

&lt;p&gt;Now, you might notice that by default systemd does create per-service
cgroups in the "cpu" controller if it finds it enabled in the kernel. This is
entirely optional, however. We chose to make use of it by default to even out
CPU usage between system services. Example: On a traditional web server machine
Apache might end up having 100 CGI worker processes around, while MySQL only
has 5 processes running. Without the use of the "cpu" controller this means
that Apache all together ends up having 20x more CPU available than MySQL since
the kernel tries to provide every process with the same amount of CPU time. On
the other hand, if we add these two services to the "cpu" controller in
individual groups by default, Apache and MySQL get the same amount of CPU,
which we think is a good default.&lt;/p&gt;

&lt;p&gt;Note that if the CPU controller is not enabled in the kernel systemd will not
attempt to make use of the "cpu" hierarchy as described above. Also, even if it is enabled in the kernel it
is trivial to tell systemd not to make use of it: Simply edit
&lt;tt&gt;/etc/systemd/system.conf&lt;/tt&gt; and set &lt;tt&gt;DefaultControllers=&lt;/tt&gt; to the
empty string.&lt;/p&gt;

&lt;p&gt;Let's discuss a few frequently heard complaints regarding systemd's use of control groups:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;&lt;b&gt;systemd mounts all controllers to &lt;tt&gt;/sys/fs/cgroup/&lt;/tt&gt; even though
my software requires it at &lt;tt&gt;/dev/cgroup/&lt;/tt&gt; (or some other place)!&lt;/b&gt; The
standardization of &lt;tt&gt;/sys/fs/cgroup/&lt;/tt&gt; as mount point of the hierarchies
is a relatively recent change in the kernel. Some software has not been updated
yet for it. If you cannot change the software in question you are welcome to
unmount the hierarchies from &lt;tt&gt;/sys/fs/cgroup/&lt;/tt&gt; and mount them wherever
you need them instead. However, make sure to leave
&lt;tt&gt;/sys/fs/cgroup/systemd/&lt;/tt&gt; untouched.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;systemd makes use of the "cpu" hierarchy, but it should leave its dirty
fingers from it!&lt;/b&gt; As mentioned above, just set the
&lt;tt&gt;DefaultControllers=&lt;/tt&gt; option of systemd to the empty string.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;I need my two controllers "foo" and "bar" mounted into one hierarchy,
but systemd mounts them in two!&lt;/b&gt; Use the &lt;tt&gt;JoinControllers=&lt;/tt&gt; setting
in &lt;tt&gt;/etc/systemd/system.conf&lt;/tt&gt; to mount several controllers into a single
hierarchy.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;Control groups are evil and they make everything slower!&lt;/b&gt; Well,
please read the text above and understand the difference between
"control-groups-as-in-naming-and-grouping" (A) and "cgroups-as-in-controllers"
(B).  Then, please turn off all controllers in you kernel build (B) but leave
CONFIG_CGROUPS=y (A) enabled.&lt;/li&gt;

&lt;li&gt;&lt;b&gt;I have heard &lt;i&gt;some&lt;/i&gt; kernel developers really hate control groups
and think systemd is evil because it requires them!&lt;/b&gt; Well, there are a
couple of things behind the dislike of control groups by some folks.
Primarily, this is probably caused because the hackers in question do not
distuingish the naming-and-grouping bits of the control group logic (A) and the
controllers that are based on it (B). Mainly, their beef is with the latter
(which systemd does not require, which is the key point I am trying to make in
the text above), but there are other issues as well: for example, the code of
the grouping logic is not the most beautiful bit of code ever written by man
(which is thankfully likely to get better now, since the control groups
subsystem now has an active maintainer again). And then for some
developers it is important that they can compare the runtime behaviour of many
historic kernel versions in order to find bugs (git bisect).  Since systemd
requires kernels with basic control group support enabled, and this is a
relatively recent feature addition to the kernel, this makes it difficult for
them to use a newer distribution with all these old kernels
that predate cgroups. Anyway, the summary is probably that what matters to
developers is different from what matters to users and
administrators.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I hope this explanation was useful for a reader or two! Thank you for your time!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 10 Apr 2012 19:09:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-04-10:/blog/projects/cgroups-vs-cgroups.html</guid><category>projects</category></item><item><title>GUADEC 2012 CFP Ending Soon!</title><link>https://0pointer.net/blog/projects/guadec-2012-cfp.html</link><description>
                
&lt;p&gt;In case you haven't submitted your talk proposal for GUADEC 2012 in A
Coru&amp;ntilde;a, Spain yet, hurry: the deadline is on April 14th, i.e. this
saturday! &lt;a href="http://www.guadec.org/cfp"&gt;Read der Call for
Participation!&lt;/a&gt; &lt;a href="https://www.gpul.org/indico/abstractSubmission.py?confId=0"&gt;Submit a
proposal!&lt;/a&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 10 Apr 2012 17:40:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-04-10:/blog/projects/guadec-2012-cfp.html</guid><category>projects</category></item><item><title>/tmp or not /tmp?</title><link>https://0pointer.net/blog/projects/tmp.html</link><description>
                
&lt;p&gt;A number of Linux distributions have recently switched (or started
switching) to &lt;tt&gt;/tmp&lt;/tt&gt; on tmpfs by default (ArchLinux, Debian among
others). Other distributions have plans/are discussing doing the same (Ubuntu, OpenSUSE).
Since we believe this is a good idea and it's good to keep the delta between
the distributions minimal &lt;a href="https://fedoraproject.org/wiki/Features/tmp-on-tmpfs"&gt;we are proposing
the same for Fedora 18, too&lt;/a&gt;. On Solaris a similar change has already been
implemented in 1994 (and other Unixes have made a similar change long ago,
too). Yet, not all of our software is written in a way that it works nicely
together with &lt;tt&gt;/tmp&lt;/tt&gt; on tmpfs.&lt;/p&gt;

&lt;p&gt;Another &lt;a href="https://fedoraproject.org/wiki/Features/ServicesPrivateTmp"&gt;Fedora
feature (for Fedora 17)&lt;/a&gt; changed the semantics of &lt;tt&gt;/tmp&lt;/tt&gt; for many
system services to make them more secure, by isolating the /tmp namespaces of the
various services. Handling of temporary files in &lt;tt&gt;/tmp&lt;/tt&gt; has been
security sensitive since it has been introduced since it traditionally has been
a world writable, shared namespace and unless all user code safely uses randomized file names
it is vulnerable to DoS attacks and worse.&lt;/p&gt;

&lt;p&gt;In this blog story I'd like to shed some light on proper usage of
&lt;tt&gt;/tmp&lt;/tt&gt; and what your Linux application should use for what purpose. We'll not
discuss why &lt;tt&gt;/tmp&lt;/tt&gt; on tmpfs is a good idea, for that refer to the &lt;a href="https://fedoraproject.org/wiki/Features/tmp-on-tmpfs"&gt;Fedora feature
page&lt;/a&gt;. Here we'll just discuss what &lt;tt&gt;/tmp&lt;/tt&gt; should be used for and for
what it shouldn't be, as well as what should be used instead. All that in order
to make sure your application remains compatible with these new features
introduced to many newer Linux distributions.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;/tmp&lt;/tt&gt; is (as the name suggests) an area where temporary files
applications require during operation may be placed. Of course, temporary files
differ very much in their properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They can be large, or very small&lt;/li&gt;
&lt;li&gt;They might be used for sharing between users, or be private to users&lt;/li&gt;
&lt;li&gt;They might need to be persistent across boots, or very volatile&lt;/li&gt;
&lt;li&gt;They might need to be machine-local or shared on the network&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditionally, &lt;tt&gt;/tmp&lt;/tt&gt; has not only been the place where actual
temporary files are stored, but some software used to place (and often still
continues to place) communication primitives such as sockets, FIFOs, shared
memory there as well. Notably X11, but many others too. Usage of world-writable
shared namespaces for communication purposes has always been problematic, since
to establish communication you need stable names, but stable names open the
doors for DoS attacks. This can be corrected partially, by establishing
protected per-app directories for certain services during early boot (like we
do for X11), but this only fixes the problem partially, since this only works
correctly if every package installation is followed by a reboot.&lt;/p&gt;

&lt;p&gt;Besides &lt;tt&gt;/tmp&lt;/tt&gt; there are various other places where temporary files
(or other files that traditionally have been stored in &lt;tt&gt;/tmp&lt;/tt&gt;) can be
stored. Here's a quick overview of the candidates:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;&lt;tt&gt;/tmp&lt;/tt&gt;, POSIX suggests this is flushed as boot, FHS says that files
do not need to be persistent between two runs of the application. Old files are
often cleaned up automatically after a time ("aging"). Usually it is
recommended to use $TMPDIR if it is set before falling back to &lt;tt&gt;/tmp&lt;/tt&gt;
directly. As mentioned, this is a tmpfs on many Linuxes/Unixes (and most likely
will be for most soon), and hence should be used only for small files. It's
generally a shared namespace, hence the only APIs for using it should be &lt;a href="http://linux.die.net/man/3/mkstemp"&gt;&lt;tt&gt;mkstemp()&lt;/tt&gt;&lt;/a&gt;, &lt;a href="http://linux.die.net/man/3/mkdtemp"&gt;&lt;tt&gt;mkdtemp()&lt;/tt&gt;&lt;/a&gt; (and friends)
to be entirely safe.&lt;sup&gt;[1]&lt;/sup&gt; Recently, improvements have been made to
turn this shared namespace into a private namespace (see above), but that doesn't
relieve developers from writing secure code that is also safe if &lt;tt&gt;/tmp&lt;/tt&gt; is a shared
namespace. Because &lt;tt&gt;/tmp&lt;/tt&gt; is no longer necessarily a shared namespace it
is generally unsuitable as a location for communication primitives. It is
machine-private and local. It's usually fully featured (locking, ...). This
directory is world writable and thus available for both privileged and
unprivileged code.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;/var/tmp&lt;/tt&gt;, according to FHS "more persistent" than &lt;tt&gt;/tmp&lt;/tt&gt;,
and is less often cleaned up (it's persistent across reboots, for example). It's not on a tmpfs, but on a real disk, and
hence can be used to store much larger files. The same namespace problems apply
as with &lt;tt&gt;/tmp&lt;/tt&gt;, hence also exclusively use
&lt;tt&gt;mkstemp()&lt;/tt&gt;/&lt;tt&gt;mkdtemp()&lt;/tt&gt; for this directory. It is also
automatically cleaned up by time. It is machine-private. It's not necessarily
fully featured (no locking, ...). This directory is world writable and thus
available for both privileged and unprivileged code. We suggest to also check
&lt;tt&gt;$TMPDIR&lt;/tt&gt; before falling back to &lt;tt&gt;/var/tmp&lt;/tt&gt;. That way if
&lt;tt&gt;$TMPDIR&lt;/tt&gt; is set this overrides usage of both &lt;tt&gt;/tmp&lt;/tt&gt; and
&lt;tt&gt;/var/tmp&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;/run&lt;/tt&gt; (traditionally &lt;tt&gt;/var/run&lt;/tt&gt;) where privileged daemons
can store runtime data, such as communication primitives. This is where your
daemon should place its sockets. It's guaranteed to be a shared namespace, but
is only writable by privileged code and hence very safe to use. This file
system is guaranteed to be a tmpfs and is hence automatically flushed at boots.
No automatic clean-up is done beyond that. It is machine-private and local. It
is fully-featured, and provides all functionality the local OS can provide
(locking, sockets, ...).&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;&lt;a href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html"&gt;$XDG_RUNTIME_DIR&lt;/a&gt;&lt;/tt&gt;
where unprivileged user software can store runtime data, such as communication
primitives. This is similar to &lt;tt&gt;/run&lt;/tt&gt; but for user applications. It's a
user private namespace, and hence very safe to use. It's cleaned up
automatically at logout and also is cleaned up by time via "aging". It is
machine-private and fully featured. In GLib applications use
&lt;tt&gt;g_get_user_runtime_dir()&lt;/tt&gt; to query the path of this directory.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;&lt;a href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html"&gt;$XDG_CACHE_HOME&lt;/a&gt;&lt;/tt&gt;
where unprivileged user software can store non-essential data. It's a private
namespace of the user. It might be shared between machines. It is not
automatically cleaned up, and not fully featured (no locking, and so on, due to
NFS). In GLib applications use &lt;tt&gt;g_get_user_cache_dir()&lt;/tt&gt; to query this
directory.&lt;/li&gt;

&lt;li&gt;&lt;tt&gt;&lt;a href="http://freedesktop.org/wiki/Software/xdg-user-dirs"&gt;$XDG_DOWNLOAD_DIR&lt;/a&gt;&lt;/tt&gt;
where unprivileged user software can store downloads and downloads in progress.
It should only be used for downloads, and is a private namespace fo the user,
but might be shared between machines. It is not automatically cleaned up and
not fully featured. In GLib applications use &lt;tt&gt;g_get_user_special_dir()&lt;/tt&gt;
to query the path of this directory.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Now that we have introduced the contestants, here's a rough guide how we
suggest you (a Linux application developer) pick the right directory to use:&lt;/p&gt;

&lt;ol&gt;

&lt;li&gt;You need a place to put your socket (or other communication primitive) and your code runs privileged: use a subdirectory beneath &lt;tt&gt;/run&lt;/tt&gt;. (Or beneath &lt;tt&gt;/var/run&lt;/tt&gt; for extra compatibility.)&lt;/li&gt;

&lt;li&gt;You need a place to put your socket (or other communication primitive) and your code runs unprivileged: use a subdirectory beneath &lt;tt&gt;$XDG_RUNTIME_DIR&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;You need a place to put your larger downloads and downloads in progress and run unprivileged: use &lt;tt&gt;$XDG_DOWNLOAD_DIR&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;You need a place to put cache files which should be persistent and run unprivileged: use &lt;tt&gt;$XDG_CACHE_HOME&lt;/tt&gt;.&lt;/li&gt;

&lt;li&gt;Nothing of the above applies and you need to place a small file that needs no persistency: use &lt;tt&gt;$TMPDIR&lt;/tt&gt; with a fallback on &lt;tt&gt;/tmp&lt;/tt&gt;. And use &lt;tt&gt;mkstemp()&lt;/tt&gt;, and &lt;tt&gt;mkdtemp()&lt;/tt&gt; and nothing homegrown.&lt;/li&gt;

&lt;li&gt;Otherwise use &lt;tt&gt;$TMPDIR&lt;/tt&gt; with a fallback on &lt;tt&gt;/var/tmp&lt;/tt&gt;. Also use &lt;tt&gt;mkstemp()&lt;/tt&gt;/&lt;tt&gt;mkdtemp()&lt;/tt&gt;.&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;Note that these rules above are only suggested by us. These rules
take into account everything we know about this topic and avoid problems with
current and future distributions, as far as we can see them. Please consider
updating your projects to follow these rules, and keep them in mind if you
write new code.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;One thing we'd like to stress is that &lt;tt&gt;/tmp&lt;/tt&gt; and &lt;tt&gt;/var/tmp&lt;/tt&gt;
more often than not are actually not the right choice for your usecase. There
are valid uses of these directories, but quite often another directory might
actually be the better place. So, be careful, consider the other options, but
if you do go for &lt;tt&gt;/tmp&lt;/tt&gt; or &lt;tt&gt;/var/tmp&lt;/tt&gt; then at least make sure to
use &lt;tt&gt;mkstemp()&lt;/tt&gt;/&lt;tt&gt;mkdtemp()&lt;/tt&gt;.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Thank you for your interest!&lt;/p&gt;

&lt;p&gt;Oh, and if you now complain that we don't understand Unix, and that we are
morons and worse, then please read this again, and you might notice that this
is just a best practice guide, not a specification we have written. Nothing that
introduces anything new, just something that explains how things are.&lt;/p&gt;

&lt;p&gt;If you want to complain about the &lt;tt&gt;tmp-on-tmpfs&lt;/tt&gt; or
&lt;tt&gt;ServicesPrivateTmp&lt;/tt&gt; feature, then this is not the right place either,
because this blog post is not really about that. Please direct this to
&lt;tt&gt;fedora-devel&lt;/tt&gt; instead. Thank you very much.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;&lt;small&gt;Footnotes&lt;/small&gt;&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] Well, or to turn this around: unless you have a PhD in advanced
Unixology and are not using &lt;tt&gt;mkstemp()&lt;/tt&gt;/&lt;tt&gt;mkdtemp()&lt;/tt&gt; but use
&lt;tt&gt;/tmp&lt;/tt&gt; nonetheless it's very likely you are writing vulnerable
code.&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Wed, 28 Mar 2012 14:04:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-03-28:/blog/projects/tmp.html</guid><category>projects</category></item><item><title>/etc/os-release</title><link>https://0pointer.net/blog/projects/os-release.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;One of
the new configuration files systemd introduced is &lt;tt&gt;/etc/os-release&lt;/tt&gt;&lt;/a&gt;.
It replaces the multitude of per-distribution release files&lt;sup&gt;[1]&lt;/sup&gt; with
a single one. Yesterday we &lt;a href="http://lists.freedesktop.org/archives/systemd-devel/2012-February/004475.html"&gt;decided
to drop&lt;/a&gt; support for systems lacking &lt;a href="http://www.freedesktop.org/software/systemd/man/os-release.html"&gt;&lt;tt&gt;/etc/os-release&lt;/tt&gt;&lt;/a&gt;
in systemd since recently the majority of the big distributions adopted
&lt;tt&gt;/etc/os-release&lt;/tt&gt; and many small ones did, too&lt;sup&gt;[2]&lt;/sup&gt;.  It's our
hope that by dropping support for non-compliant distributions we gently put
some pressure on the remaining hold-outs to adopt this scheme as well.&lt;/p&gt;

&lt;p&gt;I'd like to take the opportunity to explain a bit what the new file offers,
why application developers should care, and why the distributions should adopt
it. Of course, this file is pretty much a triviality in many ways,
but I guess it's still one that deserves explanation.&lt;/p&gt;

&lt;p&gt;So, you ask why this all?&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;It relieves application developers who just want to know the
distribution they are running on to check for a multitude of individual release files.&lt;/li&gt;

&lt;li&gt;It provides both a "pretty" name (i.e. one to show to the user), and
machine parsable version/OS identifiers (i.e. for use in build systems).&lt;/li&gt;

&lt;li&gt;It is extensible, can easily learn new fields if needed. For example, since
we want to print a welcome message in the color of your distribution at boot
we make it possible to configure the ANSI color for that in the file.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;b&gt;FAQs&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;There's already the &lt;tt&gt;lsb_release&lt;/tt&gt; tool for this, why don't you
just use that?&lt;/b&gt; Well, it's a very strange interface: a shell script you have
to invoke (and hence spawn asynchronously from your C code), and it's not
written to be extensible. It's an optional package in many distributions, and
nothing we'd be happy to invoke as part of early boot in order to show a
welcome message. (In times with sub-second userspace boot times we really don't
want to invoke a huge shell script for a triviality like showing the welcome
message). The &lt;tt&gt;lsb_release&lt;/tt&gt; tool to us appears to be an attempt of
abstracting distribution checks, where standardization of distribution checks
is needed. It's simply a badly designed interface. In our opinion, it
has its use as an interface to determine the LSB version itself, but not for
checking the distribution or version.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Why haven't you adopted one of the generic release files, such as
Fedora's &lt;tt&gt;/etc/system-release&lt;/tt&gt;?&lt;/b&gt; Well, they are much nicer than
&lt;tt&gt;lsb_release&lt;/tt&gt;, so much is true. However, they are not extensible and
are not really parsable, if the distribution needs to be identified
programmatically or a specific version needs to be verified.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Why didn't you call this file &lt;tt&gt;/etc/bikeshed&lt;/tt&gt; instead? The name
&lt;tt&gt;/etc/os-release&lt;/tt&gt; sucks!&lt;/b&gt; In a way, I think you kind of answered your
own question there already.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Does this mean my distribution can now drop our equivalent of
&lt;tt&gt;/etc/fedora-release&lt;/tt&gt;?&lt;/b&gt; Unlikely, too much code exists that still
checks for the individual release files, and you probably shouldn't break that.
This new file makes things easy for applications, not for distributions:
applications can now rely on a single file only, and use it in a nice way.
Distributions will have to continue to ship the old files unless they are
willing to break compatibility here.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;This is so useless! My application needs to be compatible with distros
from 1998, so how could I ever make use of the new file? I will have to
continue using the old ones!&lt;/b&gt; True, if you need compatibility with really
old distributions you do. But for new code this might not be an issue, and in
general new APIs are new APIs. So if you decide to depend on it, you add a
dependency on it. However, even if you need to stay compatible it might make
sense to check &lt;tt&gt;/etc/os-release&lt;/tt&gt; first and just fall back to the old
files if it doesn't exist. The least it does for you is that you don't need 25+
&lt;tt&gt;open()&lt;/tt&gt; attempts on modern distributions, but just one.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;You evil people are forcing my beloved distro $XYZ to adopt your awful
systemd schemes. I hate you!&lt;/b&gt; You hate too much, my friend. Also, I am
pretty sure it's not difficult to see the benefit of this new file
independently of systemd, and it's truly useful on systems without systemd,
too.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;I hate what you people do, can I just ignore this?&lt;/b&gt; Well, you really
need to work on your constant feelings of hate, my friend. But, to a certain
degree yes, you can ignore this for a while longer. But already, there are a
number of applications making use of this file.  You lose compatibility with
those. Also, you are kinda working towards the further balkanization of the
Linux landscape, but maybe that's your intention?&lt;/p&gt;

&lt;p&gt;&lt;b&gt;You guys add a new file because you think there are already too many? You
guys are so confused!&lt;/b&gt; None of the existing files is generic and extensible
enough to do what we want it to do. Hence we had to introduce a new one. We
acknowledge the irony, however.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;The file is extensible? Awesome! I want a new field XYZ= in it!&lt;/b&gt; Sure,
it's extensible, and we are happy if distributions extend it. Please prefix
your keys with your distribution's name however. Or even better: talk to us and
we might be able update the documentation and make your field standard, if you
convince us that it makes sense.&lt;/p&gt;

&lt;p&gt;Anyway, to summarize all this: if you work on an application that needs to
identify the OS it is being built on or is being run on, please consider making
use of this new file, we created it for you. If you work on a distribution, and
your distribution doesn't support this file yet, please consider adopting this
file, too.&lt;/p&gt;

&lt;p&gt;If you are working on a small/embedded distribution, or a legacy-free
distribution we encourage you to adopt only this file and not establish any
other per-distro release file.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.freedesktop.org/software/systemd/man/os-release.html"&gt;Read the documentation for &lt;tt&gt;/etc/os-release&lt;/tt&gt;.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;Footnotes&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] Yes, multitude, there's at least: &lt;tt&gt;/etc/redhat-release&lt;/tt&gt;,
&lt;tt&gt;/etc/SuSE-release&lt;/tt&gt;, &lt;tt&gt;/etc/debian_version&lt;/tt&gt;,
&lt;tt&gt;/etc/arch-release&lt;/tt&gt;, &lt;tt&gt;/etc/gentoo-release&lt;/tt&gt;,
&lt;tt&gt;/etc/slackware-version&lt;/tt&gt;, &lt;tt&gt;/etc/frugalware-release&lt;/tt&gt;,
&lt;tt&gt;/etc/altlinux-release&lt;/tt&gt;, &lt;tt&gt;/etc/mandriva-release&lt;/tt&gt;,
&lt;tt&gt;/etc/meego-release&lt;/tt&gt;, &lt;tt&gt;/etc/angstrom-version&lt;/tt&gt;,
&lt;tt&gt;/etc/mageia-release&lt;/tt&gt;. And some distributions even have multiple, for
example Fedora has already four different files.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[2] To our knowledge at least OpenSUSE, Fedora, ArchLinux, Angstrom,
Frugalware have adopted this. (This list is not comprehensive, there are
probably more.)&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 13 Feb 2012 19:46:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-02-13:/blog/projects/os-release.html</guid><category>projects</category></item><item><title>The Case for the /usr Merge</title><link>https://0pointer.net/blog/projects/the-usr-merge.html</link><description>
                
&lt;p&gt;One of the features of Fedora 17 is &lt;a href="https://fedoraproject.org/wiki/Features/UsrMove"&gt;the /usr merge&lt;/a&gt;, put
forward by Harald Hoyer and Kay Sievers&lt;sup&gt;[1]&lt;/sup&gt;. In the time since this
feature has been proposed repetitive discussions took place all over the various
Free Software communities, and usually the same questions were asked: what the reasons
behind this feature were, and whether it makes sense to adopt the same scheme for
distribution XYZ, too.&lt;/p&gt;

&lt;p&gt;Especially in the Non-Fedora world it appears to be socially unacceptable to
actually have a look at the &lt;a href="https://fedoraproject.org/wiki/Features/UsrMove"&gt;Fedora feature page&lt;/a&gt;
(where many of the questions are already brought up and answered) which is very unfortunate. To
improve the situation I spent some time today to summarize the reasons for the
/usr merge independently. I'd hence like to direct you to this new page I put
up which tries to summarize the reasons for this, with an emphasis on the
compatibility point of view:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge"&gt;The Case for the /usr Merge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that even though this page is in the systemd wiki, what it covers is
mostly orthogonal to systemd. systemd supports both systems with a merged /usr
and with a split /usr, and the /usr merge should be interesting for non-systemd
distributions as well.&lt;/p&gt;

&lt;p&gt;Primarily I put this together to have a nice place to point all those folks
who continue to write me annoyed emails, even though I am actually not even
working on all of this...&lt;/p&gt;

&lt;p&gt;Enjoy the read!&lt;/p&gt;

&lt;p&gt;&lt;b&gt;&lt;small&gt;Footnotes:&lt;/small&gt;&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] And not actually by me, I am just a supportive spectator and am
not doing any work on it. Unfortunately some tech press folks created the false
impression I was behind this. But credit where credit is due, this is all
Harald's and Kay's work.&lt;/small&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 26 Jan 2012 22:29:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-01-26:/blog/projects/the-usr-merge.html</guid><category>projects</category></item><item><title>Plumbers Wishlist, The Third Edition, a.k.a. "The Thank You Edition"</title><link>https://0pointer.net/blog/projects/plumbers-wishlist-3.html</link><description>
                
&lt;p&gt;Last October &lt;a href="http://0pointer.de/blog/projects/plumbers-wishlist-2.html"&gt;we published a
wishlist for plumbing related features&lt;/a&gt; we'd like to see added to the Linux
kernel. Three months later it's time to publish a short update, and explain
what has been implemented in the kernel, what people have started working on,
and what's still missing.&lt;/p&gt;

&lt;p&gt;The full, updated list is &lt;a href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8"&gt;available
on Google Docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In general, I must say that the list turned out to be a great success. It
shows how awesome the Open Source community is: Just ask nicely and there's a
good chance they'll fulfill your wishes! Thank you very much, Linux
community!&lt;/p&gt;

&lt;p&gt;We'd like to thank everybody who worked on any of the features on that list:
Lucas De Marchi, Andi Kleen, Dan Ballard, Li Zefan, Kirill A. Shutemov,
Davidlohr Bueso, Cong Wang, Lennart Poettering, Kay Sievers.&lt;/p&gt;

&lt;p&gt;Of the items on the list 5 have been fully implemented and are already part
of a released kernel, or already merged for inclusion for the next kernels
being released.&lt;/p&gt;

&lt;p&gt;For 4 further items patches have been posted, and I am hoping they'll get
merged eventually. Davidlohr, Wang, Zefan, Kirill, it would be great if you'd
continue working on your patches, as we think they are following the right
approach&lt;sup&gt;[1]&lt;/sup&gt; even if there was some opposition to them on LKML. So,
please keep pushing to solve the outstanding issues and thanks for your work so far!&lt;/p&gt;

&lt;p&gt;&lt;b&gt;&lt;small&gt;Footnotes&lt;/small&gt;&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;[1] Yes, I still believe that tmpfs quota should be implemented via
resource limits, as everything else wouldn't work, as we don't want to
implement complex and fragile userspace infrastructure to racily upload complex
quota data for all current and future UIDs ever used on the system into each
tmpfs mount point at mount time.&lt;/small&gt;&lt;/p&gt;


        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 20 Jan 2012 21:26:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-01-20:/blog/projects/plumbers-wishlist-3.html</guid><category>projects</category></item><item><title>systemd for Administrators, Part XII</title><link>https://0pointer.net/blog/projects/security.html</link><description>
                
&lt;p&gt;Here's &lt;a href="http://0pointer.de/blog/projects/inetd.html"&gt;the&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/instances.html"&gt;twelfth&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/on-etc-sysinit.html"&gt;installment&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/the-new-configuration-files.html"&gt;of&lt;/a&gt;

&lt;a href="http://0pointer.de/blog/projects/blame-game.html"&gt;my&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/changing-roots"&gt;ongoing&lt;/a&gt; &lt;a href="http://0pointer.de/blog/projects/three-levels-of-off.html"&gt;series&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-4.html"&gt;on&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-3.html"&gt;systemd&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-2.html"&gt;for&lt;/a&gt;
&lt;a href="http://0pointer.de/blog/projects/systemd-for-admins-1.html"&gt;Administrators&lt;/a&gt;:&lt;/p&gt;

&lt;h4&gt;Securing Your Services&lt;/h4&gt;

&lt;p&gt;One of the core features of Unix systems is the idea of privilege separation
between the different components of the OS. Many system services run under
their own user IDs thus limiting what they can do, and hence the impact they
may have on the OS in case they get exploited.&lt;/p&gt;

&lt;p&gt;This kind of privilege separation only provides very basic protection
however, since in general system services run this way can still do at least as
much as a normal local users, though not as much as root. For security purposes
it is however very interesting to limit even further what services can do, and
shut them off a couple of things that normal users are allowed to do.&lt;/p&gt;

&lt;p&gt;A great way to limit the impact of services is by employing MAC technologies
such as SELinux. If you are interested to secure down your server, running
SELinux is a very good idea. systemd enables developers and administrators to
apply additional restrictions to local services independently of a MAC. Thus,
regardless whether you are able to make use of SELinux you may still enforce
certain security limits on your services.&lt;/p&gt;

&lt;p&gt;In this iteration of the series we want to focus on a couple of these
security features of systemd and how to make use of them in your services.
These features take advantage of a couple of Linux-specific technologies that have
been available in the kernel for a long time, but never have been exposed in a
widely usable fashion. These systemd features have been designed to be as easy to use
as possible, in order to make them attractive to administrators and upstream
developers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Isolating services from the network&lt;/li&gt;
&lt;li&gt;Service-private &lt;tt&gt;/tmp&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;Making directories appear read-only or inaccessible to services&lt;/li&gt;
&lt;li&gt;Taking away capabilities from services&lt;/li&gt;
&lt;li&gt;Disallowing forking, limiting file creation for services&lt;/li&gt;
&lt;li&gt;Controlling device node access of services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All options described here are documented in systemd's man pages, notably &lt;a href="http://0pointer.de/public/systemd-man/systemd.exec.html"&gt;systemd.exec(5)&lt;/a&gt;.
Please consult these man pages for further details.&lt;/p&gt;

&lt;p&gt;All these options are available on all systemd systems, regardless if
SELinux or any other MAC is enabled, or not.&lt;/p&gt;

&lt;p&gt;All these options are relatively cheap, so if in doubt use them. Even if you
might think that your service doesn't write to &lt;tt&gt;/tmp&lt;/tt&gt; and hence enabling
&lt;tt&gt;PrivateTmp=yes&lt;/tt&gt; (as described below) might not be necessary, due to
today's complex software it's still beneficial to enable this feature, simply
because libraries you link to (and plug-ins to those libraries) which you do
not control might need temporary files after all. Example: you never know what
kind of NSS module your local installation has enabled, and what that NSS module
does with &lt;tt&gt;/tmp&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;These options are hopefully interesting both for administrators to secure
their local systems, and for upstream developers to ship their services secure
by default.  We strongly encourage upstream developers to consider using these
options by default in their upstream service units. They are very easy to make
use of and have major benefits for security.&lt;/p&gt;

&lt;h4&gt;Isolating Services from the Network&lt;/h4&gt;

&lt;p&gt;A very simple but powerful configuration option you may use in systemd
service definitions is &lt;tt&gt;PrivateNetwork=&lt;/tt&gt;:&lt;/p&gt;

&lt;pre&gt;...
[Service]
ExecStart=...
PrivateNetwork=yes
...&lt;/pre&gt;

&lt;p&gt;With this simple switch a service and all the processes it consists of are
entirely disconnected from any kind of networking. Network interfaces became
unavailable to the processes, the only one they'll see is the loopback device
"lo", but it is isolated from the real host loopback. This is a very powerful
protection from network attacks.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Caveat:&lt;/b&gt; Some services require the network to be operational. Of
course, nobody would consider using &lt;tt&gt;PrivateNetwork=yes&lt;/tt&gt; on a
network-facing service such as Apache. However even for non-network-facing
services network support might be necessary and not always obvious. Example: if
the local system is configured for an LDAP-based user database doing glibc name
lookups with calls such as &lt;tt&gt;getpwnam()&lt;/tt&gt; might end up resulting in network access.
That said, even in those cases it is more often than not OK to use
&lt;tt&gt;PrivateNetwork=yes&lt;/tt&gt; since user IDs of system service users are required to
be resolvable even without any network around. That means as long as the only
user IDs your service needs to resolve are below the magic 1000 boundary using
&lt;tt&gt;PrivateNetwork=yes&lt;/tt&gt; should be OK.&lt;/p&gt;

&lt;p&gt;Internally, this feature makes use of network namespaces of the kernel. If
enabled a new network namespace is opened and only the loopback device
configured in it.&lt;/p&gt;

&lt;h4&gt;Service-Private /tmp&lt;/h4&gt;

&lt;p&gt;Another very simple but powerful configuration switch is
&lt;tt&gt;PrivateTmp=&lt;/tt&gt;:&lt;/p&gt;

&lt;pre&gt;...
[Service]
ExecStart=...
PrivateTmp=yes
...&lt;/pre&gt;

&lt;p&gt;If enabled this option will ensure that the &lt;tt&gt;/tmp&lt;/tt&gt; directory the
service will see is private and isolated from the host system's &lt;tt&gt;/tmp&lt;/tt&gt;.
&lt;tt&gt;/tmp&lt;/tt&gt; traditionally has been a shared space for all local services and
users. Over the years it has been a major source of security problems for a
multitude of services. Symlink attacks and DoS vulnerabilities due to guessable
&lt;tt&gt;/tmp&lt;/tt&gt; temporary files are common. By isolating the service's
&lt;tt&gt;/tmp&lt;/tt&gt; from the rest of the host, such vulnerabilities become moot.&lt;/p&gt;

&lt;p&gt;For Fedora 17 a &lt;a href="https://fedoraproject.org/wiki/Features/ServicesPrivateTmp"&gt;feature has
been accepted&lt;/a&gt; in order to enable this option across a large number of
services.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Caveat:&lt;/b&gt; Some services actually misuse &lt;tt&gt;/tmp&lt;/tt&gt; as a location
for IPC sockets and other communication primitives, even though this is almost
always a vulnerability (simply because if you use it for communication you need
guessable names, and guessable names make your code vulnerable to DoS and symlink
attacks) and &lt;tt&gt;/run&lt;/tt&gt; is the much safer replacement for this, simply
because it is not a location writable to unprivileged processes. For example,
X11 places it's communication sockets below &lt;tt&gt;/tmp&lt;/tt&gt; (which is actually
secure -- though still not ideal -- in this exception since it does so in a
safe subdirectory which is created at early boot.) Services which need to
communicate via such communication primitives in &lt;tt&gt;/tmp&lt;/tt&gt; are no
candidates for &lt;tt&gt;PrivateTmp=&lt;/tt&gt;. Thankfully these days only very few
services misusing &lt;tt&gt;/tmp&lt;/tt&gt; like this remain.&lt;/p&gt;

&lt;p&gt;Internally, this feature makes use of file system namespaces of the kernel.
If enabled a new file system namespace is opened inheritng most of the host
hierarchy with the exception of &lt;tt&gt;/tmp&lt;/tt&gt;.&lt;/p&gt;

&lt;h4&gt;Making Directories Appear Read-Only or Inaccessible to Services&lt;/h4&gt;

&lt;p&gt;With the &lt;tt&gt;ReadOnlyDirectories=&lt;/tt&gt; and &lt;tt&gt;InaccessibleDirectories=&lt;/tt&gt;
options it is possible to make the specified directories inaccessible for
writing resp. both reading and writing to the service:&lt;/p&gt;

&lt;pre&gt;...
[Service]
ExecStart=...
InaccessibleDirectories=/home
ReadOnlyDirectories=/var
...
&lt;/pre&gt;

&lt;p&gt;With these two configuration lines the whole tree below &lt;tt&gt;/home&lt;/tt&gt;
becomes inaccessible to the service (i.e. the directory will appear empty and
with 000 access mode), and the tree below &lt;tt&gt;/var&lt;/tt&gt; becomes read-only.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Caveat:&lt;/b&gt; Note that &lt;tt&gt;ReadOnlyDirectories=&lt;/tt&gt; currently is not
recursively applied to submounts of the specified directories (i.e. mounts below
&lt;tt&gt;/var&lt;/tt&gt; in the example above stay writable). This is likely to get fixed
soon.&lt;/p&gt;

&lt;p&gt;Internally, this is also implemented based on file system namspaces.&lt;/p&gt;

&lt;h4&gt;Taking Away Capabilities From Services&lt;/h4&gt;

&lt;p&gt;Another very powerful security option in systemd is
&lt;tt&gt;CapabilityBoundingSet=&lt;/tt&gt; which allows to limit in a relatively fine
grained fashion which kernel capabilities a service started retains:&lt;/p&gt;

&lt;pre&gt;...
[Service]
ExecStart=...
CapabilityBoundingSet=CAP_CHOWN CAP_KILL
...
&lt;/pre&gt;

&lt;p&gt;In the example above only the CAP_CHOWN and CAP_KILL capabilities are
retained by the service, and the service and any processes it might create have
no chance to ever acquire any other capabilities again, not even via setuid
binaries. The list of currently defined capabilities is available in &lt;a href="http://linux.die.net/man/7/capabilities"&gt;capabilities(7)&lt;/a&gt;.
Unfortunately some of the defined capabilities are overly generic (such as
CAP_SYS_ADMIN), however they are still a very useful tool, in particular for
services that otherwise run with full root privileges.&lt;/p&gt;

&lt;p&gt;To identify precisely which capabilities are necessary for a service to run
cleanly is not always easy and requires a bit of testing. To simplify this
process a bit, it is possible to blacklist certain capabilities that are
definitely not needed instead of whitelisting all that might be needed. Example: the
CAP_SYS_PTRACE is a particularly powerful and security relevant capability
needed for the implementation of debuggers, since it allows introspecting and
manipulating any local process on the system. A service like Apache obviously
has no business in being a debugger for other processes, hence it is safe to
remove the capability from it:&lt;/p&gt;

&lt;pre&gt;...
[Service]
ExecStart=...
CapabilityBoundingSet=~CAP_SYS_PTRACE
...&lt;/pre&gt;

&lt;p&gt;The &lt;tt&gt;~&lt;/tt&gt; character the value assignment here is prefixed with inverts
the meaning of the option: instead of listing all capabalities the service
will retain you may list the ones it will not retain.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Caveat:&lt;/b&gt; Some services might react confused if certain capabilities are
made unavailable to them. Thus when determining the right set of capabilities
to keep around you need to do this carefully, and it might be a good idea to talk
to the upstream maintainers since they should know best which operations a
service might need to run successfully.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Caveat 2:&lt;/b&gt; &lt;a href="https://forums.grsecurity.net/viewtopic.php?f=7&amp;amp;t=2522"&gt;Capabilities are
not a magic wand.&lt;/a&gt; You probably want to combine them and use them in
conjunction with other security options in order to make them truly useful.&lt;/p&gt;

&lt;p&gt;To easily check which processes on your system retain which capabilities use
the &lt;tt&gt;pscap&lt;/tt&gt; tool from the &lt;tt&gt;libcap-ng-utils&lt;/tt&gt; package.&lt;/p&gt;

&lt;p&gt;Making use of systemd's &lt;tt&gt;CapabilityBoundingSet=&lt;/tt&gt; option is often a
simple, discoverable and cheap replacement for patching all system daemons
individually to control the capability bounding set on their own.&lt;/p&gt;

&lt;h4&gt;Disallowing Forking, Limiting File Creation for Services&lt;/h4&gt;

&lt;p&gt;Resource Limits may be used to apply certain security limits on services
being run. Primarily, resource limits are useful for resource control (as the
name suggests...) not so much access control. However, two of them can be
useful to disable certain OS features: RLIMIT_NPROC and RLIMIT_FSIZE may be
used to disable forking and disable writing of any files with a size &gt;
0:&lt;/p&gt;

&lt;pre&gt;...
[Service]
ExecStart=...
LimitNPROC=1
LimitFSIZE=0
...&lt;/pre&gt;

&lt;p&gt;Note that this will work only if the service in question drops privileges
and runs under a (non-root) user ID of its own or drops the CAP_SYS_RESOURCE
capability, for example via &lt;tt&gt;CapabilityBoundingSet=&lt;/tt&gt; as discussed above.
Without that a process could simply increase the resource limit again thus
voiding any effect.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Caveat:&lt;/b&gt; &lt;tt&gt;LimitFSIZE=&lt;/tt&gt; is pretty brutal. If the service
attempts to write a file with a size &gt; 0, it will immeidately be killed with
the SIGXFSZ which unless caught terminates the process. Also, creating files
with size 0 is still allowed, even if this option is used.&lt;/p&gt;

&lt;p&gt;For more information on these and other resource limits, see &lt;a href="http://linux.die.net/man/2/setrlimit"&gt;setrlimit(2)&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Controlling Device Node Access of Services&lt;/h4&gt;

&lt;p&gt;Devices nodes are an important interface to the kernel and its drivers.
Since drivers tend to get much less testing and security checking than the core
kernel they often are a major entry point for security hacks. systemd allows
you to control access to devices individually for each service:&lt;/p&gt;

&lt;pre&gt;...
[Service]
ExecStart=...
DeviceAllow=/dev/null rw
...&lt;/pre&gt;

&lt;p&gt;This will limit access to &lt;tt&gt;/dev/null&lt;/tt&gt; and only this device node,
disallowing access to any other device nodes.&lt;/p&gt;

&lt;p&gt;The feature is implemented on top of the &lt;tt&gt;devices&lt;/tt&gt; cgroup controller.&lt;/p&gt;

&lt;h4&gt;Other Options&lt;/h4&gt;

&lt;p&gt;Besides the easy to use options above there are a number of other security
relevant options available. However they usually require a bit of preparation
in the service itself and hence are probably primarily useful for upstream
developers. These options are &lt;tt&gt;RootDirectory=&lt;/tt&gt; (to set up
&lt;tt&gt;chroot()&lt;/tt&gt; environments for a service) as well as &lt;tt&gt;User=&lt;/tt&gt; and
&lt;tt&gt;Group=&lt;/tt&gt; to drop privileges to the specified user and group. These
options are particularly useful to greatly simplify writing daemons, where all
the complexities of securely dropping privileges can be left to systemd, and
kept out of the daemons themselves.&lt;/p&gt;

&lt;p&gt;If you are wondering why these options are not enabled by default: some of
them simply break seamntics of traditional Unix, and to maintain compatibility
we cannot enable them by default. e.g. since traditional Unix enforced that
&lt;tt&gt;/tmp&lt;/tt&gt; was a shared namespace, and processes could use it for IPC we
cannot just go and turn that off globally, just because &lt;tt&gt;/tmp&lt;/tt&gt;'s role in
IPC is now replaced by &lt;tt&gt;/run&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;And that's it for now. If you are working on unit files for upstream or in
your distribution, please consider using one or more of the options listed
above. If you service is secure by default by taking advantage of these options
this will help not only your users but also make the Internet a safer
place.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 20 Jan 2012 02:26:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-01-20:/blog/projects/security.html</guid><category>projects</category></item><item><title>PulseAudio vs. AudioFlinger</title><link>https://0pointer.net/blog/projects/aruns-numbers.html</link><description>
                
&lt;p&gt;&lt;a href="http://arunraghavan.net/2012/01/pulseaudio-vs-audioflinger-fight/"&gt;Arun
put an awesome article up&lt;/a&gt;, detailing how PulseAudio compares to Android's
AudioFlinger in terms of power consumption and suchlike. Suffice to say,
PulseAudio rocks, but go and read the whole thing, it's worth it.&lt;/p&gt;

&lt;p&gt;Apparently, AudioFlinger is a great choice if you want to shorten your
battery life.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 16 Jan 2012 16:31:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2012-01-16:/blog/projects/aruns-numbers.html</guid><category>projects</category></item><item><title>Introducing the Journal</title><link>https://0pointer.net/blog/projects/the-journal.html</link><description>
                
&lt;p&gt;In the past weeks we have been working on a major new addition to systemd
that will hopefully positively change the Linux ecosystem in a number of ways.
But see for yourself, check out the full explanation on what we have
implemented on the &lt;a href="https://docs.google.com/document/pub?id=1IC9yOXj7j6cdLLxWEBAGRL6wl97tFxgjLUEHIX3MSTs"&gt;design
document we put up on Google Docs&lt;/a&gt;.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 18 Nov 2011 16:28:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-11-18:/blog/projects/the-journal.html</guid><category>projects</category></item><item><title>Kernel Hackers Panel</title><link>https://0pointer.net/blog/projects/linuxcon-kernel-panel.html</link><description>
                
&lt;p&gt;At LinuxCon Europe/ELCE I had the chance to moderate the &lt;a href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel"&gt;kernel hackers
panel with Linus Torvalds, Alan Cox, Paul McKenney and Thomas Gleixner on
stage&lt;/a&gt;. I like to believe it went quite well, but check it out for yourself, as
a video recording is now available online:&lt;/p&gt;

&lt;video width="800" height="450" controls="1"&gt;
  &lt;source src="http://free-electrons.com/pub/video/2011/elce/elce-2011-torvalds-cox-gleixner-mackenney-kernel-developer-panel-450p.webm" /&gt;
&lt;/video&gt;

&lt;p&gt;For me personally I think the most notable topic covered was Control Groups,
and the clarification that they are something that is needed even though their
implementation right now is in many ways less than perfect. But in the end there is no
reasonable way around it, and much like SMP, technology that complicates things
substantially but is ultimately unavoidable.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://free-electrons.com/blog/elce-2011-videos/"&gt;Other videos from ELCE are online now, too.&lt;/a&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 07 Nov 2011 16:53:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-11-07:/blog/projects/linuxcon-kernel-panel.html</guid><category>projects</category></item><item><title>libabc</title><link>https://0pointer.net/blog/projects/libabc.html</link><description>
                
&lt;p&gt;At the Kernel Summit in Prague last week Kay Sievers and I lead a session on
developing shared userspace libraries, for kernel hackers. More and more
userspace interfaces of the kernel (for example many which deal with storage,
audio, resource management, security, file systems or a number of other
subsystems) nowadays rely on a dedicated userspace component. As people who
work primarily in the plumbing layer of the Linux OS we noticed over and over
again that these libraries written by people who usually are at home on the
kernel side of things make the same mistakes repeatedly, thus making life for
the users of the libraries unnecessarily difficult. In our session we tried to
point out a number of these things, and in particular places where the usual
kernel hacking style translates badly into userspace shared library hacking.
Our hope is that maybe a few kernel developers have a look at our list of
recommendations and consider the points we are raising.&lt;/p&gt;

&lt;p&gt;To make things easy we have put together an example skeleton library we
dubbed &lt;tt&gt;libabc&lt;/tt&gt;, whose &lt;a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;a=blob_plain;f=README"&gt;README&lt;/a&gt;
file includes all our points in terse form. It's available on kernel.org:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git"&gt;The git repository&lt;/a&gt; and the &lt;a href="https://git.kernel.org/?p=linux/kernel/git/kay/libabc.git;a=blob_plain;f=README"&gt;README&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This list of recommendations draws inspiration from David Zeuthen's and
Ulrich Drepper's well known papers on the topic of writing shared libraries. In
the README linked above we try to distill this wealth of information into a
terse list of recommendations, with a couple of additions and with a strict
focus on a kernel hacker background.&lt;/p&gt;

&lt;p&gt;Please have a look, and even if you are not a kernel hacker there might be
something useful to know in it, especially if you work on the lower layers of
our stack.&lt;/p&gt;

&lt;p&gt;If you have any questions or additions, just ping us, or comment below!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Tue, 01 Nov 2011 01:46:00 +0100</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-11-01:/blog/projects/libabc.html</guid><category>projects</category></item><item><title>Prague</title><link>https://0pointer.net/blog/projects/linuxcon-europe.html</link><description>
                
&lt;p&gt;If you make it to Prague the coming week for the LinuxCon/ELCE/GStreamer/Kernel Summit/... superconference, make sure not to miss:&lt;/p&gt;

&lt;ul&gt;

&lt;li&gt;The Linux Audio BoF with numerous Linux audio hackers, 5pm, on Sunday (23rd, i.e. today).&lt;/li&gt;

&lt;li&gt;&lt;a href="http://gstreamer.freedesktop.org/conference/speakers.html#raghavan"&gt;Latest
developments in PulseAudio&lt;/a&gt; by Arun Raghavan. 4pm, on Tuesday, GStreamer
Summit&lt;/li&gt;

&lt;li&gt;&lt;a href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel"&gt;Linux
Kernel Developer Panel&lt;/a&gt;, a shared session of LinuxCon and ELCE. Panelists
are Linus Torvalds, Alan Cox, Thomas Gleixner and Paul McKenney. Moderated by
yours truly. 9:30am, on Wednesday&lt;/li&gt;

&lt;li&gt;&lt;a href="https://events.linuxfoundation.org/events/linuxcon-europe/poettering-sievers"&gt;systemd
Administration in the Enterprise&lt;/a&gt; by Kay Sievers and yours truly. 4:15pm, on
Wednesday, LinuxCon&lt;/li&gt;

&lt;li&gt;&lt;a href="https://events.linuxfoundation.org/events/embedded-linux-conference-europe/kooi"&gt;Integrating
systemd: Booting Userspace in Less Than 1 Second&lt;/a&gt; by Koen Kooi. 11:15am, on
Friday, ELCE&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;All of that at the Clarion Hotel. See you in Prague!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sun, 23 Oct 2011 01:31:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-23:/blog/projects/linuxcon-europe.html</guid><category>projects</category></item><item><title>Plumbers Wishlist, The Second Edition</title><link>https://0pointer.net/blog/projects/plumbers-wishlist-2.html</link><description>
                
&lt;p&gt;Two weeks ago we published a &lt;a href="http://0pointer.de/blog/projects/plumbers-wishlist.html"&gt;Plumber's
Wishlist for Linux&lt;/a&gt;. So far, this has already created lively discussions in
the community (as reported on LWN among others), and patches for a few of the
items listed have already been posted (thanks a lot to those who worked on
this, your contributions are much appreciated!).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8"&gt;We
have now prepared a second version of the wish list.&lt;/a&gt; It includes a number
of additions (tmpfs quota! hostname change notifications! and more!) and
updates to the previous items, including links to patches, and references to
other interesting material.&lt;/p&gt;

&lt;p&gt;We hope to update this wishlist from time, so stay tuned!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.google.com/document/pub?id=1RmJrtIoTnivkmR9KCqfJNBnEll4X9Jtu0xj5w6hFGs8"&gt;And now, go and read the new wishlist!&lt;/a&gt;&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 20 Oct 2011 20:41:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-20:/blog/projects/plumbers-wishlist-2.html</guid><category>projects</category></item><item><title>Google doesn't like my name</title><link>https://0pointer.net/blog/projects/google-doesnt-like-my-name.html</link><description>
                
&lt;p&gt;Nice one, Google suspended my Google+ account because I created it under,
well, my name, which is "Lennart Poettering", and Google+ thinks that wasn't my
name, even though it says so in my passport, and almost every document I own
and I was never aware I had any other name. This is ricidulous. Google, give me
my name back! This is a really uncool move.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 17 Oct 2011 18:50:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-17:/blog/projects/google-doesnt-like-my-name.html</guid><category>projects</category></item><item><title>Your Questions for the Kernel Developer Panel at LinuxCon in Prague</title><link>https://0pointer.net/blog/projects/kernel-hacker-panel.html</link><description>
                #nocomments yes

&lt;p&gt;&lt;a href="https://plus.google.com/115547683951727699051/posts/SuTUvbcJ6p9"&gt;I
am currently collecting&lt;/a&gt; questions for the &lt;a href="https://events.linuxfoundation.org/events/linuxcon-europe/kernel-panel"&gt;kernel
developer panel at LinuxCon in Prague&lt;/a&gt;. If there's something you'd like the
panelists to respond to, please post it on &lt;a href="https://plus.google.com/115547683951727699051/posts/SuTUvbcJ6p9"&gt;the
thread&lt;/a&gt;, and I'll see what I can do. Thank you!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Mon, 17 Oct 2011 15:38:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-17:/blog/projects/kernel-hacker-panel.html</guid><category>projects</category></item><item><title>A Big Loss</title><link>https://0pointer.net/blog/projects/a-big-loss.html</link><description>
                
&lt;p&gt;&lt;a href="http://googleblog.blogspot.com/2011/10/fall-sweep.html"&gt;Google
announced today that they'll be shutting down Google Code Search in
January&lt;/a&gt;. I am quite sure that this would be a massive loss for the Free
Software community.  The ability to learn from other people's code is a key
idea of Free Software.  There's simply no better way to do that than with a
source code search engine.  The day Google Code Search will be shut down will
be a sad day for the Free Software community.&lt;/p&gt;

&lt;p&gt;Of course, there are a couple of alternatives around, but they all have one
thing in common: they, uh, don't even remotely compare to the completeness,
performance and simplicity of the Google Code Search interface, and have
serious usability issues. (For example: koders.com is really really slow, and
splits up identifiers you search for at underscores, which kinda makes it
useless for looking for almost any kind of code.)&lt;/p&gt;

&lt;p&gt;I think it must be of genuine interest to the Free Software community to
have a capable replacement for Google Code Search, for the day it is turned
off. In fact, it probably should be something the various foundations which
promote Free Software should be looking into, like the FSF or the Linux
Foundation. There are very few better ways to get Free Software into the heads
and minds of engineers than by examples -- examples consisting of real life
code they can find with a source code search engine. I believe a source code
search engine is probably among the best vehicles to promote Free Software
towards engineers. In particular if it itself was Free Software (in contrast to
Google Code Search).&lt;/p&gt;

&lt;p&gt;Ideally, all software available on web sites like SourceForge, Freshmeat, or
github should be indexed. But there's also a chance for distributions here:
indexing the sources of all packages a distribution like Debian or Fedora
include would be a great tool for developers. In fact, a distribution offering
this functionality might benefit from such functionality, as it attracts
developer interest in the distribution.&lt;/p&gt;

&lt;p&gt;It's sad that Google Code Search will be gone soon. But maybe there's
something positive in the bad news here, and a chance to create something better,
more comprehensive, that is free, and promotes our ideals better than Google
ever could. Maybe there's a chance here for the Open Source foundations, for
the distributions and for the communities to create a better replacement!&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 14 Oct 2011 23:05:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-14:/blog/projects/a-big-loss.html</guid><category>projects</category></item><item><title>Dresden, California, Poznan</title><link>https://0pointer.net/blog/photos/california.html</link><description>
                
&lt;p&gt;&lt;a href="http://0pointer.de/static/dresden.html"&gt;&lt;img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/dresden-small.jpeg" width="1024" height="291" alt="Hofkirche, Dresden, Saxony, Germany" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Hofkirche, Dresden, Saxony, Germany&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/static/bastei.html"&gt;&lt;img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/bastei-small.jpeg" width="1024" height="260" alt="Bastei, Saxon Switzerland, Saxony, Germany" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Bastei, Saxon Switzerland, Saxony, Germany&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/static/dresden2.html"&gt;&lt;img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/dresden2-small.jpeg" width="1024" height="370" alt="Fürstenzug, Dresden, Saxony, Germany" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;F&amp;uuml;rstenzug, Dresden, Saxony, Germany&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/static/california.html"&gt;&lt;img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california-small.jpeg" width="1024" height="120" alt="Near California State Route 46, California, USA" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Near California State Route 46, California, USA&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/static/california2.html"&gt;&lt;img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california2-small.jpeg" width="1024" height="122" alt="Near Generals Highway, California, USA" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Near Generals Highway, California, USA&lt;/i&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/static/california3.html"&gt;&lt;img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/california3-small.jpeg" width="1024" height="230" alt="Near Generals Highway, California, USA" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Near Generals Highway, California, USA&lt;/i&gt;, a bit further down the road.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://0pointer.de/static/poznan.html"&gt;&lt;img style="border: 10px solid #232729; background-color: #6b6c6; padding: 1px; -moz-border-radius: 7px; margin: 0.5cm" src="http://0pointer.de/static/poznan-small.jpeg" width="1024" height="183" alt="Parish Church in Poznan, Poland" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;i&gt;Parish Church in Poznan, Poland&lt;/i&gt;&lt;/p&gt;



        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Sun, 09 Oct 2011 21:32:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-09:/blog/photos/california.html</guid><category>photos</category></item><item><title>A Plumber's Wish List for Linux</title><link>https://0pointer.net/blog/projects/plumbers-wishlist.html</link><description>
                
&lt;p&gt;Here's a &lt;a href="http://thread.gmane.org/gmane.linux.kernel/1200272"&gt;mail
we just sent to LKML&lt;/a&gt;, for your consideration. Enjoy:&lt;/p&gt;

&lt;pre&gt;&lt;b&gt;Subject: A Plumber’s Wish List for Linux&lt;/b&gt;

We’d like to share our current wish list of plumbing layer features we
are hoping to see implemented in the near future in the Linux kernel and
associated tools. Some items we can implement on our own, others are not
our area of expertise, and we will need help getting them implemented.

Acknowledging that this wish list of ours only gets longer and not
shorter, even though we have implemented a number of other features on
our own in the previous years, we are posting this list here, in the
hope to find some help.

If you happen to be interested in working on something from this list or
able to help out, we’d be delighted. Please ping us in case you need
clarifications or more information on specific items.

Thanks,
Kay, Lennart, Harald, in the name of all the other plumbers


An here’s the wish list, in no particular order:

* (ioctl based?) interface to query and modify the label of a mounted
FAT volume:
A FAT labels is implemented as a hidden directory entry in the file
system which need to be renamed when changing the file system label,
this is impossible to do from userspace without unmounting. Hence we’d
like to see a kernel interface that is available on the mounted file
system mount point itself. Of course, bonus points if this new interface
can be implemented for other file systems as well, and also covers fs
UUIDs in addition to labels.

* CPU modaliases in /sys/devices/system/cpu/cpuX/modalias:
useful to allow module auto-loading of e.g. cpufreq drivers and KVM
modules. Andy Kleen has a patch to create the alias file itself. CPU
‘struct sysdev’ needs to be converted to ‘struct device’ and a ‘struct
bus_type cpu’ needs to be introduced to allow proper CPU coldplug event
replay at bootup. This is one of the last remaining places where
automatic hardware-triggered module auto-loading is not available. And
we’d like to see that fix to make numerous ugly userspace work-arounds
to achieve the same go away.

* expose CAP_LAST_CAP somehow in the running kernel at runtime:
Userspace needs to know the highest valid capability of the running
kernel, which right now cannot reliably be retrieved from header files
only. The fact that this value cannot be detected properly right now
creates various problems for libraries compiled on newer header files
which are run on older kernels. They assume capabilities are available
which actually aren’t. Specifically, libcap-ng claims that all running
processes retain the higher capabilities in this case due to the
“inverted” semantics of CapBnd in /proc/$PID/status.

* export ‘struct device_type fb/fbcon’ of ‘struct class graphics’
Userspace wants to easily distinguish ‘fb’ and ‘fbcon’ from each other
without the need to match on the device name.

* allow changing argv[] of a process without mucking with environ[]:
Something like setproctitle() or a prctl() would be ideal. Of course it
is questionable if services like sendmail make use of this, but otoh for
services which fork but do not immediately exec() another binary being
able to rename this child processes in ps is of importance.

* module-init-tools: provide a proper libmodprobe.so from
module-init-tools:
Early boot tools, installers, driver install disks want to access
information about available modules to optimize bootup handling.

* fork throttling mechanism as basic cgroup functionality that is
available in all hierarchies independent of the controllers used:
This is important to implement race-free killing of all members of a
cgroup, so that cgroup member processes cannot fork faster then a cgroup
supervisor process could kill them. This needs to be recursive, so that
not only a cgroup but all its subgroups are covered as well.

* proper cgroup-is-empty notification interface:
The current call_usermodehelper() interface is an unefficient and an
ugly hack. Tools would prefer anything more lightweight like a netlink,
poll() or fanotify interface.

* allow user xattrs to be set on files in the cgroupfs (and maybe
procfs?)

* simple, reliable and future-proof way to detect whether a specific pid
is running in a CLONE_NEWPID container, i.e. not in the root PID
namespace. Currently, there are available a few ugly hacks to detect
this (for example a process wanting to know whether it is running in a
PID namespace could just look for a PID 2 being around and named
kthreadd which is a kernel thread only visible in the root namespace),
however all these solutions encode information and expectations that
better shouldn’t be encoded in a namespace test like this. This
functionality is needed in particular since the removal of the the ns
cgroup controller which provided the namespace membership information to
user code.

* allow making use of the “cpu” cgroup controller by default without
breaking RT. Right now creating a cgroup in the “cpu” hierarchy that
shall be able to take advantage of RT is impossible for the generic case
since it needs an RT budget configured which is from a limited resource
pool. What we want is the ability to create cgroups in “cpu” whose
processes get an non-RT weight applied, but for RT take advantage of the
parent’s RT budget. We want the separation of RT and non-RT budget
assignment in the “cpu” hierarchy, because right now, you lose RT
functionality in it unless you assign an RT budget. This issue severely
limits the usefulness of “cpu” hierarchy on general purpose systems
right now.

* Add a timerslack cgroup controller, to allow increasing the timer
slack of user session cgroups when the machine is idle.

* An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or
something like that), i.e. a way to attach sender cgroup membership to
messages sent via AF_UNIX. This is useful in case services such as
syslog shall be shared among various containers (or service cgroups),
and the syslog implementation needs to be able to distinguish the
sending cgroup in order to separate the logs on disk. Of course stm
SCM_CREDENTIALS can be used to look up the PID of the sender followed by
a check in /proc/$PID/cgroup, but that is necessarily racy, and actually
a very real race in real life.

* SCM_COMM, with a similar use case as SCM_CGROUPS. This auxiliary
control message should carry the process name as available
in /proc/$PID/comm.&lt;/pre&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Fri, 07 Oct 2011 01:22:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-07:/blog/projects/plumbers-wishlist.html</guid><category>projects</category></item><item><title>What You Need to Know When Becoming a Free Software Hacker</title><link>https://0pointer.net/blog/projects/hinter-den-kulissen.html</link><description>
                
&lt;p&gt;Earlier today I gave a presentation at the Technical University Berlin about
things you need to know, things you should expect and things you shouldn't
expect when your are aspiring to become a successful Free Software Hacker.&lt;/p&gt;

&lt;p&gt;I have put my slides up on Google Docs in case you are interested, either
because you are the target audience (i.e. a university student) or because you
need inspiration for a similar talk about the same topic.&lt;/p&gt;

&lt;p&gt;The first two slides are in German language, so skip over them. The
interesting bits are all in English. I hope it's quite comprehensive (though of
course terse). Enjoy:&lt;/p&gt;

&lt;iframe src="https://docs.google.com/present/embed?id=dd4d9j2z_1r8fjkqc7" frameborder="0" width="410" height="342"&gt;&lt;/iframe&gt;

&lt;p&gt;In case your feed reader/planet messes this up, &lt;a href="https://docs.google.com/present/view?id=dd4d9j2z_1r8fjkqc7"&gt;here's the non-embedded version&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Oh, and thanks to everybody who &lt;a href="https://plus.google.com/115547683951727699051/posts/UqNgFiV3qTx"&gt;reviewed and suggested additions to the the slides on +&lt;/a&gt;.&lt;/p&gt;

        </description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lennart Poettering</dc:creator><pubDate>Thu, 06 Oct 2011 22:05:00 +0200</pubDate><guid isPermaLink="false">tag:0pointer.net,2011-10-06:/blog/projects/hinter-den-kulissen.html</guid><category>projects</category></item></channel></rss>