Skip to content

Blog

A Rolling Boulder Gathers No Moss

We actually did it. Super pleased to announce that moss is now capable of installing and removing packages. Granted, super rough, but gotta start somewhere right?

Transactional system roots + installation in moss

OK let’s recap. A moss archive is super weird, and consists of multiple containers, or payloads. We use a strongly typed binary format, per-payload compression (Currently zstd), and don’t store files in a typical archive fashion.

Instead a .stone file (moss archive) has a Content Payload, which is a compressed “megablob” of all the unique files in a given package. The various files contained within that “megablob” are described in an IndexPayload, which simply contains some IDs and offsets, acting much like a lookup table.

That data alone doesn’t actually tell us where files go on the filesystem when installed. For that, we have a specialist Layout Payload, encoding the final layout of the package on disk.

As can be imagined, the weirdness made it quite difficult to install in a trivial fashion.

Well, persistence really. Thanks to RocksDB and our new moss-db project, we can trivially store information we need from each package we “precache”. Primarily, we store full system states within our new StateDB, which at present is simply a series of package ID selections grouped by a unique 64-bit integer.

Additionally we remember the layouts within the LayoutDB so that we can eventually recreate said layout on disk.

Before we actually commit to an install, we try to precache all of the stone files in our pool. So we unpack the content payload (“megablob”), split it into various unique files in the pool ready for use. At this point we also record the Layouts, but do not “install” the package to a system root.

This is our favourite step. When our cache is populated, we gather all relevant layouts for the current selections, and then begin applying them in a new system root. All directories and symlinks are created as normal, whereas any regular file is hardlinked from the pool. This process takes a fraction of a second and gives us completely clean, deduplicated system roots.

Currently these live in /.moss/store/root/$ID/usr. To complete the transaction, we update /usr to point to the new usr tree atomically assuming that a reboot isn’t needed. In future, boot switch logic will update the tree for us.

Removal is quite the same as installation. We simply remove the package IDs from the new state selections (copied from the last state) and blit a new system root, finally updating the atomic /usr pointer.

Removal

We retain classic package management traits such as having granular selections, multiple repositories, etc, whilst sporting advanced features like full system deduplication and transactions/rollbacks.

When we’re far enough along, it’ll be possible to boot back to the last working transaction without requiring an internet connection. Due to the use of pooling and hardlinks, each transaction tree is only a few KiB, with files shared between each transaction/install.

We need some major cleanups, better error handling, logging, timed functions, and an eventloop driven process to allow parallel fetching/precaching prior to final system rootfs construction.

It’s taken us a very long time to get to this point, and there is still more work to be done. However this is a major milestone and we can now start adding features and polish.

Once the required features are in place, we’ll work on the much needed pre alpha ISO :) If you fancy helping us get to that stage quicker, do check out our OpenCollective! (We won’t limit prealpha availability, don’t worry :))

Moss DB Progress

I’ll try to make this update as brief as I can but it’s certainly an important one, so let’s dive right into it. The last few weeks have been rough but work on our package manager has still been happening. Today we’re happy to reveal another element of the equation: moss-db.

Putting moss-db to the test
Putting moss-db to the test

moss-db is an abstract API providing access to simplistic “Key Value” stores. We had initially used some payload based files as databases but that introduced various hurdles, so we decided to take a more abstract approach to not tie ourselves to any specific implementation of a database.

Our main goal with moss-db is to encapsulate the RocksDB library, providing sane, idiomatic access to a key value store.

At the highest level, we needed something that could store arbitrary keys and values, grouped by some kind of common key (commonly known as “buckets”). We’ve succeeded in that abstraction, which also required us to fork a rocksdb-binding to add the Transform APIs required.

Additionally we required idiomatic range behaviours for iteration, as well as generic access patterns. To that affect we can now foreach a bucket, pipe it through the awesomely powerful std.algorithm APIs, and automatically encode/decode keys and values through our generic APIs when implementing the mossdbEncode() and mossdbDecode() functions for a specific type.

In a nutshell, this was the old, ugly, hard way:

/* old, hard way */
auto nameZ = name.toStringz();
int age = 100;
ubyte[int.sizeof] ageEncoded = nativeToBigEndian(ageEncoded);
db.setDatum(cast(ubyte[]) (nameZ[0 .. strlen(nameZ)]), ageEncoded);

And this is the new, shmexy way:

db.set("john", 100);
db.set("user 100", "bobby is my name");
auto result = db.get!int("john");
if (result.found)
{
writeln(result.value);
}
auto result2 = db.get!string("user 100");
if (result2.found)
{
writeln(result2.value);
}

It’s quite easy to see the new API lends itself robustly to our needs, so that we may implement stateful, strongly typed databases for moss.

Even though some APIs in moss-db may still be lacking (remove, for example) we’re happy that it can provide the foundation for our next steps. We now need to roll out the new StateDB, MetaDB and LayoutDB, to record system states, package metadata, and filesystem layout information, respectively.

With those 3 basic requirements in place we can combine the respective works into installation routines. Which, clearly, warrants another blog post … :)

For now you can see the relevant projects on our GitLab project.

Initial Performance Testing

With further progress on boulder, we can now build native stone packages with some easy tweaks such as profile guided optimizations (PGO) and link time optimizations (LTO). That means we can take a first look at what the performance of the first cut of Serpent OS shows for the future. The tests have been conducted using benchmarking-tools with Serpent OS measured in a chroot on the same host with the same kernel and config.

One of the key focuses for early in the project is on reducing build time. Every feature can either add or subtract from the time it takes to produce a package. With a source/binary hybrid model, users will greatly benefit from the faster builds as well. In terms of what I’ve targeted in these tests is the performance of clang and testing some compiler flag options on cmake.

clang has always been a compiler with a big future. The performance credentials have also been improving each release and are now starting to see it perform strongly against its GNU counterpart. It is common to hear that clang is slow and produces less optimized code. I will admit that most distros provide a slow build of clang, but that will not be the case in Serpent OS.

It is important to note that in this comparison the Host distro has pulled in some patches from LLVM-13 that greatly improve the performance of clang. Prior to this, their tests actually took 50% longer for cmake and configure but only 10% longer for compiling. boulder does not yet support patching in builds so the packages are completely vanilla.

Test using clangSerpentHostDifference
cmake LLVM5.89s10.58s79.7%
Compile -j4 llvm-ar126.16s155.32s23.1%
configure gettext36.64s63.55s73.4%

Based on the results during testing, the performance of clang in Serpent OS still has room to improve and was just a quick tuning pass. At stages where I would have expected to be ahead already, the compile performance was only equal (but cmake and configure were still well ahead).

While clang is the default compiler in Serpent OS, there may be instances where the performance is not quite where it could be. It is common to see software have more optimized code paths where they are not tested with clang upstream. As an example, here’s a couple of patches in flac (1, 2) that demonstrate this being improved. Using benchmarking-tools, it is easy to see where gcc and clang builds are running different functions via perf results.

In circumstances where the slowdown is due to hitting poor optimization paths in clang, we always have the option to build packages using gcc, where the GNU toolchain is essential for building glibc. Therefore having a solid GNU toolchain is important but small compile time improvements won’t be noticed by users or developers as much.

Test using gccSerpentHostDifference
cmake LLVM7.00s7.95s13.6%
Compile llvm-ar168.11s199.07s18.4%
configure gettext45.45s51.93s14.3%

While the current bootstrap exists only as a starting point for building the rest of Serpent OS, there are some other packages we can easily test and compare. Here’s a summary of those results.

TestSerpentHostDifference
Pybench1199.67ms1024.33ms-14.6%
xz Compress Kernel (-3 -T1)42.67s46.57s9.1%
xz Compress Kernel (-9 -T4)71.25s76.12s6.8%
xz Decompress Kernel8.03s8.18s1.9%
zlib Compress Kernel12.60s13.17s4.5%
zlib Decompress Kernel5.14s5.21s1.4%
zstd Compress Kernel (-8 -T1)5.77s7.06s22.3%
zstd Compress Kernel (-19 -T4)51.87s66.52s28.3%
zstd Decompress Kernel2.90s3.08s6.3%

From my experiences with testing the bootstrap, it is clear there’s some cobwebs in there that require some more iterations of the toolchain. There also seems to be some slowdowns in not including all the dependencies of some packages. Once more packages are included, naturally all the testing will be redone and help influence the default compiler flags of the project.

It’s not yet clear the experience of using libc++ vs libstdc++ with the clang compiler. Once the cobwebs are out and Serpent OS further developed, the impact (if any) should become more obvious. There are also some parts not yet included in boulder such as stripping files, LTO and other flags by default that will speed up loading libraries. At this stage this is deliberate until integrating outputs from builds (such as symbol information).

But this provides an excellent platform to build out the rest of the OS. The raw speed of the clang compiler will make iterating and expanding the package set a real joy!

Very astute of you to notice! python in its current state is an absolute minimal build of python in order to run meson. However, I did an analyze run in benchmarking-tools where it became obvious that they were doing completely different things.

Apples and oranges comparison

For now I’ll simply be assuming this will sort itself out when python is built complete with all its functionality. And before anyone wants to point the finger at clang, you get the same result with gcc.

Boulder Keeps On Rolling

Squirrelling away in the background has been some great changes to bring boulder closer to its full potential. Here’s a quick recap of some of the more important ones.

Boulder hard at work

  • Fixed a path issue that prevented manifests from being written for 32bit builds
  • Added keys to control where the tarballs are extracted to
    • This results in a greatly simplified setup stage when using multiple upstreams
  • More customizations to control the final c{,xx}flags exported to the build
  • Added a key to run at the start of every stage so definitions can be exported easily in the stone.yml file
  • Fixed an issue where duplicate hash files were being included in the Content Payload
    • This resulted in reducing the Content Payload size by 750MB of a glibc build with duplicate locale files
  • Finishing touches on profile guided optimization (PGO) builds - including clang’s context-sensitive PGO
    • Fixed a few typos in the macros to make it all work correctly
    • Profile flags are now added to the build stages
    • Added the llvm profile merge steps after running the workload
    • Recreate a clean working directory at the start of each PGO phase

With all this now in place, the build stages of boulder are close to completion. But don’t worry, there’s plenty more great features to come to make building packages for Serpent OS simple, flexible and performant. Next steps will be testing out these new features to see how much they can add to the overall stage4 performance.

Let There Be Databases

We haven’t been too great on sharing progress lately, so welcome to an overdue update on timelines, progress, and database related shmexiness.

Emerging DB design

OK, so you may remember moss-format, our module for reading and writing moss binary archives. It naturally contains much in the way of binary serialisation support, so we’ve extended the format to support “database” files. In reality, they are more like tables binary encoded into a single file, identified by a filepath.

The DB archives are currently stored without compression to ensure 0-copy mmap() access when loading from disk, as a premature optimisation. This may change in future if we find the DB files taking up too much disk space.

So far we’ve implemented a “StateMetaDB”, which stores metadata on every recorded State on the system, and right now I’m in the progress of implementing the “StateEntriesDB”, which is something akin to a binary encoded dpkg selections file with candidate specification reasons.

Next on the list is the LayoutsDB (file manifests) and the CacheDB, for recording refcounts of every cached file in the OS pool.

An interesting trial we’re currently implementing is to hook the DB implementation up to our Entity Component system from the Serpent Engine, in order to provide fast, cache coherent, in memory storage for the DB. It’s implemented using many nice DLang idioms, allowing the full use of std.algorithm APIs:

auto states()
{
import std.algorithm : map;
auto view = View!ReadOnly(entityManager);
return view.withComponents!StateMetaArchetype
.map!((t) => StateDescriptor(t[1].id, t[3].name, t[4].description,
t[1].type, t[2].timestamp));
}
...
/* Write the DB back in ascending numerical order */
db.states
.array
.sort!((a, b) => a.id < b.id)
.each!((s) => writeOne(s));

Ok, so you can see we need basic DB types for storing the files for each moss archive, plus each cache and state entry. If you look at the ECS code above, it becomes quite easy to imagine how this will impact installation of archives. Our new install code will simply modify the existing state, cache the incoming package, and apply the layout from the DB to disk, before committing the new DB state.

In essence, our DB work is the current complex target, and installation is a <50 line trick tying it all together.

/* Pseudocode */
State newState...
foreach (pkgID; currentState.filter!((s) => s.reason == SelectionReason.Explicit))
{
auto fileSet = layoutDB.get(pkgID);
fileSet.array.sort!((a, b) => a.path < b.path).each!((f) => applyLayout(f));
/* Record into new state */
...
}

Til next time -

Ikey