Verneuil: S3-backed asynchronous replication for SQLite

We here at Backtrace are excited to share Verneuil, our MIT-licensed SQLite VFS (OS interface) extension that intercepts file writes and asynchronously replicates SQLite databases to S3-compatible blob stores, as well as a companion VFS that recreates snapshots from the replicated blob data. We migrated our internal repository, with full commit history, to this public repository where we will host all future development.

What does it do?

Verneuil is a Linux-only Rust crate (with a fair amount of C) that uses SQLite’s extension mechanism to access database files on disk exactly like the default unix ™ VFS, while asynchronously replicating snapshots to a blob store like S3. The extension also registers a verneuil_snapshot VFS that lets SQLite directly access these snapshots from remote storage. It can be loaded as a SQLite extension at runtime (verneuil_vfs), or invoked as a command line tool (verneuilctl). The extension is convenient for quick hacks at the SQLite shell or to test replication in preexisting programs, while verneuilctl is a versatile tool for system administrators that sometimes wish to pull off unexpected hacks like atomically updating replica db files behind an unsuspecting application’s back. However, for deeper integration, it makes more sense to link against the crate, either directly as a Rust library or via its C interface.

At first sight, Verneuil has very similar goals to Litestream, a daemon that ships SQLite WAL segments to S3. A key difference is that Litestream currently does not support read replicas, while Verneuil was designed to trivially offer (slightly stale) read snapshots.

For those familiar with SQLite, the most important difference may be that Verneuil only supports databases with rollback journals, while Litestream only supports write-ahead logging (WAL). Each journaling mode has its strengths, and it’s not clear that either is superior to the other. It’s definitely useful to open a SQLite database in WAL mode and let reads proceed concurrently with a write. On the other hand, rollback journaling works better for large transactions that insert a lot of data: SQLite in WAL mode must write everything twice (once to the WAL and once more to the database file), and we have seen WAL files grow to a few GBs and remain at that size until manual intervention. Most programs at Backtrace have always used rollback journals, so it made sense for us to keep using that journaling mode.

Our use case

We use Verneuil in production to replicate a few thousand SQLite databases. Many are small (2-3 MB), some are medium sized (1-2 GB); most are cold, and a few see multiple writes every second. The global average write rate across all these databases is in the low dozens of write transactions per second. We started rolling out Verneuil close to six months ago, and have yet to find a correctness problem. Even replication lag is usually less than a few seconds (80 detected instances of lag exceeding 5 seconds in the last 24 hours across all replicated databases, and none exceeding 60 seconds), despite freshness not being a design goal.

Just the simplification of our disaster recovery playbooks, now that we can restore seconds-old snapshots with verneuilctl, would justify the switch to Verneuil: the replication system was designed to avoid introducing new catastrophic failure modes at all costs, so we’re pretty confident that enabling replication for backups did not worsen our system’s stability. However, we’re even more excited by the prospect of easily migrating logic out of our large backend service: a lot of functionality works fine with stale data, at least as a fallback mechanism, and breaking that out will help us degrade gracefully.

The obvious question is “Why would we want to snapshot and replicate SQLite databases instead of using a real database server like Postgres?” The immediate answer is that we have a lot of C code that integrates with the SQLite API and assumes low latency queries. However, even if we were to redo it all from scratch, it’s not clear that adding a database server to our dependency graph would be a good idea. Backtrace supports different deployment strategies, from servers shared by hundreds of tenants, to dedicated instances for customers that contractually require isolation, to on-premises installations. An external database server would add to our support burden, and likely further differentiate multitenant installations from dedicated ones.

It’s also important to note that the SQLite databases only store metadata, and that metadata is order of magnitudes smaller than the data it describes. Even a 1GB database is surprisingly large. The current solution, where we shard metadata physically (one SQLite database for each shard) by customer (and by “project” under each customer) thus offers us more than sufficient performance while being uniform across our installations.

Of course, using SQLite instead of a database server makes it harder to use off-the-shelf high availability solutions, and impossible to access databases from remote machines. We wrote Verneuil to address this gap: we can now restore fresh snapshots from S3 even when EBS is down, and services that tolerate stale data can directly query read replicas in SQLite.

This is all gravy on top of a mature system that already works well enough, so our primary objective with Verneuil was to “do no harm.” That’s why the Verneuil VFS is fully compatible with the default unix ™ VFS, to preserve existing tooling and to let us roll forward and back seamlessly. It’s also why it buffers replication data on disk: to decouple the VFS from the workers that upload the data to blob stores. Given this design philosophy, and after exercising the VFS and hits internal consistency checks with SQLite’s TCL test suite, we felt comfortable rolling out replication globally over a few weeks, and we haven’t rolled back ever since.

What’s next for Verneuil?

We are already happy with Verneuil at Backtrace: just the ability to treat replication as a streaming backup system makes it a lot easier to recover from disasters. We also quickly started using read replicas for internal tools that would otherwise require dedicated API endpoints. We are currently exploring the combination of Verneuil with SQLite databases that only store protobuf objects (one per row), using views and functional indexes to ergonomically support queries. The initial goal was to avoid schema migration pain (especially for on-premises installations), but this storage strategy could also make it a lot easier to robustly access read replicas from loosely coupled services.

Verneuil feels like a versatile tool, and we can’t be the only ones who find its guarantees interesting. That’s why we decided to open source it in its current state: it’s solid enough for our production usage, the code has fundamentally been stable for a few months, and it seems like the quickest way to improve Verneuil is to expose it to the wider world. There’s a lot to improve. The project needs external documentation, there are a few uber modules that could be split up, testing with SQLite’s TCL suite can be finicky, and a lot of tunables that should be configurable are hardcoded to values that make sense for us.

If live read replication from SQLite to S3 sounds like your kind of hack, give Verneuil a whirl, and help us make it a more robust tool. Pull requests might be a heavy lift until we streamline testing, but filing issues for missing documentation or functionality is always helpful: we made Verneuil to fit our needs, so it definitely reflects our idiosyncrasies.

The Linux VFS might also be of independent interest: it’s compatible with SQLite’s default unix ™ VFS, but has none of the backward compatibility baggage. That makes it a good starting point for alternative storage approaches.