Wax AI memory: the SQLite-style file that keeps agents offline

Build private, on-device assistants with Wax, a single-file memory engine that combines BM25, vector search, and crash recovery.

Feb 19, 2026

A lot of “AI agents” look independent until you ask them to remember something. Then the hidden dependency chain shows up: a vector database, a separate search service, background jobs, auth, monitoring, backups, and a new question you did not want to answer yet, which machine is holding the user’s data.

Wax is an attempt to cut that loop by turning “memory” into a local artifact you can ship, back up, and audit. The project describes itself as “the SQLite for AI memory,” which is a useful mental model if you have ever relied on SQLite to keep an app simple and portable.

Why local memory keeps your agent honest

When the memory layer lives in your infrastructure, the memory layer becomes your product. It is easy to start “offline-first” and later add a hosted index “for convenience,” until the hosted part is the only version that gets attention.

Wax pushes in the opposite direction. Its core goal is “zero network calls,” so the memory store stays on the device unless you choose to move it. That changes the incentives in a few practical ways:

Privacy by default: a local file reduces the risk of accidental data exfiltration to hosted RAG services.
Lower attack surface: fewer daemons, fewer open ports, fewer credentials, fewer surprise integrations.
Portability: one file is easy to back up, move between machines, and keep under your own retention policy.
Less account risk: there is no vendor dashboard that can throttle you, deplatform you, or decide your content needs review.

Wax makes the autonomy argument in a very concrete way. If the memory is a file you own, it is hard for anyone else to gatekeep it.

What Wax actually is

Wax is a Swift-native memory engine designed for on-device apps and local tooling. You create a memory store, ingest text (and optionally richer media workflows), then ask it to recall relevant context with deterministic token budgeting.

The novelty is not “it can do vector search.” Wax describes a hybrid retrieval approach that runs multiple lanes in parallel, including BM25, vector search, temporal signals, and structured evidence. It then fuses the results based on the query type. It also claims strict, reproducible token counting using cl100k_base, so “same query equals same context” becomes a property you can test.

If you want the canonical overview and the evolving API surface, the Wax repository on GitHub is where the project documents its design.

What lives inside a .mv2s file

Wax stores memory in a single .mv2s file. The README says that file bundles:

Raw documents
Embeddings (any dimension, any provider)
A BM25 full-text index (FTS5)
A vector index using HNSW (USearch)
A write-ahead log for crash recovery
Metadata plus an entity graph

Wax also describes the file format as self-contained with an append-only layout, checksum verification, and a dual-header “atomic switch” design.

This packaging detail matters because “memory” is usually the place where local-first projects quietly reintroduce cloud dependencies. If the index is a separate service, it starts to feel “normal” to host it. If the index is inside the same artifact as the data, it is much easier to keep everything local.

How Wax retrieves context without surprises

In practice, “memory” breaks down in two common ways.

One is relevance. Pure vector similarity is often good, but it can miss exact matches, recent events, and structured facts. Wax’s hybrid approach is explicitly trying to avoid that by blending multiple retrieval strategies instead of choosing a single one.

The second is prompt instability. If your retrieval step does not have deterministic budgeting, you can get different context from the same query when the store grows or the system changes. Wax claims deterministic token counting with cl100k_base, which aims to make the retrieval step reproducible, not just fast.

Wax also publishes Apple-Silicon-focused performance numbers in its README, including sub-millisecond warm vector search at 10,000 documents and comparisons across cold-start and CPU paths. You should still treat any benchmark as hardware- and workload-dependent, but the intent is clear: the project is optimizing for on-device speed.

Quick walkthrough 1: 30-second “it works” demo

Wax’s pitch is straightforward. No Docker, no vector DB, no network calls. The fastest way to evaluate that is to run the minimal loop: create a memory file, store a fact, and recall it.

Create a small Swift executable project

mkdir wax-demo && cd wax-demo
swift package init --type executable

Add Wax to Package.swift

Add the dependency:

.dependencies: [
  .package(url: "https://github.com/christopherkarani/Wax.git", from: "0.1.6")
]

If you want the exact SwiftPM repository URL outside the snippet, here it is as a reference link: Wax SwiftPM repository URL.

Paste this into Sources/main.swift

import Foundation
import Wax

@main
struct Demo {
  static func main() async throws {
    let storeURL = URL(fileURLWithPath: "brain.mv2s")
    let brain = try await MemoryOrchestrator(at: storeURL)

    try await brain.remember(
      "User prefers dark mode and gets headaches from bright screens",
      metadata: ["source": "onboarding"]
    )

    let context = try await brain.recall(query: "user preferences")

    for item in context.items {
      print("[\(item.kind)] \(item.text)")
    }
  }
}

Run it

swift run

Your first success test is simple. The output should include the remembered preference, along with any other ranked context Wax returns for that query.

Quick walkthrough 2: Dropping Wax into a real iOS or macOS app

A lot of memory libraries feel neat in a CLI demo, then get awkward the moment you have to deal with real app lifecycles, sandbox paths, and persistence. Wax is clearly aiming for that world.

Hardware requirements and sensible buying options

Wax lists its requirements in the README as Swift 6.2, iOS 26 or macOS 26, and Apple Silicon for Metal GPU features.

What that means in practice depends on the size of your memory stores and how much you care about GPU acceleration.

Minimum (CPU-first, small memory stores)

A modern Apple Silicon Mac, like an Apple Mac mini (M2).
At least 16GB unified memory.
Storage headroom for .mv2s files. Starting with a 1TB NVMe SSD can be a comfortable baseline, and external storage can work if you want portable archives.

Recommended (smoother dev, bigger stores, faster search)

An Apple Mac Studio (M2 Max).
32GB unified memory if you expect larger stores and heavier local workflows.
Fast local storage like a 2TB NVMe SSD.

If you are targeting iPhone on-device apps

Test on a recent Pro device so you are not tuning performance on the slow path. For example, an iPhone 15 Pro gives you a higher-performance baseline for profiling.
Share Popular AI

Licensing and openness

Wax is Apache-2.0 licensed. It also does not ship model weights.

If you add your own embedding model, whether that is CoreML, a local server, or something else, the license for that model is separate. Wax explicitly describes itself as able to store embeddings of any dimension from any provider, which keeps the storage layer decoupled from the model choice.

Gotchas you should know before you bet on it

A single-file memory store can make local AI feel simple again. It also concentrates risk.

Your .mv2s file is sensitive because it can contain raw documents alongside embeddings and indexes. If you copy that file to cloud backup without thinking, you can undo the privacy benefits you were aiming for. Full-disk encryption, app container protections, and careful handling of exports matter.

The README does not advertise an encryption feature. That is not necessarily a deal-breaker, but it means the primary protection is the operating system and your own threat model.

Platform requirements are also aggressive. Swift 6.2 plus iOS 26 or macOS 26 will exclude older devices, which matters if you are shipping broadly.

Finally, the performance story is Apple-Silicon-centric. Wax highlights Metal acceleration and benchmarks on Apple hardware. If you are running CPU-only, expect slower vector search even if it still works for many workloads.

The practical takeaway

Wax is a sharp example of where local AI tooling is heading. Fewer moving parts. Fewer cloud assumptions. More capability packed into artifacts you can carry.

If you are building privacy-first assistants, offline-first iOS apps, or local agent tooling, Wax looks worth a weekend. Start with the 30-second demo, then move the .mv2s file into your real app container and treat it like the memory database it wants to be.

If it holds up under real workloads, Wax is the kind of project that shifts leverage back to users and developers, simply because it makes “run it yourself” the default again.

Explore more from Popular AI:

Start here | Local AI | Fixes & guides | Builds & gear | AI briefing

Comments

Ready for more?