Lethe avatar
Brain-inspired ยท Self-aware ยท Written in Rust ยท Open source

Lethe is an AI assistant
who remembers you.

Most assistants reset the moment you close the tab. Lethe doesn't. She lives on your machine as a Rust service โ€” cortex, hippocampus, default-mode network, and brainstem, each on its own clock โ€” and she keeps thinking when you're not looking. Past conversations stay in reach. Things you mentioned weeks ago come back when they matter. And when she finds something worth fixing, she can read her own source code and propose the change.

She holds the thread for as long as you do.
A longer context window isn't the point. What you want is something that remembers the work, notices what slipped, and decides โ€” on its own โ€” when something is worth interrupting you for.
she remembers what you were doing two weeks ago
you: how's the thing going
lethe: The Springer resubmission? You uploaded the revised manuscript on the 2nd but I never saw a confirmation email come through. Want me to check the submission portal?
she hands work off and keeps talking
you: audit the deployment scripts
lethe: Already on it โ€” sent a subagent to do the review. It'll check in every couple of minutes and ping me with the result. What else?
she catches what fell through the cracks
lethe: You haven't mentioned the visa renewal since March 12. The window closes in 9 days. Want me to draft the email to the consulate, or are you already on it?
The brain names aren't metaphors.
Each region is a real actor with its own job, its own clock, and its own logs. The architecture isn't loosely "inspired by" neuroscience โ€” it's mapped to it.
cortex
The voice you talk to. Picks tools, hands work off, decides when to step in and when to wait.
hippocampus
Pulls relevant memories โ€” notes, archives, old conversations โ€” biased toward what feels load-bearing right now.
dmn
Default-mode network. Thinks between turns, drifts across goals, occasionally catches something the rest of the system missed.
brainstem
The autonomic layer. Boots the system, watches resources, runs release checks, keeps the lights on.
subagents
One-task workers spawned for specific jobs. They report progress and come back with a clean result.
attention gate
Decides whether a background thought is worth interrupting you for. Most aren't. The ones that are, get through.
01:24:18 dmn background cognition complete. found possible deadline drift
01:24:19 hippocampus recall triggered. 2 notes, 3 conversation matches, salience bias active
01:24:20 cortex delegation decision. spawned subagent: deployment audit
01:26:20 subagent progress report. checked install path, reviewing update path
01:26:21 attention notification reviewed. held for cortex decision
A runtime, not a prompt trick.

A brain has parts. So does she.

Cortex talks. Hippocampus remembers. Brainstem keeps the process alive. The default-mode network drifts and thinks. Each runs on its own clock โ€” closer to how a brain actually works than to one attention loop with tools strapped to it.

She can change herself.

She knows where her source lives and can read it. She knows her own process and can restart with new code. Her memory survives model swaps, reboots, and new hardware โ€” who she is isn't tied to any one weight set. You can rebuild her tomorrow and she'll still remember today.

One Rust binary. Yours.

Around 50 MB, statically linked. No Python, no virtualenv pinball, no pauses when a tool loop gets long. Boots in milliseconds, sits comfortably as a systemd service, swaps cleanly between Anthropic, OpenAI, OpenRouter, and a local Gemma without changing anything else.

Two minutes to memory.
1

Install

One command. Works on macOS and Linux.

curl -fsSL https://lethe.gg/install | bash
2

Say hello

Message your bot on Telegram. From this point on, she remembers.

// you'll need

1

Build llama.cpp

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc)
2

Start the model server

Download a Gemma 4 31B GGUF and run:

llama-server --model gemma-4-31B-it-Q8_0.gguf \
  --split-mode tensor --jinja --reasoning-budget 4096 \
  --ctx-size 98304 --parallel 2 --flash-attn on -fit off
3

Install Lethe & configure

curl -fsSL https://lethe.gg/install | bash

# then set in .env:
LLM_PROVIDER=openai
LLM_API_BASE=http://localhost:8090/v1
OPENAI_API_KEY=local

// you'll need