getmem.ai — Persistent memory API for AI agents

How it works

An async write path and a fast read path. That's all.

Ingestion returns 202 in milliseconds; a worker extracts typed knowledge, resolves entities, and writes to a hybrid vector index and a typed graph in parallel. Retrieval runs LLM-free on the fast path — fuzzy entity match, parallel hybrid search, bounded graph expansion — and assembles grounded context per request.

1

Ingest the turn

One endpoint. Returns 202 in milliseconds. Idempotent — re-sending the same conversation is safe.

POST /v1/memory/ingest

2

Extract structured knowledge

A worker reads each turn and a short recent-history window, then emits typed facts across a 12-category taxonomy with tone and confidence.

observations → knowledge items

3

Resolve entities

People, places, orgs, and topics are canonicalized with aliases and stable IDs. Relationships are typed — not bag-of-words.

LOCATED_AT · PREFERS · WORKS_AT …

4

Dual-write in parallel

Knowledge lands in a hybrid vector index (dense + sparse, 17-field payload) and a typed graph store simultaneously.

vector · graph · cache

5

Retrieve in under 300 ms

LLM-free fast path: heuristic decompose, fuzzy entity match, parallel hybrid search, bounded graph expansion, rank, assemble. Every response ships per-stage timings.

POST /v1/memory/get · sync

Extraction

Conversations become structured knowledge — not chunks.

Every turn is parsed into categorized, typed facts with tone, confidence, and provenance. Chunks of raw text can't power grounded responses; typed knowledge can.

raw conversation · session 018f3ainput

extracted knowledge · 12-category taxonomyoutput

Retrieval

Five stages. All LLM-free. All under 300 ms.

Most memory systems re-run an LLM at retrieve time. That's why their latency budgets shred your agent's response time. We don't.

Heuristic decompose

Key terms, likely entities, and tags pulled from the query and recent turns without a model call. Falls back to LLM-mode only for multi-hop questions.

query: "when was I last in Berlin"
→ terms: ["last","Berlin"] entities: [Berlin] tags: [location,time]

Fuzzy entity match

Typos, abbreviations, morphological variants, and diacritic drift all normalize to the same canonical entity ID. Partial-ratio matching with per-token fallback.

"Burlin" → Berlin (0.92)
"Alex" → Alexander (0.88)
"в Берлине" → Берлин (0.95)

Parallel hybrid search

Four to five indexes queried concurrently: dense semantic, sparse lexical, entity-sparse, tag-chain, recent-session. Every query pre-filtered by tenant.

dense: 42ms · sparse: 38ms
entity: 21ms · tag: 19ms
fan-out · fan-in · merge

Graph expansion

Matched entities seed a bounded graph traversal that pulls in directly related facts. User asks about Berlin → we surface "Alexander lives in Berlin" even when it wasn't top-ranked.

Berlin ←LOCATED_AT— u
u —PREFERS→ weekend runs
depth: 2 · max-edges: 12

Rank & assemble

Weighted blend of similarity, recency, confidence, and tone. Results are deduplicated and formatted into a structured prompt block your model can consume directly.

[user profile] Alexander lives in Berlin
[preference] weekend runs
[related] Berlin ← LOCATED_AT ← u

Every stage, observable

Every response carries a meta block with per-stage latency, matched entities, and token count. If memory is the bottleneck, you'll see it.

decompose_ms: 4
search_ms: 86 graph_ms: 21
rank_ms: 12 total_ms: 142

Built for

Agents that need deep, accurate personalization.

Anywhere the quality of a response depends on what the system already knows about the user — getmem is the memory layer.

Healthcare

Patient agents that remember.

Medications, allergies, prior symptoms, care preferences, provider instructions — surfaced on every turn without asking the patient to repeat themselves.

"I've got that same headache again"

Your agent already knows: patient has hypertension, is on lisinopril 10mg, reported similar migraines 6 weeks ago, and prefers non-pharma options first.

HIPAA-ready isolationper-patient scopingaudit log

Legal

Case-aware research agents.

Matter facts, precedents cited in prior sessions, client preferences on risk tolerance, jurisdiction, procedural posture — all grounded context, always current.

"Summarize the risks for tomorrow's filing"

Agent retrieves: §230 safe-harbor discussed Tue, client's risk preference: conservative, jurisdiction: SDNY, and the three cases you flagged last week.

matter-scopedcitations preservedreviewer-safe

Personal AI

Companions that actually know you.

Goals, preferences, relationships, recurring contexts, long-running projects — carried across sessions, across devices, across weeks and months.

"How's the marathon training going?"

Agent remembers: Berlin marathon in Sep, target 3:45, prefers weekend long runs, last week's mileage was 48km, and you mentioned a knee twinge on Tuesday.

cross-sessionlong-horizonper-user

Integrations

Drop in next to your existing stack. No router, no chat framework, no new model.

Two API calls, or a one-line adapter for the runtimes you already ship. getmem composes with OpenAI, Anthropic, and ChatGPT-style apps without changing your deployment topology.

Memory, alongside your model.

Call /get before you prompt, inject the grounded context block, call /ingest fire-and-forget after the turn completes. That's it.

Works with any model. OpenAI, Anthropic, open weights, on-prem — we touch the context block, not the model.

SDKs for Python, TypeScript, Go. Plus a REST API for everything else. Adapters for LangChain, LlamaIndex, Vercel AI SDK.

No vendor lock-in on storage. Export your memories as JSON at any time. Full-data export is a single API call.

Per-stage meta on every call. Latency budget blown? Your logs will say exactly where.

pip install getmem-ai

# pip install getmem-ai
import getmem_ai as getmem
mem = getmem.init("gm_live_...")

# Get context before LLM call
ctx = mem.get("uid", query=msg)["context"]

# Save both roles after each turn
mem.ingest("uid", messages=[
  {"role": "user", "content": msg},
  {"role": "assistant", "content": reply}
])

// npm install getmem
import { GetMem } from 'getmem';
const mem = new GetMem({
  apiKey: 'gm_live_...'
});

// Get context before LLM call
const { ctx } =
  await mem.getContext('uid', msg);

// Save both roles after each turn
await mem.ingest('uid', {
  messages: [
    {role:'user', content:msg},
    {role:'assistant', content:reply}
  ]
});

SKILL.md

---
name: getmem
description: 'Memory via getmem.ai'
metadata:
  openclaw:
    install:
      - kind: pip
        package: getmem-ai
---

import getmem_ai as getmem, os
mem = getmem.init(os.environ["GETMEM_API_KEY"])

# Before LLM call
ctx = mem.get(uid, query=msg)["context"]
# After — save both roles
mem.ingest(uid, messages=[
  {"role":"user", "content":msg},
  {"role":"assistant", "content":reply}
])

Drop SKILL.md into your skills directory. Any agent that loads this skill gets persistent memory automatically.

Compared

What rolling your own actually costs.

Four paths people take to give agents memory. You can do any of them — we'd just rather you ship this week.

Setuptime to first call

Knowledge extractionturn → typed facts

Entity resolutionfuzzy · multilingual

Typed graphrelations · traversal

Latency · p50retrieval budget

Context tokensper turn sent to model

Per-user isolationtenant safety

Auditabilityobservability per call

getmem

managed memory layer

2 API calls

12-category taxonomy, included

built in · fuzzy · multilingual

typed edges, bounded traversal

< 300ms

~5% of history

filter-level, provable

per-stage meta on every call

Vector DB + glue

pinecone · weaviate · qdrant

4–6 weeks of glue

you write & tune prompts

you build it

extra service

300–800ms

20–40%

app-layer only

build your own

Full history in context

do nothing · send it all

trivial & bad

none

token-bound

100%

inherent

none

Pricing

Pay for what you use.
$20 free on signup, then $10 / month.

Top up your balance, get locked-in lower per-call prices for as long as the balance holds. No seats, no minimums, no commitments. $20 free on signup, then $10 added every month automatically.

1. Pick a deposit

The bigger the deposit, the cheaper every call.

2. Estimate your usage

Active users / mo 2,000

Turns / user / day 6

LLM tokens per turn, as a conversation grows

rolling history with getmem

Rolling-history agents pay for every prior turn, forever. getmem keeps the prompt small — because typed knowledge beats raw transcripts. —

Ingest calls / month—

Get calls / month—

Ingest cost—

Get cost—

Expected monthly spend—

$250 deposit Deposit acts as prepaid balance. —. When it drops to zero, top up again — same tier prices lock in at the new deposit level.

Total cost with getmem net saved

with

saved tokens

LLM tokens / turn · rolling history—

LLM tokens / turn · with getmem context—

LLM bill without memory—

LLM bill with memory—

+ getmem calls—

Net saved / month—

Assumes GPT-4.1-class pricing ($2.50 / 1M in · $10 / 1M out), a rolling history that grows to ~4,000 input tokens / turn by turn 10, and getmem's grounded context block averaging ~250 input tokens / turn. Output tokens held constant at 350.

Deposit range

Ingest · per call

Get · per call

Discount

$0 – $50 · Starter

$0.002000

$0.000500

base

$50 – $250 · Indie

$0.001600

$0.000400

20%

$250 – $1,000 · Growth

$0.001200

$0.000300

40%

$1,000+ · Scale

$0.000800

$0.000200

60%

All plans include: 12-category taxonomy · entity resolution · typed graph · per-user isolation · full export · $20 free credit on signup + $10 / month added automatically.

Trust

Tenant-isolated by default. No training on your data. Every write audited.

Filter-level isolation

Every vector search, every graph query, and every cache key is scoped by (developer, project, user). Cross-tenant reads are architecturally impossible, not just policed.

Embeddings & extraction — no-train

Customer conversations never train shared models. Upstream vendor calls run under no-training API terms; our own models never see cross-tenant data.

Every stage, timed

Per-stage latency, matched entities, and token count on every response. Structured logs carry a request ID through every hop. If memory is slow, you'll see exactly where.

Deterministic writes

Observation IDs are deterministic UUIDv5s derived from (dev, project, user, session, index). Re-ingesting the same conversation is safe — identical observations, no dupes.

Quotas, never silent drops

Ingest failures propagate as structured errors with stable codes. Exhausted quota returns 402 — never a 200 with an empty body.

Export, always

A single API call returns your entire memory corpus as JSON. Migrate away, archive, or replay into a new tenant. Your data is never held hostage.

FAQ

Questions developers actually ask.

How is this different from a vector DB?

A vector DB stores embeddings and indexes. getmem is one layer up: raw conversation in, typed structured knowledge out, across both a hybrid vector index and a typed graph store. If you want to manage chunks and similarity, use a vector DB. If you want your agent to remember things, use getmem.

How is this different from document RAG?

RAG is for documents. getmem is for conversations. They compose — we're the memory of what the user said and meant across sessions; RAG is the knowledge your product already has on a shelf.

Do you train on our data?

No. Embeddings and extraction run under no-training vendor terms. Our own models, when used, never see cross-tenant data. Isolation is enforced at the filter level on every query.

What happens to superseded memories?

Nothing silent. When a fact changes (switched from VS Code to Neovim) we detect the contradiction at ingest time, mark the old item status=superseded, and write the new one — with a pointer between them. Your audit log is default-on.

What's the real latency story?

Retrieval is sync and designed to stay under 300 ms on the fast path. Ingest is async and returns 202 in milliseconds. Every response carries meta.total_ms plus per-stage breakdowns. No hidden budget.

Can I self-host?

Single-tenant deploys are available on the Scale tier. The Managed plan is what most developers run in production — same binary, same contract, same latency envelope.

How does the deposit pricing actually work?

Top up a balance; your deposit size locks a per-call price for as long as the balance lasts. Larger deposits unlock bigger discounts (20% / 40% / 60%). When the balance runs down, top up again at whatever level fits your next month. No subscriptions, no minimums.

The memory layerfor AI agents.

An async write path and a fast read path. That's all.

Ingest the turn

Extract structured knowledge

Resolve entities

Dual-write in parallel

Retrieve in under 300 ms

Conversations become structured knowledge — not chunks.

Five stages. All LLM-free. All under 300 ms.

Heuristic decompose

Fuzzy entity match

Parallel hybrid search

Graph expansion

Rank & assemble

Every stage, observable

Agents that need deep, accurate personalization.

Patient agents that remember.

Case-aware research agents.

Companions that actually know you.

Drop in next to your existing stack. No router, no chat framework, no new model.

Memory, alongside your model.

What rolling your own actually costs.

Pay for what you use.
$20 free on signup, then $10 / month.

1. Pick a deposit

Total cost with getmem net saved

Tenant-isolated by default. No training on your data. Every write audited.

Filter-level isolation

Embeddings & extraction — no-train

Every stage, timed

Deterministic writes

Quotas, never silent drops

Export, always

Questions developers actually ask.

Give your agent a memory.

The memory layerfor AI agents.

An async write path and a fast read path. That's all.

Ingest the turn

Extract structured knowledge

Resolve entities

Dual-write in parallel

Retrieve in under 300 ms

Conversations become structured knowledge — not chunks.

Five stages. All LLM-free. All under 300 ms.

Heuristic decompose

Fuzzy entity match

Parallel hybrid search

Graph expansion

Rank & assemble

Every stage, observable

Agents that need deep, accurate personalization.

Patient agents that remember.

Case-aware research agents.

Companions that actually know you.

Drop in next to your existing stack. No router, no chat framework, no new model.

Memory, alongside your model.

What rolling your own actually costs.

Pay for what you use.$20 free on signup, then $10 / month.

1. Pick a deposit

Total cost with getmem net saved

Tenant-isolated by default. No training on your data. Every write audited.

Filter-level isolation

Embeddings & extraction — no-train

Every stage, timed

Deterministic writes

Quotas, never silent drops

Export, always

Questions developers actually ask.

Give your agent a memory.

Pay for what you use.
$20 free on signup, then $10 / month.