Your app sends us conversation turns. We extract structured knowledge, store it across a typed graph and a hybrid vector index, and return ranked, grounded context on every turn — in under 300 ms, isolated per end-user, with 95% fewer context tokens.
Ingestion returns 202 in milliseconds; a worker extracts typed knowledge, resolves entities, and writes to a hybrid vector index and a typed graph in parallel. Retrieval runs LLM-free on the fast path — fuzzy entity match, parallel hybrid search, bounded graph expansion — and assembles grounded context per request.
One endpoint. Returns 202 in milliseconds. Idempotent — re-sending the same conversation is safe.
A worker reads each turn and a short recent-history window, then emits typed facts across a 12-category taxonomy with tone and confidence.
People, places, orgs, and topics are canonicalized with aliases and stable IDs. Relationships are typed — not bag-of-words.
Knowledge lands in a hybrid vector index (dense + sparse, 17-field payload) and a typed graph store simultaneously.
LLM-free fast path: heuristic decompose, fuzzy entity match, parallel hybrid search, bounded graph expansion, rank, assemble. Every response ships per-stage timings.
Every turn is parsed into categorized, typed facts with tone, confidence, and provenance. Chunks of raw text can't power grounded responses; typed knowledge can.
Most memory systems re-run an LLM at retrieve time. That's why their latency budgets shred your agent's response time. We don't.
Key terms, likely entities, and tags pulled from the query and recent turns without a model call. Falls back to LLM-mode only for multi-hop questions.
Typos, abbreviations, morphological variants, and diacritic drift all normalize to the same canonical entity ID. Partial-ratio matching with per-token fallback.
Four to five indexes queried concurrently: dense semantic, sparse lexical, entity-sparse, tag-chain, recent-session. Every query pre-filtered by tenant.
Matched entities seed a bounded graph traversal that pulls in directly related facts. User asks about Berlin → we surface "Alexander lives in Berlin" even when it wasn't top-ranked.
Weighted blend of similarity, recency, confidence, and tone. Results are deduplicated and formatted into a structured prompt block your model can consume directly.
Every response carries a meta block with per-stage latency, matched entities, and token count. If memory is the bottleneck, you'll see it.
Anywhere the quality of a response depends on what the system already knows about the user — getmem is the memory layer.
Medications, allergies, prior symptoms, care preferences, provider instructions — surfaced on every turn without asking the patient to repeat themselves.
Matter facts, precedents cited in prior sessions, client preferences on risk tolerance, jurisdiction, procedural posture — all grounded context, always current.
Goals, preferences, relationships, recurring contexts, long-running projects — carried across sessions, across devices, across weeks and months.
Two API calls, or a one-line adapter for the runtimes you already ship. getmem composes with OpenAI, Anthropic, and ChatGPT-style apps without changing your deployment topology.
Call /get before you prompt, inject the grounded context block, call /ingest fire-and-forget after the turn completes. That's it.
# pip install getmem-ai import getmem_ai as getmem mem = getmem.init("gm_live_...") # Get context before LLM call ctx = mem.get("uid", query=msg)["context"] # Save both roles after each turn mem.ingest("uid", messages=[ {"role": "user", "content": msg}, {"role": "assistant", "content": reply} ])
Four paths people take to give agents memory. You can do any of them — we'd just rather you ship this week.
Top up your balance, get locked-in lower per-call prices for as long as the balance holds. No seats, no minimums, no commitments. $20 free on signup, then $10 added every month automatically.
$2.50 / 1M in · $10 / 1M out), a rolling history that grows to ~4,000 input tokens / turn by turn 10, and getmem's grounded context block averaging ~250 input tokens / turn. Output tokens held constant at 350.
All plans include: 12-category taxonomy · entity resolution · typed graph · per-user isolation · full export · $20 free credit on signup + $10 / month added automatically.
Every vector search, every graph query, and every cache key is scoped by (developer, project, user). Cross-tenant reads are architecturally impossible, not just policed.
Customer conversations never train shared models. Upstream vendor calls run under no-training API terms; our own models never see cross-tenant data.
Per-stage latency, matched entities, and token count on every response. Structured logs carry a request ID through every hop. If memory is slow, you'll see exactly where.
Observation IDs are deterministic UUIDv5s derived from (dev, project, user, session, index). Re-ingesting the same conversation is safe — identical observations, no dupes.
Ingest failures propagate as structured errors with stable codes. Exhausted quota returns 402 — never a 200 with an empty body.
A single API call returns your entire memory corpus as JSON. Migrate away, archive, or replay into a new tenant. Your data is never held hostage.
switched from VS Code to Neovim) we detect the contradiction at ingest time, mark the old item status=superseded, and write the new one — with a pointer between them. Your audit log is default-on.meta.total_ms plus per-stage breakdowns. No hidden budget.Claim your API key and get $20 free credit instantly — no card needed. Then $10 added every month, automatically.