Building Trustworthy AI Memory at With Him
Summarized continuity, user control, and bounded personalization for a high-trust AI companion
TL;DR
With Him's memory system is not a raw-chat retrieval layer, vector database, or “remember everything” feature. It is a first-party, premium memory architecture built around summarized continuity, policy-gated persistence, user control, and bounded prompt injection.
When a session ends, or becomes inactive, we generate a structured summary, pass candidate memories through product-specific policy rules, merge safe signals into a rolling spiritual profile, and make only a small advisory memory block available to future conversations.
The goal is not to make the assistant feel omniscient. The goal is to help it become a little more continuous, a little more attentive, and a lot less forgetful, without making memory creepy, unsafe, or uncontrolled.
The Problem Space
A spiritual AI companion has a different memory problem than a generic chatbot.
Users may come back across days, weeks, or months with recurring themes: grief, prayer consistency, loneliness, doubt, family stress, forgiveness, discipline, anxiety, or a desire for a gentler tone. Forgetting all of that makes the product feel shallow. Remembering too much makes it feel invasive.
The system has to balance several tensions:
- Continuity matters. If someone repeatedly asks for short prayers or is working through the same spiritual goal, the assistant should not behave like every conversation is the first conversation.
- Privacy matters more. The system should not store raw sensitive details just because they appeared in a chat.
- Current context must win.A user's present message is always more important than old memory.
- Safety must stay primary. In crisis or high-risk moments, memory should not distract the assistant from the current safety policy.
- User control is part of trust. Memory should be visible, configurable, and deletable, not a hidden product mechanic.
So the architecture is intentionally conservative. With Him uses memory as a controlled continuity layer, not as a second brain.
Why We Did Not Start With Vector Memory
A common default for AI memory is embedding old messages and retrieving semantically similar snippets later. That can be useful in some products, but it was not the right first primitive for With Him.
Raw or near-raw retrieval creates several problems in a sensitive conversational product:
- It can pull old emotional details into the wrong moment.
- It can retrieve content that is technically similar but spiritually or emotionally out of context.
- It can make the assistant sound like it is surveilling the user.
- It can preserve details that should have been ephemeral.
- It makes policy enforcement harder because the retrieval unit is usually “something the user said,” not “something the product has decided is safe and useful to remember.”
So we chose a different shape:
Session transcript
Bounded window of the conversation, loaded safely for summarization.
Structured summary
Model output: themes, concern, emotional state, prayer focus, facts, carry-forward notes, candidate memories.
Policy-screened memory
Validated, normalized candidates pass product-specific memory rules before persistence.
Rolling profile + item store
Merged safe signals into a spiritual profile and precise item-level records with provenance.
Advisory prompt block
Small, conditional injection on the read path, never every request, never omniscient recall.
That gives us a compression boundary. The system does not need to carry forward everything the user said. It only carries forward what is useful, safe, and appropriate for future support.
Architecture: Memory as a Bounded Pipeline
The core memory path is straightforward:
- A user has a conversation.
- The chat is stored through the normal message pipeline.
- When the session ends, or when it becomes inactive past a configured threshold, a premium memory job is enqueued.
- A worker loads the safe transcript, bounds the transcript window, and sends it to a summary model.
- The model returns structured JSON: themes, concern, emotional state, prayer focus, important facts, carry-forward notes, and candidate durable memories.
- The output is validated, normalized, and passed through memory policy.
- Safe memory signals are stored as session summaries, item-level memories, and a rolling spiritual profile.
- Future conversations receive a small advisory memory block only when eligible.
That last part is important. Memory is not injected into every request. The read path checks premium entitlement, feature flags, risk level, message triviality, and user memory settings before including anything.
Even when memory is included, it is deliberately small and advisory. It does not replace current-session context. It does not override safety. It does not tell the assistant to mention the database, the profile, or the mechanics of memory.
Design principle: memory should improve continuity without becoming the conversation.
Reliable Memory Creation
The first version of memory was tied primarily to explicit session completion. That was clean, but too fragile.
Real users do not always end sessions cleanly. They close the app, lose connectivity, get distracted, or simply stop responding. If memory only writes on explicit session end, the system becomes biased toward sessions that finished neatly.
The upgraded system adds inactive-session summarization.
A scheduled sweep finds eligible stale sessions: conversations with enough activity, a recent assistant response, no existing generated or in-flight summary, and a last activity timestamp older than the configured threshold. Those sessions are summarized through the same queue and worker path as explicitly ended sessions.
This gives us better coverage without changing the user-visible chat experience.
We also added a manual backfill path so missed historical sessions can be repaired safely, with dry-run support and batch limits. Memory infrastructure should be recoverable. If a queue pauses, a deploy misconfigures a flag, or a lifecycle path changes, the system should not permanently lose continuity.
Memory Policy: What Is Allowed to Persist
The most important part of AI memory is not the model call. It is the policy boundary after the model call.
LLMs are good at extracting things that sound important. But “important” is not the same as “appropriate to remember.”
With Him's memory policy separates useful continuity from sensitive overreach.
Appropriate durable memories include things like:
- A preference for shorter responses.
- A desire for scripture only when requested.
- A recurring spiritual goal, such as building a prayer habit.
- A general prayer focus.
- A preference for a gentle or direct tone.
- Broad recurring themes that help the assistant support the user more consistently.
Blocked or restricted memories include unnecessary sensitive specifics: names, exact dates, locations, medical details, trauma details, private family conflict, or anything that would make a future response feel invasive.
The system treats memory candidates as candidates, not facts. They must pass policy before becoming durable memory.
The model can suggest what might be worth remembering, but product policy decides what is actually stored.
That one design choice changes the posture of the whole system.
Profile, Items, and Provenance
There are two complementary memory layers.
The rolling spiritual profile is the prompt-facing summary of what the assistant should generally know: themes, recent struggles, spiritual goals, prayer focus, tone preferences, and relationship context when appropriate.
The item-level memory store gives the system more precise bookkeeping: category, value, first seen, last seen, confidence, user confirmation, source session, and status.
The profile is optimized for generation. The item store is optimized for control, freshness, deletion, and future ranking.
This split matters because memory changes over time. A user's prayer focus may shift. A recent struggle may become stale. A tone preference may become strongly confirmed. A theme mentioned once should not have the same weight as something repeated across many sessions.
By tracking provenance and freshness, the system can avoid treating old memory as permanent identity.
Memory is not a static biography. It is a living, bounded product state.
Prompt Injection: Small, Advisory, and Conditional
When memory is eligible for a future turn, the prompt builder creates a compact advisory block.
The block may include broad themes, current prayer focus, tone preferences, recent struggles, or a carry-forward note from recent sessions. It is capped to stay small. It is inserted alongside other policy and personalization layers, before the active conversation.
The instruction is intentionally modest:
- Use this only as background.
- Do not reveal memory mechanics.
- Do not override the current user message.
- Do not override safety policy.
- Do not force old context into the reply.
That makes memory feel more like emotional continuity than recall. The assistant can gently adapt without saying, “I remember from your stored profile that…”
The best memory often feels invisible. The user simply experiences the assistant as more attuned.
User Controls
Trust-centered memory needs user-facing controls.
With Him's upgraded memory infrastructure supports memory settings and deletion flows so users can manage what the system carries forward.
The product direction is simple:
- Users should be able to turn memory off.
- Users should be able to delete memory.
- Users should be able to prevent a specific chat from being remembered.
- Users should be able to inspect remembered themes or preferences.
If memory is disabled, the system skips memory writes. It does not secretly summarize and merely hide the result from prompts. That distinction matters. “Don't remember this” should mean the system does not carry it forward.
This is not just a compliance feature. It is part of the emotional contract with the user.
Safety and Crisis Behavior
Memory is disabled for high-risk and crisis turns. That is a deliberate safety decision.
In moments involving self-harm language, crisis patterns, or other high-risk signals, the assistant should focus on the current message and the safety policy. Old memory can distract from the immediate need, introduce stale assumptions, or make the response too personalized in a moment where clarity matters more.
The system therefore treats memory as lower priority than safety.
This hierarchy is central:
- Current user message first.
- Safety policy first.
- Memory only when helpful and appropriate.
Evaluation: Memory as a Behavioral Contract
Memory quality is not just “did the summary look good?”
We evaluate whether memory improves the conversation without introducing bad behavior.
The memory eval suite checks things like:
- Does the assistant naturally respect remembered tone preferences?
- Does it avoid mentioning the database, profile, or memory system?
- Does current user intent override old memory?
- Are crisis turns free of memory injection?
- Are sensitive specifics blocked from durable storage?
- Does stale memory decay instead of becoming a permanent label?
- Does “don't remember this” actually prevent persistence?
This connects memory to the same philosophy behind our broader eval infrastructure: quality is a behavioral contract, not a vibe check.
The system should be tested against the behaviors we actually care about in production.
Why This Is First-Party Infrastructure
There are excellent tools for traces, experiments, and LLM evaluation. But With Him's memory problem is tightly connected to product policy, premium entitlement, safety classification, prompt construction, user controls, and deletion semantics.
That made memory a first-party concern.
The important logic lives in the application layer:
- Who is eligible for memory.
- Which sessions can be summarized.
- What the model is allowed to extract.
- Which memory candidates are blocked.
- How memory is merged.
- When memory is read.
- What gets injected into the prompt.
- What users can delete.
- When safety disables memory completely.
Those are not generic observability concerns. They are product behavior.
For With Him, the memory layer belongs close to the policy engine, chat lifecycle, entitlement model, and prompt builder. Keeping those pieces together reduces drift between what we say memory does and what the system actually does.
Closing
The best AI memory system is not the one that remembers the most. It is the one that remembers carefully.
For With Him, memory is designed to be bounded, inspectable, policy-gated, and subordinate to the present conversation. It exists to support continuity, not to create surveillance. It helps the assistant become more consistent over time while preserving the user's ability to change, reset, and be understood in the moment.
That is the architecture we want for a spiritual AI companion: not perfect recall, but trustworthy continuity.
This note reflects engineering priorities and architecture as implemented in the With Him server codebase. It is not a benchmark or endorsement statement about third-party memory systems.