Before (Redis log)
before.py
- Hits the LLM context limit fast.
- “Last 50” is a guess. Older relevant context gets evicted.
- Redis cost grows linearly forever.
After (Tex)
after.py
- Bounded prompts. You pull the relevant 8 turns regardless of how many exist.
- Cross-session continuity. Use
f"chat-{user_id}"to share memory across conversations. - Less prompt trimming. You stop guessing which recent turns fit in context.
Backfill plan
Verify
Pick 10 sessions. For each, run a known query and compare retrieved turns against your Redis log. Look for:
- All turns are present (
active_fragment_idscount matches input) recall(q=<a known phrase>)finds the right turn
Shadow mode
Run both read paths in production for a week. Log when Tex’s
confidence < 0.2. If that rate stays below your tolerance, proceed.Cut over the read path
Switch the prompt to use Tex hits. Keep the Redis writes for one more week as backup.
Edge cases
- Streaming responses. Persist with
rememberafter the stream completes. Run it in the background so the next request is not delayed. - System messages. Do not migrate them. They consume tokens and add little recall value.
- Tool calls. Store the result of a tool call as an assistant turn, not the raw JSON. Recall returns text.
- Audit log. Tex is not an audit store. Keep Redis or another append-only log for compliance, and use Tex for retrieval.

