Skip to main content
How memory works explains what Tex stores. This page explains how to choose what comes back. Use mode="active" for chat and copilots. Use mode="deep" when the user can wait longer, or when active recall is not finding enough. Use top_k to choose how many hits you give the model. Smaller values keep prompts tight. Larger values help summaries, digests, and long answers. Since hits count toward tokens_out, this also affects cost. If confidence stays under about 0.3, do not force the memory into the prompt. Try mode="deep" once, raise top_k, or ask a clearer q. Python uses tex.recall(q, session_id, ...). HTTP uses POST /recall with the same ideas. The REST field list is in Recall memory.

Modes

active (default)

  • Best for: Chat, copilots, and live user flows.
  • Rough latency: about 1.5-2.5 s end-to-end in typical setups.
  • Behavior: Single-pass retrieval and ranking.

deep

  • Best for: Offline jobs, decision reviews, or a second pass after weak active results.
  • Rough latency: about 3-6 s.
  • Behavior: Two-pass with heavier reranking.

top_k

Defaults: 15 (active) / 25 (deep). The server caps at 30 no matter what you send.
SituationStarting top_k
Tight assistant prompt3-5
Standard chat with citations8-15
Summaries or long answers20-30
Larger top_k directly increases tokens_out on your bill. How that maps to quota is in Usage, quotas, and billing.

Confidence

Every recall returns confidence in [0, 1], calibrated so that roughly P(relevant hits | confidence) ≈ confidence.
RangeHow to read itPractical move
≥ 0.6StrongPass context to the model as-is.
0.3 - 0.6MixedUse hits, but cite or summarize sources for the user.
< 0.3WeakTry mode="deep", rephrase q, or skip memory for this turn.
hits = tex.recall(q=q, session_id=sid)
if hits.confidence < 0.3:
    hits = tex.recall(q=q, session_id=sid, mode="deep")

Hit fields

RecallHit(id, text, score, kind, timestamp)        # turns + observations
RecallEntity(id, label, score)                     # entities
  • hits.hits.turns - use these for most prompts.
  • hits.hits.observations - small facts extracted from prior turns.
  • hits.hits.entities - people, places, and organizations that help with “who”, “what”, and “where” questions.

Timeline string

hits = tex.recall(q="when did we discuss pricing?",
                  session_id=sid, include_timeline=True)
print(hits.timeline)   # optional pre-rendered string
timeline is an Optional[str]: either drop it straight into a prompt or ignore it. It is not a list you iterate.

Next: usage, quotas, and billing

How recall choices affect tokens_in / tokens_out.