mode="active" for chat and copilots. Use mode="deep" when the user can wait longer, or when active recall is not finding enough.
Use top_k to choose how many hits you give the model. Smaller values keep prompts tight. Larger values help summaries, digests, and long answers. Since hits count toward tokens_out, this also affects cost.
If confidence stays under about 0.3, do not force the memory into the prompt. Try mode="deep" once, raise top_k, or ask a clearer q.
Python uses tex.recall(q, session_id, ...). HTTP uses POST /recall with the same ideas. The REST field list is in Recall memory.
Modes
active (default)
- Best for: Chat, copilots, and live user flows.
- Rough latency: about 1.5-2.5 s end-to-end in typical setups.
- Behavior: Single-pass retrieval and ranking.
deep
- Best for: Offline jobs, decision reviews, or a second pass after weak
activeresults. - Rough latency: about 3-6 s.
- Behavior: Two-pass with heavier reranking.
top_k
Defaults: 15 (active) / 25 (deep). The server caps at 30 no matter what you send.
| Situation | Starting top_k |
|---|---|
| Tight assistant prompt | 3-5 |
| Standard chat with citations | 8-15 |
| Summaries or long answers | 20-30 |
top_k directly increases tokens_out on your bill. How that maps to quota is in Usage, quotas, and billing.
Confidence
Every recall returnsconfidence in [0, 1], calibrated so that roughly P(relevant hits | confidence) ≈ confidence.
| Range | How to read it | Practical move |
|---|---|---|
| ≥ 0.6 | Strong | Pass context to the model as-is. |
| 0.3 - 0.6 | Mixed | Use hits, but cite or summarize sources for the user. |
| < 0.3 | Weak | Try mode="deep", rephrase q, or skip memory for this turn. |
Hit fields
hits.hits.turns- use these for most prompts.hits.hits.observations- small facts extracted from prior turns.hits.hits.entities- people, places, and organizations that help with “who”, “what”, and “where” questions.
Timeline string
timeline is an Optional[str]: either drop it straight into a prompt or ignore it. It is not a list you iterate.
Next: usage, quotas, and billing
How recall choices affect
tokens_in / tokens_out.
