Recall and ranking - Tex | Memory API for agents

How memory works explains what Tex stores. This page explains how to choose what comes back. Use mode="active" for chat and copilots. Use mode="deep" when the user can wait longer, or when active recall is not finding enough. Use top_k to choose how many hits you give the model. Smaller values keep prompts tight. Larger values help summaries, digests, and long answers. Since hits count toward tokens_out, this also affects cost. If confidence stays under about 0.3, do not force the memory into the prompt. Try mode="deep" once, raise top_k, or ask a clearer q. Python uses tex.recall(q, session_id, ...). HTTP uses POST /recall with the same ideas. The REST field list is in Recall memory.

Modes

`active` (default)

Best for: Chat, copilots, and live user flows.
Rough latency: about 1.5-2.5 s end-to-end in typical setups.
Behavior: Single-pass retrieval and ranking.

`deep`

Best for: Offline jobs, decision reviews, or a second pass after weak active results.
Rough latency: about 3-6 s.
Behavior: Two-pass with heavier reranking.

`top_k`

Defaults: 15 (active) / 25 (deep). The server caps at 30 no matter what you send.

Situation	Starting `top_k`
Tight assistant prompt	3-5
Standard chat with citations	8-15
Summaries or long answers	20-30

Larger top_k directly increases tokens_out on your bill. How that maps to quota is in Usage, quotas, and billing.

Confidence

Every recall returns confidence in [0, 1], calibrated so that roughly P(relevant hits | confidence) ≈ confidence.

Range	How to read it	Practical move
≥ 0.6	Strong	Pass context to the model as-is.
0.3 - 0.6	Mixed	Use hits, but cite or summarize sources for the user.
< 0.3	Weak	Try `mode="deep"`, rephrase `q`, or skip memory for this turn.

hits = tex.recall(q=q, session_id=sid)
if hits.confidence < 0.3:
    hits = tex.recall(q=q, session_id=sid, mode="deep")

Hit fields

RecallHit(id, text, score, kind, timestamp)        # turns + observations
RecallEntity(id, label, score)                     # entities

hits.hits.turns - use these for most prompts.
hits.hits.observations - small facts extracted from prior turns.
hits.hits.entities - people, places, and organizations that help with “who”, “what”, and “where” questions.

Timeline string

hits = tex.recall(q="when did we discuss pricing?",
                  session_id=sid, include_timeline=True)
print(hits.timeline)   # optional pre-rendered string

timeline is an Optional[str]: either drop it straight into a prompt or ignore it. It is not a list you iterate.

Next: usage, quotas, and billing

How recall choices affect tokens_in / tokens_out.

​Modes

​active (default)

​deep

​top_k

​Confidence

​Hit fields

​Timeline string