Skip to main content
Use tex.recall before you call your model. It returns the memory that best matches the current user message. For mode, top_k, and confidence, read Recall and ranking. This page lists the Python fields.
RecallResponse = tex.recall(
    q: str,
    *,
    session_id: str,
    mode: "active" | "deep" = "active",
    top_k: int | None = None,
    include_timeline: bool = False,
) -> RecallResponse
Call recall directly on the client: tex.recall(...). There is no tex.recall.search(...).

Parameters

q
str
required
Natural-language query. The user’s latest message usually works well.
session_id
str
required
Session to search. Use the same session_id you wrote with.
mode
"active" | "deep"
default:"\"active\""
Retrieval depth. See Recall and ranking.
top_k
int
Number of hits across all kinds. Defaults to 15 in active mode and 25 in deep mode. The server caps the final value at 30.
include_timeline
bool
default:"false"
When true, the response includes a pre-rendered timeline string (not a structured list).

Returns

hits.turns
list[RecallHit]
Raw conversation turns most relevant to q.
hits.observations
list[RecallHit]
Small facts extracted from past turns, such as preferences or decisions.
hits.entities
list[RecallEntity]
People, places, and things linked across observations.
confidence
float
Calibrated confidence in [0, 1]. Higher means the returned memory is more likely to help.
timeline
str | None
A pre-rendered chronological summary of the relevant events. Set only when include_timeline=True.
mode
str
Echoes the request mode.
usage
Usage | None
tokens_in / tokens_out billed for this call. Always present in production.

RecallHit fields (turns / observations)

@dataclass(frozen=True)
class RecallHit:
    id: str | None      # stable; persists across recalls
    text: str           # matched content
    score: float        # raw relevance, 0.0-1.0
    kind: str           # "turn" | "observation" | "entity"; defaults to "turn"
    timestamp: str | None

RecallEntity fields (entities only)

@dataclass(frozen=True)
class RecallEntity:
    id: str | None
    label: str          # the entity label (e.g. "Acme")
    score: float
RecallEntity is not the same as RecallHit. It has label instead of text, and it does not have kind or timestamp.

Examples

Build a chatbot system prompt

def respond(user_msg: str, sid: str) -> str:
    hits = tex.recall(q=user_msg, session_id=sid, top_k=5)
    memory = "\n".join(f"- {h.text}" for h in hits.hits.turns)

    return llm.complete([
        {"role": "system", "content": f"Relevant memory:\n{memory}"},
        {"role": "user",   "content": user_msg},
    ])

Confidence-gated fallback

hits = tex.recall(q=q, session_id=sid)

if hits.confidence < 0.3:
    hits = tex.recall(q=q, session_id=sid, mode="deep")

if hits.confidence < 0.2:
    return generate_without_memory(q)

Temporal queries

hits = tex.recall(
    q="when did we discuss pricing?",
    session_id=sid,
    include_timeline=True,
)

if hits.timeline:
    print(hits.timeline)   # a pre-rendered chronological summary string
timeline is a free-form string. Drop it into a prompt as text. Do not treat it like an array.

Multi-source recall

If you have both a long-lived user “bio” and a per-conversation session, query both:
bio = tex.recall(q=user_msg, session_id=f"bio-{user_id}", top_k=3)
chat = tex.recall(q=user_msg, session_id=f"chat-{conv_id}", top_k=5)

context = "\n".join(f"- {h.text}" for h in (bio.hits.turns + chat.hits.turns))

Performance

ModeTypical p50p99When to use
active1.7s2.5sEvery interactive call
deep3.5s6sPeriodic analysis, low-confidence retries
Set timeout=2.0 on the constructor for interactive paths. Catch APITimeoutError and continue without memory:
try:
    hits = tex.recall(q=q, session_id=sid)
    context = format_context(hits)
except APITimeoutError:
    context = "(no memory available)"
return llm.complete(prompt_with(context))

Next: Track usage

Read your token totals.