Skip to main content
This recipe puts the quickstart loop behind one HTTP route:
  1. Recall memory for the incoming message.
  2. Call your model.
  3. Store the user and assistant turns.
Create one Tex client per process and reuse it across requests.

Layout

FileJob
deps.pyCached Tex
chat.py/chat
main.pyApp entry
1

Lay out the package

Create this structure:
app/
├── deps.py        # cached Tex client
├── main.py        # FastAPI entry
└── chat.py        # /chat route
Rename app/ if you want. Keep the import paths consistent in uvicorn.
2

Cache your Tex client

Read secrets from the environment and construct Tex once:
deps.py
from functools import cache
from tex import Tex
import os

@cache
def tex_client() -> Tex:
    return Tex(
        api_key=os.environ["TEX_API_KEY"],
        base_url=os.environ.get(
            "TEX_BASE_URL", "https://api.getmetacognition.com"
        ),
        timeout=10,
    )
Use tex_client() inside FastAPI Depends(...) so every route shares the same connection pool.
3

Build the chat route

Derive session_id from the user and the chat. Recall with a small top_k. If Tex times out or quota is exhausted, answer without memory. Then store both sides of the turn:
chat.py
from datetime import datetime, timezone
from fastapi import APIRouter, Depends, Header
from pydantic import BaseModel
from tex import Tex, RateLimitError, APITimeoutError
from .deps import tex_client

router = APIRouter()

def now_iso() -> str:
    return datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")

class ChatBody(BaseModel):
    text: str
    session_id: str

@router.post("/chat")
def chat(
    body: ChatBody,
    x_user_id: str = Header(...),
    tex: Tex = Depends(tex_client),
):
    sid = f"u_{x_user_id}-{body.session_id}"

    memory: list[str] = []
    try:
        hits = tex.recall(q=body.text, session_id=sid, top_k=5)
        memory = [h.text for h in hits.hits.turns]
    except (RateLimitError, APITimeoutError):
        memory = []

    answer = your_llm.complete(body.text, memory=memory)

    tex.conversations.remember(
        session_id=sid,
        turns=[
            {"role": "user", "text": body.text, "timestamp": now_iso()},
            {"role": "assistant", "text": answer, "timestamp": now_iso()},
        ],
    )

    return {"answer": answer}
Replace your_llm.complete(...) with your model call.
4

Expose the app

Mount the router once:
main.py
from fastapi import FastAPI
from .chat import router

app = FastAPI()
app.include_router(router)
5

Run locally

Export your key and launch uvicorn:
export TEX_API_KEY=tex_live_...
uvicorn app.main:app --reload
Send POST /chat with JSON {"text":"...","session_id":"..."} and header x-user-id.

Full files

If you prefer one copy block, paste these files:
from functools import cache
from tex import Tex
import os

@cache
def tex_client() -> Tex:
    return Tex(
        api_key=os.environ["TEX_API_KEY"],
        base_url=os.environ.get(
            "TEX_BASE_URL", "https://api.getmetacognition.com"
        ),
        timeout=10,
    )

Production tweaks

Run remember in the background

Do not make the user wait for remember. Enqueue it in the background:
from fastapi import BackgroundTasks

@router.post("/chat")
def chat(
    body: ChatBody,
    bg: BackgroundTasks,
    x_user_id: str = Header(...),
    tex: Tex = Depends(tex_client),
):
    # ... recall + answer ...
    bg.add_task(
        tex.conversations.remember,
        session_id=sid,
        turns=[user_turn, assistant_turn],
    )
    return {"answer": answer}

Bound recall latency

Set Tex(timeout=2.0) and catch APITimeoutError. If recall is slow, answer without memory instead of blocking the whole chat request.

Add a health probe

@app.get("/healthz")
def healthz(tex: Tex = Depends(tex_client)):
    try:
        tex.usage.today()
        return {"ok": True}
    except Exception as e:
        return {"ok": False, "error": str(e)}, 503