Skip to content

p7: Conversation that streams

Every run so far was one shot: one input, one answer, done. A real assistant holds a conversation. Two things make it feel like one, and they pull in different directions.

In a hurry? These three steps are the whole challenge. Everything below is the why and the how.

  1. Run printf 'My name is Jag.\nWhat is my name?\nquit\n' | npm run p7 (or chat live) and watch it stream fluently but forget your name on the next turn.
  2. Edit start/agent.ts: declare a messages history above the loop, push each user turn, switch the call to { messages }, and push the accumulated reply back on (TODO 2).
  3. Done when the second turn answers “Jag”, because the full history goes back in every turn.

The first is streaming, which you already know from f1: agent.stream() and a for await over textStream, the reply printing as the model writes it. It’s wired here for you; in a chat loop it’s what keeps each turn feeling live rather than frozen.

The second is memory, and that’s the part you wire. The agent remembers earlier turns because you feed the whole conversation back in each time. Without it, every turn starts from nothing. Streaming alone doesn’t make a conversation; memory does.

turn 1:  [you]  ->  model  ->  [reply]
turn 2:  [you][reply][you]  ->  model  ->  [reply]

History grows every turn and the model re-reads all of it, which is why memory, not streaming, is what makes a conversation.

Open start/agent.ts. The agent is plain, with no tools, so the lesson stays on delivery and memory. Here’s the loop you’ll work in:

const MAX_TURNS = 50; // a bounded loop, so it always ends even when piped or unattended

const rl = readline.createInterface({ input: process.stdin });

process.stdout.write("You: ");
for await (const line of rl) {
  const input = line.trim();
  if (!input || ["quit", "q", "exit"].includes(input.toLowerCase())) break;

  process.stdout.write("TripMate: ");
  const stream = await agent.stream({ prompt: input });
  for await (const chunk of stream.textStream) process.stdout.write(chunk);
  console.log("\n");

  if (++turns >= MAX_TURNS) break;
  process.stdout.write("You: ");
}

The loop reads input with for await (const line of rl) and caps the turns at MAX_TURNS. That’s deliberate: a while (true) chat loop has no guaranteed end, so pipe it some input and it can hang, or a stuck condition can spin forever. A bounded for loop always terminates, even unattended.

Right now each turn calls agent.stream({ prompt: input }). One prompt in, one reply out, nothing carried over. That’s the gap you’ll close.

Memory is just a list you append to on both sides of each turn. Here it is on a throwaway chef bot that has nothing to do with TripMate, three moves per turn:

const messages: Array<{ role: "user" | "assistant"; content: string }> = [];

async function ask(input: string) {
  messages.push({ role: "user", content: input });       // 1. record what they said
  let reply = "";
  const stream = await agent.stream({ messages });        // 2. send the WHOLE history, not one prompt
  for await (const chunk of stream.textStream) reply += chunk;
  messages.push({ role: "assistant", content: reply });   // 3. record the answer, THIS is the memory
  return reply;
}

await ask("I'm vegetarian.");
await ask("Suggest a dinner for me.");   // remembers, turn 1 is still in `messages`

Take the three moves over to TripMate’s streaming loop: record the user’s line, call with { messages }, and record the reply. The third move is the one that creates the memory, drop it and the agent forgets everything it said.

You can chat interactively, or pipe a script in to test:

printf 'My name is Jag.\nWhat is my name?\nquit\n' | npm run p7

Tell TripMate your name, then ask it back. It streams a fluent reply, then on the next turn it doesn’t know your name, because each turn is sent on its own with no history. The streaming works; the memory isn’t there yet.

  1. Run it and watch it forget. Pipe the script above, or run npm run p7 and chat. Give your name, ask for it back, and confirm the second answer can’t recall it.

  2. Declare a running history (TODO 2). Add a messages array above the loop, the conversation the agent will see, the same shape as the chef example above.

  3. Record each turn and switch to { messages } (TODO 2). Inside the loop, the first two moves: push the user’s input onto messages, and change the call from agent.stream({ prompt: input }) to agent.stream({ messages }). You’ll also need to accumulate the streamed chunks into a reply string as you print them, you need the full text for the next step.

  4. Push the reply back on (TODO 2). The third move, and the one line that is the memory: after streaming, append the assistant’s reply to messages so the next turn sees it. Run the script again, now it answers “Jag”, because the whole history goes back in every turn.

    Stuck? finish/agent.ts is the canonical version. Read it after you’ve had a real go.

  5. Watch the cost grow. Keep chatting for several turns, then open the trace and read usage.inputTokens on the later calls. It climbs every turn, because the whole history is re-sent each time. Memory isn’t free: long conversations get expensive, which is why production apps eventually summarise or trim old turns.

  6. Check you’ve got it. You should be able to point at the one line that is the memory (pushing the reply back on), say why the loop is bounded rather than while (true), and show the input-token count climbing across turns.

  • Forgetting to append the reply. If you push the user’s message but not the agent’s answer, the agent sees your questions but never its own previous answers, and it repeats itself or loses the thread. Push both.

  • while (true). Always bound the loop. Unbounded input loops hang when piped and can spin forever on a bad condition.

  • Unbounded growth. Re-sending the full history every turn costs more tokens each time. Fine for a workshop chat; for a long-running assistant, trim or summarise.

What about the tool calls inside a turn?

This version keeps a simple text history: user said X, assistant said Y. That carries the conversation faithfully for a plain agent.

If your agent uses tools and you want later turns to see the exact tool calls and results, carry the structured response.messages from each run forward instead of plain strings. For this lesson the text history is enough and easier to read.

Could I keep only the last few messages?

Yes, and that’s how real apps keep the cost down. Trim messages to the latest handful before each run, but hold on to anything you can’t afford to lose, like the user’s name. A tiny summary string you update when they introduce themselves, then prepend to each run, keeps durable facts cheap.

Raw transcript memory is easy; durable memory needs a little design.

That is the last of the patterns track. Next is the discussion that closes the workshop: compare what you saw across the workflows you orchestrated (p1 to p4) and the agent patterns the model orchestrated (p5 to p7), and decide where each fits in real work. From foundations you can also branch into the full-stack track, the same kind of agent, now behind a web UI.