p7: Conversation that streams
Every run so far was one shot: one input, one answer, done. A real assistant holds a conversation. Two things make it feel like one, and they pull in different directions.
Quick path
Section titled “Quick path”In a hurry? These three steps are the whole challenge. Everything below is the why and the how.
- Run
printf 'My name is Jag.\nWhat is my name?\nquit\n' | npm run p7(or chat live) and watch it stream fluently but forget your name on the next turn. - Edit
start/agent.ts: declare amessageshistory above the loop, push each user turn, switch the call to{ messages }, and push the accumulated reply back on (TODO 2). - Done when the second turn answers “Jag”, because the full history goes back in every turn.
The first is streaming, which you already know from f1: agent.stream() and a
for await over textStream, the reply printing as the model writes it. It’s wired
here for you; in a chat loop it’s what keeps each turn feeling live rather than frozen.
The second is memory, and that’s the part you wire. The agent remembers earlier turns because you feed the whole conversation back in each time. Without it, every turn starts from nothing. Streaming alone doesn’t make a conversation; memory does.
Mental model
Section titled “Mental model”History grows every turn and the model re-reads all of it, which is why memory, not streaming, is what makes a conversation.
Open start/agent.ts. The agent is plain, with no tools, so the lesson
stays on delivery and memory. Here’s the loop you’ll work in:
The loop reads input with for await (const line of rl) and caps the turns at MAX_TURNS.
That’s deliberate: a while (true) chat loop has no guaranteed end, so pipe it some input
and it can hang, or a stuck condition can spin forever. A bounded for loop always
terminates, even unattended.
Right now each turn calls agent.stream({ prompt: input }). One prompt in, one reply out,
nothing carried over. That’s the gap you’ll close.
The mechanic, on a throwaway chat
Section titled “The mechanic, on a throwaway chat”Memory is just a list you append to on both sides of each turn. Here it is on a throwaway chef bot that has nothing to do with TripMate, three moves per turn:
Take the three moves over to TripMate’s streaming loop: record the user’s line, call
with { messages }, and record the reply. The third move is the one that creates the
memory, drop it and the agent forgets everything it said.
Run it
Section titled “Run it”You can chat interactively, or pipe a script in to test:
Tell TripMate your name, then ask it back. It streams a fluent reply, then on the next turn it doesn’t know your name, because each turn is sent on its own with no history. The streaming works; the memory isn’t there yet.
Build it
Section titled “Build it”-
Run it and watch it forget. Pipe the script above, or run
npm run p7and chat. Give your name, ask for it back, and confirm the second answer can’t recall it. -
Declare a running history (TODO 2). Add a
messagesarray above the loop, the conversation the agent will see, the same shape as the chef example above. -
Record each turn and switch to
{ messages }(TODO 2). Inside the loop, the first two moves: push the user’sinputontomessages, and change the call fromagent.stream({ prompt: input })toagent.stream({ messages }). You’ll also need to accumulate the streamed chunks into areplystring as you print them, you need the full text for the next step. -
Push the reply back on (TODO 2). The third move, and the one line that is the memory: after streaming, append the assistant’s
replytomessagesso the next turn sees it. Run the script again, now it answers “Jag”, because the whole history goes back in every turn.Stuck?
finish/agent.tsis the canonical version. Read it after you’ve had a real go. -
Watch the cost grow. Keep chatting for several turns, then open the trace and read
usage.inputTokenson the later calls. It climbs every turn, because the whole history is re-sent each time. Memory isn’t free: long conversations get expensive, which is why production apps eventually summarise or trim old turns. -
Check you’ve got it. You should be able to point at the one line that is the memory (pushing the reply back on), say why the loop is bounded rather than
while (true), and show the input-token count climbing across turns.
-
Forgetting to append the reply. If you push the user’s message but not the agent’s answer, the agent sees your questions but never its own previous answers, and it repeats itself or loses the thread. Push both.
-
while (true). Always bound the loop. Unbounded input loops hang when piped and can spin forever on a bad condition. -
Unbounded growth. Re-sending the full history every turn costs more tokens each time. Fine for a workshop chat; for a long-running assistant, trim or summarise.
A couple of things worth knowing
Section titled “A couple of things worth knowing”What about the tool calls inside a turn?
This version keeps a simple text history: user said X, assistant said Y. That carries the conversation faithfully for a plain agent.
If your agent uses tools and you want later turns to see the exact tool calls and results,
carry the structured response.messages from each run forward instead of plain strings. For
this lesson the text history is enough and easier to read.
Could I keep only the last few messages?
Yes, and that’s how real apps keep the cost down. Trim messages to the latest handful
before each run, but hold on to anything you can’t afford to lose, like the user’s name. A
tiny summary string you update when they introduce themselves, then prepend to each run,
keeps durable facts cheap.
Raw transcript memory is easy; durable memory needs a little design.
That is the last of the patterns track. Next is the discussion that closes the workshop: compare what you saw across the workflows you orchestrated (p1 to p4) and the agent patterns the model orchestrated (p5 to p7), and decide where each fits in real work. From foundations you can also branch into the full-stack track, the same kind of agent, now behind a web UI.