NotificationsYou must be signed in to change notification settings
Fork79
Star1.9k

[WIP]: Tool hardening fixes#149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

jherr wants to merge2 commits intomain

base:main

Choose a base branch

fromfix/tools-hardening

Draft

[WIP]: Tool hardening fixes#149

jherr wants to merge2 commits intomainfromfix/tools-hardening

Conversation

Copy link

Contributor

jherr commentedDec 15, 2025•
edited by coderabbitaibot
Loading

🎯 Changes

Fixing tool call issues and adding a lot of tests to make sure they are 100% stable and reliable.

✅ Checklist

I have followed the steps in theContributing guide.
I have tested this code locally withpnpm run test:pr.

🚀 Release Impact

This change affects published code, and I have generated achangeset.
This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

Release Notes

New Features
- Added support for tool execution approvals, allowing tools to require user consent before running.
- Enhanced tool orchestration to properly handle sequential and parallel tool execution flows.
- Improved async handling for client and server tool executions with better state management.
Tests
- Added comprehensive test suites covering approval flows, client tools, server tools, and tool sequencing scenarios.
- Introduced a test simulator adapter for deterministic tool execution testing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

initial tool hardening fixes

3752db0

jherr requested a review froma team

December 15, 2025 05:31

Copy link

coderabbitaibot commentedDec 15, 2025•
edited
Loading

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the.coderabbit.yaml file in this repository. To trigger a single review, invoke the@coderabbitai review command.

You can disable this status message by setting thereviews.review_status tofalse in the CodeRabbit configuration file.

Walkthrough

Introduces an approval-based tool execution workflow with async state management. Adds a queuing system for post-stream actions, tracks in-flight tool executions, convertsonToolCall callback to synchronous, and defers continuation logic. Includes comprehensive LLM simulator testing infrastructure and end-to-end test suites for tool execution flows.

Changes

Cohort / File(s)	Summary
Core Chat Client Orchestration `packages/typescript/ai-client/src/chat-client.ts`	Introduced queuing system (`postStreamActions`), in-flight tool execution tracking (`pendingToolExecutions`), and continuation deferral (`continuationPending`). Changed`onToolCall` signature from async to synchronous with internal promise orchestration. Added`drainPostStreamActions`,`checkForContinuation`, and`queuePostStreamAction` helpers. Modified stream lifecycle to wait for pending executions before finalization and drain queued actions in finally blocks.
Type System Extensions `packages/typescript/ai/src/types.ts`	Added optional`approval?: { id: string; approved: boolean }` field to`ToolCall` public API to carry user-approval metadata.
Message Conversion & State Collection `packages/typescript/ai/src/message-converters.ts`,`packages/typescript/ai/src/core/chat.ts`	Enhanced`collectClientState` to inspect both UIMessage (parts array) and ModelMessage (toolCalls array) formats for approval data. Updated message converters to conditionally include approval data in tool-call entries when approval state is 'approval-responded'.
LLM Simulator Adapter `packages/typescript/smoke-tests/adapters/src/llm-simulator.ts`	New comprehensive simulator for deterministic tool-call testing. Exports`LLMSimulatorAdapter` class,`SimulatorScript`/`SimulatorToolCall`/`SimulatorIteration` interfaces,`createLLMSimulator` factory, and`SimulatorScripts` utility with preset patterns (single/multi-tool, approvals, sequential/parallel). Supports stateless and stateful iteration selection with ID generation.
Adapter Package Configuration `packages/typescript/smoke-tests/adapters/package.json`,`packages/typescript/smoke-tests/adapters/vitest.config.ts`	Added exports field for`./src/llm-simulator.ts`, test scripts (`test`,`test:watch`), and`vitest` dev dependency. Created Vitest configuration with path alias (`@tanstack/ai`), globals, Node environment, coverage settings, and exclusion patterns.
Adapter Test Suites `packages/typescript/smoke-tests/adapters/src/tests/tools/*.test.ts`	Added six test modules:`server-tool.test.ts`,`client-tool.test.ts`,`approval.test.ts`,`multi-tool.test.ts`,`sequences.test.ts`,`error-handling.test.ts`. Cover server/client tool execution, approvals, parallel/sequential flows, mixed tool types, error propagation, and iteration tracking via simulator.
E2E API & Page Route `packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts`,`packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx`	Added`/api/tools-test` handler supporting multiple predefined scenarios (text-only, single/multi-tool, approval, sequential/parallel). Defined server tools (get_weather, fetch_data, get_time, delete_file with approval) and client tools (show_notification, display_chart). Added React component`ToolsTestPage` with scenario selector, event logging, approvals UI, tool call tracking, and integration with`useChat`.
E2E Dependencies `packages/typescript/smoke-tests/e2e/package.json`	Added`@tanstack/tests-adapters` (workspace:*) and`zod` (^4.1.13) dependencies.
E2E Playwright Tests `packages/typescript/smoke-tests/e2e/tests/tools/*.spec.ts`	Added four test suites:`approval-flow.spec.ts`,`client-tool.spec.ts`,`race-conditions.spec.ts`,`server-client-sequence.spec.ts`. Cover approval workflows (single, sequential, parallel), client tool execution, mixed server-client sequences, race conditions, and stress tests. Each includes helpers for scenario selection, completion detection, metadata extraction, and event log parsing.

Sequence Diagram(s)

sequenceDiagram    participant Client as Client/App    participant CC as ChatClient    participant Queue as Action Queue    participant Tool as Tool Executor    participant Stream as Stream Lifecycle    Client->>CC: streamResponse(messages)    CC->>Stream: Initialize stream    Stream->>Stream: Reset pendingToolExecutions        loop During Stream        Stream->>Tool: onToolCall (sync)        Tool->>Tool: Create executionPromise        Tool->>Queue: Store in pendingToolExecutions        Tool->>Tool: Execute async (detached)        Tool-->>Queue: Remove from pendingToolExecutions on completion    end        Stream->>Stream: Stream completes    Stream->>Stream: Wait for all pendingToolExecutions    Stream->>Stream: Check streamCompletedSuccessfully        alt Auto-send criteria met        Stream->>CC: checkForContinuation()        CC->>Queue: queuePostStreamAction(continuation)    end        Stream->>Queue: drainPostStreamActions()    Queue->>Queue: Execute all queued actions    Queue-->>Client: Finally block runs        Note over CC,Queue: Tool result added while loading    Client->>CC: addToolResult(result)    alt Stream in progress        CC->>Queue: queuePostStreamAction(continuation)    else No active stream        CC->>CC: Trigger continuation directly    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

packages/typescript/ai-client/src/chat-client.ts: Dense async orchestration logic with state management across multiple new data structures (postStreamActions,pendingToolExecutions,continuationPending), control flow changes toonToolCall, and integration into stream lifecycle. Requires careful analysis of timing, race conditions, and state transitions.
Type system changes (types.ts,message-converters.ts,chat.ts): Moderate complexity; requires tracing approval field propagation across message formats.
LLM Simulator (llm-simulator.ts): Large new public API surface (~15 exports); comprehensive implementation supporting multiple iteration patterns and tool types. Needs validation of contract consistency across all preset scripts.
Adapter & E2E tests: High volume (~6 test modules + 4 E2E specs) with heterogeneous scenarios (server tools, client tools, approvals, race conditions, sequencing). Each test suite has distinct logic and assertions that demand individual scrutiny.
Interdependencies: Changes span chat-client → types → message converters → simulator usage in tests; tracing data flow through all layers is essential.

Areas requiring extra attention:

Async orchestration guards and edge cases inchat-client.ts (e.g., concurrent streams, pending execution cleanup, continuation re-entrance)
Approval state propagation throughcollectClientState andmessage-converters under various message formats
LLM simulator contract compliance with allSimulatorScripts presets
E2E test reliability under race conditions and timing edge cases

Poem

🐰Queues and tools dance in the sun,
Approvals bloom when work is done,
We hitch the stream with post-stream cheer,
Orchestration whispers, loud and clear,
Async hops through fields so green,
The finest tool-flow ever seen! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title uses '[WIP]' prefix and is vague ('Tool hardening fixes') without clearly conveying the main changes or their scope.	Replace '[WIP]' with a clear, specific title describing the primary change, such as 'Add tool execution queuing and approval support' or similar.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description addresses the template structure but lacks specific details about what 'tool call issues' were fixed and what improvements the added tests validate.
Docstring Coverage	✅ Passed	Docstring coverage is 96.77% which is sufficient. The required threshold is 80.00%.

Thanks for usingCodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment@coderabbitai help to get the list of available commands and usage tips.}

jherr marked this pull request as draft

December 15, 2025 05:32

Copy link

nx-cloudbot commentedDec 15, 2025•
edited
Loading

View yourCI Pipeline Execution ↗ for commit20511e6

Command	Status	Duration	Result
`nx affected --targets=test:sherif,test:knip,tes...`	✅ Succeeded	7s	View ↗
`nx run-many --targets=build --exclude=examples/**`	✅ Succeeded	2s	View ↗

☁️Nx Cloud last updated this comment at2025-12-16 17:07:44 UTC

Copy link

pkg-pr-newbot commentedDec 15, 2025

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai@149

@tanstack/ai-anthropic

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-anthropic@149

@tanstack/ai-client

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-client@149

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-devtools-core@149

@tanstack/ai-gemini

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-gemini@149

@tanstack/ai-ollama

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-ollama@149

@tanstack/ai-openai

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-openai@149

@tanstack/ai-react

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-react@149

@tanstack/ai-react-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-react-ui@149

@tanstack/ai-solid

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-solid@149

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-solid-ui@149

@tanstack/ai-svelte

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-svelte@149

@tanstack/ai-vue

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-vue@149

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/TanStack/ai/@tanstack/ai-vue-ui@149

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/TanStack/ai/@tanstack/react-ai-devtools@149

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/TanStack/ai/@tanstack/solid-ai-devtools@149

commit:3752db0

coderabbitaibot reviewed

Dec 15, 2025

View reviewed changes

Copy link

coderabbitaibot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)

packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (1)
7-16:DuplicatecollectChunks helper - see earlier comment.
This is the same helper duplicated fromerror-handling.test.ts. Consider extracting to shared test utilities.
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (1)
7-16:DuplicatecollectChunks helper - see earlier comment.
Third instance of this helper. Extracting to a shared module would reduce maintenance burden.

🧹 Nitpick comments (16)

packages/typescript/ai-client/src/chat-client.ts (1)
134-169:Consider defensive error message extraction.
The async execution tracking pattern is solid. However, at line 159,error.message assumes the caught value is an Error object. If a non-Error value is thrown, this could result inundefined.
           } catch (error: any) {             await this.addToolResult({               toolCallId: args.toolCallId,               tool: args.toolName,               output: null,               state: 'output-error',-              errorText: error.message,+              errorText: error?.message ?? String(error),             })
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)
148-169:Consider edge case: empty content string.
The content streaming splits on spaces, but an empty string'' would result inwords = [''], yielding a single empty delta. This is unlikely to cause issues but worth noting.
// Current behavior with empty string:''.split(' ')// [''] - yields one empty content chunk
If empty content should yield no chunks, add a guard:
-    if (iteration.content) {+    if (iteration.content && iteration.content.length > 0) {
223-231:Stub implementation for structuredOutput.
This is a minimal mock that always returns empty data. This is acceptable for a test simulator, but consider documenting that it's a stub.
 async structuredOutput(   _options: StructuredOutputOptions<Record<string, unknown>>, ): Promise<StructuredOutputResult<unknown>> {-  // Simple mock implementation+  // Stub implementation - always returns empty object+  // Real structured output scenarios should use a different approach   return {     data: {},     rawText: '{}',   } }
379-420:singleServerTool andsingleClientTool are identical.
Both methods produce the exact same script structure. This appears intentional (the distinction is in how the tool is defined, not in the script), but the naming could mislead users into thinking the scripts differ.
Consider either:
Adding a comment clarifying that the script is the same, or
Consolidating into a singlesingleTool method with a comment
 /**  * Script for a single server tool call+ * Note: Script structure is identical to singleClientTool -+ * the server/client distinction is in the tool definition, not the script.  */
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (2)
103-115:Silent JSON parse failure returns empty array.
The empty catch block silently handles parse errors. Consider logging parse failures in debug mode to aid troubleshooting.
     try {       return JSON.parse(el.textContent || '[]')     } catch {+      console.warn('Failed to parse event-log-json:', el.textContent)       return []     }
215-232:Test expectations are intentionally relaxed.
The "triple client sequence" test notes a known issue with the 3rd tool. The comment explains this well. Consider adding a TODO or tracking issue reference.
-    // NOTE: This tests a complex multi-step continuation flow that requires-    // additional investigation. Currently verifying at least 2 tools complete.+    // TODO: Track issue for 3rd tool not triggering in complex continuation flows+    // Currently verifying at least 2 tools complete while this is investigated.
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1)
76-82:Consider using typed chunk filtering.
The chunk type assertions rely on string matching. Consider using type guards for stronger typing.
// Could define a type guard for safer filtering:constisToolResultChunk=(c:StreamChunk):c isToolResultStreamChunk=>c.type==='tool_result'consttoolResultChunks=chunks.filter(isToolResultChunk)
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (2)
14-133:Helper functions are duplicated across test files.
TheselectScenario,waitForTestComplete,getMetadata,getEventLog, andgetToolCalls helpers are nearly identical to those inclient-tool.spec.ts. This creates maintenance burden.
Extract shared helpers to a common test utilities file:
// packages/typescript/smoke-tests/e2e/tests/tools/test-helpers.tsimport{typePage}from'@playwright/test'exportasyncfunctionselectScenario(page:Page,scenario:string):Promise<void>{// ... shared implementation}exportasyncfunctionwaitForTestComplete(page:Page,timeout?:number,expectedToolCount?:number,):Promise<void>{// ... shared implementation}// etc.
Then import in both test files:
import{selectScenario,waitForTestComplete,getMetadata}from'./test-helpers'
104-116:Return type includestimestamp field unlike client-tool.spec.ts.
ThegetEventLog return type here includestimestamp: number, whileclient-tool.spec.ts only hastype andtoolName. This inconsistency could cause confusion.
If extracting to shared helpers, standardize the return type to include all fields:
Promise<Array<{type:string;toolName:string;timestamp?:number}>>
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (1)
355-410:Non-null assertions ontc.approval!.id! rely on filter guarantees.
ThependingApprovals filter at line 229-231 checkstc.approval?.needsApproval but doesn't explicitly verifytc.approval.id exists. While this is likely safe in practice for a test page, consider optional chaining for defensive coding:
-                if (!respondedApprovals.current.has(tc.approval!.id!)) {-                    respondedApprovals.current.add(tc.approval!.id!)+                const approvalId = tc.approval?.id+                if (approvalId && !respondedApprovals.current.has(approvalId)) {+                    respondedApprovals.current.add(approvalId)
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1)
19-47:Consider reducing arbitrary timeouts inselectScenario.
The function uses multiplewaitForTimeout calls (500ms, 200ms, 300ms) that could make tests flaky or slow. While hydration waits are sometimes necessary, consider using more deterministic waits where possible:
 async function selectScenario(page: Page, scenario: string): Promise<void> {-  // Wait for hydration-  await page.waitForTimeout(500)+  // Wait for component to be interactive+  await page.waitForSelector('#scenario-select:not([disabled])', { timeout: 3000 })
However, given this is e2e test infrastructure and React hydration timing can be unpredictable, the current approach is acceptable.
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (1)
7-16:Consider extractingcollectChunks to a shared test utility.
This helper function is duplicated across multiple test files (error-handling.test.ts,sequences.test.ts,client-tool.test.ts). Consider extracting it to a shared test utilities module to reduce duplication.
// e.g., packages/typescript/smoke-tests/adapters/src/tests/test-utils.tsexportasyncfunctioncollectChunks<T>(stream:AsyncIterable<T>):Promise<Array<T>>{constchunks:Array<T>=[]forawait(constchunkofstream){chunks.push(chunk)}returnchunks}
packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (1)
7-16:Consider extractingcollectChunks helper to a shared test utility.
This helper is duplicated across multiple test files (also inapproval.test.ts). Consider extracting it to a shared test utilities module to reduce duplication and ensure consistency.
For example, create a file likepackages/typescript/smoke-tests/adapters/src/test-utils.ts:
exportasyncfunctioncollectChunks<T>(stream:AsyncIterable<T>):Promise<Array<T>>{constchunks:Array<T>=[]forawait(constchunkofstream){chunks.push(chunk)}returnchunks}
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1)
427-489:Abort signal propagation may be incomplete.
TheabortController is created independently but isn't linked to the incomingrequestSignal. If the client aborts the request, the chat stream won't be notified to stop.
Consider linking them:
         const abortController = new AbortController()+        // Link request abort to our controller+        requestSignal?.addEventListener('abort', () => {+          abortController.abort()+        })+         try {
Alternatively, passrequestSignal directly iftoStreamResponse andchat support it.
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (2)
15-46:Consider using more robust waiting strategies instead of fixed timeouts.
TheselectScenario function uses multiplewaitForTimeout calls (500ms, 200ms, 300ms) which can be flaky in CI environments. While the retry loop andwaitForFunction help mitigate this, the hardcoded delays may cause unnecessary slowness or intermittent failures.
Consider replacing fixed timeouts with condition-based waits where possible:
 async function selectScenario(page: Page, scenario: string): Promise<void> {-  // Wait for hydration-  await page.waitForTimeout(500)+  // Wait for hydration - wait for select to be interactive+  await page.waitForSelector('#scenario-select:not([disabled])', { state: 'visible' })    // Try selecting multiple times   for (let attempt = 0; attempt < 3; attempt++) {     await page.focus('#scenario-select')     await page.selectOption('#scenario-select', scenario)-    await page.waitForTimeout(200)      const currentScenario = await page.evaluate(() =>       document.getElementById('test-metadata')?.getAttribute('data-scenario'),     )     if (currentScenario === scenario) break+    await page.waitForTimeout(100) // Small delay only on retry   }
180-218:Potential race condition in sequential approvals test.
The test waits for the approval section and buttons but the condition on lines 196-201 might pass when the first approval button is still visible after clicking. Consider explicitly waiting for the first approval to be processed before checking for the second.
     // Approve the first one     await page.click('.approve-button')+    // Wait for first approval to be processed (button should disappear)+    await page.waitForFunction(+      () => document.querySelectorAll('.approve-button').length === 0,+      { timeout: 5000 },+    ).catch(() => {}); // May not disappear if second comes quickly+     // Wait for second approval to appear     await page.waitForFunction(       () => {         const section = document.getElementById('approval-section')         const buttons = document.querySelectorAll('.approve-button')-        // Either new approval appeared or section is visible-        return section !== null && buttons.length > 0+        // New approval should have appeared+        return section !== null && buttons.length >= 1       },       { timeout: 10000 },     )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and betweene011bf8 and3752db0.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by!**/pnpm-lock.yaml

📒 Files selected for processing (20)

packages/typescript/ai-client/src/chat-client.ts (8 hunks)
packages/typescript/ai/src/core/chat.ts (2 hunks)
packages/typescript/ai/src/message-converters.ts (1 hunks)
packages/typescript/ai/src/types.ts (1 hunks)
packages/typescript/smoke-tests/adapters/package.json (2 hunks)
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (1 hunks)
packages/typescript/smoke-tests/adapters/vitest.config.ts (1 hunks)
packages/typescript/smoke-tests/e2e/package.json (1 hunks)
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1 hunks)
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1 hunks)
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Use tree-shakeable adapter architecture for provider implementations - export specialized adapters (text, embedding, summarize, image) as separate imports from/adapters subpath rather than monolithic adapters
Use Zod for runtime schema validation and type inference, particularly for tool input/output definitions withtoolDefinition() and Zod schema inference
Implement isomorphic tool system usingtoolDefinition() with.server() and.client() implementations for dual-environment execution
Use type-safe per-model configuration with provider options typed based on selected model to ensure compile-time safety
Implement stream processing with StreamProcessor for handling chunked responses and support partial JSON parsing for streaming AI responses

Files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/ai/src/message-converters.ts
packages/typescript/ai/src/core/chat.ts
packages/typescript/ai/src/types.ts
packages/typescript/ai-client/src/chat-client.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

**/*.test.ts

📄 CodeRabbit inference engine (CLAUDE.md)

Write unit tests using Vitest alongside source files with.test.ts naming convention

Files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use camelCase for function and variable names throughout the codebase

Files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/ai/src/message-converters.ts
packages/typescript/ai/src/core/chat.ts
packages/typescript/ai/src/types.ts
packages/typescript/ai-client/src/chat-client.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

🧠 Learnings (10)

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to **/*.{ts,tsx} : Implement isomorphic tool system using `toolDefinition()` with `.server()` and `.client()` implementations for dual-environment execution

Applied to files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/ai-client/src/chat-client.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to **/*.{ts,tsx} : Use Zod for runtime schema validation and type inference, particularly for tool input/output definitions with `toolDefinition()` and Zod schema inference

Applied to files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to packages/typescript/*/src/adapters/*.ts : Create individual adapter implementations for each provider capability (text, embed, summarize, image) with separate exports to enable tree-shaking

Applied to files:

packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts
packages/typescript/smoke-tests/e2e/package.json
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to packages/typescript/*/src/index.ts : Export tree-shakeable adapters with clear subpath exports in package.json (e.g., `tanstack/ai/adapters`, `tanstack/ai-openai/adapters`) to minimize bundle size

Applied to files:

packages/typescript/smoke-tests/e2e/package.json
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to packages/*/package.json : Use `workspace:*` protocol for internal package dependencies in package.json (e.g., `"tanstack/ai": "workspace:*"`)

Applied to files:

packages/typescript/smoke-tests/e2e/package.json

📚 Learning: 2025-12-13T17:09:09.784Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.784ZLearning: Use Nx workspace with affected commands to optimize testing and building only changed packages and their dependents

Applied to files:

packages/typescript/smoke-tests/e2e/package.json

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to **/*.test.ts : Write unit tests using Vitest alongside source files with `.test.ts` naming convention

Applied to files:

packages/typescript/smoke-tests/e2e/package.json
packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to **/*.{ts,tsx} : Use tree-shakeable adapter architecture for provider implementations - export specialized adapters (text, embedding, summarize, image) as separate imports from `/adapters` subpath rather than monolithic adapters

Applied to files:

packages/typescript/smoke-tests/adapters/vitest.config.ts
packages/typescript/smoke-tests/adapters/package.json
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to .eslintrc* : Use ESLint with custom TanStack config for linting all TypeScript and JavaScript files

Applied to files:

packages/typescript/smoke-tests/adapters/vitest.config.ts

📚 Learning: 2025-12-13T17:09:09.783Z

Learnt from: CRRepo: TanStack/ai PR: 0File: CLAUDE.md:0-0Timestamp: 2025-12-13T17:09:09.783ZLearning: Applies to **/*.{ts,tsx} : Implement stream processing with StreamProcessor for handling chunked responses and support partial JSON parsing for streaming AI responses

Applied to files:

packages/typescript/ai-client/src/chat-client.ts

🧬 Code graph analysis (10)

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (2)

packages/typescript/ai-client/src/chat-client.ts (3)
sendMessage (270-280)
stop (402-409)
addToolApprovalResponse (456-493)
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (1)
Route (424-490)

packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (2)

packages/typescript/ai-svelte/src/create-chat.svelte.ts (1)
isLoading (119-121)
packages/typescript/ai/src/tools/tool-calls.ts (1)
getToolCalls (111-115)

packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)
SimulatorScripts (379-500)
createLLMSimulator (366-370)
SimulatorScript (35-40)
packages/typescript/ai/src/core/chat.ts (2)
chat (80-107)
chat (750-780)

packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (2)
SimulatorScript (35-40)
createLLMSimulator (366-370)
packages/typescript/ai/src/core/chat.ts (2)
chat (80-107)
chat (750-780)

packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (1)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (4)
SimulatorScripts (379-500)
createLLMSimulator (366-370)
SimulatorScript (35-40)
approvalTool (425-441)

packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (1)

packages/typescript/ai/src/tools/tool-calls.ts (1)
getToolCalls (111-115)

packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (1)

packages/typescript/ai-svelte/src/create-chat.svelte.ts (1)
isLoading (119-121)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (1)

packages/typescript/ai/src/types.ts (2)
ChatOptions (553-587)
StreamChunk (677-685)

packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (2)
SimulatorScript (35-40)
createLLMSimulator (366-370)
packages/typescript/ai/src/core/chat.ts (2)
chat (80-107)
chat (750-780)

packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (2)

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (3)
SimulatorScripts (379-500)
createLLMSimulator (366-370)
SimulatorScript (35-40)
packages/typescript/ai/src/core/chat.ts (2)
chat (80-107)
chat (750-780)

🪛 ast-grep (0.40.0)

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx

[warning] 591-591: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
-https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
-https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

[warning] 598-598: Usage of dangerouslySetInnerHTML detected. This bypasses React's built-in XSS protection. Always sanitize HTML content using libraries like DOMPurify before injecting it into the DOM to prevent XSS attacks.
Context: dangerouslySetInnerHTML
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation [REFERENCES]
-https://reactjs.org/docs/dom-elements.html#dangerouslysetinnerhtml
-https://cwe.mitre.org/data/definitions/79.html

(react-unsafe-html-injection)

🪛 Biome (2.1.2)

packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx

[error] 592-592: Avoid passing content using the dangerouslySetInnerHTML prop.

Setting content using code can expose users to cross-site scripting (XSS) attacks

(lint/security/noDangerouslySetInnerHtml)

[error] 599-600: Avoid passing content using the dangerouslySetInnerHTML prop.

Setting content using code can expose users to cross-site scripting (XSS) attacks

(lint/security/noDangerouslySetInnerHtml)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Preview
GitHub Check: Test

🔇 Additional comments (70)

packages/typescript/smoke-tests/e2e/package.json (3)
25-25:Correct use of workspace protocol for internal package.
The@tanstack/tests-adapters dependency correctly uses theworkspace:* protocol for internal monorepo dependencies, aligning with established practices.
29-29:Verify vite-tsconfig-paths configuration.
Thevite-tsconfig-paths dependency is newly added. Confirm that it's configured in the Vite config file (vite.config.ts) and is actually needed for the e2e tests, particularly if TypeScript path aliases are used in test files.
25-25:@tanstack/tests-adapters is properly configured and zod versions are consistent across core packages.
The dependency is correctly defined in the workspace and usesworkspace:* as expected. Zod versions are aligned at^4.1.13 across main packages including this smoke test, tests-adapters, and other core AI packages. Minor version variations exist in test-specific packages (ai-anthropic-live-tests uses^3.24.1) and the root ai package uses a range (^3.0.0 || ^4.0.0), which is acceptable for those contexts.
packages/typescript/ai/src/types.ts (1)
69-73:LGTM! Clean extension of ToolCall for approval data.
The optionalapproval field provides a backward-compatible way to carry approval decisions back to the server. The simpler structure (compared toToolCallPart.approval) appropriately contains only what's needed in the response payload.
packages/typescript/smoke-tests/adapters/vitest.config.ts (1)
1-26:LGTM! Well-structured Vitest configuration for the smoke-tests package.
The path alias, coverage settings, and exclusion patterns are appropriate. Based on learnings, this follows the project's convention for test file naming (.test.ts).
packages/typescript/ai/src/message-converters.ts (1)
110-117:LGTM! Correctly propagates approval data for responded tool calls.
The conditional spread ensures approval metadata is only included when the tool call has been explicitly approved or denied (approval-responded state with a definedapproved value). This aligns with the newToolCall.approval type.
packages/typescript/ai/src/core/chat.ts (1)
517-546:LGTM! Enhanced approval collection supports both message formats.
The dual-format handling (UIMessage withparts array and ModelMessage withtoolCalls array) provides necessary flexibility. Note that both code paths can potentially add to the sameapprovals Map for the same message if it happens to have both structures, but this is harmless since the approval decision would be the same value.
packages/typescript/ai-client/src/chat-client.ts (5)
29-33:LGTM! Appropriate state variables for async tool execution tracking.
ThependingToolExecutions Map keyed bytoolCallId enables proper lifecycle management, and thecontinuationPending flag prevents duplicate continuation attempts during action draining.
238-242:LGTM! Proper synchronization before stream finalization.
Awaiting all pending tool executions ensures client tools complete before the stream is finalized and continuation checks occur. The outer try/catch inprocessStream will handle any rejections.
310-373:LGTM! Well-structured stream lifecycle with proper guards and continuation logic.
The concurrent stream guard, state reset, and success flag pattern correctly handle the stream lifecycle. Thefinally block order is intentional:setIsLoading(false) beforedrainPostStreamActions allows queued actions to trigger continuation without being blocked by the loading check.
444-450:LGTM! Proper deferral of continuation when stream is in progress.
Queueing the continuation check whenisLoading is true ensures we don't attempt concurrent streams, and the action will be executed after the current stream completes.
498-529:LGTM! Robust action queue and continuation helpers.
The sequential draining and dual-guard (continuationPending andisLoading) incheckForContinuation correctly prevent concurrent continuation attempts. Thefinally block ensures the flag is always reset.
packages/typescript/smoke-tests/adapters/src/llm-simulator.ts (5)
1-40:Well-structured type definitions for the simulator.
The interfaces are clear and well-documented. TheSimulatorIteration interface properly captures the possible LLM turn outcomes (content, tool calls, finish reasons).
67-94:Clean adapter class design following the established pattern.
The adapter correctly implements the expected structure withkind,name, andmodels properties. Thereset() method provides a way to reuse the adapter instance across tests.
95-130:Complex iteration detection logic with good documentation.
The stateless operation logic for E2E tests is well thought out, enabling both stateful (unit tests) and stateless (E2E with fresh adapter instances) operation modes. The comments clearly explain the two modes.
172-194:Tool call streaming looks correct.
Each tool call gets a unique ID (either from script or auto-generated), and the chunk structure matches the expectedToolCallStreamChunk format with properindex values for parallel tools.
363-370:Factory function is clean and straightforward.
ThecreateLLMSimulator factory provides a clean API for creating adapter instances.
packages/typescript/smoke-tests/e2e/tests/tools/client-tool.spec.ts (3)
16-44:Robust scenario selection with retry logic.
TheselectScenario helper handles React hydration issues well with retries and verification. The timeouts are reasonable for test stability.
50-77:Flexible completion detection with dual conditions.
ThewaitForTestComplete helper smartly checks both thetestComplete flag and tool counts, which provides resilience against race conditions where the flag might not be set.
253-271:Good debugging support on test failure.
Capturing screenshots and logging events/metadata on failure will significantly help with debugging flaky E2E tests.
packages/typescript/smoke-tests/adapters/src/tests/tools/multi-tool.test.ts (5)
1-16:Proper imports and reusable helper.
ThecollectChunks helper is a clean utility for consuming async iterables in tests. Follows Vitest conventions as per coding guidelines.
19-83:Comprehensive parallel tool execution test.
The test properly verifies that all three server tools execute, and validates both tool call and result chunk counts. Good use ofvi.fn() for mocking.
85-154:Good coverage of mixed tool types.
This test exercises server, approval, and client tools simultaneously, verifying the correct behavior for each type:
Server tool executes immediately
Approval tool waits for approval
Client tool emitstool-input-available
Well-structured test usingtoolDefinition() with Zod schemas as per coding guidelines.
157-213:Solid test for repeated tool invocations with different arguments.
The test correctly verifies that the same tool can be called multiple times with different arguments, and that each call receives the correct city-specific data.
215-267:Clean tool selection test.
Verifies that only the requested tool executes when multiple tools are available. Good negative assertions for non-called tools.
packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts (2)
197-209:Good sequence verification using timestamps.
The test properly verifies the order of tool executions by comparing timestamps. The conditional check handles cases where timestamps might not be present.
263-284:Text-only test uses different completion strategy.
This test waits forisLoading to become false rather than usingwaitForTestComplete. This is appropriate since there are no tools to count, but the additionalwaitForTimeout(500) seems like a workaround.
packages/typescript/smoke-tests/adapters/package.json (3)
17-18:Test scripts added correctly.
Thetest andtest:watch scripts follow standard Vitest patterns.
34-34: No issues found. The vitest version^4.0.14 is consistent across the entire monorepo, including the root package.json and all other packages. The version aligns properly with the project's testing setup.
9-14:Exports field correctly exposes the simulator for cross-package imports using .ts files.
This is appropriate for a private test package. The TypeScript configuration supports direct.ts imports viaallowImportingTsExtensions, and the e2e test package successfully imports from this export. No built output is needed since the package is private and used internally during development.
packages/typescript/smoke-tests/e2e/src/routes/tools-test.tsx (5)
1-6:LGTM!
Imports are appropriate for the test page functionality. Uses the recommended patterns:toolDefinition from@tanstack/ai, Zod for schema validation, and React hooks as expected. Based on learnings, this correctly implements the isomorphic tool system with client implementations.
27-97:LGTM!
ThecreateTrackedTools function correctly implements the isomorphic tool system usingtoolDefinition() with.client() implementations. Input and output schemas use Zod as per coding guidelines. The event tracking pattern is well-suited for e2e test observability.
156-157:LGTM!
TheuseRef pattern ensures tools are created once with the stableaddEvent callback. SinceaddEvent is memoized with an empty dependency array, this approach correctly avoids unnecessary re-creation of tools on re-renders.
172-198:LGTM!
The completion detection logic appropriately handles multiple completion states (complete,output-available, or having output/result). The dependency array is correct, and the early return guard at line 173 prevents premature completion detection.
589-609:Static analysis warning is a false positive in this context.
ThedangerouslySetInnerHTML usage here serializes application-controlled state (toolEvents,toolCalls) that doesn't contain unsanitized user input. The JSON data originates from internal tool execution tracking and is safe. This pattern of embedding JSON in<script type="application/json"> tags for test harnesses is appropriate for e2e test infrastructure.
packages/typescript/smoke-tests/e2e/tests/tools/race-conditions.spec.ts (4)
53-80:LGTM!
The completion detection logic is well-designed with two conditions (explicit completion flag OR sufficient completed tools without loading). The trailing 200ms buffer for React state propagation is a pragmatic approach for e2e tests.
143-180:LGTM!
This test effectively validates the specific bug where sequential client tool calls could block each other. The execution order verification via event log timestamps and the duration upper bound (< 10s) are good regression guards against blocking behavior.
246-276:LGTM!
Smart handling of the parallel approvals scenario - capturing button IDs before clicking prevents issues with React re-renders removing buttons from the DOM. The brief wait between clicks is appropriate for test stability.
399-414:LGTM!
The failure hook provides excellent debugging support with screenshots and state dumps. This will be valuable for investigating flaky tests or race condition failures.
packages/typescript/smoke-tests/adapters/src/tests/tools/error-handling.test.ts (5)
19-103:LGTM!
Comprehensive error handling tests covering both synchronous throws and async rejections. The assertions correctly verify that error details propagate totool_result chunks with the expected error messages.
106-145:LGTM!
Good coverage of the edge case where the LLM calls a tool that isn't registered. The test verifies graceful error handling rather than crashes.
147-183:LGTM!
Correctly tests the scenario where a tool definition has no execute function, verifying thattool-input-available is emitted for client-side handling. This aligns with the isomorphic tool system pattern.
224-268:LGTM!
Effective test for themaxIterations strategy. Verifies that the agent loop terminates at the specified limit even when the script would continue indefinitely, and confirms proper completion signaling viadone chunks.
270-349:LGTM!
Good coverage of the auto-stringification behavior for non-string tool returns. Tests verify that objects are properly JSON-serialized and primitives like numbers are stringified correctly.
packages/typescript/smoke-tests/adapters/src/tests/tools/sequences.test.ts (5)
19-133:LGTM!
Thorough testing of server tool sequencing with proper call order verification usingmock.invocationCallOrder. The assertions correctly verify both execution count and order.
135-194:LGTM!
Excellent test of the server→client tool sequence pattern. Correctly verifies that server tools producetool_result chunks while client tools (without execute) producetool-input-available chunks for client-side handling.
196-276:LGTM!
Well-designed test for client→server continuation flow. The message structure correctly simulates a scenario where a client tool result has been injected, allowing verification that the server tool executes as expected in the continuation.
278-347:LGTM!
Clean implementation of the three-tool sequence test using a shared array to track execution order. ThetoEqual(['A', 'B', 'C']) assertion provides strict order verification.
349-418:LGTM!
Effective test for the parallel→sequential tool pattern. TheindexOf comparisons correctly verify that both parallel tools (A, B) complete before the sequential tool (combine) executes.
packages/typescript/smoke-tests/adapters/src/tests/tools/client-tool.test.ts (4)
19-109:LGTM!
Comprehensive testing of client tools without execute functions. ThegetCurrentIteration() check is a clever way to verify the stream paused waiting for client input rather than advancing to the next iteration.
111-154:LGTM!
Good test demonstrating that client tools with execute functions behave similarly to server tools - they auto-execute and producetool_result chunks. The output schema with Zod provides type safety for the returned data.
156-223:LGTM!
Well-structured test for the message injection pattern. The message structure withparts containing the complete tool call correctly simulates a client that has already provided a tool result, allowing verification of the continuation flow.
225-278:LGTM!
Effective test of mixed client tool behavior where one tool has an execute function and one doesn't. The assertions correctly verify that only the tool without execute emitstool-input-available.
packages/typescript/smoke-tests/adapters/src/tests/tools/approval.test.ts (6)
1-16:LGTM! Well-structured test file with proper imports and helper.
The imports are appropriate for the test suite, and the genericcollectChunks helper is cleanly implemented for collecting async iterable chunks.
18-111:LGTM! Comprehensive approval-requested test coverage.
The tests properly verify:
Approval-requested chunks are emitted for tools withneedsApproval: true
Tool execution is deferred until approval is granted
Iteration stops when approval is pending
Theas any casts on chunk types are acceptable for test assertions.
113-187:LGTM! Good test for approval acceptance flow.
The test correctly verifies that when an approval is pre-granted in the message history (viaparts withapproved: true), the tool executes and produces atool_result chunk.
189-259:LGTM! Denial flow properly tested.
The test correctly verifies that when approval is denied (approved: false), the tool is not executed and the LLM produces a content response instead.
261-316:LGTM! Multiple approval tools test is well-designed.
The test verifies that when multiple tools requiring approval are called in the same iteration, each emits anapproval-requested chunk and neither executes until approved.
318-380:LGTM! Critical edge case for mixed approval scenarios.
This test is particularly valuable as it verifies that non-approval tools execute immediately while approval-requiring tools wait, even when both are called in the same iteration.
packages/typescript/smoke-tests/adapters/src/tests/tools/server-tool.test.ts (4)
18-143:LGTM! Comprehensive server tool execution tests.
Good coverage of:
Basic tool execution with simple arguments
Complex nested argument schemas with Zod validation
Proper tool call and result chunk generation
The tests follow the coding guidelines by using Zod for runtime schema validation.
145-227:LGTM! Good coverage for different tool return types.
Tests properly verify:
Tools returning JSON objects withoutputSchema
Tools returning plain objects (framework handles stringification)
Result content contains expected values
229-299:LGTM! Important ID tracking verification.
These tests ensure proper correlation betweentool_call andtool_result chunks via IDs, which is critical for the agent loop to function correctly. The auto-generated ID format test (/^call-\d+-\d+$/) validates the fallback behavior.
301-343:LGTM! Good test for combined content and tool call handling.
This test verifies that content and tool calls can coexist in the same iteration, which is important for realistic LLM response patterns.
packages/typescript/smoke-tests/e2e/src/routes/api.tools-test.ts (3)
14-339:LGTM! Well-structured test scenario definitions.
The scenarios provide comprehensive coverage for:
Basic flows (text-only, single server/client tools)
Approval workflows
Sequential and parallel tool executions
Mixed server/client sequences
Race condition scenarios
The scenario structure aligns well with the LLM simulator'sSimulatorScript interface.
341-422:LGTM! Tool definitions follow best practices.
The tool definitions properly:
Use Zod for runtime schema validation as per coding guidelines
Implement the isomorphic tool system with.server() and.client() implementations
IncludeneedsApproval: true for thedelete_file tool to test approval flows
492-553:LGTM! Clean scenario-to-tools mapping.
The switch statement clearly maps each scenario to its required tools. The default case returning an empty array is consistent with thetext-only scenario behavior.
packages/typescript/smoke-tests/e2e/tests/tools/approval-flow.spec.ts (4)
48-125:LGTM! Well-designed test helper functions.
The helpers provide good abstraction for:
waitForApproval: Simple selector-based wait
waitForTestComplete: Multi-condition wait with good fallback logic
getMetadata/getEventLog: Clean data extraction from DOM
The 200ms buffer inwaitForTestComplete (line 85) is a pragmatic approach for React state settlement.
127-178:LGTM! Single approval flow tests are well-structured.
Both approve and deny cases properly:
Select the scenario
Wait for approval UI
Perform the action
Verify metadata counts
The 500ms wait on line 172 for denial processing is acceptable as denial may not settestComplete immediately.
220-314:LGTM! Parallel approvals and mixed flow tests are robust.
The parallel approvals test (lines 220-253) correctly:
Captures button IDs before clicking to handle DOM re-renders
Verifies both approvals are granted
The mixed flow tests properly verify event log entries for both approval and client tool execution.
316-332:LGTM! Good debugging support for test failures.
TheafterEach hook provides valuable debugging information:
Full-page screenshots with descriptive filenames
Console output of metadata and events
This will be helpful for diagnosing flaky tests in CI.

packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

Comment on lines +273 to +360

		private determineIterationFromMessages(
		messages: Array<{
		role: string
		toolCalls?: Array<{
		id: string
		approval?: { id: string; approved: boolean }
		}>
		toolCallId?: string
		parts?: Array<{
		type: string
		id?: string
		output?: any
		approval?: { approved?: boolean }
		}>
		}>,
		): number \| null {
		if (!messages \|\| messages.length === 0) {
		return 0 // Fresh conversation, start at iteration 0
		}

		// Find all assistant messages with tool calls
		const assistantToolCallMessages = messages.filter(
		(m) => m.role === 'assistant' && m.toolCalls && m.toolCalls.length > 0,
		)

		if (assistantToolCallMessages.length === 0) {
		// No tool calls in history, might be first iteration or continuation
		// Check if there's a user message (fresh start)
		const hasUserMessage = messages.some((m) => m.role === 'user')
		return hasUserMessage ? 0 : null
		}

		// Get all completed tool call IDs from:
		// 1. Separate tool result messages (role: 'tool')
		// 2. Parts array with output set (client tool results) - UIMessage format
		// 3. Parts array with approval.approved set (approval responses) - UIMessage format
		// 4. toolCalls array with approval.approved set (approval responses) - ModelMessage format
		const completedToolIds = new Set<string>()

		for (const msg of messages) {
		// Check for role: 'tool' messages (server tool results)
		if (msg.role === 'tool' && msg.toolCallId) {
		completedToolIds.add(msg.toolCallId)
		}

		// Check for UIMessage format: parts with output or approval responses
		if (msg.parts) {
		for (const part of msg.parts) {
		if (part.type === 'tool-call' && part.id) {
		// Client tool results have output set
		if (part.output !== undefined) {
		completedToolIds.add(part.id)
		}
		// Approval tools are complete when approval.approved is set (true or false)
		if (part.approval?.approved !== undefined) {
		completedToolIds.add(part.id)
		}
		}
		}
		}

		// Check for ModelMessage format: toolCalls with approval info
		if (msg.role === 'assistant' && msg.toolCalls) {
		for (const tc of msg.toolCalls) {
		// Approval tools are complete when approval.approved is set
		if (tc.approval?.approved !== undefined) {
		completedToolIds.add(tc.id)
		}
		}
		}
		}

		// Count how many complete tool call rounds we have
		let completedRounds = 0
		for (const assistantMsg of assistantToolCallMessages) {
		const toolCalls = assistantMsg.toolCalls as Array<{ id: string }>
		const allToolsComplete = toolCalls.every((tc) =>
		completedToolIds.has(tc.id),
		)
		if (allToolsComplete) {
		completedRounds++
		}
		}

		// The next iteration is completedRounds (0-indexed)
		// e.g., if we've completed 1 round, we're on iteration 1
		return completedRounds
		}

Copy link

coderabbitaibotDec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

⚠️ Potential issue |🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash# Find ModelMessage and UIMessage type definitionsecho"=== Searching for ModelMessage type definition ==="rg -n"type ModelMessage|interface ModelMessage" --type=ts -A 20| head -100echo -e"\n=== Searching for UIMessage type definition ==="rg -n"type UIMessage|interface UIMessage" --type=ts -A 20| head -100echo -e"\n=== Searching for toolCalls in message types ==="rg -n"toolCalls" --type=ts -B 2 -A 2| head -80

Repository: TanStack/ai

Length of output: 13022

🏁 Script executed:

#!/bin/bash# Find MessagePart type definition to understand tool-call partsecho"=== Searching for MessagePart type definition ==="rg -n"type MessagePart|interface MessagePart" --type=ts -A 30| head -120# Check for ToolCall definitionecho -e"\n=== Searching for ToolCall type definition ==="rg -n"type ToolCall|interface ToolCall" --type=ts -A 15| head -80# Check the actual llm-simulator.ts file to see imports and contextecho -e"\n=== Checking llm-simulator.ts imports and context ==="head -50 packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

Repository: TanStack/ai

Length of output: 13699

🏁 Script executed:

#!/bin/bash# Get complete ToolCallPart and ToolResultPart definitionsecho"=== Complete ToolCallPart definition ==="rg -n"export interface ToolCallPart" --type=ts -A 20| head -50echo -e"\n=== Complete ToolResultPart definition ==="rg -n"export interface ToolResultPart" --type=ts -A 10| head -40# Check if there's a different message type being used in llm-simulatorecho -e"\n=== Search for message types in llm-simulator context ==="grep -n"messages\|Message" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts| head -40

Repository: TanStack/ai

Length of output: 5228

🏁 Script executed:

#!/bin/bash# Check how determineIterationFromMessages is called and what messages are passedecho"=== Checking method calls and context ==="grep -B 5 -A 5"determineIterationFromMessages" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts| head -80# Check the options type to see what message format is expectedecho -e"\n=== Checking options and message types ==="grep -n"options\." packages/typescript/smoke-tests/adapters/src/llm-simulator.ts| grep -E"messages|options"| head -20

Repository: TanStack/ai

Length of output: 1246

🏁 Script executed:

#!/bin/bash# Find the ChatOptions interface to see what message types are expectedecho"=== ChatOptions in llm-simulator ==="rg -n"ChatOptions" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts -B 2 -A 2# Find the actual ChatOptions type definitionecho -e"\n=== ChatOptions type definition ==="rg -n"interface ChatOptions|type ChatOptions" --type=ts -A 30| head -100# Check what the adapter signature expectsecho -e"\n=== Check adapter method signature ==="grep -n"adapter.*(" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts| head -20</parameter>

Repository: TanStack/ai

Length of output: 3072

🏁 Script executed:

#!/bin/bash# Check if there's any documentation or usage showing what message types are actually passedecho"=== Checking how messages are used in llm-simulator ==="sed -n'273,360p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts# Also check if there are tests that show actual usageecho -e"\n=== Check test usage ==="find packages/typescript/smoke-tests -name"*.test.*" -o -name"*.spec.*"| head -5</parameter>

Repository: TanStack/ai

Length of output: 3705

🏁 Script executed:

#!/bin/bash# Check for all calls to determineIterationFromMessagesecho"=== All calls to determineIterationFromMessages ==="rg -n"determineIterationFromMessages" packages/typescript/smoke-tests/adapters/src/llm-simulator.ts# Check if messages could come from different sources with different typesecho -e"\n=== Check other adapters or similar patterns ==="find packages/typescript/smoke-tests/adapters/src -name"*.ts" -type f| xargs grep -l"messages"| head -5# Verify the actual usage contextecho -e"\n=== Check full context around line 108 ==="sed -n'100,130p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts

Repository: TanStack/ai

Length of output: 1938

🏁 Script executed:

#!/bin/bash# Get the complete ToolCall type definition from types.tsecho"=== ToolCall type in ModelMessage ==="sed -n'62,74p' packages/typescript/ai/src/types.ts# Compare with what the method expectsecho -e"\n=== What the method expects vs actual ToolCall ==="echo"Method expects: toolCalls with approval?.approved"echo"Actual ToolCall has:"rg -n"export interface ToolCall" --type=ts -A 12| grep -A 12"packages/typescript/ai/src"

Repository: TanStack/ai

Length of output: 3808

🏁 Script executed:

#!/bin/bash# Check for camelCase compliance in the methodecho"=== Checking variable/function names in determineIterationFromMessages ==="sed -n'273,360p' packages/typescript/smoke-tests/adapters/src/llm-simulator.ts| grep -E"^\s*(const|let|var|for|function)"| head -30

Repository: TanStack/ai

Length of output: 631

Remove unreachable code that checks for UIMessage format.

The method only receivesArray<ModelMessage> perChatOptions, which doesn't have aparts property. Lines 318-330 that checkmsg.parts for UIMessage format are dead code—ModelMessage hastoolCalls (notparts), andUIMessage hasparts (but is never passed here). The logic for handlingModelMessage format withtoolCalls andapproval.approved is correct, but the defensive UIMessage checks should be removed to avoid confusion about supported message types.

🤖 Prompt for AI Agents

In packages/typescript/smoke-tests/adapters/src/llm-simulator.ts around lines273-360 (specifically remove the unreachable UIMessage handling at ~lines318-330), remove the block that checks msg.parts and iterates parts for type'tool-call' (the UIMessage format) because this method only receivesModelMessage instances per ChatOptions; keep the remaining logic that handlesrole === 'tool' and assistant msg.toolCalls approvals so completedToolIds isstill populated, and delete any now-unused variables or comments tied only tothe removed block.

packages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.ts

Comment on lines +136 to +139

		test.beforeEach(async ({ page }) => {
		await page.goto('/tools-test')
		await page.waitForSelector('#run-test-button')
		})

Copy link

coderabbitaibotDec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

⚠️ Potential issue |🟡 Minor

Missing:not([disabled]) check in beforeEach.

Unlikeclient-tool.spec.ts which waits for#run-test-button:not([disabled]), this file only waits for#run-test-button. This could cause test flakiness if the button is initially disabled.

 test.beforeEach(async ({ page }) => {   await page.goto('/tools-test')-  await page.waitForSelector('#run-test-button')+  await page.waitForSelector('#run-test-button:not([disabled])', {+    timeout: 10000,+  }) })

📝 Committable suggestion

‼️IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

	test.beforeEach(async({ page})=>{
	awaitpage.goto('/tools-test')
	awaitpage.waitForSelector('#run-test-button')
	})
	test.beforeEach(async({ page})=>{
	awaitpage.goto('/tools-test')
	awaitpage.waitForSelector('#run-test-button:not([disabled])',{
	timeout:10000,
	})
	})

🤖 Prompt for AI Agents

Inpackages/typescript/smoke-tests/e2e/tests/tools/server-client-sequence.spec.tsaround lines 136-139, the beforeEach uses awaitpage.waitForSelector('#run-test-button') which can pass while the button isstill disabled; change the wait to target the enabled button (e.g. awaitpage.waitForSelector('#run-test-button:not([disabled])') or otherwise wait untilthe element is not disabled) so the test only proceeds when the run button isinteractive.

AlemTuzlak approved these changes

Dec 15, 2025

View reviewed changes

tool test router

20511e6

Labels

None yet

Movatterモバイル変換

Uh oh!

[WIP]: Tool hardening fixes#149

Are you sure you want to change the base?

[WIP]: Tool hardening fixes#149

Uh oh!

Conversation

jherr commentedDec 15, 2025• edited by coderabbitaibotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

🎯 Changes

✅ Checklist

🚀 Release Impact

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitaibot commentedDec 15, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

nx-cloudbot commentedDec 15, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

pkg-pr-newbot commentedDec 15, 2025

Uh oh!

coderabbitaibot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitaibotDec 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitaibotDec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jherr commentedDec 15, 2025•
edited by coderabbitaibot
Loading

coderabbitaibot commentedDec 15, 2025•
edited
Loading

nx-cloudbot commentedDec 15, 2025•
edited
Loading