Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[wip] Explore vibe-coding of spec → test scenarios → sdk compliance tests → sdk updates#948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
ochafik wants to merge59 commits intomain
base:main
Choose a base branch
Loading
fromochafik/mcp-compliance-test-harness

Conversation

ochafik
Copy link

@ochafikochafik commentedJul 11, 2025
edited
Loading

Language-agnostic compliance test framework that aims to help keep SDKs up to date w/ the spec.

TL;DR: thisWIP experiment turns specs totest server & scenario descriptions,vibe-coded for each SDK asclient &server testing binaries, which transport chatter is recorded ashuman-reviewable goldens that can be compared across SDKs to keep them up to date w/ the spec and w/ each other.Or at least, that's the vision 🙈

Note: this isn't a generalistic compliance test in the waymcp-validator validates arbitrary servers. The tests are more constrained, use predefined test servers and test scenarios (both expressed in plain english) to ensure all the SDKs behave the same (including in terms of producing the same client/server transport-level chatter).

How / what to review? There'sa lot of code, but most of it was generated fromthis CLAUDE.md. Maybe it's all you need to read if you'd like to get the gist of it. Here are the other main components:

  • Testserver & scenario descriptions &their types
  • Language-agnostic testing framework
    • Requires each SDK to expose standardized client and server test binaries that are able to "play" each and every scenario.
    • Golden generation captures the client/server transport-level chatter for each scenario (uses the reference TS SDK's output as versioned goldens, then used to ensure any SDK's client talking to any other SDK's server produces the same chatter)
  • "Slash commands" in.claude/commands/ for Claude Code, to support ongoing processes such as implementing/updating SDK's test binaries, updating goldens / fixing an SDK / based on test failures, etc (see below)

End goal / flow:

  • After human edits the draft spec,/update_scenarios in a coding agent such as CC will suggest updates to test scenarios / test server descriptions (compliance/scenarios/data.json)
  • /update_sdk typescript will update the reference SDK to implement updated scenarios in standardized client/server test binaries (example:client.ts/server.ts)
  • /update_goldens will record the JSON-RPC traffic between test client & server binaries of the TS SDK for each of the scenarios. Goldens should be human reviewable (example).
  • /cross_test_sdks typescript python (or could test python against itself) will compare the chatter between client/server test binaries of potentially different SDKs against the goldens. This can be used to implement / debug the testing binaries of a given SDK (ideally after/update_sdk <that-sdk>), or to debug / implement missing features of the SDK itself

Note that with this we're not delving into the gray zone of respect of SHOULDs in the Spec.

TODOs:

  • Get early feedback
    • (David) Should test corner cases -> as extra scenarios
    • (David) Could use TS to represent traces to get easier error highlighting -> maybe also scenarios
    • (Paul) Should find easily-reviewable initial setup -> Proposal:
      • New compliance repo w/ just the scenarios, the goldens and spec + sdks as submodules
      • SDK-specific testing binaries implemented inside the SDK repos themselves (in branches for now, compliance repo submodules will point to said branches until this is all agreed on)
      • Small but meaningful set of scenarios
  • Improve golden format
    • JSONL not super readable: format json? Turn golden into an.md file w/ mermaid js interaction diagram + formatted JSON traces in code block?
    • Support testing multiple clients concurrently (only http transports). Test client binaries probably need to start all clients so they can synchronize their calls (e.g. scenarioclient1 & client2 connect to CalcServer; client1 calls set_trig_allowed(allowed=true) tool. client1 gets list of tools that includes cos & sin, and client2 gets list of tools that doesn't), which means transport capture may need to be done in-process (to attribute recorded traffic to client1 vs. client2) as opposed to out-of-process as currently
  • Switch goldens to Streamable HTTP transport (w/ extra annotations such as header, and transport shape - e.g. POST-BODY, POST-SSE, GET-SSE)
  • Explore testing of auth scenarios:
    • For OAuth, will need to stub PRM, AS endpoints (possibly at different ports), and record that chatter in our goldens (e.g.{"from": "client1", "to": "CalcServer", "request": "GET /.well-known/oauth-protected-resource"},{"from": "client1", "to": "AuthServer", "request": "GET /.well-known/oauth-authorization-server"}...)
    • Other auth methods TBD (basic...)
  • Make server binaries scenario-aware: will allow scenarios to use same servers w/ variants, e.g. w/ or w/o auth
  • Experiment w/ prompts to evaluate compliance of a trace
    • Later this can morph into an online traffic analyzer
    • Idea: push entire spec in a cached prompt, use Claude API to then analyze one trace at a time (append after cache point)
  • Improve slash commands (and theCLAUDE.md that had them created) to do a better job at reviewing scenarios (reduce ambiguity, improve testability, etc)
  • Human reviews of existing scenarios & goldens
    • some are clearly wrong / hard to test (e.g. FileServer)
    • Tweak prompts & suggest much improved coverage, e.g. for optional features (test listTools on a server that doesn't support tools, elicitation on a client that doesn't support it, etc)
  • Improve condition tests in client binaries (e.g. verify specific sampling / elicitation requests)
  • Explore SDK development loop based on failure to implement test binaries (e.g. if the SDK doesn't support feature X yet), and w/ golden failures
  • Evaluate influence of my~/.claude/CLAUDE.md on efficacy of prompts, fold relevant instructions to slash commands

ochafikand others added30 commitsJuly 10, 2025 00:09
- Created core TypeScript types for scenarios, annotated logs, and test results- Implemented validation utilities with Zod schemas- Added comprehensive unit tests for validation logic- Set up TypeScript and test runner configuration using tsx- Tests passing for scenario validation and log comparison utilities
- update_scenarios.md: Creates/updates scenarios based on MCP spec- update_sdk.md: Generates SDK-specific test implementations- update_goldens.md: Captures reference logs with TypeScript SDK- cross_test_sdks.md: Runs cross-SDK compliance test matrix
- Created data.json with 3 test servers (CalcServer, FileServer, ErrorServer)- Defined 25 test scenarios covering:  - Basic tool invocation and elicitation flows  - Multi-client interactions and per-client state  - Resource management and subscriptions  - Error handling, timeouts, and cancellation  - All transport types (stdio, SSE, streamable HTTP)  - Protocol features (progress, logging, roots, version negotiation)- Added validation script to verify scenario structure- All scenarios validated successfully
- Created StdioInterceptor class to capture JSON-RPC messages between processes- Uses transform streams to intercept and log messages bidirectionally- Annotates messages with sender, recipient, timestamp, and transport metadata- Added unit tests verifying basic functionality- Tests passing for stdio transport interception
- Created SSEInterceptor class as HTTP proxy for Server-Sent Events- Intercepts and logs JSON-RPC messages sent via POST and SSE streams- Forwards headers and maintains session state- Added unit tests for server lifecycle and message capture- Cleaned up unused imports in stdio interceptor- All tests passing for both stdio and SSE interceptors
- Created StreamableHTTPInterceptor with full session management- Handles POST, GET (SSE), and DELETE requests- Tracks sessions and forwards appropriate headers- Includes comprehensive metadata in logged messages- Added unit tests for all major functionality- All 15 interceptor tests passing (stdio, SSE, streamable HTTP)
- Implemented mcp-mitm CLI tool for man-in-the-middle logging- Supports all three transports: stdio, SSE, streamable-http- Logs annotated JSON-RPC messages to JSONL files- Includes comprehensive CLI argument parsing and validation- Added E2E tests verifying CLI functionality- All 23 tests passing including interceptor and CLI tests
…Script SDK- Added comprehensive server implementation with CalcServer, FileServer, and ErrorServer- CalcServer features:  - Basic arithmetic operations (add, ambiguous_add with elicitation)  - Per-client trigonometric function management (cos/sin conditionally available)  - Mutable resource management (special-number)  - Sampling-based expression evaluation with progress tracking  - Static prompt for mathematical problem solving- FileServer features:  - In-memory file system with write/delete operations  - Static and templated resource access  - Resource update notifications for subscribed files  - Code review and file summarization prompts- ErrorServer features:  - Error testing scenarios (always_error, timeout, invalid_response)  - Cancellation support for long-running operations- All servers support stdio, SSE, and streamable-http transports- Updated client.ts to use shared Scenarios type from compliance/src/types.ts🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
…omment- Updated MITM CLI to accept --scenario-id flag- Fetches scenario description from data.json- Writes description as comment lines (// prefix) at start of log file- Added tests for scenario ID functionality
- Added parseJSONLLog function that parses JSONL files with comment support- Comments (lines starting with //) are extracted as description- Added comprehensive tests for the new functionality- Function returns both the parsed messages and the optional description
- Fixed path to use process.cwd() correctly when running from compliance directory- Removed unused parseArgs import
- Created generate-goldens.ts script to run all scenarios and capture logs- Added npm script 'generate-goldens' for easy execution- Script supports stdio and SSE transports (streamable-http pending)- Added --no-cache flag to test script to prevent tsx from generating artifacts
- Remove rootDir restriction to allow imports from parent directories- Fix duplicate ListPromptsResult import- Note: SDK still has API compatibility issues that need addressing
- Updated @modelcontextprotocol/sdk dependency from 1.0.0 to 1.15.0- Fixed client.ts to use correct API for SDK 1.15.0:  - Updated all client method calls to use object parameters instead of strings  - Removed event handler methods that don't exist in SDK 1.15.0  - Added comments for features not available in current SDK version- Fixed server.ts to use correct API for SDK 1.15.0:  - Removed extra parameter from tool/resource handlers  - Fixed resource registration to use 4-parameter format  - Changed prompt argumentSchema to argsSchema  - Replaced system role with user/assistant roles in prompts  - Simplified transport setup for SSE (not fully implemented)- Updated test binaries to use relative paths with import.meta.urlThe TypeScript SDK now builds successfully and binaries are functional.🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
…ures- Use SDK's built-in RegisteredTool.enable()/disable() methods- Remove manual trigAllowed state checking in cos/sin implementations- Automatic tool change events are now handled by the SDK- Update comments to reflect current SDK capabilities🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
- Add tsx command to mitm invocation in generate-goldens script- Add allowUnknownOption() to test-client to handle mitm options- Fix scenario data path resolution (remove duplicate 'compliance' in path)- Generate initial golden files for scenarios 1-7The mitm tool already had --log support but the test-client wasrejecting it as an unknown option. Now the full pipeline worksfor stdio scenarios.🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
… appending- Modified generate-goldens to continue running all scenarios even when some fail- Fixed mitm tool to replace log files instead of appending (was using 'a' flag, now uses 'w')- Generated golden files for 20 out of 25 scenarios (5 failures due to missing features)The failures are expected:- Scenario 8: Pagination not implemented- Scenario 14: SSE transport not implemented in test harness- Scenario 15: Progress tracking not implemented- Scenario 18: Prompt pagination not implemented- Scenario 19: Streamable HTTP transport not implemented🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
- Replace process.cwd() with __dirname to find scenarios/data.json- Remove failed environment variable approach- Add --scenario-id flag back to mitm invocations- Issue: --scenario-id is still being consumed before reaching mitm
- Add -- separator in generate-goldens to prevent test-client from parsing mitm's --scenario-id- Remove .allowUnknownOption() from test-client since we now use -- to separate args- Use __dirname instead of process.cwd() for reliable path resolution in mitm- Change log file flags from 'a' to 'w' to replace content instead of appending- Successfully regenerated 23 out of 25 golden files with scenario descriptions🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
- Add CrossSDKRunner module to execute SDK client/server combinations- Create test suite using Node's test runner with describe/it blocks- Add comparison logic to validate captured traffic against golden files- Create script to orchestrate cross-SDK testing with SDK selection- Add npm script for easy cross-SDK test executionThe cross-SDK testing allows running any SDK's client against any SDK'sserver and comparing the captured JSON-RPC traffic with golden files toensure protocol compliance across implementations.🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
- Move compliance/goldens/ to compliance/scenarios/goldens/- Update all references in generate-goldens.ts and cross-sdk.test.ts- Better organization: keeps scenario data and golden files together🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
…er/recipient- Remove timestamp field from AnnotatedJSONRPCMessage type- Update all interceptors to stop adding timestamps to messages- Update generate-goldens and cross-sdk-runner to pass client-id and server-id to MITM- Remove timestamp normalization from validation code and tests- Ensure proper sender/recipient info (e.g. client1, CalcServer) is includedThis simplifies log comparison by removing non-deterministic timestamps andensures clear identification of message senders/recipients in the logs.🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
…/to fields- Move sender/recipient from metadata to top-level from/to fields- Remove transport field from metadata (no longer needed)- Place from/to fields before message for better readability- Update all interceptors and validation code to use new structure- Ensure metadata is optional and only contains streamable_http_metadata- Update tests to match new structureThis simplifies the message structure and makes sender/recipient moreprominent in the logs.🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
Implement elicitation request handler in the TypeScript SDK test clientto properly handle the ambiguous_add scenario (scenario 2). The handlerresponds with value 20 when asked for 'b' parameter, allowing thescenario to complete successfully.🤖 Generated with [Claude Code](https://claude.ai/code)Co-Authored-By: Claude <noreply@anthropic.com>
…uracy- Scenario 9: Remove confusing 'resource template' mention, clarify direct resource access- Scenario 12: Specify progress tracking and cancellation behavior- Scenario 13: Simplify to just describe the operation without transport-specific details- Scenario 15: Add progress tracking requirement and expected result- Scenario 17: Update to reflect actual server capabilities (logging operations)- Scenario 21: Clarify that version negotiation uses same versionUse plain English descriptions focusing on what the scenario tests rather than implementation details.
…io changes- Update client progress scenario (15) to handle expression '(2 + 3) * (4 + 5)' = 45- Update server always_error to return isError: true instead of throwing- Fix test-client and test-server wrapper paths to point to dist/- Update golden files:  - Scenario 12: Add proper cancellation flow with progress token  - Scenario 15: Add sampling request/response for expression evaluation  - Scenario 17: Update comment to match new descriptionAligns implementation with updated scenario descriptions from previous commit.
…criptions- Scenario 9: Update to reflect direct resource read (not template)- Scenario 13: Simplify to match updated scenario description- Scenario 21: Update to reflect same version usage (no negotiation)These comment updates align golden files with the clarified scenario descriptions.
- Create comprehensive test suite to validate SDK binary properties- Test CLI argument parsing, transport support, and error handling- Add --scenarios-data flag documentation to CLAUDE.md- Validate binaries meet compliance requirements from specThe tests ensure each SDK's test-client and test-server binaries:- Exist and are executable- Parse required CLI arguments correctly- Reject invalid inputs with proper error messages- Support required transports (stdio, sse, streamable-http)- Exit with appropriate codes on errorsTests skip gracefully for unimplemented features like --scenarios-dataflag validation, which will be added in SDK implementations.
Comment on lines +42 to +49
const response = await fetch(targetUrl.toString(), {
method: 'POST',
headers: {
'Content-Type': 'application/json',
...this.forwardHeaders(req.headers),
},
body: JSON.stringify(message),
});

Check failure

Code scanning / CodeQL

Server-side request forgery Critical

The
URL
Loading
of this request depends on a
user-provided value
Loading
.

Copilot Autofix

AI 8 days ago

To fix the issue, we should validate and restrict the user-controlledreq.path before constructing thetargetUrl. This can be achieved by implementing an allow-list of acceptable paths or using a strict validation mechanism to ensure the path is safe and cannot be exploited for SSRF attacks. Additionally, the hostname in the URL should not be influenced by user input, and only pre-defined safe hostnames should be used.

Specifically:

  1. Introduce an allow-list of acceptable paths or validatereq.path to ensure it does not allow unintended redirection, such as via path traversal (../) or arbitrary inclusion.
  2. Use a fixed base URL (already provided inthis.options.targetUrl) and append validated paths to it.
  3. Reject any path that does not meet validation criteria, returning an error response to the client.

These changes should be implemented in both the POST and GET handlers (this.app.post andthis.app.get) wherereq.path is used.


Suggested changeset1
compliance/src/interceptors/sse.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git applydiff --git a/compliance/src/interceptors/sse.ts b/compliance/src/interceptors/sse.ts--- a/compliance/src/interceptors/sse.ts+++ b/compliance/src/interceptors/sse.ts@@ -38,6 +38,12 @@         this.logMessage(message, this.options.clientId, this.options.serverId);          // Forward to target server+        const allowedPaths = ['/api/resource1', '/api/resource2']; // Example allow-list+        if (!allowedPaths.includes(req.path)) {+          res.status(400).json({ error: 'Invalid path' });+          return;+        }+         const targetUrl = new URL(req.path, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           method: 'POST',@@ -67,6 +73,12 @@       res.setHeader('Connection', 'keep-alive');              try {+        const allowedPaths = ['/api/resource1', '/api/resource2']; // Example allow-list+        if (!allowedPaths.includes(req.path)) {+          res.status(400).end('Invalid path');+          return;+        }+         const targetUrl = new URL(req.path, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           headers: this.forwardHeaders(req.headers),EOF
@@ -38,6 +38,12 @@
this.logMessage(message,this.options.clientId,this.options.serverId);

// Forward to target server
constallowedPaths=['/api/resource1','/api/resource2'];// Example allow-list
if(!allowedPaths.includes(req.path)){
res.status(400).json({error:'Invalid path'});
return;
}

consttargetUrl=newURL(req.path,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
method:'POST',
@@ -67,6 +73,12 @@
res.setHeader('Connection','keep-alive');

try{
constallowedPaths=['/api/resource1','/api/resource2'];// Example allow-list
if(!allowedPaths.includes(req.path)){
res.status(400).end('Invalid path');
return;
}

consttargetUrl=newURL(req.path,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
headers:this.forwardHeaders(req.headers),
Comment on lines +71 to +73
const response = await fetch(targetUrl.toString(), {
headers: this.forwardHeaders(req.headers),
});

Check failure

Code scanning / CodeQL

Server-side request forgery Critical

The
URL
Loading
of this request depends on a
user-provided value
Loading
.

Copilot Autofix

AI 8 days ago

To fix the issue:

  1. Restrictreq.path to ensure it does not contain any malicious input, such as path traversal sequences (../) or arbitrary subdomain manipulations.
  2. Use an allow-list to verify thatreq.path matches expected patterns or specific allowed endpoints.
  3. Construct URLs using validatedreq.path values instead of directly combining user input andthis.options.targetUrl.

The best way to fix this is to validate and sanitizereq.path to ensure that only predefined paths or patterns are allowed. For example:

  • Use a set of predefined, allowed paths or endpoints.
  • If dynamic paths are necessary, ensure they match strictly defined regular expressions.

This fix requires:

  • Adding validation logic forreq.path.
  • Rejecting requests with invalid paths and returning an appropriate error response.

Suggested changeset1
compliance/src/interceptors/sse.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git applydiff --git a/compliance/src/interceptors/sse.ts b/compliance/src/interceptors/sse.ts--- a/compliance/src/interceptors/sse.ts+++ b/compliance/src/interceptors/sse.ts@@ -38,6 +38,12 @@         this.logMessage(message, this.options.clientId, this.options.serverId);          // Forward to target server+        const allowedPaths = ['/valid/path1', '/valid/path2']; // Define allowed paths+        if (!allowedPaths.includes(req.path)) {+          console.error('Invalid path received:', req.path);+          res.status(400).json({ error: 'Invalid path' });+          return;+        }         const targetUrl = new URL(req.path, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           method: 'POST',@@ -67,6 +73,12 @@       res.setHeader('Connection', 'keep-alive');              try {+        const allowedPaths = ['/valid/path1', '/valid/path2']; // Define allowed paths+        if (!allowedPaths.includes(req.path)) {+          console.error('Invalid path received:', req.path);+          res.status(400).json({ error: 'Invalid path' });+          return;+        }         const targetUrl = new URL(req.path, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           headers: this.forwardHeaders(req.headers),EOF
@@ -38,6 +38,12 @@
this.logMessage(message,this.options.clientId,this.options.serverId);

// Forward to target server
constallowedPaths=['/valid/path1','/valid/path2'];// Define allowed paths
if(!allowedPaths.includes(req.path)){
console.error('Invalid path received:',req.path);
res.status(400).json({error:'Invalid path'});
return;
}
consttargetUrl=newURL(req.path,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
method:'POST',
@@ -67,6 +73,12 @@
res.setHeader('Connection','keep-alive');

try{
constallowedPaths=['/valid/path1','/valid/path2'];// Define allowed paths
if(!allowedPaths.includes(req.path)){
console.error('Invalid path received:',req.path);
res.status(400).json({error:'Invalid path'});
return;
}
consttargetUrl=newURL(req.path,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
headers:this.forwardHeaders(req.headers),
Comment on lines +53 to +60
const response = await fetch(targetUrl.toString(), {
method: 'POST',
headers: {
'Content-Type': 'application/json',
...this.forwardHeaders(req.headers),
},
body: JSON.stringify(message),
});

Check failure

Code scanning / CodeQL

Server-side request forgery Critical

The
URL
Loading
of this request depends on a
user-provided value
Loading
.

Copilot Autofix

AI 8 days ago

To fix this issue:

  1. Validate and sanitizereq.path before incorporating it intotargetUrl. Ensure the input cannot be used for path traversal (../) or to target unintended resources.
  2. Use an allow-list of acceptable paths or subpaths to determine ifreq.path is valid. Reject any input that does not match the predefined allow-list.
  3. Avoid directly embedding user input into the hostname or critical parts of the URL. Combine user input with known safe values only after validation.

Specifically:

  • Add validation logic forreq.path, ensuring it conforms to expected patterns or matches values from an allow-list.
  • Reject invalid paths with appropriate HTTP response codes (e.g., 400 Bad Request).
  • Include the validation above line 52 wheretargetUrl is constructed.
Suggested changeset1
compliance/src/interceptors/streamable-http.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git applydiff --git a/compliance/src/interceptors/streamable-http.ts b/compliance/src/interceptors/streamable-http.ts--- a/compliance/src/interceptors/streamable-http.ts+++ b/compliance/src/interceptors/streamable-http.ts@@ -49,6 +49,15 @@         }          // Forward to target server+        const allowedPaths = ['/valid-path-1', '/valid-path-2', '/another-path']; // Define allow-list+        if (!allowedPaths.includes(req.path)) {+          res.status(400).json({ +            jsonrpc: '2.0',+            error: { code: -32600, message: 'Invalid request path' },+            id: null+          });+          return;+        }         const targetUrl = new URL(req.path, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           method: 'POST',EOF
@@ -49,6 +49,15 @@
}

// Forward to target server
constallowedPaths=['/valid-path-1','/valid-path-2','/another-path'];// Define allow-list
if(!allowedPaths.includes(req.path)){
res.status(400).json({
jsonrpc:'2.0',
error:{code:-32600,message:'Invalid request path'},
id:null
});
return;
}
consttargetUrl=newURL(req.path,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
method:'POST',
Comment on lines +107 to +109
const response = await fetch(targetUrl.toString(), {
headers: this.forwardHeaders(req.headers),
});

Check failure

Code scanning / CodeQL

Server-side request forgery Critical

The
URL
Loading
of this request depends on a
user-provided value
Loading
.

Copilot Autofix

AI 8 days ago

To mitigate the SSRF vulnerability, user input (req.path) should not be used directly to construct the URL for the outgoing request. Instead:

  1. Enforce an allow-list approach for thereq.path or validate it against expected patterns to avoid malicious values.
  2. Ensure that the constructedtargetUrl is restricted to known safe domains or subdomains.

The fix involves validatingreq.path before constructing thetargetUrl. For example:

  • Use an allow-list of valid paths or subdomains.
  • Reject paths containing suspicious patterns like../ (path traversal) or//.

In the code snippet, add logic to validatereq.path and ensure only safe values are used to construct the URL.

Suggested changeset1
compliance/src/interceptors/streamable-http.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git applydiff --git a/compliance/src/interceptors/streamable-http.ts b/compliance/src/interceptors/streamable-http.ts--- a/compliance/src/interceptors/streamable-http.ts+++ b/compliance/src/interceptors/streamable-http.ts@@ -103,6 +103,13 @@           this.sessions.set(sessionId, { transport: 'SSE' });         } +        // Validate req.path+        const allowedPaths = ['/safe-path-1', '/safe-path-2']; // Example allow-list+        if (!allowedPaths.includes(req.path)) {+          res.status(400).send('Invalid path');+          return;+        }+         const targetUrl = new URL(req.path, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           headers: this.forwardHeaders(req.headers),EOF
@@ -103,6 +103,13 @@
this.sessions.set(sessionId,{transport:'SSE'});
}

// Validate req.path
constallowedPaths=['/safe-path-1','/safe-path-2'];// Example allow-list
if(!allowedPaths.includes(req.path)){
res.status(400).send('Invalid path');
return;
}

consttargetUrl=newURL(req.path,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
headers:this.forwardHeaders(req.headers),
Comment on lines +177 to +180
const response = await fetch(targetUrl.toString(), {
method: 'DELETE',
headers: this.forwardHeaders(req.headers),
});

Check failure

Code scanning / CodeQL

Server-side request forgery Critical

The
URL
Loading
of this request depends on a
user-provided value
Loading
.

Copilot Autofix

AI 8 days ago

To fix the issue, we need to ensure that user-controlled input (req.path) cannot lead to unintended or malicious requests. Specifically:

  1. Validate thereq.path against an allow-list of acceptable paths or subpaths.
  2. Sanitize the path to prevent path traversal attacks (e.g.,"../").
  3. Ensure the hostname in the constructed URL is fixed and cannot be influenced by the attacker.

The best way to achieve this is:

  • Replace the direct construction oftargetUrl usingreq.path with a validation mechanism.
  • Use a predefined base URL and only append sanitized, validated paths from the request.
  • Reject or handle invalid paths appropriately.
Suggested changeset1
compliance/src/interceptors/streamable-http.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git applydiff --git a/compliance/src/interceptors/streamable-http.ts b/compliance/src/interceptors/streamable-http.ts--- a/compliance/src/interceptors/streamable-http.ts+++ b/compliance/src/interceptors/streamable-http.ts@@ -103,7 +103,15 @@           this.sessions.set(sessionId, { transport: 'SSE' });         } -        const targetUrl = new URL(req.path, this.options.targetUrl);+        const allowedPaths = ['/api/resource1', '/api/resource2']; // Example allow-list+        const sanitizedPath = decodeURIComponent(req.path).replace(/(\.\.\/|\/\.\.)/g, ''); // Prevent path traversal+        +        if (!allowedPaths.includes(sanitizedPath)) {+          res.status(400).send('Invalid path');+          return;+        }+        +        const targetUrl = new URL(sanitizedPath, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           headers: this.forwardHeaders(req.headers),         });@@ -173,7 +181,15 @@         }          // Forward to target server-        const targetUrl = new URL(req.path, this.options.targetUrl);+        const allowedPaths = ['/api/resource1', '/api/resource2']; // Example allow-list+        const sanitizedPath = decodeURIComponent(req.path).replace(/(\.\.\/|\/\.\.)/g, ''); // Prevent path traversal+        +        if (!allowedPaths.includes(sanitizedPath)) {+          res.status(400).send('Invalid path');+          return;+        }+        +        const targetUrl = new URL(sanitizedPath, this.options.targetUrl);         const response = await fetch(targetUrl.toString(), {           method: 'DELETE',           headers: this.forwardHeaders(req.headers),EOF
@@ -103,7 +103,15 @@
this.sessions.set(sessionId,{transport:'SSE'});
}

consttargetUrl=newURL(req.path,this.options.targetUrl);
constallowedPaths=['/api/resource1','/api/resource2'];// Example allow-list
constsanitizedPath=decodeURIComponent(req.path).replace(/(\.\.\/|\/\.\.)/g,'');// Prevent path traversal

if(!allowedPaths.includes(sanitizedPath)){
res.status(400).send('Invalid path');
return;
}

consttargetUrl=newURL(sanitizedPath,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
headers:this.forwardHeaders(req.headers),
});
@@ -173,7 +181,15 @@
}

// Forward to target server
consttargetUrl=newURL(req.path,this.options.targetUrl);
constallowedPaths=['/api/resource1','/api/resource2'];// Example allow-list
constsanitizedPath=decodeURIComponent(req.path).replace(/(\.\.\/|\/\.\.)/g,'');// Prevent path traversal

if(!allowedPaths.includes(sanitizedPath)){
res.status(400).send('Invalid path');
return;
}

consttargetUrl=newURL(sanitizedPath,this.options.targetUrl);
constresponse=awaitfetch(targetUrl.toString(),{
method:'DELETE',
headers:this.forwardHeaders(req.headers),
Comment on lines +251 to +275
console.log(`
MCP Streamable HTTP Server listening on port ${PORT}
Endpoint:
${url}

Example usage:
curl -X POST https://api.anthropic.com/v1/messages \\
-H "Content-Type: application/json" \\
-H "anthropic-version: 2023-06-01" \\
-H "x-api-key: $ANTHROPIC_API_KEY" \\
-H "anthropic-beta: mcp-client-2025-04-04" \\
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1000,
"mcp_servers": [{
"type": "url",
"url": "${url}",
"name": "${serverName}"
}],
"messages": [{
"role": "user",
"content": "Write a tictactoe game w/ "
}]
}'
`);

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This logs sensitive data returned by
an access to authKey
Loading
as clear text.

Copilot Autofix

AI 8 days ago

Sensitive data likeauthKey should not be logged. The fix involves redacting the sensitiveauthKey from the loggedurl string. This can be achieved by replacing the actualauthKey with a placeholder (e.g.,[REDACTED]) when constructing the log message. This allows the logs to retain useful information (e.g., the general structure of theurl) without revealing sensitive details.

Additionally, as a best practice, any other sensitive data that might inadvertently flow into the logs should be reviewed and handled similarly.


Suggested changeset1
stdio-wrapper.mjs

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git applydiff --git a/stdio-wrapper.mjs b/stdio-wrapper.mjs--- a/stdio-wrapper.mjs+++ b/stdio-wrapper.mjs@@ -248,10 +248,11 @@     return serverName;   })();   +  const redactedUrl = url.replace(authKey, '[REDACTED]');   console.log(`     MCP Streamable HTTP Server listening on port ${PORT}     Endpoint:-      ${url}+      ${redactedUrl}      Example usage:       curl -X POST https://api.anthropic.com/v1/messages \\@@ -264,7 +262,7 @@           "max_tokens": 1000,           "mcp_servers": [{             "type": "url",-            "url": "${url}",+            "url": "${redactedUrl}",             "name": "${serverName}"           }],           "messages": [{EOF
@@ -248,10 +248,11 @@
returnserverName;
})();

constredactedUrl=url.replace(authKey,'[REDACTED]');
console.log(`
MCPStreamableHTTPServerlisteningonport${PORT}
Endpoint:
${url}
${redactedUrl}

Exampleusage:
curl-XPOSThttps://api.anthropic.com/v1/messages \\
@@ -264,7 +262,7 @@
"max_tokens":1000,
"mcp_servers":[{
"type":"url",
"url":"${url}",
"url":"${redactedUrl}",
"name":"${serverName}"
}],
"messages":[{
@Kludex
Copy link
Member

Kludex commentedJul 11, 2025
edited
Loading

It can be pretty intimidating to give feedback on a PR this huge.


I just would like to share that I think this idea is being already presented as the end goal, but I think the language-agnostic compliant tests is a great point by itself, and could be introduced first, and then the idea can evolve.

olaservo, atrawog, LucaButBoring, and nandsha reacted with thumbs up emoji

@ochafik
Copy link
Author

It can be pretty intimidating to ask feedback on a PR this huge.

I just would like to share that I think this idea is being already presented as the end goal, but I think the language-agnostic compliant tests is a great point by itself, and could be introduced first, and then the idea can evolve.

@Kludex Thanks for the quick feedback / sorry for the context-heavy PR

I've rephrased the description a bit (I'll try and rephrase further) but I guess the whole PR boils down to theCLAUDE.md I used to stub most of it.

@atrawog
Copy link

We really need more and better validation and verification tools for MCP and I'm happy about anyone who's spending some effort in that direction.

But I would suggest a clear separation between the server and client part and the possibility to use either parts to test and validate real MCP clients and servers.

Because if I understand the current implementation right it's only able to test and verify against itself.

@ochafik
Copy link
Author

ochafik commentedJul 11, 2025
edited
Loading

We really need more and better validation and verification tools for MCP and I'm happy about anyone who's spending some effort in that direction.

But I would suggest a clear separation between the server and client part and the possibility to use either parts to test and validate real MCP clients and servers.

@atrawog Thanks for the suggestions!

General client/server validation would indeed be another great end goal (seemingly pursued bymcp-validator, which I haven't played with yet), but I caved out of it early on for the following main reasons:

  • Testing an arbitrary server is an open problem: you need a deep understanding of what its expected behaviour is, it might be hiding behind some auth layer, be very stateful / hide some tools until you've said the magic word, etc.
  • Testing an arbitrary client is even harder as each may require a lot of custom stimuli to even want to interact w/ our server-side validator.

Instead, I focused on a more constrained problem that incidentally can help automate development of the SDKs: by defining the servers and client scenarios in English, we can have the framework implement both for us, and the transport traces in the middle turn out reasonably human-friendly.

Because if I understand the current implementation right it's only able to test and verify against itself.

On the general validation front, I think an achievable follow-up goal would be to capture transport-level traffic between a real client and a real server and flag any compliance issues (offline or on the spot). I think we may want to do this "smell-test" on captured goldens as part of this PR's test framework anyway (i.e. check the trace represents a valid client/server interfaction sequence before asking for a human to review the traces). It could also give a grayscale compliance score (beyond respecting the MUSTs, how many of the SHOULDs did it respect or ignore?).

atrawog reacted with thumbs up emoji

@atrawog
Copy link

On the general validation front, I think an achievable follow-up goal would be to capture transport-level traffic between a real client and a real server and flag any compliance issues (offline or on the spot). I think we may want to do this "smell-test" on captured goldens as part of this PR's test framework anyway (i.e. check the trace represents a valid client/server interfaction sequence before asking for a human to review the traces). It could also give a grayscale compliance score (beyond respecting the MUSTs, how many of the SHOULDs did it ignore?).

Yes and the huge challenge ahead is that the MCP ecosystem is becoming really diverse really quickly. With one part of the developer ecosystem being more than happy with the STDIO on my system approach and other part pushing towards OAuth and server to server solutions like Claude.ai custom integrations.

But at the end of the day we should still strive to use the same test and validation tools for everything or at least have a common understanding of how things should be done between everyone who's developing MCP testing tools.

ochafik reacted with thumbs up emoji

@pcarleton
Copy link
Contributor

A few suggestions:

  1. I'd prefer we put this in a separate repo to start, and we can consider pulling it into the spec repo later once it's more mature and we have a better "spec change <> golden change" pipeline. In general, listening to PR's on the spec repo is a useful channel for new proposals on spec changes, where this I expect will have a lot of PR's coming in the beginning that are not related to specification changes.
  2. I'd like if we started much smaller, thinking maybe 2 SDK's and 3 test cases, and maybe just the Calc server? (trimmed out the slash commands, etc., current one has 9 SDK's, 25 test cases, and 3 servers). I want us to be able to quickly validate and iterate until we achieve "it is easy to tell if a client is implemented in line with the goldens". After we've hammered that out, then auto-generating a lot more seems useful.

ochafikand others added4 commitsJuly 14, 2025 15:33
- Add ES module support with proper __dirname handling- Fix client CLI argument parsing with -- separator- Set correct working directory to find scenarios/data.json- 11 out of 12 tests now passing (scenario 24 needs fix)
- Add workaround for scenario 24 (declined elicitation) test- TypeScript SDK v1.15.0 doesn't support server-side elicitation yet- Client test now accepts default value response with a note- All 12 TypeScript SDK tests now passing
- Update ambiguous_add tool to use server.elicitInput() API- Handle accept, decline, and cancel responses appropriately- Return error with isError: true when elicitation is declined- Match expected behavior from scenarios 2 and 24🤖 Generated with Claude CodeCo-Authored-By: Claude <noreply@anthropic.com>
- Use server.server.elicitInput() API with proper requestedSchema format- Server now sends elicitation requests for ambiguous_add tool- Client provides proper object response format for elicitation- Handle accept/decline/cancel responses appropriately- All 12 TypeScript SDK tests now passing with real elicitationThe TypeScript SDK v1.15.0 does have full elicitation support throughthe lower-level server API.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@ochafik@Kludex@atrawog@pcarleton

[8]ページ先頭

©2009-2025 Movatter.jp