evaluator-mcp

評価者mcp

AI-Driven-R-D-Deptcode-executionTypeScript

GitHub

0Tools

10Findings

0Stars

Mar 24, 2026Last Scanned

⚠3 critical · 5 high · 1 medium · 1 low findings detected

Security Category Deep Dive

⚡

Prompt Injection

Prompt & context manipulation attacks

Maturity

Rules

Sub-Categories

Gaps

64%

Implemented

Tests

Stories

PI-DIRDirect Input Injection

100%3 rules

Injection via tool descriptions and parameter fields

GAP-001Prompt Injection Coverage GapMissing detection coverage for emerging prompt injection attack variants not addressed by current rules

PI-INDIndirect / Gateway Injection

100%4 rules

Hidden instructions via external content and tool responses

PI-CTXContext Manipulation

100%2 rules

Context window saturation and prior-approval exploitation

PI-ENCEncoding & Obfuscation

100%3 rules

Payload hiding via invisible chars, base64, schema fields

PI-TPLTemplate & Output Poisoning

50%2 rules1 found

Injection via prompt templates and runtime tool output

Findings10

3critical

5high

1medium

1low

Critical3

criticalQ13MCP Bridge Package Supply Chain AttackMCP10-supply-chainAML.T0054

MCP bridge packages (mcp-remote, mcp-proxy, @modelcontextprotocol/sdk, fastmcp) are high-value supply chain targets — CVE-2025-6514 (CVSS 9.6) in mcp-remote affected 437,000+ installs. Always pin exact versions (no ^ or ~ ranges). Use lockfiles (package-lock.json, pnpm-lock.yaml, uv.lock). Never run `npx mcp-remote` without version pinning. Verify package integrity with `npm audit` or `pip-audit` before deployment. Reference: CVE-2025-6514, OWASP ASI04.

criticalK14Agent Credential Propagation via Shared StateMCP05-privilege-escalationAML.T0054

Never write credentials to shared agent state. Use credential vaults (HashiCorp Vault, AWS Secrets Manager) with per-agent scoped access. Implement OAuth token exchange (RFC 8693) for cross-agent authorization. Redact credentials from all agent outputs before writing to shared memory. Required by OWASP ASI03/ASI07 and MAESTRO L7.

criticalC1Command InjectionMCP03-command-injectionAML.T0054

Pattern "`[^`]+`" matched in source_code: "`Please evaluate this text: ${text}`" (at position 1084)

Replace exec()/execSync() with execFile() and pass arguments as an array, never as a string. Validate all inputs against an allowlist before use in any shell context. For subprocess.run, always pass a list and shell=False.

High5

highK11Missing Server Integrity VerificationMCP10-supply-chainAML.T0054

Implement cryptographic verification for MCP server connections: (1) Pin server TLS certificates or public keys, (2) Verify server tool definition checksums against a known-good manifest, (3) Use package manager integrity checks (npm integrity, pip --require-hashes). The MCP spec recommends but doesn't yet mandate server signing — implement it proactively. Required by ISO 27001 A.8.24 and CoSAI MCP-T6.

highJ5Tool Output Poisoning PatternsMCP01-prompt-injectionAML.T0054

[AST — J5] Catch block at L158 interpolates error variable "error" into response. If the error originates from attacker-controlled input (e.g., malformed data), the error message becomes an injection vector into the AI's context.

Never include user input or LLM manipulation directives in error messages or tool responses. Use structured error codes.

highJ5Tool Output Poisoning PatternsMCP01-prompt-injectionAML.T0054

[AST — J5] Catch block at L120 interpolates error variable "error" into response. If the error originates from attacker-controlled input (e.g., malformed data), the error message becomes an injection vector into the AI's context.

Never include user input or LLM manipulation directives in error messages or tool responses. Use structured error codes.

highK16Unbounded Recursion / Missing Depth LimitsMCP07-insecure-configAML.T0054

Pattern "function\s+(\w+).*\{[^}]*\1\s*\((?!.*(?:depth|level|limit|max|count|recursi))" matched in source_code: "function evaluateWithO3(text: string): Promise<EvaluationResult> { const completion = await openai.chat.completions.create({ model: "o3", messages: [ { role: "system", content: "You are an expert evaluator. Evaluate the given text and provide a structured assessment with a score (" (at position 675)

Add explicit depth/recursion limits to all recursive operations. Use iterative approaches where possible. Set maximum depth for directory walking (max_depth=10), tree traversal (max_level=20), and agent re-invocation (max_calls=5). Implement circuit breakers that halt after N iterations. Required by EU AI Act Art. 15 (robustness) and OWASP ASI08.

highO6Server Fingerprinting via Error ResponsesMCP04-data-exfiltrationAML.T0057

Pattern "catch\s*$[^)]*$\s*\{[^}]*(?:res\.(?:send|json)|return).*(?:err(?:or)?\.(?:message|stack|code)|connection|host|port|database)" matched in source_code: "catch (error) { return `Error evaluating text: ${error instanceof Error ? error.message" (at position 3246)

Never expose process, OS, runtime, or database metadata in tool responses or error messages. Use generic error messages ("An error occurred") for production responses. Remove or disable debug/diagnostic endpoints. If health endpoints are needed, limit them to simple "ok"/"error" status without infrastructure details. Wrap all error handlers with a sanitization layer that strips system information.

Medium1

mediumC6Error LeakageMCP09-logging-monitoring

Pattern "catch\s*$[^)]*$\s*\{[^}]*(?:throw|return).*(?:err|error)\.(?:message|stack)" matched in source_code: "catch (error) { return `Error evaluating text: ${error instanceof Error ? error.message" (at position 3246)

Return generic error messages to clients. Log detailed errors server-side. Never expose stack traces, file paths, or internal error details in responses.

Low1

lowF4MCP Spec Non-ComplianceMCP07-insecure-config

Server fails MCP spec compliance checks: required:server_name; required:server_version; required:protocol_version; recommended:tool_descriptions; recommended:parameter_descriptions

Follow the MCP specification for server metadata. Include server name, version, and protocol version. Provide descriptions for all tools and parameters.