MCP-INDEX

mberg/kokoro-tts-mcpcritical

Kokoro Text to Speech (TTS) MCP Server

This MCP server converts text to speech using the Kokoro TTS model, generating MP3 files with optional upload to AWS S3. It exposes a single tool that...

purpose: This MCP server converts text to speech using the threat: network exposed

Python · ★ 76 · May 21, 2026 · ⚙ May 21, 2026 · GITHUB ↗

heuristic signalsshell.exec filesystem.read aws.integration env.exposure filesystem.write

RISK SCORE

0/ 100 risk

low findings+5

high findings+50

medium findings+45

capped at100

VULNERABILITY ANALYSIS · 6 findings in 6 blocks2 HIGH · 3 MEDIUM

HIGH1 finding

kokoro_service.py:119

119            cmd = ['ffmpeg', '-y', '-i', wav_file, '-codec:a', 'libmp3lame', '-qscale:a', '2', mp3_file]
120            subprocess.run(cmd, check=True, capture_output=True, text=True)

mcp-tts.py:71-74→kokoro_service.py:1-6

// Exploitable only if MCP is exposed to untrusted prompts and the attacker can control the `filename` parameter.

EXPLAINThe `_convert_wav_to_mp3` method constructs an ffmpeg command using `wav_file` and `mp3_file` which are derived from user-controlled `output_file` parameter. Although `subprocess.run` with a list prevents shell injection, the filenames could contain special characters or paths that cause ffmpeg to interpret them as options or perform unintended operations. For example, a filename starting with '-' could be interpreted as an ffmpeg option, leading to arbitrary file read/write or code execution via ffmpeg's complex filter capabilities.

IMPACTAn attacker could potentially read arbitrary files, write files to arbitrary locations, or execute arbitrary code by crafting a malicious filename that is interpreted as an ffmpeg option.

FIXSanitize the `output_file` parameter to ensure it does not start with '-' and does not contain path traversal sequences. Use `secure_filename` from werkzeug or similar to validate the filename before use.

HIGH1 finding

kokoro_service.py:84

84            # Use macOS 'say' command or other system TTS
85            cmd = ['say', '-o', wav_file, text]
86            subprocess.run(cmd, check=True, capture_output=True, text=True)

mcp-tts.py:71-74→kokoro_service.py:1-6

// Exploitable only if MCP is exposed to untrusted prompts and the fallback TTS method is triggered (e.g., when Kokoro is unavailable).

EXPLAINThe `_generate_with_fallback` method passes the `text` parameter directly to the macOS `say` command via `subprocess.run`. Although `subprocess.run` with a list does not invoke a shell, the `say` command itself may interpret certain characters or options in the text argument, potentially leading to unexpected behavior. More critically, if the `say` command is not available or if the system uses a different fallback that involves shell execution, this could be exploited. However, the primary risk is that the `text` parameter is user-controlled and could contain special characters that affect the `say` command's behavior.

IMPACTAn attacker could potentially execute arbitrary commands on the server if the `say` command or a similar fallback interprets the text argument in an unsafe manner. This could lead to full system compromise.

FIXSanitize the `text` input to remove or escape any characters that could be interpreted as command-line options or special sequences. Consider using a dedicated TTS library that does not rely on external commands, or validate that the text does not start with '-' to prevent option injection.

MEDIUM1 finding

mcp-tts.py:425

425    parser.add_argument("--s3-access-key", 
426                        help="Override S3 access key ID")
427    parser.add_argument("--s3-secret-key",
428                        help="Override S3 secret access key")

mcp-tts.py:415-436

// Local-only MCP, requires compromised LLM or local access to exploit.

EXPLAINThe server accepts AWS access key and secret key as command-line arguments. These credentials could be exposed in process listings, shell history, or logs. Additionally, if the server is started with these arguments, they are stored in environment variables and could be leaked through debug output or error messages.

IMPACTAn attacker with access to the system's process list or logs could obtain AWS credentials, leading to unauthorized access to S3 buckets and potential data breach or resource abuse.

FIXRemove command-line arguments for credentials. Use environment variables or a secure credential store. If command-line arguments are necessary, ensure they are not logged and are cleared from memory after use.

MEDIUM1 finding

kokoro_service.py:47

47            base_filename = os.path.splitext(output_file)[0]
48            wav_file = os.path.join(output_dir, f"{base_filename}.wav")
49            mp3_file = os.path.join(output_dir, output_file)

mcp-tts.py:71-74→kokoro_service.py:1-6

// Exploitable only if MCP is exposed to untrusted prompts and the `filename` parameter is not properly sanitized upstream.

EXPLAINThe `output_file` parameter is used directly in `os.path.join` without sanitization. Although `secure_filename` is called in `mcp-tts.py` before passing to `kokoro_service.py`, the `kokoro_service.py` does not perform its own validation. If `secure_filename` is bypassed or if the service is called from another context, an attacker could use path traversal sequences (e.g., '../') to write files outside the intended output directory.

IMPACTAn attacker could write MP3 files to arbitrary locations on the filesystem, potentially overwriting critical files or planting malicious files.

FIXApply `secure_filename` or similar sanitization within `kokoro_service.py` as well, or ensure that the output directory is enforced and the filename is validated before use.

MEDIUM1 finding

mcp-tts.py:479

479        if args.debug or os.environ.get('DEBUG') == 'true' or os.environ.get('DEBUG') == '1':
480            print("Debug mode enabled")
481            print("Environment variables:")
482            for var in ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_S3_BUCKET_NAME', 'AWS_S3_REGION', 'AWS_S3_FOLDER', 'AWS_S3_ENDPOINT_URL']:
483                if os.environ.get(var):
484                    print(f"  {var}: {'*' * 10 if 'KEY' in var or 'SECRET' in var else os.environ.get(var)}")

mcp-tts.py:415-436

// Local-only MCP, requires compromised LLM or local access to exploit.

EXPLAINIn debug mode, the server prints environment variables. While it attempts to mask keys and secrets with asterisks, the masking logic is flawed: it only checks if 'KEY' or 'SECRET' is in the variable name, but the actual values are not masked. The code prints `'*' * 10` instead of the value, but the condition is incorrect. It should print the masked value, but instead it prints a string of asterisks. However, the variable names themselves are printed, which could reveal which credentials are set. More importantly, if the masking logic fails, the actual values could be exposed.

IMPACTAn attacker who can view server logs or console output could obtain AWS credentials, leading to unauthorized S3 access.

FIXRemove debug printing of sensitive environment variables entirely. If debug output is necessary, ensure credentials are never printed, even masked.

LOW1 finding

mcp-tts.py:295

295            text = request_data.get('text', '')
296            voice = request_data.get('voice', os.environ.get('TTS_VOICE', 'af_heart'))
297            speed = float(request_data.get('speed', 1.0))
298            lang = request_data.get('lang', 'en-us')
299            filename = request_data.get('filename', None)
300            upload_to_s3_flag = request_data.get('upload_to_s3', True)
301            
302            if not text:
303                return {"success": False, "error": "No text provided"}

mcp-tts.py:286-303

// Exploitable only if MCP is exposed to untrusted prompts.

EXPLAINThe `text` parameter is only checked for emptiness, but there is no validation on its length or content. An attacker could provide extremely long text, leading to resource exhaustion (denial of service) or potentially trigger buffer overflows in underlying libraries. Additionally, there is no validation on `voice`, `speed`, or `lang` parameters, which could be used to inject unexpected values.

IMPACTAn attacker could cause denial of service by sending very long text or invalid parameter values that crash the TTS service.

FIXImplement input validation: limit text length (e.g., 1000 characters), validate voice against a list of available voices, clamp speed to a reasonable range (e.g., 0.5-2.0), and validate language code against a whitelist.

◷ 5/21/2026

Findings are produced by automated LLM analysis and may include false positives or miss issues. Verify independently before acting.