alexandreshah/pandoc-weasyprint-mcp
criticalMCP server for document conversion using Pandoc with WeasyPrint PDF engine. Supports customizable fonts, styling, and multiple formats.
MCP server (purpose undetermined)
328async def convert_file(args: dict) -> list[TextContent]:
329 try:
330 input_path = args["input_path"]
331 output_path = args["output_path"]
332 ...
333 if not os.path.exists(input_path):
334 return [TextContent(
335 type="text",
336 text=f"Error: Input file not found: {input_path}"
337 )]
338 ...
339 pypandoc.convert_file(
340 input_path,
341 to_format if to_format else None,
342 format=from_format,
343 outputfile=temp_output,
344 extra_args=extra_args
345 )
346 shutil.copy2(temp_output, output_path)// Exploitable if MCP is exposed to untrusted prompts (network_exposed) or by a compromised LLM (local_only).
The convert_file tool accepts an arbitrary input_path from the user without any validation or restriction. This allows reading any file on the system that the server process has access to, including sensitive files like /etc/passwd, SSH keys, or configuration files. The file is then converted and written to an attacker-controlled output_path, effectively exfiltrating the file contents.
ImpactAn attacker can read arbitrary files from the server's filesystem, leading to information disclosure of sensitive data such as credentials, configuration files, or private keys.
FixRestrict input_path to a specific allowed directory (e.g., a dedicated workspace). Validate that the path is within an allowed scope and does not contain path traversal sequences. Alternatively, remove the convert_file tool if not essential.
234async def convert_md_to_pdf(args: dict) -> list[TextContent]:
235 try:
236 markdown_content = args["markdown_content"]
237 output_path = args["output_path"]
238 if not os.path.isabs(output_path):
239 output_path = os.path.join("/tmp", output_path)
240 ...
241 output_dir = os.path.dirname(os.path.abspath(output_path))
242 if output_dir and not os.path.exists(output_dir):
243 os.makedirs(output_dir, exist_ok=True)
244 ...
245 shutil.copy2(temp_output, output_path)// Exploitable if MCP is exposed to untrusted prompts (network_exposed) or by a compromised LLM (local_only).
Both convert_md_to_pdf and convert_file accept an output_path parameter that is used to write the converted file. While relative paths are prepended with /tmp, absolute paths are used as-is. This allows writing files to arbitrary locations on the filesystem, such as overwriting system files, writing to startup directories, or placing malicious files in shared directories.
ImpactAn attacker can write arbitrary files to any location the server process has write access to, potentially leading to code execution (e.g., overwriting scripts, adding cron jobs, modifying configuration files) or data corruption.
FixRestrict output_path to a specific allowed directory (e.g., a dedicated output folder). Validate that the resolved path is within the allowed scope and does not contain path traversal. Avoid using absolute paths from user input.
282pypandoc.convert_text(
283 markdown_content,
284 'pdf',
285 format='markdown',
286 outputfile=temp_output,
287 extra_args=[
288 '--pdf-engine=weasyprint',
289 f'--css={css_path}',
290 '--sandbox=false'
291 ]
292)// Exploitable if MCP is exposed to untrusted prompts (network_exposed) or by a compromised LLM (local_only).
The extra_args list includes user-controlled values such as css_path (derived from user-provided CSS content) and potentially other parameters. While css_path is a temporary file path, the --pdf-engine argument is hardcoded to 'weasyprint', but the convert_file tool allows the user to specify pdf_engine (line 340). This value is passed directly to pandoc as --pdf-engine={pdf_engine}, which can be set to an arbitrary executable path, leading to arbitrary command execution when pandoc invokes the PDF engine.
ImpactAn attacker can execute arbitrary commands on the server by specifying a malicious executable as the PDF engine, leading to full system compromise.
FixValidate the pdf_engine parameter against a whitelist of allowed engines (e.g., 'weasyprint', 'pdflatex'). Avoid passing user-controlled values directly to command-line arguments.
331input_path = args["input_path"]
332output_path = args["output_path"]
333...
334if not os.path.exists(input_path):
335 return [TextContent(
336 type="text",
337 text=f"Error: Input file not found: {input_path}"
338 )]// Exploitable if MCP is exposed to untrusted prompts (network_exposed) or by a compromised LLM (local_only).
The input_path is accepted without any validation beyond existence check. There is no restriction on which directories can be read, no normalization, and no check for symlinks or special files. This allows reading any file on the system, including sensitive files.
ImpactAn attacker can read arbitrary files from the filesystem, leading to information disclosure.
FixRestrict input_path to a specific allowed directory. Use os.path.realpath to resolve symlinks and verify the path is within the allowed scope.
241if not os.path.isabs(output_path):
242 output_path = os.path.join("/tmp", output_path)// Exploitable if MCP is exposed to untrusted prompts (network_exposed) or by a compromised LLM (local_only).
When output_path is relative, it is joined with /tmp, but no normalization or traversal check is performed. An attacker could provide a relative path like '../../etc/cron.d/malicious' which, after joining with /tmp, becomes '/tmp/../../etc/cron.d/malicious', which resolves to '/etc/cron.d/malicious'. This allows writing files outside the intended /tmp directory.
ImpactAn attacker can write files to arbitrary locations by using path traversal sequences in the output_path parameter, even when providing a relative path.
FixNormalize the output path using os.path.realpath or os.path.abspath and verify it starts with the allowed base directory (e.g., /tmp). Reject paths containing '..' or symbolic links that escape the allowed scope.