javiramos1/qagent-mcp

high

Simple AI Agent to answer questions for specific domains based on website docs by connecting to a remote MCP server

MCP server (purpose undetermined)

purpose: MCP server (purpose undetermined)threat: network exposed

Python★ 3◷ May 20, 2026⚙ May 20, 2026GITHUB

◆Vulnerability Analysis[ 3 findings in 3 blocks ]

◷ 5/20/2026

high1 finding

mcp_server.py

212@mcp.tool
213async def scrape_website(
214    url: str,
215    tags_to_extract: Optional[List[str]] = None
216) -> str:
217    """Scrape complete website content using Chromium browser..."""
218    try:
219        if tags_to_extract is None:
220            tags_to_extract = get_default_tags()
221        logger.info(f"🌐 Scraping: {url}")
222        loader = AsyncChromiumLoader([url])
223        html_docs = await asyncio.to_thread(loader.load)

mcp_server.py:212

// Exploitable if MCP is exposed to untrusted prompts (network_exposed). For local-only, requires compromised LLM.

The scrape_website tool accepts a URL from the user without any validation or sanitization. An attacker can provide arbitrary URLs, including internal network addresses (e.g., http://169.254.169.254/latest/meta-data/ for cloud metadata, http://localhost:8000/ for internal services). The tool uses AsyncChromiumLoader which fetches the URL, enabling Server-Side Request Forgery (SSRF).

ImpactAn attacker could use this tool to scan internal networks, access cloud metadata endpoints, read internal service responses, or perform port scanning on the server's network. This could lead to information disclosure or further compromise of internal systems.

FixImplement a URL allowlist or blocklist. Validate that the URL scheme is https only and that the host is not an internal IP (e.g., 127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x). Consider using a URL parsing library to check the host against a list of allowed domains.

medium1 finding

mcp_server.py

212@mcp.tool
213async def scrape_website(
214    url: str,
215    tags_to_extract: Optional[List[str]] = None
216) -> str:
217    """Scrape complete website content using Chromium browser..."""
218    ...
219    loader = AsyncChromiumLoader([url])
220    html_docs = await asyncio.to_thread(loader.load)

mcp_server.py:212

// Exploitable if MCP is exposed to untrusted prompts (network_exposed). For local-only, requires compromised LLM.

The scrape_website tool is intended for scraping documentation websites, but it accepts any URL without restriction. This allows the tool to be used to scrape arbitrary external websites, which goes beyond the documented purpose of scraping specific domains for Q&A. An attacker could use this to fetch content from any public website, potentially for data exfiltration or reconnaissance.

ImpactAn attacker could use this tool to scrape arbitrary websites, potentially exfiltrating data or performing reconnaissance on external services. While not as severe as SSRF, it expands the attack surface beyond intended use.

FixRestrict the scrape_website tool to only allow URLs from domains that are relevant to the Q&A agent's purpose (e.g., official documentation sites). Implement a domain allowlist.

medium1 finding

mcp_server.py

132@mcp.tool
133async def search_documentation(
134    query: str,
135    sites: List[str],
136    max_results: Optional[int] = None,
137    depth: Optional[str] = None
138) -> str:
139    ...
140    search_results = await asyncio.to_thread(
141        server_clients.tavily_client.search,
142        query=query,
143        max_results=final_max_results,
144        search_depth=final_depth,
145        include_domains=sites,
146    )

mcp_server.py:132

// Exploitable if MCP is exposed to untrusted prompts (network_exposed). For local-only, requires compromised LLM.

The search_documentation tool accepts a list of sites (domains) without any validation. An attacker could provide arbitrary domains, potentially causing the Tavily search to include malicious or unintended sites. While Tavily itself may have some protections, the lack of validation allows the LLM to be tricked into searching untrusted domains, which could lead to prompt injection or retrieval of malicious content.

ImpactAn attacker could manipulate the search to include attacker-controlled domains, potentially injecting malicious content into the search results that could influence the LLM's behavior or leak sensitive information through the search query.

FixValidate that each site in the sites list matches a pattern of allowed domains (e.g., official documentation domains). Consider restricting to a predefined list or using a domain allowlist.

◆Heuristic Signals

shell.execbrowser.automationenv.exposurefilesystem.read

◆Risk Score

LLM-based

high findings+25

medium findings+30