javiramos1/qagent-mcp
highSimple AI Agent to answer questions for specific domains based on website docs by connecting to a remote MCP server
MCP server (purpose undetermined)
212@mcp.tool
213async def scrape_website(
214 url: str,
215 tags_to_extract: Optional[List[str]] = None
216) -> str:
217 """Scrape complete website content using Chromium browser..."""
218 try:
219 if tags_to_extract is None:
220 tags_to_extract = get_default_tags()
221 logger.info(f"🌐 Scraping: {url}")
222 loader = AsyncChromiumLoader([url])
223 html_docs = await asyncio.to_thread(loader.load)// Exploitable if MCP is exposed to untrusted prompts (network_exposed). For local-only, requires compromised LLM.
The scrape_website tool accepts a URL from the user without any validation or sanitization. An attacker can provide arbitrary URLs, including internal network addresses (e.g., http://169.254.169.254/latest/meta-data/ for cloud metadata, http://localhost:8000/ for internal services). The tool uses AsyncChromiumLoader which fetches the URL, enabling Server-Side Request Forgery (SSRF).
ImpactAn attacker could use this tool to scan internal networks, access cloud metadata endpoints, read internal service responses, or perform port scanning on the server's network. This could lead to information disclosure or further compromise of internal systems.
FixImplement a URL allowlist or blocklist. Validate that the URL scheme is https only and that the host is not an internal IP (e.g., 127.0.0.1, 10.x.x.x, 172.16-31.x.x, 192.168.x.x). Consider using a URL parsing library to check the host against a list of allowed domains.
212@mcp.tool
213async def scrape_website(
214 url: str,
215 tags_to_extract: Optional[List[str]] = None
216) -> str:
217 """Scrape complete website content using Chromium browser..."""
218 ...
219 loader = AsyncChromiumLoader([url])
220 html_docs = await asyncio.to_thread(loader.load)// Exploitable if MCP is exposed to untrusted prompts (network_exposed). For local-only, requires compromised LLM.
The scrape_website tool is intended for scraping documentation websites, but it accepts any URL without restriction. This allows the tool to be used to scrape arbitrary external websites, which goes beyond the documented purpose of scraping specific domains for Q&A. An attacker could use this to fetch content from any public website, potentially for data exfiltration or reconnaissance.
ImpactAn attacker could use this tool to scrape arbitrary websites, potentially exfiltrating data or performing reconnaissance on external services. While not as severe as SSRF, it expands the attack surface beyond intended use.
FixRestrict the scrape_website tool to only allow URLs from domains that are relevant to the Q&A agent's purpose (e.g., official documentation sites). Implement a domain allowlist.
132@mcp.tool
133async def search_documentation(
134 query: str,
135 sites: List[str],
136 max_results: Optional[int] = None,
137 depth: Optional[str] = None
138) -> str:
139 ...
140 search_results = await asyncio.to_thread(
141 server_clients.tavily_client.search,
142 query=query,
143 max_results=final_max_results,
144 search_depth=final_depth,
145 include_domains=sites,
146 )// Exploitable if MCP is exposed to untrusted prompts (network_exposed). For local-only, requires compromised LLM.
The search_documentation tool accepts a list of sites (domains) without any validation. An attacker could provide arbitrary domains, potentially causing the Tavily search to include malicious or unintended sites. While Tavily itself may have some protections, the lack of validation allows the LLM to be tricked into searching untrusted domains, which could lead to prompt injection or retrieval of malicious content.
ImpactAn attacker could manipulate the search to include attacker-controlled domains, potentially injecting malicious content into the search results that could influence the LLM's behavior or leak sensitive information through the search query.
FixValidate that each site in the sites list matches a pattern of allowed domains (e.g., official documentation domains). Consider restricting to a predefined list or using a domain allowlist.