一个与 Firecrawl 集成的模型上下文协议 (MCP) 服务器实现,用于网络爬虫功能。
特别感谢 @vrknetha 和 @cawstudios 的初始实现!
env FIRECRAWL_API_KEY=fc-YOUR_API_KEY npx -y firecrawl-mcp

npm install -g firecrawl-mcp
配置 Cursor 🖥️
注意:需要 Cursor 版本 0.45.6+
要在 Cursor 中配置 FireCrawl MCP:
env FIRECRAWL_API_KEY=your-api-key npx -y firecrawl-mcp
如果您使用的是 Windows 并遇到问题,请尝试
cmd /c "set FIRECRAWL_API_KEY=your-api-key && npx -y firecrawl-mcp"
将 your-api-key
替换为您的 FireCrawl API 密钥。
添加后,刷新 MCP 服务器列表以查看新工具。作曲家代理将在适当的时候自动使用 FireCrawl MCP,但您也可以通过描述您的网络抓取需求来明确请求它。通过 Command+L(Mac)访问作曲家,在提交按钮旁边的“代理”中选择,并输入您的查询。
将以下内容添加到您的 ./codeium/windsurf/model_config.json
文件中:
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE"
}
}
}
}

要通过 Smithery 自动安装 Claude Desktop 的 FireCrawl:
npx -y @smithery/cli install @mendableai/mcp-server-firecrawl --client claude
FIRECRAWL_API_KEY
: 您的 FireCrawl API 密钥
FIRECRAWL_API_URL
的自托管实例时可选FIRECRAWL_API_URL
(可选): 自托管实例的自定义 API 端点
https://firecrawl.your-domain.com
FIRECRAWL_RETRY_MAX_ATTEMPTS
: 最大重试次数(默认: 3)FIRECRAWL_RETRY_INITIAL_DELAY
: 第一次重试前的初始延迟(毫秒,默认: 1000)FIRECRAWL_RETRY_MAX_DELAY
: 重试之间的最大延迟(毫秒,默认: 10000)FIRECRAWL_RETRY_BACKOFF_FACTOR
: 指数退避乘数(默认: 2)FIRECRAWL_CREDIT_WARNING_THRESHOLD
: 信用使用警告阈值(默认:1000)FIRECRAWL_CREDIT_CRITICAL_THRESHOLD
: 信用使用严重阈值(默认:100)对于使用自定义重试和信用监控的云 API:
# Required for cloud API
export FIRECRAWL_API_KEY=your-api-key
# Optional retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=5 # Increase max retry attempts
export FIRECRAWL_RETRY_INITIAL_DELAY=2000 # Start with 2s delay
export FIRECRAWL_RETRY_MAX_DELAY=30000 # Maximum 30s delay
export FIRECRAWL_RETRY_BACKOFF_FACTOR=3 # More aggressive backoff
# Optional credit monitoring
export FIRECRAWL_CREDIT_WARNING_THRESHOLD=2000 # Warning at 2000 credits
export FIRECRAWL_CREDIT_CRITICAL_THRESHOLD=500 # Critical at 500 credits

对于自托管实例:
# Required for self-hosted
export FIRECRAWL_API_URL=https://firecrawl.your-domain.com
# Optional authentication for self-hosted
export FIRECRAWL_API_KEY=your-api-key # If your instance requires auth
# Custom retry configuration
export FIRECRAWL_RETRY_MAX_ATTEMPTS=10
export FIRECRAWL_RETRY_INITIAL_DELAY=500 # Start with faster retries

将以下内容添加到您的 claude_desktop_config.json
中:
{
"mcpServers": {
"mcp-server-firecrawl": {
"command": "npx",
"args": ["-y", "firecrawl-mcp"],
"env": {
"FIRECRAWL_API_KEY": "YOUR_API_KEY_HERE",
"FIRECRAWL_RETRY_MAX_ATTEMPTS": "5",
"FIRECRAWL_RETRY_INITIAL_DELAY": "2000",
"FIRECRAWL_RETRY_MAX_DELAY": "30000",
"FIRECRAWL_RETRY_BACKOFF_FACTOR": "3",
"FIRECRAWL_CREDIT_WARNING_THRESHOLD": "2000",
"FIRECRAWL_CREDIT_CRITICAL_THRESHOLD": "500"
}
}
}
}

服务器包括几个可通过环境变量设置的可配置参数。如果未配置,以下是默认值:
const CONFIG = {
retry: {
maxAttempts: 3, // Number of retry attempts for rate-limited requests
initialDelay: 1000, // Initial delay before first retry (in milliseconds)
maxDelay: 10000, // Maximum delay between retries (in milliseconds)
backoffFactor: 2, // Multiplier for exponential backoff
},
credit: {
warningThreshold: 1000, // Warn when credit usage reaches this level
criticalThreshold: 100, // Critical alert when credit usage reaches this level
},
};

这些配置控制:
重试行为
信用使用监控
服务器利用 FireCrawl 的内置速率限制和批量处理功能:
firecrawl_scrape
)从单个 URL 抓取内容,并具有高级选项。
{
"name": "firecrawl_scrape",
"arguments": {
"url": "https://example.com",
"formats": ["markdown"],
"onlyMainContent": true,
"waitFor": 1000,
"timeout": 30000,
"mobile": false,
"includeTags": ["article", "main"],
"excludeTags": ["nav", "footer"],
"skipTlsVerification": false
}
}

firecrawl_batch_scrape
)使用内置速率限制和并行处理高效地抓取多个 URL。
{
"name": "firecrawl_batch_scrape",
"arguments": {
"urls": ["https://example1.com", "https://example2.com"],
"options": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}

响应中包含用于状态检查的操作 ID:
{
"content": [
{
"type": "text",
"text": "Batch operation queued with ID: batch_1. Use firecrawl_check_batch_status to check progress."
}
],
"isError": false
}

firecrawl_check_batch_status
)检查批量操作的状态。
{
"name": "firecrawl_check_batch_status",
"arguments": {
"id": "batch_1"
}
}

firecrawl_search
)搜索网络并可选地从搜索结果中提取内容。
{
"name": "firecrawl_search",
"arguments": {
"query": "your search query",
"limit": 5,
"lang": "en",
"country": "us",
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": true
}
}
}

firecrawl_crawl
)开始一个带有高级选项的异步爬取。
{
"name": "firecrawl_crawl",
"arguments": {
"url": "https://example.com",
"maxDepth": 2,
"limit": 100,
"allowExternalLinks": false,
"deduplicateSimilarURLs": true
}
}

firecrawl_extract
)使用 LLM 功能从网页中提取结构化信息。支持云端 AI 和自托管 LLM 提取。
{
"name": "firecrawl_extract",
"arguments": {
"urls": ["https://example.com/page1", "https://example.com/page2"],
"prompt": "Extract product information including name, price, and description",
"systemPrompt": "You are a helpful assistant that extracts product information",
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
},
"required": ["name", "price"]
},
"allowExternalLinks": false,
"enableWebSearch": false,
"includeSubdomains": false
}
}

示例响应:
{
"content": [
{
"type": "text",
"text": {
"name": "Example Product",
"price": 99.99,
"description": "This is an example product description"
}
}
],
"isError": false
}

urls
: 要从中提取信息的 URL 数组prompt
: 用于 LLM 提取的自定义提示systemPrompt
: 引导 LLM 的系统提示schema
: 结构化数据提取的 JSON 模式allowExternalLinks
: 允许从外部链接提取enableWebSearch
: 启用网络搜索以获取额外上下文includeSubdomains
: 在提取中包含子域当使用自托管实例时,提取将使用您配置的LLM。对于云API,它使用FireCrawl管理的LLM服务。
利用智能爬取、搜索和LLM分析对查询进行深度网络研究。
{
"name": "firecrawl_deep_research",
"arguments": {
"query": "how does carbon capture technology work?",
"maxDepth": 3,
"timeLimit": 120,
"maxUrls": 50
}
}

参数:
返回:
为给定域名生成标准化的llms.txt文件(可选地还包括llms-full.txt)。此文件定义了大型语言模型应如何与站点交互。
{
"name": "firecrawl_generate_llmstxt",
"arguments": {
"url": "https://example.com",
"maxUrls": 20,
"showFullText": true
}
}

参数:
返回:
服务器包含了全面的日志记录:
示例日志消息:
[INFO] FireCrawl MCP Server initialized successfully [INFO] Starting scrape for URL: https://example.com [INFO] Batch operation queued with ID: batch_1 [WARNING] Credit usage has reached warning threshold [ERROR] Rate limit exceeded, retrying in 2s...
服务器提供了强大的错误处理机制:
示例错误响应:
{
"content": [
{
"type": "text",
"text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
}
],
"isError": true
}

# Install dependencies
npm install
# Build
npm run build
# Run tests
npm test

npm test
MIT许可证 - 详情请参阅LICENSE文件