项目的核心是使用PHP处理用户请求,通过SSH连接服务器执行爬取命令,并将结果发送到用户邮箱。
该工具具备以下功能:
页面使用Bootstrap框架实现响应式设计。以下是页面的基本HTML结构示例:
<!DOCTYPE html>
<html lang="zh-cn">
<head>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title>在线仿站工具</title>
<link rel="stylesheet" href="https://cdn.staticfile.org/twitter-bootstrap/4.3.1/css/bootstrap.min.css">
</head>
<body>
<nav class="navbar sticky-top navbar-expand-lg navbar-light bg-white border-bottom">
<div class="container">
<a class="navbar-brand" href="./">在线仿站工具</a>
</div>
</nav>
<div class="container mt-5">
<div class="row">
<div class="col-md-8 offset-md-2">
<input type="text" id="url" class="form-control" placeholder="请输入有效的网址" required/>
<input type="text" id="email" class="form-control mt-2" placeholder="请输入有效的邮箱" required/>
<input id="submit" type="button" value="提交任务" class="btn btn-primary btn-block mt-3"/>
</div>
</div>
</div>
<script src="./assets/js/common.js"></script>
</body>
</html>
后端使用PHP实现,主要功能集中在api.php
文件中。以下是该文件的核心代码示例:
首先,我们检查请求方法是否为POST,并获取URL和邮箱:
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {
exit(json_encode(array('code' => '-1', 'msg' => '仅支持POST请求'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
$url = $_POST['url'];
$email = $_POST['email'];
使用正则表达式验证输入的URL格式,确保用户输入的是有效的URL:
$preg = "/^http(s)?:\\/\\/.+/";
if (!preg_match($preg, $url)) {
log_error("Invalid URL format: $url");
exit(json_encode(array('code' => '-1', 'msg' => '域名请带上协议头!如(http:// 或 https://)'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
生成唯一的文件名以便于后续的下载:
$timestamp = time();
$file = "website_$timestamp.zip";
通过自定义的SSH类连接到服务器并执行Wget命令:
$ssh = new Components_Ssh($host, $user, $pass, $port, $log);
$command = escapeshellcmd("bash /www/wwwroot/{$site_url}/wget_site.sh {$url} {$file} >/dev/null && echo \"success\"");
$result = $ssh->cmd($command);
检查文件是否成功生成,若未生成,则记录错误信息:
if (!file_exists('./down/' . $file)) {
log_error("File generation failed for: $url");
exit(json_encode(array('code' => '-1', 'msg' => '爬取失败,请稍后再试。'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
发送爬取结果到用户邮箱的代码如下:
$content = '爬取成功,下载链接:' . $site_url . '/down/' . $file;
$wz = $smtpapi . "?adress=" . urlencode($email) . "&isHTML=false&title={$site_url}爬取成功&content=" . urlencode($content);
$response = @file_get_contents($wz);
使用自定义的log_error
函数记录所有错误信息,确保系统的可维护性:
function log_error($message) {
error_log(date('Y-m-d H:i:s') . " - " . $message . "\n", 3, '/path/to/error.log');
}
以下是一个简单的SSH连接类示例,供后续使用:
class Components_Ssh {
private $connection;
public function __construct($host, $user, $pass, $port = 22) {
$this->connection = ssh2_connect($host, $port);
ssh2_auth_password($this->connection, $user, $pass);
}
public function cmd($command) {
$stream = ssh2_exec($this->connection, $command);
stream_set_blocking($stream, true);
return stream_get_contents($stream);
}
}
wget_site.sh
脚本负责执行实际的爬取操作,代码示例如下:
#!/bin/bash
url=$1
file=$2
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent $url -P /www/wwwroot/your_site/down/
zip -r /www/wwwroot/your_site/down/$file /www/wwwroot/your_site/down/*
处理执行命令后的返回结果以便后续使用:
if ($result === 'success') {
exit(json_encode(array('code' => '1', 'msg' => '爬取任务已提交,请查看邮箱。'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
} else {
log_error("Command execution failed: $command");
exit(json_encode(array('code' => '-1', 'msg' => '命令执行失败,请重试。'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
在前端添加任务提交前的提示,增强用户体验:
document.getElementById('submit').onclick = function() {
const url = document.getElementById('url').value;
const email = document.getElementById('email').value;
if (url === '' || email === '') {
alert('请填写所有字段!');
return;
}
fetch('api.php', {
method: 'POST',
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
},
body: `url=${encodeURIComponent(url)}&email=${encodeURIComponent(email)}`
})
.then(response => response.json())
.then(data => {
if (data.code === '-1') {
alert(data.msg);
} else {
alert('任务已提交,请查看您的邮箱!');
}
});
};
检查邮件发送状态以确保用户能够及时收到通知:
if ($response === false) {
log_error("Email failed to send to: $email");
exit(json_encode(array('code' => '-1', 'msg' => '邮件发送失败,请检查邮箱地址。'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
这个在线仿站工具允许用户快速爬取并下载网站资源。关键在于安全地处理用户输入、稳定地执行后端爬取操作,并确保系统的可维护性。
<?php
include "./common.php";
include "./Config.php"; // 引入配置文件
ignore_user_abort(true);
ini_set('max_execution_time', '0');
date_default_timezone_set('Asia/Shanghai');
@header('Content-Type: application/json; charset=UTF-8');
// 日志文件路径
$log_file = "./error_log.txt";
// 捕获错误并记录详细日志的函数
function log_error($error_message) {
global $log_file;
$timestamp = date("Y-m-d H:i:s");
// 检查错误信息是否在已定义的内容中
$defined_errors = [
'爬取失败,请检查网址是否正确!',
'域名请带上协议头!如(http:// 或 https://)',
'爬取失败!',
'系统出错,请稍后重试!'
];
// 如果错误信息不在已定义的列表中,则使用统一的提示
if (!in_array($error_message, $defined_errors)) {
$error_message = "系统错误,请联系管理员";
}
file_put_contents($log_file, "[$timestamp] $error_message\n", FILE_APPEND);
}
// 检查请求方法
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {
exit(json_encode(array('code' => '-1', 'msg' => 'Web front-end crawling API - Author: C4rpeDime, Blog: www.1042.net.'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
if (isset($_POST['url'])) {
$url = $_POST['url'];
$email = $_POST['email'];
try {
// 验证URL返回状态码
if (get_code($url) != 200) {
log_error("URL request failed for $url with status code not 200.");
exit(json_encode(array('code' => '-1', 'msg' => '爬取失败,请检查网址是否正确!'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
// 检查URL格式
$preg = "/^http(s)?:\\/\\/.+/";
if (!preg_match($preg, $url)) {
log_error("Invalid URL format: $url");
exit(json_encode(array('code' => '-1', 'msg' => '域名请带上协议头!如(http:// 或 https://)'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
// 生成文件名
$file = parse_url($url)['host'] . '-' . mt_rand(10000, 99999);
// 执行SSH命令(捕获可能的错误)
$ssh = new Components_Ssh($host, $user, $pass, $port, $log);
$command = escapeshellcmd("bash /www/wwwroot/{$site_url}/wget_site.sh {$url} {$file} >/dev/null && echo \"success\"");
$ssh->cmd($command);
// 检查文件是否生成成功
if (file_exists('./down/' . $file . '.zip')) {
$content = '你在' . $site_url . '提交的前端爬取请求已结束,下载链接:' . $site_url . '/down/' . $file . '.zip';
$wz = $smtpapi . "?adress=" . urlencode($email) . "&isHTML=false&title={$site_url}爬取成功&content=" . urlencode($content);
// 检查 URL 是否可访问
$response = @file_get_contents($wz);
if ($response === false) {
log_error("Failed to send email notification: " . error_get_last()['message']);
}
exit(json_encode(array(
'code' => '1',
'msg' => '爬取成功!',
'down' => $site_url . '/down/' . $file . '.zip',
'yulan' => $site_url . '/work/' . $file . '/' . parse_url($url)['host']
), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
} else {
log_error("File not generated: $file");
exit(json_encode(array('code' => '-1', 'msg' => '爬取失败!'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
} catch (Exception $e) {
// 捕获所有异常并记录详细信息
log_error("Error processing URL: " . $url . " - " . $e->getMessage());
exit(json_encode(array('code' => '-1', 'msg' => '系统出错,请稍后重试!'), JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
} else {
// 转换字节大小的函数保持不变
function trans_byte($byte) {
$KB = 1024;
$MB = 1024 * $KB;
$GB = 1024 * $MB;
$TB = 1024 * $GB;
if ($byte < $KB) {
return $byte . "B";
} elseif ($byte < $MB) {
return round($byte / $KB, 2) . "KB";
} elseif ($byte < $GB) {
return round($byte / $MB, 2) . "MB";
} elseif ($byte < $TB) {
return round($byte / $GB, 2) . "GB";
} else {
return round($byte / $TB, 2) . "TB";
}
}
// 处理分页请求(无更改)
$list = glob('./down/*.zip');
$count = count($list);
$page_num = isset($_GET['limit']) ? $_GET['limit'] : 10;
$pages = ceil($count / $page_num);
$page = isset($_GET['page']) ? $_GET['page'] : 1;
$startpos = ($page - 1) * $page_num;
$json['code'] = '0';
// 返回数据
$arr = []; // Initialize $arr to avoid undefined variable notice
for ($i = $startpos; $i < min($startpos + $page_num, $count); $i++) {
$arr[] = basename($list[$i]); // 仅返回文件名
}
$json['data'] = $arr;
exit(json_encode($json, JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT));
}
// 获取URL状态码的函数保持不变
function get_code($url) {
$ch = curl_init();
$timeout = 3;
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
return $httpcode;
}
?>
<?php
// 使用环境变量加载敏感信息,避免将信息硬编码到文件中
$title = "在线仿站工具-Wget.Fit";
$copyright = "Copyright © 2021-2023 WgetFit All Rights Reserved."; // 底部版权
// SSH 连接信息,通过环境变量加载
$host = getenv('SSH_HOST') ?: 'localhost';
$user = getenv('SSH_USER') ?: 'root';
$pass = getenv('SSH_PASS') ?: 'Your_password'; // 仅作默认值,应从环境变量中获取
$port = getenv('SSH_PORT') ?: '22';
// 日志配置,增加日志级别控制
$log = getenv('LOG_LEVEL') ?: 'error'; // 默认记录错误日志
// 邮件 API 地址,从环境变量加载或使用默认值
$smtpapi = getenv('SMTP_API') ?: 'https://Your_domain_name/smtp/api.php';
// 网站 URL
$site_url = "Your_website_directory"; // 网站域名与wwwroot内网站目录须一致
?>
<?php
include "./common.php";
?>
<!DOCTYPE html>
<html lang="zh-cn">
<head>
<meta charset="utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<title><?php echo $title?> </title>
<meta name="keywords" content="在线扒站,手机扒站,扒站工具,扒站软件,扒网站工具,扒站,仿站,在线仿站,一键扒站,网站下载器">
<meta name="description" content="本工具永久免费使用!只需要一个浏览器,一键将目标网站的前端代码扒下来,自动将指定网页的HTML、CSS、JS、图片等前端资源分类,自动更改资源路径为本地路径,支持一键打包在线下载。">
<meta name="author" content="Wget.Fit">
<link rel="stylesheet" href="https://cdn.staticfile.org/twitter-bootstrap/4.3.1/css/bootstrap.min.css">
<script src="https://cdn.staticfile.org/jquery/3.2.1/jquery.min.js"></script>
<script src="https://cdn.staticfile.org/popper.js/1.15.0/umd/popper.min.js"></script>
<link rel="stylesheet" href="//cdn.staticfile.org/font-awesome/4.7.0/css/font-awesome.min.css">
<link rel="stylesheet" href="//cdn.staticfile.org/twitter-bootstrap/4.6.1/css/bootstrap.min.css">
<link rel='stylesheet' href='./static/css/style.css?v=1002'>
<!-- Pixel Code for https://zz.sangyun.net/ -->
<script defer src="https://zz.sangyun.net/pixel/ZfvYr1njxm5mQ2lv"></script>
<!-- END Pixel Code --></head>
<nav class="navbar sticky-top navbar-expand-lg navbar-light bg-white border-bottom" id="navbar">
<div class="container big-nav">
<a class="navbar-brand" href="./">
<img src="/logo.png" width="180" height="40" class="d-inline-block align-top mr-2" alt="">
</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
<li class="nav-item">
<a class="nav-link" href="/">首页</a>
</li>
<li class="nav-item">
<a class="nav-link" href="/down">数据</a>
</li>
</ul>
<form class="form-inline my-2 my-lg-0" action="./down/" method="GET">
<input type="text" name="c" required lay-verify="required" autocomplete="off" value = "search" style = "display: none;">
<input name="s" class="form-control mr-sm-2" type="search" placeholder="请输入网址关键字" aria-label="Search" value="">
<button class="btn btn-outline-primary my-2 my-sm-0" type="submit"> <i class="fa fa-search" aria-hidden="true"></i> 搜索</button>
</form>
</div>
</div>
</nav>
<div class="col-xs-12 col-sm-10 col-md-8 col-lg-4 center-block " style="float: none;">
<img src="./assets/simple/img/userbg.jpg" width="100%" >
<div id="main">
<br/>
<input type="text" id="url" name="url" value="" class="form-control" required="required" placeholder="请输入一个有效的网址"/><br/>
<input type="text" id="email" name="email" value="" class="form-control" required="required" placeholder="请输入一个有效的邮箱"/>
</div><br/>
<div class="form-group">
<input id="submit" type="button" value="提交任务" class="btn btn-primary btn-block"/>
</div>
<div id="running_alert">
<div id="running_class" class="alert alert-info alert-dismissable">
当前暂未提交爬取任务!
</div>
</div>
</div>
</div>
<script src="./assets/js/common.js"></script>
<footer class="footer card-footer mt-3" id="footer">
<span> <?php echo $copyright?>丨<a href="https://wget.fit/WgetFitCode.zip">Code download</a></span>
</footer>
<script src="//cdn.staticfile.org/twitter-bootstrap/4.6.1/js/bootstrap.min.js"></script>
<script src="//cdn.staticfile.org/jquery.qrcode/1.0/jquery.qrcode.min.js"></script>
<script src="//cdn.staticfile.org/layer/3.1.1/layer.js"></script>
<script src="./static/js/clipBoard.min.js"></script>
</body>
</html>
<?php
// 检查并包含 Config.php 文件
if (file_exists('Config.php')) {
include 'Config.php';
} else {
error_log("Config.php 文件未找到。", 0);
exit("服务器配置错误,请联系管理员。");
}
// 检查并包含 ssh.class.php 文件
if (file_exists('./ssh.class.php')) {
include './ssh.class.php';
} else {
error_log("ssh.class.php 文件未找到。", 0);
exit("服务器配置错误,请联系管理员。");
}
?>
<?php
class Components_Ssh {
private $connection;
private $log;
// 构造函数,初始化SSH连接
public function __construct($host, $user, $pass, $port = 22, $log = false) {
$this->log = $log;
$this->connection = ssh2_connect($host, $port);
if (!$this->connection) {
$this->log_error("无法连接到SSH主机 $host:$port");
throw new Exception("无法连接到SSH主机,请检查网络连接或主机配置。");
}
$authSuccess = ssh2_auth_password($this->connection, $user, $pass);
if (!$authSuccess) {
$this->log_error("SSH身份验证失败:$user@$host");
throw new Exception("SSH身份验证失败,请检查用户名或密码。");
}
$this->log_info("SSH连接已建立:$user@$host:$port");
}
// 执行命令
public function cmd($command) {
$this->log_info("SSH命令:$command");
// 验证输入命令的安全性
if (!$this->validate_command($command)) {
$this->log_error("命令验证失败:$command");
throw new Exception("不安全的命令:$command");
}
$stream = ssh2_exec($this->connection, $command);
if (!$stream) {
$this->log_error("SSH命令执行失败:$command");
throw new Exception("SSH命令执行失败,请检查命令格式或权限。");
}
stream_set_blocking($stream, true);
$output = stream_get_contents($stream);
fclose($stream);
$this->log_info("命令输出:$output");
return $output;
}
// 关闭SSH连接
public function close() {
$this->log_info("关闭SSH连接");
$this->connection = null; // 修正错误的变量引用
}
// 日志记录函数
private function log_info($message) {
if ($this->log) {
error_log("[INFO] $message", 0);
}
}
private function log_error($message) {
error_log("[ERROR] $message", 0);
}
// 验证命令安全性的方法
private function validate_command($command) {
// 增加命令白名单或过滤规则,确保安全
$allowed_commands = ['ls', 'pwd', 'whoami', 'bash', 'wget'];
foreach ($allowed_commands as $allowed) {
if (strpos($command, $allowed) !== false) {
return true;
}
}
return false;
}
}
#!/bin/sh
wget -r -p -np -k -P /www/wwwroot/Your_website_directory/work/$2 $1
cp /www/wwwroot/Your_website_directory/readme.txt /www/wwwroot/Your_website_directory/work/$2/readme.txt
cd /www/wwwroot/Your_website_directory/work/$2/
zip -r $2.zip ./
mv $2.zip /www/wwwroot/Your_website_directory/down
.rm -rf /www/wwwroot/Your_website_directory/work/$2
服务器需要安装ssh2扩展 同目录下需要创建down和work目录
@echo off
chcp 65001
setlocal enabledelayedexpansion
:: 输入要爬取的URL
set /p url=请输入要爬取的URL(http/https):
:: 输入邮箱
set /p email=请输入你的邮箱:
:: 调用 API,并将结果保存到临时文件
echo 正在发送请求到 https://1.1042.net/api.php ...
curl -X POST -d "url=!url!" -d "email=!email!" "https://1.1042.net/api.php" -H "Content-Type: application/x-www-form-urlencoded" -o response.json
:: 检查响应是否成功
if %errorlevel% neq 0 (
echo 请求失败,请检查你的网络连接或 API 地址。
pause
exit /b
)
:: 解析返回的 JSON
set "found=false"
for /f "delims=" %%i in ('type response.json ^| findstr /R /C:"down" /C:"yulan"') do (
set "line=%%i"
set "line=!line:~1,-1!" :: 去掉引号
echo !line!
set "found=true"
)
:: 检查是否找到了链接
if not !found! == true (
echo 未找到下载链接或预览链接,请检查 API 返回的内容。
type response.json
) else (
echo 请求成功,已获取下载链接和预览链接。
)
:: 清理临时文件
del response.json
pause
运行结果
https://github.com/C4rpeDime/WgetFit
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。