文章/答案/技术大牛

发布

社区首页 >专栏 >Github Action实现友链状态检测

Github Action实现友链状态检测

柳神

发布于 2024-06-24 02:53:58

21400

代码可运行

文章被收录于专栏：清羽飞扬清羽飞扬

运行总次数：0

代码可运行

碎碎念

起初，我管理友链时采取的是手动点击检验的方式，但随着时间的推移，友链数量逐渐增加至73条，这一做法显然已不再高效。我曾看到一些大佬实现了直接在友链卡片上标注可达状态的功能，遗憾的是，我并没有找到相关的教程。在探索过程中，我发现友链圈里存在一个API，它能够返回未能成功抓取的链接，原理是，如果某个站点在过去两个月内未曾产出新文章，则被视为不可达。然而，这种滞后性的判定机制明显影响了友链监测的即时性，所以生成的结果还是仅供参考。

于是，我动手编写了一个Python脚本，安排在执行hexo d命令时同步运行，以此来检测友链状态，并将检测结果输出到控制台，虽稍显原始，但也算是也勉强能用哈哈。偶然间在一次日常的糖果屋QQ群闲聊中，我看到了群友安小歪分享的一个方案，他利用GitHub Actions调度脚本运行，并最终生成比较简洁的HTML页面展示检测结果，这一思路极大地启发了我。

在此基础上，我进一步优化了这一方案，设计出更为美观的前端展示界面，并额外写了一项类似API的功能，输出所有友链数据的可达性，针对适配性问题，我还使用根目录下的更加简洁的txt文件进行了适配检测并输出同样的内容。最终，借助编写好的JavaScript代码，我成功地将这些实时检测结果嵌入到了友情链接页面的每个卡片左上角，大大提升了友链管理的效率与直观性。

🪧引用站外地址，不保证站点的可用性和安全性

check-flink，⚙️检查友链链接是否可达，可以大幅度减少检查工作量。

github.com@willow-god

功能概览

github action自动定时检测友链状态，结果输出到根目录下result.json。
友链状态展示页面，可以部署到zeabur或者vercel，加速api访问速度。
为确保兼容性，实现了两种检测方案：
- 非兼容：使用该格式文件动态读取友链内容，实现功能，友链列表自动实时性更新。
- 兼容：使用TXT存储所有友链信息，兼容性好，适合所有站点，但是添加友链后可能需要手动更新文件。
API访问数据，api包含数据包括可达链接，不可达链接，可达链接数目。不可达链接数目，更新时间戳，其中链接中包含站点名称和地址，便于前端部署。
测试脚本使用python，使用Request包的get和head两种检测方式检测，尽可能减少误判概率。
前端采用缓存，减少api调用次数，缓存半个小时刷新，基本不影响实时性。

使用教程

github配置

添加 GitHub Secrets 在GitHub仓库的设置中，添加一个名为 PAT_TOKEN 的密钥，步骤如下：
- 打开你的GitHub仓库，点击右上角的Settings。
- 在左侧栏中找到并点击Secrets and variables，然后选择Actions。
- 点击New repository secret按钮。
- 在“Name”字段中输入 PAT_TOKEN。
- 在Secret字段中粘贴你的Personal Access Token（个人访问令牌）。
- 点击Add secret按钮保存。

其中 PAT_TOKEN 请在右上角设置，开发者选项自行生成，给予仓库提交权限。

配置仓库权限 在GitHub仓库的设置中，确保Actions有写权限，步骤如下：
- 打开你的GitHub仓库，点击右上角的Settings。
- 在左侧栏中找到并点击Actions。
- 选择General。
- 在Workflow permissions部分，选择Read and write permissions。
- 点击Save按钮保存设置。

获取方式

动态Json获取

该方法适用于hexo-theme-butterfly，其他主题理论上也适配，但是需要自行修改代码实现相关功能；

首先，在hexo根目录下创建link.js，写入以下内容：

const YML = require('yamljs')
const fs = require('fs')

let ls   = [],
    data = YML.parse(fs.readFileSync('source/_data/link.yml').toString().replace(/(?<=rss:)\s*\n/g, ' ""\n'));

data.forEach((e, i) => {
    let j = 2;  //获取友链数组的范围（除了最后，前面的都获取）
    if (i < j) ls = ls.concat(e.link_list)
});
fs.writeFileSync('./source/flink_count.json', `{"link_list": ${JSON.stringify(ls)},"length":${ls.length}}`)
console.log('flink_count.json 文件已生成。');

其中的j表示获取的友链数组的范围，比如你只想要第一组，那么填写1即可。

根目录下执行以下内容：

node link.js

你将在[HexoRoot]/source文件夹下看到flink_count.json文件，文件格式如下：

{
  "link_list": [
    {
      "name": "String",
      "link": "String",
      "avatar": "String",
      "descr": "String",
      "siteshot": "String"
    },{
      "name": "String",
      "link": "String",
      "avatar": "String",
      "descr": "String",
      "siteshot": "String"
    },
    // ... 其他76个博客站点信息
  ],
  "length": 77
}

该文件将在执行hexo g命令时进入[BlogRoot]/public目录下，并上传到网络上。

为了方便，可以写一个脚本，代替执行hexo d的功能（可选）：

@echo off
E:
cd E:\Programming\HTML_Language\willow-God\blog
node link.js && hexo g && hexo algolia && hexo d

上传之后，你就可以使用路径：https://blog.example.com/flink_count.json获取到所有的友链数据，下面修改github上的文件：test-friend.py

import json
import requests
import warnings
import concurrent.futures
from datetime import datetime

# 忽略警告信息
warnings.filterwarnings("ignore", message="Unverified HTTPS request is being made.*")

# 用户代理字符串，模仿浏览器
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

# 检查链接是否可访问的函数
def check_link_accessibility(item):
    headers = {"User-Agent": user_agent}
    link = item['link']
    try:
        # 发送HEAD请求
        response = requests.head(link, headers=headers, timeout=5)
        if response.status_code == 200:
            return [item, 1]  # 如果链接可访问，返回链接
    except requests.RequestException:
        pass  # 如果出现请求异常，不执行任何操作
    
    try:
        # 如果HEAD请求失败，尝试发送GET请求
        response = requests.get(link, headers=headers, timeout=5)
        if response.status_code == 200:
            return [item, 1]  # 如果GET请求成功，返回链接
    except requests.RequestException:
        pass  # 如果出现请求异常，不执行任何操作
    
    return [item, -1]  # 如果所有请求都失败，返回-1

# 目标JSON数据的URL
json_url = 'https://blog.qyliu.top/flink_count.json' # 修改这里

# 发送HTTP GET请求获取JSON数据
response = requests.get(json_url)
if response.status_code == 200:
    data = response.json()  # 解析JSON数据
    link_list = data['link_list']  # 提取所有的链接项
else:
    print(f"Failed to retrieve data, status code: {response.status_code}")
    exit()

# 使用ThreadPoolExecutor并发检查多个链接
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(check_link_accessibility, link_list))

# 分割可达和不可达的链接
accessible_results = [{'name': result[0]['name'], 'link': result[0]['link']} for result in results if result[1] == 1]
inaccessible_results = [{'name': result[0]['name'], 'link': result[0]['link']} for result in results if result[1] == -1]

# 获取当前时间
current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# 统计可访问和不可访问的链接数
accessible_count = len(accessible_results)
inaccessible_count = len(inaccessible_results)

# 将结果写入JSON文件
output_json_path = './result.json'
with open(output_json_path, 'w', encoding='utf-8') as file:
    json.dump({
        'timestamp': current_time,
        'accessible_links': accessible_results,
        'inaccessible_links': inaccessible_results,
        'accessible_count': accessible_count,
        'inaccessible_count': inaccessible_count
    }, file, ensure_ascii=False, indent=4)

print(f"检查完成，结果已保存至 '{output_json_path}' 文件。")

修改其中的json_url为你的对应地址，保存即可。由于github action脚本默认方式即为这种方式，所以不需要进行修改。

静态TXT获取

这个方式较为简单，但是维护稍微有点点麻烦，你需要将所有数据写到仓库根目录的link.txt文件中，格式如下：

清羽飞扬,https://blog.qyliu.top/
ChrisKim,https://www.zouht.com/
Akilar,https://akilar.top/
张洪Heo,https://blog.zhheo.com/
安知鱼,https://blog.anheyu.com/
杜老师说,https://dusays.com
Tianli,https://tianli-blog.club/
贰猹,https://noionion.top/

其中前面是名称，后面是链接，名称是为了使我们的结果json数据更加全面，同时和上面动态Json获取的方式统一，减少后面部分的工作量，增强兼容性。

处理这部分可能比较消耗时间。

🪧引用站外地址，不保证站点的可用性和安全性

KIMI智能助手，欢迎探索月之暗面

月之暗面||moonshot.cn

这里可以使用kimi帮你整理，自行组织语言并命令，复制最终结果并保存即可。

下面修改github action脚本内容，修改其中运行python脚本的部分：

name: Check Links and Generate JSON

on:
  push:
    branches:
      - main
  schedule:
    - cron: '0 1 * * *'
    - cron: '0 13 * * *'
  workflow_dispatch:

env:
  TZ: Asia/Shanghai

jobs:
  check_links:
    runs-on: ubuntu-latest

    steps:
    - name: Pull latest repository
      uses: actions/checkout@v2

    - name: Install python
      uses: actions/setup-python@v2
      with:
        python-version: '3.x'

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install requests

    - name: Run Python script to check frined-links
      run: python test-friend-in-txt.py # 改的是这里，修改后请删除注释

    - name: Configure git
      run: |
        git config --global user.email "actions@github.com"
        git config --global user.name "GitHub Actions"

    - name: Commit and push
      env:
        PAT_TOKEN: ${{ secrets.PAT_TOKEN }}
      run: |
        git add .
        git commit -m "⏱️GitHub Action每日定时更新"
        git push https://x-access-token:${{ secrets.PAT_TOKEN }}@github.com/${{ github.repository }}.git main

修改结束。

部署展示页面

在成功运行一次后，该仓库即成为了一个前端页面，可以自行安排上传到哪里，这里选择zeabur上传，以下是大致步骤：

**登录 Vercel 或 Zeabur**：
- 如果还没有账户，请先注册一个 Vercel 或 Zeabur 账户。
- 登录后进入仪表板。
导入 GitHub 仓库：
- 点击New Project或Import Project按钮。
- 选择Import Git Repository。
- 连接到您的 GitHub账户，并选择该链接检查项目的仓库。
配置项目：
- 确保选择正确的分支（如 main）。
- 对于 Vercel，在 Build and Output Settings中，确保 output.json 文件在构建输出目录中。
部署项目：
- 点击Deploy按钮开始部署。
- 部署完成后，Vercel 或 Zeabur 会生成一个 URL，您可以使用这个 URL 访问部署的网页。

此时如果不出意外，前端页面应该可以展示数据了，如下：

将数据展示到前端

通用解释

通过上面部署，我们就可以通过地址访问获得的数据了（用本站部署的作为示例）：

https://check.zeabur.app/result.json

链接检查结果以JSON格式存储，主要包含以下字段：

accessible_links: 可访问的链接列表。
inaccessible_links: 不可访问的链接列表。
timestamp: 生成检查结果的时间戳。

以下是一个示例结构：

{
    "accessible_links": [
        {
            "name": "清羽飞扬",
            "link": "https://blog.qyliu.top/"
        },
        {
            "name": "ChrisKim",
            "link": "https://www.zouht.com/"
        }
    ],
    "inaccessible_links": [
        {
            "name": "JackieZhu",
            "link": "https://blog.zhfan.top/"
        },
        {
            "name": "青桔气球",
            "link": "https://blog.qjqq.cn/"
        }
    ],
    "accessible_count": 2,
    "inaccessible_count": 2,
    "timestamp": "2024-06-20T23:40:15"
}

比如，你可以通过以下页面展示其友链数据到前端，当然，该代码仅作解释，具体效果请自行实现：

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>友链检测</title>
</head>
<body>
    <h1>不可访问的链接</h1>
    <div id="inaccessibleLinksContainer"></div>

    <script>
        async function fetchInaccessibleLinks() {
            const jsonUrl = 'https://your-deployed-url.com/result.json';
            try {
                const response = await fetch(jsonUrl);
                if (!response.ok) {
                    throw new Error(`HTTP error! status: ${response.status}`);
                }
                const data = await response.json();
                displayInaccessibleLinks(data.inaccessible_links);
            } catch (error) {
                console.error("Fetch error: ", error);
            }
        }

        function displayInaccessibleLinks(links) {
            const container = document.getElementById('inaccessibleLinksContainer');
            container.innerHTML = ''; // 清空容器
            links.forEach(link => {
                const linkElement = document.createElement('p');
                linkElement.innerHTML = `<strong>${link.name}:</strong> <a href="${link.link}" target="_blank">${link.link}</a>`;
                container.appendChild(linkElement);
            });
        }

        fetchInaccessibleLinks();
    </script>
</body>
</html>

本站方案

注意，本站采用方案需要按照本站教程魔改友情链接页面，否则需要自行修改代码内容，所以仅供参考，请以自身情况为准

在[BlogRoot]/source/link/index.md下方填写以下内容：

<style>
    .status-tag {
        position: absolute;
        top: 0px;
        left: 0px;
        padding: 3px 8px;
        border-radius: 6px 0px 6px 0px;
        font-size: 12px;
        color: white;
        font-weight: bold;
    }
</style>
<script>
function addStatusTagsWithCache(jsonUrl) {
    const cacheKey = "statusTagsData";
    const cacheExpirationTime = 30 * 60 * 1000; // 半小时
    function fetchDataAndUpdateUI() {
        fetch(jsonUrl)
            .then(response => response.json())
            .then(data => {
                const accessibleLinks = data.accessible_links.map(item => item.link.replace(/\/$/, ''));
                const inaccessibleLinks = data.inaccessible_links.map(item => item.link.replace(/\/$/, ''));
                document.querySelectorAll('.site-card').forEach(card => {
                    const link = card.href.replace(/\/$/, '');
                    const statusTag = document.createElement('div');
                    statusTag.classList.add('status-tag');
                    let matched = false;
                    if (accessibleLinks.includes(link)) {
                        statusTag.textContent = '正常';
                        statusTag.style.backgroundColor = '#005E00';
                        matched = true;
                    } else if (inaccessibleLinks.includes(link)) {
                        statusTag.textContent = '疑问';
                        statusTag.style.backgroundColor = '#9B0000';
                        matched = true;
                    }
                    if (matched) {
                        card.style.position = 'relative';
                        card.appendChild(statusTag);
                    }
                });
                const cacheData = {
                    data: data,
                    timestamp: Date.now()
                };
                localStorage.setItem(cacheKey, JSON.stringify(cacheData));
            })
            .catch(error => console.error('Error fetching test-flink result.json:', error));
    }
    const cachedData = localStorage.getItem(cacheKey);
    if (cachedData) {
        const { data, timestamp } = JSON.parse(cachedData);
        if (Date.now() - timestamp < cacheExpirationTime) {
            const accessibleLinks = data.accessible_links.map(item => item.link.replace(/\/$/, ''));
            const inaccessibleLinks = data.inaccessible_links.map(item => item.link.replace(/\/$/, ''));
            document.querySelectorAll('.site-card').forEach(card => {
                const link = card.href.replace(/\/$/, '');
                const statusTag = document.createElement('div');
                statusTag.classList.add('status-tag');
                let matched = false;
                if (accessibleLinks.includes(link)) {
                    statusTag.textContent = '正常';
                    statusTag.style.backgroundColor = '#005E00';
                    matched = true;
                } else if (inaccessibleLinks.includes(link)) {
                    statusTag.textContent = '疑问';
                    statusTag.style.backgroundColor = '#9B0000';
                    matched = true;
                }
                if (matched) {
                    card.style.position = 'relative';
                    card.appendChild(statusTag);
                }
            });
            return;
        }
    }
    fetchDataAndUpdateUI();
}
setTimeout(() => {
    addStatusTagsWithCache('https://check.zeabur.app/result.json');
}, 0);
</script>

这段代码是一个JavaScript脚本，它定义了一个名为addStatusTagsWithCache的函数，该函数用于在网页上的链接卡片上添加状态标签。

CSS样式定义：首先定义了一个.status-tag的CSS类，这个类为状态标签设置了样式，包括绝对定位、填充、边框圆角、字体大小、颜色和字体粗细。
JavaScript函数定义：定义了一个addStatusTagsWithCache函数，该函数接收一个参数jsonUrl，这个参数是一个JSON格式的URL，用于获取链接状态数据。
缓存机制：函数内部使用localStorage来实现缓存机制，通过cacheKey和cacheExpirationTime来存储和控制缓存数据的有效期，减少对于api的请求次数并减少通信延迟。
数据获取与UI更新：fetchDataAndUpdateUI是一个内部函数，用于从提供的URL获取数据，并更新页面上的UI。它首先使用fetch API请求JSON数据，然后解析数据，并根据数据中的可访问链接和不可访问链接列表，为页面上的.site-card元素添加状态标签。
状态标签样式：根据链接的状态，状态标签的文本和背景颜色会有所不同。如果链接是可访问的，则文本为“正常”，背景颜色为绿色；如果链接是不可访问的，则文本为“疑问”，背景颜色为红色。
缓存检查：在执行fetchDataAndUpdateUI之前，脚本会检查是否存在有效的缓存数据。如果缓存数据存在并且未过期，则直接使用缓存数据更新UI，否则调用fetchDataAndUpdateUI来获取最新数据。
延迟执行：使用setTimeout函数延迟执行addStatusTagsWithCache函数，确保在页面加载完成后再执行此函数。
实际URL调用：最后，脚本通过调用addStatusTagsWithCache函数，并传入实际的JSON URL（’https://check.zeabur.app/result.json'），来启动整个流程。

整个脚本的目的是动态地根据服务器返回的链接状态数据，在页面上为每个链接卡片添加相应的状态标签，以提示用户链接的当前状态。同时，通过使用缓存机制，可以减少对服务器的请求次数，提高页面性能。

最终展示图如下：

缺陷

网络延迟：网络延迟会影响请求的响应时间，特别是当检测的链接位于地理位置较远或网络条件较差的服务器上时。
国外网络环境限制：如果GitHub Actions的服务器位于国外，可能会因为某些国家或地区的网络审查制度而无法访问部分网站。
Python检测缺陷：使用Python的requests库进行检测可能无法完全模拟浏览器行为，例如，它可能无法处理JavaScript渲染的页面或执行某些客户端脚本。
请求限制：某些网站可能会对频繁的请求进行限制，导致GitHub Actions的IP地址被暂时或永久地封禁。
HTTP头信息：使用head方法虽然可以获取页面的元数据，但不会获取到页面的实际内容，这可能导致一些需要分析页面内容才能判断的可访问性问题被忽略。
HTTPS证书问题：如果检测的链接使用自签名证书或不受信任的证书，requests可能会抛出警告或错误，导致检测失败。
重定向处理：某些链接可能会进行重定向，如果脚本没有正确处理重定向，可能会误判链接的状态。

总结

虽然这个方式有缺陷，但也在很大程度上减少了我们的工作量，可以不用手动一个个检测了。自动化的检测流程不仅节约了大量时间，还提高了检测的一致性和准确性。通过脚本，我们可以快速地对大量链接进行批量检查，及时地发现问题并进行相应的处理。此外，自动化测试可以很容易地集成到持续集成/持续部署（CI/CD）的流程中，确保在软件开发周期的早期阶段就能识别和修复问题。尽管存在一些局限性，但通过适当的配置和优化，我们可以最大限度地减少这些缺陷的影响，同时享受自动化带来的便利。