Hexo 流程自动化

发表于 2024-09-06 分类于 tech ， Hexo ， Python 本文字数： 1.6k 阅读时长 ≈ 6 分钟

序言

最近感到我使用 Hexo 写文章时的流程有些繁琐，因此想要将这一流程自动化。流程自动化的工作包括：更新 NPM 依赖、编写构建脚本弥补 Hexo 现有的不足、使用 GitHub Actions 自动部署网站。

最早开始用 Hexo 时，我还不擅长读文档，喜欢看别人的教程。有了一年多使用 Hexo 的经历，以及半年多部署各种开源软件的经历，让我开始着手这一改进。

更新 NPM 依赖

我使用的是 Next 主题，它有两种安装方式：Git 和 NPM。之前一直使用 Git，但是它不方便更新。使用 NPM 安装 Next 主题后，整个 Hexo 项目的依赖都被 NPM 统一管理了。

更新命令

更新依赖版本的命令如下：

先安装 NCU：

1	npm install -g npm-check-updates

检查更新，并安装新版本：

1 2	ncu -u npm install

之后根据提示，可能要运行：

1	npm audit fix

Dependabot

hexo init 创建的目录中自带 Dependabot 文件。GitHub Dependabot 可以定期检查更新，使用的话可以更加自动化，但是我刻意没有使用。因为和 Maven、NuGet 等依赖管理不同，NPM 还有一个 package-lock.json 文件。Dependabot 只会更新前者的依赖版本号，而后者会连同 package-lock.json 一起被更新，改坏了怎么办？

我也不太懂 NPM，因此还是手动地运行上面的命令吧。

编写构建脚本

我不满意 Hexo 对于 Markdown 图片引用的处理，因为它要求按照它的规则将图片放在特定的位置，而不是根据 Markdown 中的图片链接本身。结果是，我按照正常流程写作，写完还要按照 Hexo 的规则再调整。

这一点 Material for MkDocs 做得非常好，它可以自动处理图片链接。但它是个文档框架，尽管有博客插件，整体布局仍然偏向文档，不适合作为博客。

为了在写作时不去关心这种无聊的事，我编写了一个脚本处理图片链接。

思路

我的配置是：对于每一个 Markdown 文件，它的图片被保存在 ./MDImgs/${filename}。例如，source/_posts/Glances-Docker.md 中的图片都在 source/_posts/MDImgs/Glances-Docker 中。

运行 hexo generate 后，对应的路径为 public/2024/04/08/Glances-Docker/index.html。而生成的 HTML 中的图片链接仍然是相对链接。Hexo 默认不会帮我们复制图片，因此我们需要将图片复制到对于路径。

路径中的日期就是博文的创建时间，可以通过 frontmatter 中的 date 字段获得。

代码

将 hexo clean 和 hexo generate 一并通过脚本调用，之后直接通过这个脚本完成所有构建工作。

import os
from os.path import join
import frontmatter
import shutil

source_path = r"./source/_posts"
target_path = r"./public"

os.system("npx hexo clean")
os.system("npx hexo generate")

for f in os.listdir(source_path):
    if not f.endswith(".md"):
        continue
    basename = f.removesuffix(".md")

    p = join(source_path, f)

    with open(p, "r", encoding="utf-8") as file:
        post = frontmatter.loads(file.read())
        date = post["date"].strftime("%Y-%m-%d").split('-')

    dir1 = join(source_path, "MDImgs", basename)

    if not os.path.exists(dir1):
        continue

    dir2 = join(
        target_path, date[0], date[1], date[2], basename, "MDImgs"
    )

    if not os.path.exists(dir2):
        os.makedirs(dir2)

    dir3 = join(dir2, basename)
    shutil.copytree(dir1, dir3)

print("copy image task done")

自动部署网站

可以看出，构建脚本是整个自动化流程的重点。那为什么要使用 GitHub Actions 完成自动部署呢？一方面，我认为使用 Actions 是实现自动化的标志；另一方面，随着博文数量越来越多，部署时间明显变长了。此外，文档里还提到了两点好处:

Edit the file directly online, effective immediately

Automatic deployment, simultaneous deployment to multiple locations

文档里给了一个 workflow 的例子，但它的主要缺点是只能部署到当前仓库的 GitHub Page。GH Page 的可见性与仓库的可见性相同。因此，如果要让别人看到博客，就需要把博客开源，显然不合理。我使用一个私有仓库写博客，然后用一个公有仓库的 GH Page 来部署博客。

Workflow

name: Deploy Hexo site to Pages

on:
  push:
    branches: [master]
  workflow_dispatch:

env:
  # deploy to which repo
  TARGET_REPO: 目标仓库名
  # same as timezone in _config.yml
  TZ: Europe/Berlin

jobs:
  # Build job
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Change TZ
        run: |
          sudo timedatectl set-timezone $TZ
          timedatectl
      - name: Checkout
        uses: actions/checkout@v4

      - name: Use Node.js 18
        uses: actions/setup-node@v4
        with:
          node-version: "18"

      - name: Set up Python 3.10
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'

      - name: Install Dependencies
        run: | 
          npm install
          pip install -r requirements.txt

      - name: Build Hexo Site
        run: |
          python ./hexoBuild.py

      - name: Deploy to GH Pages
        env:
          PAT: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          mv public /tmp/public
          cd /tmp/public

          git init
          git config --local user.name "github-actions[bot]"
          git config --local user.email "github-actions[bot]@users.noreply.github.com"
          git remote add target https://$PAT@github.com/用户名/$TARGET_REPO.git

          git add .
          git commit -m "Deploy updated blog"
          git push --force --set-upstream target master

修改时区

为什么要修改时区？—— Hexo 的 bug：尽管在 _config.yml 中可以配置 timezone 属性，但生成的 HTML 的路径中包含的时间用的是机器的时区。后果是，上面的构建脚本可能将图片拷贝到错误的路径（时间差一天），图片依旧无法显示。

GitHub Actions 的机器默认都是零时区，因此要调整到与 _config.yml 中的相同。

PAT

既然是从一个仓库访问另一个仓库，就需要 token，具体来说是 Settings | Developer Settings 中的 Personal Access Tokens（PAT）。Fine-grained PAT 虽然目前处于 beta 阶段，权限控制明显比 classic 的好用。

申请一个 Fine-grained PAT，要访问的目录为部署的目标目录（公有），权限为“Read and Write access to code”即可。将这个 PAT 加入到源码仓库（私有）的 Actions secrets 中，取名 PERSONAL_ACCESS_TOKEN。

部署命令

下面解释 Deploy to GH Pages 这一步骤：

为什么不用 hexo deploy？因为每次部署实际上就是把 public 重新提交一次。public 目录是 Artifact，体积很大，却要用 Git 来管理，不合适。因此，git push --force --set-upstream target master 的目的就是保证目标目录中始终只有一条提交记录，即最新的 Artifact，不会浪费空间。

前面的步骤没什么特别的，就是新建并配置一个 Git 仓库。通过 PAT 访问其他仓库的方式就是使用 https://$PAT@github.com/用户名/仓库名.git 作为远程地址。

目标仓库为 master 分支配置了 GH Pages。当有新的提交时，会使用 GitHub 内置的 Workflow 来部署。我不想使用 actions/deploy-pages，感觉不如内置的靠谱。

序言