Hexo Workflow Automation

Introduction

Recently, I found the process of writing blog posts with Hexo a bit cumbersome, so I wanted to automate it. The workflow automation includes: updating NPM dependencies, writing build scripts to make up for some deficiencies in Hexo, and using GitHub Actions to automatically deploy the site.

When I first started using Hexo, I wasn’t good at reading documentation and preferred following tutorials. After more than a year of using Hexo and over six months of deploying various open-source software, I decided to work on this improvement.

Updating NPM Dependencies

I use the NexT theme, which can be installed in two ways: Git and NPM. I had been using Git, but it was inconvenient to update. After switching to NPM to install the NexT theme, the dependencies of the entire Hexo project are managed uniformly by NPM.

Updating Command

The commands to update dependencies are as follows:

First, install NCU:

1
npm install -g npm-check-updates

Check for updates and install new versions:

1
2
ncu -u
npm install

Then, based on the prompts, we might need to run:

1
npm audit fix

Dependabot

The directory created by hexo init comes with a Dependabot file. GitHub Dependabot can periodically check for updates and automate this process. However, I intentionally did not use it. Unlike dependency managers such as Maven and NuGet, NPM also has a package-lock.json file. Dependabot not only updates the versions in package.json, but it also updates package-lock.json. What if something breaks in package-lock.json?

I’m not very familiar with NPM, so I’d rather manually run the commands above.

Writing Build Scripts

I am not satisfied with how Hexo handles Markdown image references because it requires placing images in specific locations according to its rules rather than following the image links in Markdown itself. As a result, after finishing writing, I have to adjust according to Hexo’s rules.

This is something Material for MkDocs does very well. It can automatically handle image links. But it’s a documentation framework. Although it has a blog plugin, the overall layout is still inclined towards documentation, making it unsuitable for a blog.

To avoid worrying about such trivial matters while writing, I wrote a script to handle image links.

Approach

My setup is: for each Markdown file, its images are saved in ./MDImgs/${filename}. For example, images in source/_posts/Glances-Docker.md are all in source/_posts/MDImgs/Glances-Docker.

After running hexo generate, the corresponding path is public/2024/04/08/Glances-Docker/index.html. The generated HTML still contains relative image links. Since Hexo doesn’t copy images by default, we need to copy the images to the appropriate paths.

The date in the path is the creation time of the post, which can be obtained from the date field in the frontmatter.

Code

hexo clean and hexo generate are also called via the script so that it handles all the build tasks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import os
from os.path import join
import frontmatter
import shutil

source_path = r"./source/_posts"
target_path = r"./public"

os.system("npx hexo clean")
os.system("npx hexo generate")

for f in os.listdir(source_path):
if not f.endswith(".md"):
continue
basename = f.removesuffix(".md")

p = join(source_path, f)

with open(p, "r", encoding="utf-8") as file:
post = frontmatter.loads(file.read())
date = post["date"].strftime("%Y-%m-%d").split('-')

dir1 = join(source_path, "MDImgs", basename)

if not os.path.exists(dir1):
continue

dir2 = join(
target_path, date[0], date[1], date[2], basename, "MDImgs"
)

if not os.path.exists(dir2):
os.makedirs(dir2)

dir3 = join(dir2, basename)
shutil.copytree(dir1, dir3)

print("copy image task done")

Automatic Site Deployment

As we can see, the build script is the core of the entire automation workflow. So why use GitHub Actions for automatic deployment? On the one hand, I believe using Actions is a hallmark of achieving automation; on the other hand, as the number of posts increases, the deployment time becomes noticeably longer. Additionally, the documentation mentions two more benefits:

  • Edit the file directly online, effective immediately
  • Automatic deployment, simultaneous deployment to multiple locations

The documentation provides a workflow example, but its main drawback is that it can only deploy to the GitHub Page of the current repository. The visibility of a GH Page is the same as the repository. Therefore, if we want others to see our blog, we have to make it public, which is obviously unreasonable. I use a private repository to write my blog, and then use a public repository’s GH Page to deploy the blog.

Workflow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
name: Deploy Hexo site to Pages

on:
push:
branches: [master]
workflow_dispatch:

env:
# deploy to which repo
TARGET_REPO: target-repo-name
# same as timezone in _config.yml
TZ: Europe/Berlin

jobs:
# Build job
build:
runs-on: ubuntu-latest
steps:
- name: Change TZ
run: |
sudo timedatectl set-timezone $TZ
timedatectl
- name: Checkout
uses: actions/checkout@v4

- name: Use Node.js 18
uses: actions/setup-node@v4
with:
node-version: "18"

- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install Dependencies
run: |
npm install
pip install -r requirements.txt

- name: Build Hexo Site
run: |
python ./hexoBuild.py

- name: Deploy to GH Pages
env:
PAT: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
run: |
mv public /tmp/public
cd /tmp/public

git init
git config --local user.name "github-actions[bot]"
git config --local user.email "github-actions[bot]@users.noreply.github.com"
git remote add target https://$PAT@github.com/username/$TARGET_REPO.git

git add .
git commit -m "Deploy updated blog"
git push --force --set-upstream target master

Time Zone Adjustment

Why adjust the time zone? — Hexo bug: Although the timezone attribute can be configured in _config.yml, the time in the generated HTML paths uses the machine’s time zone. As a result, the build script above might copy images to the wrong path (one day difference), and the images still won’t display.

The time zone of the machines in GitHub Actions are UTC by default, so we need to adjust it to match the one in _config.yml.

PAT

Since it involves accessing one repository from another, a token is needed, specifically a Personal Access Token (PAT) from Settings | Developer Settings. Although Fine-grained PATs are still in beta, they offer more precise permission control than the classic ones.

Generate a new Fine-grained PAT and set “Repository access” to the (public) deployment directory, with “Read and Write access to code” permission. Add this PAT to the Actions secrets of the (private) source repository and name it as PERSONAL_ACCESS_TOKEN.

Deployment Command

The Deploy to GH Pages step explained:

Why not use hexo deploy? Because each deployment is essentially committing the public directory and then pushing it. It is a quite large artifact, therefore not suitable to be managed by Git. Hence, the git push --force --set-upstream target master command ensures that there is always only one commit in the target directory, i.e. the latest artifact, to save space.

The earlier steps are nothing special; they just create and configure a Git repository. To access another repository via PAT, use https://$PAT@github.com/username/repository-name.git as the remote address.

The target repository is configured with GH Pages for the master branch. When there is a new commit, GitHub’s built-in workflow is used to deploy it. I chose not to use actions/deploy-pages because the built-in workflow feels more reliable.

References