the Architecture of Azure Pipelines

Introduction

Azure Pipelines is a component of Azure DevOps (a SaaS platform) and serves as an automated CI/CD pipeline. Similar technologies include GitHub Actions and Jenkins.

As users, we only need to define various tasks in YAML, trigger the pipeline, and Azure Pipelines automatically executes these tasks for us.

So, how are tasks executed? Is there a limit to parallel tasks? How can we design YAML more efficiently to make a pipeline run faster? This article will analyze the architecture of Azure Pipelines from the perspective of task scheduling, providing answers to these questions along the way.

Azure Pipelines Terminology

img

For users of Azure Pipelines, a YAML file uniquely defines the logic of a pipeline run:

  • A pipeline can contain multiple stages.
  • A stage can contain multiple jobs.
  • A job can contain multiple steps.
  • A step can contain multiple tasks.

For Azure Pipelines itself, when a pipeline is triggered, it needs to invoke computing resources to execute the pipeline’s logic. Specifically, a machine (agent) is responsible for executing a job.

A job runs on only one agent, and it is not further divided. However, all jobs of a pipeline may be assigned to different agents for execution.

How is this distribution of jobs accomplished?

Azure Pipelines Task Scheduling

Azure Pipelines task scheduling follows a typical distributed task scheduling model (similar technologies include QUARTZ and xxl-job):

  • Azure Pipelines has a ‘Task Queue’ containing all pending jobs (referring to both the job/task in the task scheduling model and the job term in the pipeline).
  • The agent is a worker that periodically checks whether there are jobs to run in the ‘Task Queue’. If there are, some agent will compete to gain the right to execute that job while obtaining access to the relevant resources. At this point, it can start executing the job.

Therefore, a pipeline is executed as follows:

  1. Trigger the pipeline. Azure Pipelines parses YAML, splits it into multiple jobs, and stores them in the ‘Task Queue’.
  2. If there is an idle agent at this time (indicating that the ‘Task Queue’ was empty), it listens for the event that a new job is pushed to the queue and begins competing to execute that job with other agents (if any).
    • If successful, it starts execution.
    • If unsuccessful, it continues to listen.
  3. If there is no idle agent at this time (indicating that the ‘Task Queue’ previously had jobs or no agent completed its previously assigned job), these newly generated jobs will wait for an agent to execute them in an orderly manner (jobs are queued based on the time the pipeline was triggered, ensuring fairness).

Parallel Jobs

The task scheduling framework implies parallelism. This means that when defining YAML, steps or tasks without dependencies should be elevated to jobs, allowing them to run in parallel and reduce execution time.

However, during internships, few people do this, and most people use the default step level, even when many agents are often idle. 🤔

Now, since the immediate execution of a new job depends on the presence of idle agents, does increasing the number of agents improve overall throughput (the number of jobs completed in a unit of time)? Unfortunately, not really.

After all, Microsoft needs to profit from Azure, so they introduced the concept of ‘Parallel Jobs’, indicating a limit to the number of jobs that can run in parallel. This limit requires purchasing.

In other words, to run jobs in parallel, we not only need to purchase Azure VM agents but also Parallel Jobs. Even if we have many machines locally, we still need to buy Parallel Jobs.

Reference