Bootstrap-bridge as top-level job

The idea here is

 * all prod jobs are parented to the boostrap-bridge job (they have a
   hard dependency on this job).

 * the bootstrap-bridge job checks out the system-config source to the
   right place (the commit for a change, master HEAD for periodic). This
   was actually implemented in a prior change. We're just taking full
   advantage of it here.

 * bootstrap-bridge pauses once bridge is setup to the right place

 * the child jobs now don't have to worry about cloning system-config;
   they can be sure that it's at the right place for them.  they just
   need keys so their executor can log into bridge and run the
   playbooks against the production hosts

 * the bootstrap-bridge job is paused with a semaphore stopping any
   other runs jumping in.  in deployment, zuul is ordering it for us
   anyway.  so really this is stopping conflicts with the periodic
   jobs.

 * in theory - all the child production jobs could run in parallel
   while the boostrap jobs waits for them (modulo dependencies they
   have expressed; e.g. needing letsencyrpt or backup jobs to have
   run). To begin with we limit this with a second semaphore with a
   limit of 1. We can roll this out and check things mostly operate
   as they did before then bump the max value on this semaphore upwards
   to run things in parallel.

 * does this work?  I have no idea :) it seems difficult to test
   outside production because in the testing side everything is its
   own little world, there's no overarching bootstrap job.

Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/942740
Change-Id: I7d2c4737f900c9b964855c4d03ca58a2de2d60b3
This commit is contained in:
Ian Wienand 2025-02-21 15:30:18 +11:00 committed by Clark Boylan
parent 820bd2775a
commit d616ec9d9a
3 changed files with 66 additions and 45 deletions

View File

@ -3,3 +3,9 @@
- add-bastion-host
- import_playbook: ../bootstrap-bridge.yaml
- name: Wait for child jobs
zuul_return:
data:
zuul:
pause: true

View File

@ -2,14 +2,59 @@
# in projects.yaml because it's easier to keep an overall view of
# what's happening in there.
# Make sure only one run of a system-config playbook happens at a time
# Make sure only one run happens at a time. The deploy pipeline
# should keep things in order, but this is to stop perodic jobs
# jumping in.
- semaphore:
name: infra-prod-playbook
name: infra-prod-deployment
max: 1
# This semaphore limits the total number of production playbook
# jobs that can run on bridge at one time. We want things to run in
# parallel but we have a lot of jobs (particularly in the periodic
# pipeline) that we don't want to run all at once.
- semaphore:
name: infra-prod-playbook-limit
# TODO(clarkb) this semaphore allows us to stage the rollout of
# parallel infra-prod job exceution in two steps. First we reorganize
# everything but roughly keep the same behaviors as before (max: 1).
# When we are happy with that we can bump this to 2 or higher and see
# things run in parallel.
max: 1
- job:
name: infra-prod-bootstrap-bridge
parent: opendev-infra-prod-setup-src
semaphores: infra-prod-deployment
description: |
Configure the bastion host (bridge)
This job does minimal configuration on the bastion host
(bridge.openstack.org) to allow it to run system-config
playbooks against our production hosts. It sets up Ansible
and root keys on the host. It also synchronizes the
system-config repo from the executor to the bastion.
Note that this is separate to infra-prod-service-bridge;
bridge in it's role as the bastion host actaully runs that
against itself; it includes things not strictly needed to make
the host able to deploy system-config.
This job is the parent of all deployment jobs, and will pause
until they finish. This prevents conflicts between deployment
jobs from changes and periodic runs (which use HEAD of
master).
run: playbooks/zuul/run-production-bootstrap-bridge.yaml
# Do not set file matchers on this job. We must always run this job
# before any other infra-prod jobs to ensure system-config is up to
# date on bridge before we run our playbooks.
nodeset:
nodes: []
- job:
name: infra-prod-playbook
parent: opendev-infra-prod-base
parent: opendev-infra-prod-setup-keys
semaphores: infra-prod-playbook-limit
description: |
Run specified playbook against productions hosts.
@ -19,7 +64,6 @@
/home/zuul/src/opendev.org/opendev/system-config/playbooks
on the bastion host.
abstract: true
semaphores: infra-prod-playbook
run: playbooks/zuul/run-production-playbook.yaml
post-run: playbooks/zuul/run-production-playbook-post.yaml
required-projects:
@ -30,41 +74,12 @@
infra_prod_playbook_encrypt_log: true
nodeset:
nodes: []
- job:
name: infra-prod-bootstrap-bridge
parent: opendev-infra-prod-setup-src
description: |
Configure the bastion host (bridge)
This job does minimal configuration on the bastion host
(bridge.openstack.org) to allow it to run system-config
playbooks against our production hosts. It sets up Ansible
and root keys on the host. It also synchronizes the system-config
repo from the executor to the bastion. This is necessary to
emit an up to date known_hosts file when adding new hosts to
the inventory.
Note that this is separate to infra-prod-service-bridge;
bridge in it's role as the bastion host actaully runs that
against itself; it includes things not strictly needed to make
the host able to deploy system-config.
# While we don't run the infra-prod-playbook in this job we do run
# system-config git repo updates. Until we're ready to stop running
# system-config updates in every job we use this semaphore to ensure
# exclusivity.
semaphores: infra-prod-playbook
run: playbooks/zuul/run-production-bootstrap-bridge.yaml
files:
- playbooks/bootstrap-bridge.yaml
- playbooks/zuul/run-production-bootstrap-bridge.yaml
- playbooks/zuul/run-production-bootstrap-bridge-add-rootkey.yaml
- playbooks/roles/install-ansible/
- playbooks/roles/root-keys/
- inventory/base/hosts.yaml
- inventory/service/group_vars/bastion.yaml
nodeset:
nodes: []
dependencies:
- name: infra-prod-bootstrap-bridge
# This is a hard dependency because we require the bootstrap job to
# have run before we start any playbook jobs, otherwise our buildset
# would not hold the bridge semaphore and we may not have the correct
# system-config state on bridge.
- job:
name: infra-prod-base

View File

@ -340,7 +340,10 @@
# NOTE: infra-prod-* jobs have a hierarchy below that ensure
# they can run in parallel. We are deliberately keeping their
# dependencies here rather than job definitions to help keep
# these relationships clear.
# these relationships clear. The one exception to this is the
# base infra-prod-playbook job depends on infra-prod-bootstrap-bridge.
# We make this exception because it is vital that bootstrap-bridge
# run before everything else always.
# This installs the ansible on bridge that all the infra-prod
# jobs will run with. Note the jobs use this ansible to then
@ -348,10 +351,7 @@
- infra-prod-bootstrap-bridge
# From now on, all jobs should depend on base
- infra-prod-base: &infra-prod-base
dependencies:
- name: infra-prod-bootstrap-bridge
soft: true
- infra-prod-base
# Legacy puppet hosts
- infra-prod-remote-puppet-else: &infra-prod-remote-puppet-else
@ -633,7 +633,7 @@
# Nightly runs of ansible things for catchup
# Keep in order from above
- infra-prod-bootstrap-bridge
- infra-prod-base: *infra-prod-base
- infra-prod-base
- infra-prod-remote-puppet-else: *infra-prod-remote-puppet-else
- infra-prod-letsencrypt: *infra-prod-letsencrypt
- infra-prod-service-bridge: *infra-prod-service-bridge