diff --git a/README.md b/README.md index a409816..e6e0c5d 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,24 @@ # capi-helm-charts -This repository contains Helm charts that can be used to deploy Cluster API resources. +This repository contains [Helm charts](https://helm.sh/) for deploying [Kubernetes](https://kubernetes.io/) +clusters using [Cluster API](https://cluster-api.sigs.k8s.io/). + +The charts are available from the `stackhpc.github.io/capi-helm-charts` repository: + +```sh +helm repo add capi https://stackhpc.github.io/capi-helm-charts +helm install my-release capi/ [...options] +``` + +To list the available versions for the charts: + +```sh +helm search repo capi --devel --versions +``` + +Currently, the following charts are available: + +| Chart | Description | +| --- | --- | +| [cluster-addons](./charts/cluster-addons) | Deploys addons into a Kubernetes cluster, e.g. CNI. | +| [openstack-cluster](./charts/openstack-cluster) | Deploys a Kubernetes cluster on an OpenStack cloud. | diff --git a/charts/cluster-addons/README.md b/charts/cluster-addons/README.md new file mode 100644 index 0000000..27c80f1 --- /dev/null +++ b/charts/cluster-addons/README.md @@ -0,0 +1,229 @@ +# cluster-addons chart + +This [Helm chart](https://helm.sh/) manages the deployment of addons for a +[Kubernetes](https://kubernetes.io) cluster. It is primarily intended to be used with +the cluster management charts from this repository, e.g. +[openstack-cluster](../openstack-cluster), but should work for any Kubernetes cluster. + +The addons are deployed by launching +[Kubernetes jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/) on the +target cluster, each of which is responsible for installing or updating a single addon. +The jobs use the [utils image](../../utils) from this repository, which bundles some +useful tools like [jq](https://stedolan.github.io/jq/), +[kubectl](https://kubernetes.io/docs/reference/kubectl/overview/), +[kustomize](https://kustomize.io/) and [helm](https://helm.sh), and the jobs execute +with full permissions on the cluster using the `cluster-admin` cluster role. This is +used rather than a more restrictive role for a few reasons: + + 1. This chart provides a mechanism to apply custom addons, and there is no way to + know in advance what resources those custom addons may need to manage. + 1. Addons may need to manage + [CRD](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) + instances that are not covered by a more restrictive role. + 1. Several addons need to create + [RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) resources, + and so could elevate their permissions anyway by creating new roles. + +There are two patterns used in this chart for managing addons: + + 1. Manifests are pulled from a URL and run through `kustomize` before being applied + using `kubectl apply`. The manifests are **not** present in this repository. In + this case, the URL and kustomize configuration can be changed using the Helm values + if required, e.g. to change images from Docker Hub to another repository or to + point to an internal source if an air-gapped installation is required. + 1. Using a Helm chart. The chart to use is configured using Helm values rather + than Helm dependencies, which allows full control via configuration over which + repository is used (e.g. a mirror for an air-gapped installation) and which version + is installed. The Helm values for the addon are also exposed, and can be customised, + via the values for this chart. This chart sets sensible defaults. + +This chart also allows custom addons to be managed using the Helm values, either by +specifying manifest content inline, or by specifying a Helm chart to install with the +corresponding values. + +## Container Network Interface (CNI) plugins + +This chart can install either +[Calico](https://docs.projectcalico.org/about/about-calico) or +[Weave](https://www.weave.works/docs/net/latest/kubernetes/kube-addon/) as a +[CNI plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) +to provide the pod networking in a Kubernetes cluster. By default, the Calico CNI will be +installed. + +To switch the CNI to Weave, use the following in your Helm values: + +```yaml +cni: + type: weave +``` + +And to disable the installation of a CNI completely: + +```yaml +cni: + enabled: false +``` + +Additional configuration options are available for each - see [values.yaml](./values.yaml). + +## Cloud Controller Managers (CCMs) + +In Kubernetes, a +[Cloud Controller Manager (CCM)](https://kubernetes.io/docs/concepts/architecture/cloud-controller/) +provides integration between a Kubernetes cluster and the cloud platform that it is running on. +This enables things like the automatic labelling of nodes with cloud-specific information, +automatic configuration of hostnames and IP addresses, and managed load balancers for services. + +This chart can install the +[OpenStack CCM](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md) +to provided this integration for clusters running on an OpenStack cloud. + +By default, this chart does not deploy a CCM. To enable the OpenStack CCM on the target cluster, +use the following in your Helm values: + +```yaml +ccm: + enabled: true + type: openstack +``` + +To configure options for `[Networking]`, `[LoadBalancer]` and `[Metadata]` sections of the +[cloud-config](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md#config-openstack-cloud-controller-manager) +file you can use the Helm values, e.g.: + +```yaml +ccm: + openstack: + cloudConfig: + networking: + public-network-name: public-internet + loadBalancer: + lb-method: LEAST_CONNECTIONS + create-monitor: "true" + metadata: + search-order: metadataService +``` + +The `[Globals]` section is populated using the given `clouds.yaml` (see "OpenStack credentials" below). + +Additional configuration options are available for CCMs - see [values.yaml](./values.yaml). + +### OpenStack credentials + +OpenStack credentials are required for the Kubernetes OpenStack integrations to query and +manage OpenStack resources on behalf of the cluster. The recommended way to do this is using an +[Application Credential](https://docs.openstack.org/keystone/latest/user/application_credentials.html) +to avoid your password being in stored on the cluster. Application credentials are project-scoped, +and ideally you should use a separate application credential for each cluster in a project. + +For ease of use, this chart is written so that a `clouds.yaml` file can be given directly +to the chart as a configuration file. When an application credential is created in Horizon, +the corresponding `clouds.yaml` file can be downloaded, and should look something like this: + +```yaml +clouds: + openstack: + auth: + auth_url: https://my.cloud:5000 + application_credential_id: "" + application_credential_secret: "" + region_name: "RegionOne" + interface: "public" + identity_api_version: 3 + auth_type: "v3applicationcredential" +``` + +This file can then be passed to the chart using the `-f|--values` option, e.g.: + +```sh +helm install cluster-addons capi/cluster-addons --values ./clouds.yaml [...options] +``` + +## NVIDIA GPU operator + +This chart is able to install the +[NVIDIA GPU operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/overview.html) +to provide access to NVIDIA GPUs from Kubernetes pods using the +[device plugin framework](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/). + +When deployed, the GPU operator will detect nodes with NVIDIA GPUs and automatically install the +NVIDIA software components required to make the GPUs available to Kubernetes. This does not +require any special modifications to the image used to deploy the nodes. + +The GPU operator is not enabled by default. To enable it, use the following Helm values: + +```yaml +nvidiaGPUOperator: + enabled: true +``` + +Because of the automatic detection of nodes with GPUs, there is no need to manually label +nodes with GPUs. In the case where some nodes have GPUs and some do not, the GPU operator +will do the right thing without the need for manual intervention. + +Additional configuration options are available for the NVIDIA GPU operator - see +[values.yaml](./values.yaml). + +## Custom manifests + +This chart is able to manage the application of custom user-specified manifests to the +cluster using `kubectl apply`. This can be useful to install cluster-specific resources +such as additional +[storage classes](https://kubernetes.io/docs/concepts/storage/storage-classes/) +or [RBAC rules](https://kubernetes.io/docs/reference/access-authn-authz/rbac/). + +To apply custom manifests to the cluster as part of the addon installation, use something +similar to the following in your Helm values: + +```yaml +# This should be a mapping of filenames to manifest content +customManifests: + storageclass-standard.yaml: | + apiVersion: storage.k8s.io/v1 + kind: StorageClass + metadata: + name: standard + provisioner: my-storage-provisioner + + pod-reader.yaml: | + apiVersion: rbac.authorization.k8s.io/v1 + kind: ClusterRole + metadata: + name: pod-reader + rules: + - apiGroups: [""] + resources: ["pods"] + verbs: ["get", "watch", "list"] +``` + +## Custom Helm charts + +In addition to simple custom manifests, this chart is also able to manage additional +cluster-specific Helm releases. + +To deploy a custom Helm release as part of the addon installation, use something similar +to the following in your Helm values: + +```yaml +customHelmReleases: + # This is the name of the release + my-wordpress: + chart: + # The repository that the chart is in + repo: https://charts.bitnami.com/bitnami + # The name of the chart + name: wordpress + # The version of the chart to use + # NOTE: THIS IS REQUIRED + version: 12.1.6 + # The namespace for the release + # If not given, this defaults to the release name + namespace: wordpress + # The amount of time to wait for the chart to deploy before rolling back + timeout: 5m + # The values for the chart + values: + wordpressUsername: jbloggs + wordpressPassword: supersecretpassword + wordpressBlogName: JBloggs Awesome Blog! +``` diff --git a/charts/cluster-addons/values.yaml b/charts/cluster-addons/values.yaml index 710d713..95fa4e0 100644 --- a/charts/cluster-addons/values.yaml +++ b/charts/cluster-addons/values.yaml @@ -66,7 +66,7 @@ cni: # Settings for the cloud controller manager for external cloud providers ccm: # Indicates if an external cloud controller manager should be deployed - enabled: true + enabled: false # The type of the external CCM to deploy - currently only OpenStack is supported type: openstack # Settings for the OpenStack cloud controller manager @@ -102,7 +102,7 @@ ccm: # Settings for the NVIDIA GPU operator nvidiaGPUOperator: # Indicates if the NVIDIA GPU operator should be enabled - enabled: true + enabled: false chart: repo: https://nvidia.github.io/gpu-operator name: gpu-operator diff --git a/charts/openstack-cluster/README.md b/charts/openstack-cluster/README.md new file mode 100644 index 0000000..04095a0 --- /dev/null +++ b/charts/openstack-cluster/README.md @@ -0,0 +1,208 @@ +# openstack-cluster chart + +This [Helm chart](https://helm.sh/) manages the lifecycle of a [Kubernetes](https://kubernetes.io) +cluster on an [OpenStack](https://www.openstack.org/) cloud using +[Cluster API](https://cluster-api.sigs.k8s.io/). + +As well as managing the Cluster API resources for the cluster, this chart optionally +manages addons for the cluster using Kubernetes jobs. Some of these are required for +a functional cluster, e.g. a +[Container Network Interface (CNI) plugin](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) +and the +[OpenStack Cloud Controller Manager (CCM)](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md), and +others are optional. + +> See the [cluster-addons chart](../cluster-addons) for more details about the addons +> that can be installed. + +This README describes some of the basic options, however there are many other options +available. Check out the [values.yaml](./values.yaml) (commented) and the chart +templates for more details. + +## Prerequisites + +First, you must set up a +[Cluster API management cluster](https://cluster-api.sigs.k8s.io/user/concepts.html#management-cluster) +with the [OpenStack Infrastructure Provider](https://github.com/kubernetes-sigs/cluster-api-provider-openstack) +installed. + +In addition, Helm must be installed and configured to access your management cluster, +and the chart repository containing this chart must be configured: + +```sh +helm repo add capi https://stackhpc.github.io/capi-helm-charts +``` + +## OpenStack images + +Cluster API uses an +[immutable infrastructure](https://www.hashicorp.com/resources/what-is-mutable-vs-immutable-infrastructure) +pattern where images are built with specific versions of the required +software installed (e.g. +[kubelet](https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/), +[kubeadm](https://kubernetes.io/docs/reference/setup-tools/kubeadm/)). + +Using this pattern, particularly with pre-built images, has some significant advantages, e.g.: + + * Creating, upgrading and (auto-)scaling of clusters is fast as the required software + is already available in the image. + * New images for operating system updates or new Kubernetes versions can be + built and tested before being rolled out onto a production cluster with confidence + that nothing has changed. + * Images can be built and tested once and shared by multiple clusters. + * Zero-downtime upgrades can be performed by replacing machines one at a time, + with rollback if the upgrade fails. + +Your cloud provider may use a centralised process to build, test and share suitable +images with all projects. If you need to build a suitable image, the +[Kubernetes Image Builder](https://image-builder.sigs.k8s.io/) project from the Cluster +Lifecycle SIG provides a tool for building images for use with Cluster API using +[QEMU](https://www.qemu.org/), [Packer](https://www.packer.io/) and [Ansible](https://www.ansible.com/). + +## OpenStack credentials + +OpenStack credentials are required for two purposes: + + 1. For Cluster API to manage OpenStack resources for the workload cluster, e.g. networks, machines. + 2. For OpenStack integrations on the workload cluster, e.g. OpenStack CCM, Cinder CSI. + +By default, this chart uses the same credentials for both. + +The recommended way to do this is using an +[Application Credential](https://docs.openstack.org/keystone/latest/user/application_credentials.html) +to avoid your password being in stored on both the management and workload clusters. +Application credentials are project-scoped, and ideally you should use a separate +application credential for each cluster in a project. + +For ease of use, this chart is written so that a `clouds.yaml` file can be given directly +to the chart as a configuration file. When an application credential is created in Horizon, +the corresponding `clouds.yaml` file can be downloaded, and should look something like this: + +> WARNING +> +> The Cluster API OpenStack provider currently requires that the `project_id` is present, +> which you will need to add manually. + +```yaml +clouds: + openstack: + auth: + auth_url: https://my.cloud:5000 + project_id: "" + application_credential_id: "" + application_credential_secret: "" + region_name: "RegionOne" + interface: "public" + identity_api_version: 3 + auth_type: "v3applicationcredential" +``` + +This file can then be passed to the chart using the `-f|--values` option, e.g.: + +```sh +helm install my-cluster capi/openstack-cluster --values ./clouds.yaml [...options] +``` + +## Managing a workload cluster + +In addition to the `clouds.yaml`, the following is a minimal configuration to deploy a +working cluster: + +```yaml +# The target Kubernetes version +kubernetesVersion: 1.22.1 + +# An image with the required software installed at the target version +machineImage: ubuntu-2004-kube-v{{ .Values.kubernetesVersion }} + +# The name of the SSH keypair to inject into cluster machines +machineSSHKeyName: jbloggs-keypair + +controlPlane: + # The flavor to use for control plane machines + # It is recommended to use a flavour with at least 2 CPU, 4GB RAM + machineFlavor: vm.small + +# A list of worker node groups for the cluster +nodeGroups: + - # The name of the node group + name: md-0 + # The flavor to use for the node group machines + machineFlavor: vm.xlarge + # The number of machines in the group + machineCount: 3 +``` + +To install or upgrade a cluster, use the following Helm command: + +```sh +helm upgrade my-cluster capi/openstack-cluster --devel --install -f ./clouds.yaml -f ./cluster-configuration.yaml +``` + +This will create a cluster on its own network with a three node, highly-available (HA) +control plane, a load-balancer for the Kubernetes API with a floating IP attached +and a single worker group with three nodes. + +To inspect the progress of the cluster deployment, you can use the +[clusterctl CLI](https://cluster-api.sigs.k8s.io/clusterctl/overview.html): + +```sh +clusterctl describe cluster my-cluster +``` + +To update the cluster, just modify the configuration as required and run the above +command again. Some examples of updates that can be performed are: + + * Adding and removing node groups. A cluster can have several node groups, and + each node group can have a different flavor and machine count. + * Scaling the cluster. Change the machine count for the required node group(s) + to add or remove machines. + * Changing the image to update system packages or upgrade Kubernetes. + Once a new image is available, change the machine image and Kubernetes version + as required to trigger a rolling upgrade of the cluster nodes. + +### Cluster addons + +The cluster addons are enabled by default, however by default only a CNI and the +OpenStack CCM are enabled. + +You can configure which addons are deployed and the configuration of those addons +by specifying values for the addons Helm chart: + +```yaml +addons: + values: + nvidiaGPUOperator: + enabled: true +``` + +The available options under `addons.values` correspond to the available options +for the [cluster-addons chart](../cluster-addons). + +The cluster addons also can be disabled completely using the following configuration: + +> **WARNING** +> +> If the cluster addons are disabled, you will need to manually install a CNI +> plugin and the OpenStack Cloud Controller Manager before the cluster deployment +> will complete successfully. + +```yaml +addons: + enabled: false +``` + +Note that changing this after the initial deployment will **not** uninstall any +addons that have already been installed, but it will prevent updates to addons +from being applied. + +## Accessing a workload cluster + +To access the cluster, use `clusterctl` to generate a kubeconfig file: + +```sh +# Generate a kubeconfig and write it to a file +clusterctl get kubeconfig my-cluster > kubeconfig.my-cluster +# Use that kubeconfig to list pods on the workload cluster +kubectl --kubeconfig=./kubeconfig.my-cluster get po -A +``` diff --git a/charts/openstack-cluster/values.yaml b/charts/openstack-cluster/values.yaml index 40581ad..c8f8c83 100644 --- a/charts/openstack-cluster/values.yaml +++ b/charts/openstack-cluster/values.yaml @@ -72,7 +72,7 @@ controlPlane: # The flavor to use for control plane machines machineFlavor: # The kubeadm config specification for the control plane - # By default, this uses a simple configuration that just enables the external cloud provider + # By default, this uses a simple configuration that enables the external cloud provider kubeadmConfigSpec: initConfiguration: nodeRegistration: @@ -101,7 +101,7 @@ nodeGroupDefaults: machineFlavor: # The default kubeadm config specification for worker nodes # This will be merged with any configuration given for specific node groups - # By default, this uses a simple configuration that just enables the external cloud provider + # By default, this uses a simple configuration that enables the external cloud provider kubeadmConfigSpec: joinConfiguration: nodeRegistration: @@ -138,4 +138,8 @@ addons: # Values for the addons # See https://github.com/stackhpc/capi-helm-charts/blob/main/charts/cluster-addons for details # The clouds.yaml used for cluster deployment will be given in addition to these - values: {} + values: + # By default, enable the OpenStack CCM + ccm: + enabled: true + type: openstack