56 Commits

Author SHA1 Message Date
Sabyasachi Nayak
f61e33f6e1 update vault helm chart to 0.25.0
Replace references of 0.24.1 with 0.25.0.  Refresh the patches for
vault-manager and agent image reference. Update the image tags to match new vault chart. The vault helm chart uses vault server 1.14.0 version. The latest version of the vault server in the 1.14.x series is 1.14.8. Verified that the changes between vault v1.14.0 and v1.14.8 tags most of them are 'backport'', "cherry-pick" of commits i:e bug fixes. So used 1.14.8 version of vault sever.

Test plan:
 PASSED AIO-sx and Standard 2+2
 PASSED vault aware and un-aware applications
 PASSED HA tests
 PASSED test image pulls from private registry with external network
      restriction

story: 2010393
Task: 49391

Change-Id: I6bd022fed79ead6e1dc224e323a179d1dcd3ab0f
Signed-off-by: Sabyasachi Nayak <sabyasachi.nayak@windriver.com>
2024-01-10 17:47:38 +00:00
Tae Park
857fedecc6 Issue a Warning for Vault-Manager PVC Storage
This commit adds an additional check for PVC storage for vault-manager
after PVC-to-k8s conversion. If the storage is found then it will log a
warning during start-up of vault manager.

Test Plan:
PASS bashate
PASS AIO-SX vault sanity
PASS New code issues logs only when the PVC storage persists after
     conversion

Story: 2010930
Task: 49293

Change-Id: I2d669b06927b9d396ce5d6e582983ab78a3cc5fc
Signed-off-by: Tae Park <tae.park@windriver.com>
2023-12-18 16:53:33 -05:00
Michel Thebeau
494edafaa9 Remove hardcoded vault and sva-vault
The vault namespace and full-name are in variables and should not have
been hardcoded.

Test Plan:
PASS  bashate of rendered init.sh
PASS  vault sanity
PASS  all affected code paths

Story: 2010930
Task: 49232

Change-Id: I1c4765b907ce8ce4200e98575922467edb34e9fd
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-12-15 15:40:47 +00:00
Michel Thebeau
1aa869135b Fix removal of rekey milestone secrets
When vault-manager is killed during finalizeRekey the k8s secrets may
not be deleted.  Especially: the kubectl command deleting multiple
secrets may be interrupted.

It is unclear in what order kubectl/k8s would delete the secrets when
they are specified in a single command - i.e., it is observed to be a
different order than what was specified.  Use one kubectl command for
each milestone secret.

Use cluster-rekey-audit as the final milestone.  Fix needsRekey to allow
the procedure to resume as long as cluster-rekey-audit persists.

Also adjust some comments and remove some chatty logs.

Test Plan:
PASS  bashate of rendered init.sh
PASS  vault sanity, including rekey
PASS  application-update
PASS  kubectl delete vault-manager pod tests
PASS  kill -9 vault-manager tests

Story: 2010930
Task: 49174

Change-Id: I2e5e15b4f89f9f9495381d33064c631cde6da193
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-12-15 15:40:37 +00:00
Tae Park
65b38b925d Prevent multiple vault-manager pods from acting
This commit adds new check in the main loop of vault manager
for multiple instances of vault manager. Only one vault manager is
needed, so it will be put on sleep or be
terminated until only one is left

Story: 2010930
Task: 49199

Test Plan:
PASS Bashate
PASS Vault sanity test

Change-Id: I0fd881aa4078528ba3f804087db87069dae58f7e
Signed-off-by: Tae Park <tae.park@windriver.com>
2023-12-13 19:58:07 +00:00
Michel Thebeau
be0e85ec77 stability fixes for vault-manager rekey
Continue/complete the rekey procedure when vault-manager is interrupted
(kill -9). Fixes include:
  - Refactor logic of rekeyRecover function
  - additionally handle specific failure scenarios to permit the rekey
    procedure to continue
  - correct return codes of procedure functions to fall through to the
    recovery procedure
  - resort the tests of needsShuffle
  - misc adjustment of logs and comments

The additional handling of failure scenarios includes:
  - partial deletion of cluster-rekey secrets after copying to
    cluster-key
  - restart rekey on failure during authentication

Test Plan: PASS  vault sanity, ha sanity
PASS  IPv4 and IPv6
PASS  system application-update, and platform application update
PASS  rekey operation without interuption
PASS  bashate the rendered init.sh

Stability testing includes kubectl deleting pods and kill -9 processes
during rekey operation at intervals spread across the procedure, with
slight random time added to each interval

PASS  delete a standby vault server pod
PASS  delete the active vault server pod
PASS  delete the vault-manager pod
PASS  delete the vault-manager pod and a random vault server pod
PASS  delete the vault-manager pod and the active pod
PASS  delete the vault-manager pod and a standby pod
PASS  kill -9 vault-manager process
PASS  kill -9 active vault server process
PASS  kill -9 standby vault server process
PASS  kill -9 random selection of vault and vault-manager processes

Story: 2010930
Task: 49174

Change-Id: I508e93a36de9ca8b4c8fa1da7941fe49936de159
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-12-07 13:30:32 +00:00
Michel Thebeau
615d6e4657 use the vault-manager image stx.9.0-v1.28.4
This new image adds uuidgen and multiple versions of kubectl, which
vault-manager now supports.

Test Plan:
PASS  sanity test of vault application
PASS  watch vault-manager log over kubernetes upgrade

Depends-On: Ib0a105306cecb38379f9d28a70e83ed156681f08
Depends-On: I03e37af31514c3fa3b95e0560a6d6f83879ec9de

Story: 2010930
Task: 49177

Change-Id: I7f578ac7e8d2aab98fb1e104f336fd750d7d7933
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-12-06 21:23:55 +00:00
Michel Thebeau
733ca0e9a6 Add multiple version support of kubectl
Allow vault-manager to pick the version of kubectl that matches the
currently running server.  Add a helm override option to pick a
particular version available within the image.

Refresh the helm chart patches on top of this change.

Test Plan:
PASS  Unit test the code
PASS  helm chart override
PASS  sanity of vault application
PASS  watch vault manager log during kubernetes upgrade

Story: 2010930
Task: 49177

Change-Id: I2459d0376efb6b7e47a25f59ee82ca74b277361f
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-12-04 20:46:29 +00:00
Michel Thebeau
65e8589183 add vault rekey option during upgrade
Allow the vault to be rekeyed after conversion from PVC storage to k8s
storage of the shard secrets.

Update the vault-manager patch to include rekey enable/disable and
timing parameters in helm values.yaml. Refresh the other patches
(include git long log descriptions in those patch files omitting
description).

Test Plan:
PASS  vault sanity, ha sanity
PASS  IPv4 and IPv6
PASS  system application-update, and platform application update
PASS  rekey operation without interuption
PASS  helm chart options
PASS  bashate the rendered init.sh

Stability testing includes kubectl deleting pods and kill -9 processes
during rekey operation at intervals spread across the procedure, with
slight random time added to each interval

PASS  delete a standby vault server pod
PASS  delete the active vault server pod
PASS  delete the vault-manager pod
PASS  delete the vault-manager pod and a random vault server pod
PASS  delete the vault-manager pod and the active pod
PASS  delete the vault-manager pod and a standby pod
TBD  kill -9 vault-manager process
TBD  kill -9 active vault server process
TBD  kill -9 standby vault server process
TBD  kill -9 random selection of vault and vault-manager processes

Story: 2010930
Task: 48850

Change-Id: I87911819c27caaf30be69b3c969a20ed97be42cb
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-12-04 19:21:11 +00:00
Michel Thebeau
dfcfa46061 improve error handling in vaultInitialized
A rare condition can result in vault servers not responding to this
early initialization status check.  The omission has no effect after
vault is initialized, but fails the application if it happens before
vault is initialized.

Test Plan:
PASS  Unit test the changes
PASS  vault sanity

Story: 2010930
Task: 49168

Change-Id: I6b5270f89ccea27f6c10edc6e1bc250b248f4054
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-12-04 19:21:07 +00:00
Michel Thebeau
8c6d86ea3b improve error handling of unsealVault
Add generic and specific error handling for unsealVault function.
Changes include:

  Recognize unseal success from the API response
  Recognize and stop unseal procedure if the response indicates
    authentication failure
  Always 'reset' unseal in progress, if any
  Recognize if the requested server is already unsealed
  Handle return code from vaultAPI function
  Remove key_error check as it is printed as DEBUG by vaultAPI
  Refactor reused variables to be less specific

Test Plan:
PASS  unit test the function
PASS  vault sanity including HA test

Story: 2010930
Task: 49167

Change-Id: If55589d207bbb374a6137922f62e2d494278e72c
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-11-30 21:17:34 +00:00
Michel Thebeau
8669743ae2 add vault-manager pause debugging option
A debug feature to allow vault manager function to be paused.  Use case
may include setting up specific conditions for test.

Include a helm override for initial pause condition, which may be
difficult to reach as a pod starts.

Test Plan:
PASS  vault sanity
PASS  unit test the pause_on_trap code, helm override
PASS  misc usage of the option

Story: 2010930
Task: 49048

Change-Id: Icd69a79685427268d7d59b3fbe655b9b93e8ece8
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-11-13 19:04:30 +00:00
Michel Thebeau
f2d02300a9 add interactive mode for init.sh
Allow the init.sh script to be sourced by an author to permit
development and test activity.

Test Plan:
PASS vault sanity test
PASS enter vault-manager pod and source init.sh
PASS bashate on the rendered script

Story: 2010930
Task: 49047

Change-Id: I899dcf6df793ee69b51b63a8b214320282d091fa
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-11-13 19:02:36 +00:00
Michel Thebeau
c91580ebd2 add generic function for vault REST API calls
Replace curl REST API calls with a generic function. This prepares for
adding more functionality to vault manager; more REST API calls.

Main feature includes error/debug logging of the responses.

Also includes:
Define variables for server targets
Refactor ubiquitous global 'row' variable, covert to parameter
Explicitly declare the curl's default connect-timeout (120s)

Test plan:
PASS vault sanity
PASS vault HA test
PASS all code paths with REST API calls
PASS misc examples GET, POST, DELETE
PASS unit test the new function
PASS bashate of the rendered document

Story: 2010930
Task: 49042

Change-Id: Ic329f075ba1c0480f5d507f9768f76fa86fc2094
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-11-13 19:02:25 +00:00
Michel Thebeau
464f9d0e76 Conversion of storage during application update
Add lifecycle code to read secrets from PVC mounted to running
vault-manager, and vault-manager code for conversion of storage from PVC
to k8s secrets.

The lifecycle code is added because the previous version of
vault-manager does not respond to SIGTERM from kubernetes for
termination.  And yet the pod will be terminating when the new
vault-manager pod runs.  Reading the PVC data in lifecycle code before
helm updates the charts simplifies the process when vault-manager is
running during application-update.

The new vault-manager also handles the case where the application is not
running at the time the application is updated, such as if the
application is removed, deleted, uploaded and applied.

In general the procedure for conversion of the storage from PVC to k8s
secrets is:
 - read the data from PVC
 - store the data in k8s secrets
 - validate the data
 - confirm the stored data is the same as what was in PVC
 - delete the original data only when the copy is confirmed

The solution employs a 'mount-helper', an incarnation of init.sh,
that mounts the PVC resource so that vault-manager can read it.  The
mount-helper mounts the PVC resource and waits to be terminated.

Test plan:
PASS  vault sanity
PASS  vault sanity via application-update
PASS  vault sanity update via application remove, delete, upload, apply
      (update testing requires version bump similar to change 881754)
PASS  unit test of the code
PASS  bashate, flake8, bandit
PASS  tox

Story: 2010930
Task: 48846

Change-Id: Iace37dad256b50f8d2ea6741bca070b97ec7d2d2
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-11-02 15:12:47 +00:00
Tae Park
ef1b8f663b Split key shards in json
Splitting key shard file gained from vault initialization into separate
files. Each key shard (plus root token) is now stored separately.

Test Plan:
PASS vault sanity
PASS bashate

Task: 2010930
Story: 48847

Change-Id: I8a007e505ea7ee9764301e494f4801a25cb194ce
Signed-off-by: Tae Park <tae.park@windriver.com>
2023-10-31 14:02:10 -04:00
Michel Thebeau
87bf94a0c5 fix print of new log level
When changing the log level, the logged new level was reported as the
log parameter rather than the configured log level.

Refactor the case statement as it's own function so it can be used in
several places.  Use the conversion function to print the correct
configured log level.

Also print the user friendly text when detecting invalid log level.

Misc other changes including comments, line lengths, local declarations.

Test Plan:
PASS  Unit test log functions
PASS  changing configured log level
PASS  bashate on the rendered init.sh

Story: 2010930
Task: 48842

Change-Id: I6c96f2e5193d722bb9e4cd32eb66c2cd2f65a503
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-10-26 14:03:13 +00:00
Michel Thebeau
3684074d88 exit_on_trap: adjust default and debug behaviours
It is observed that the exit_on_trap is not working under normal
operation - vault-manager takes 30 seconds to exit when the application
is removed.  The default behaviour is to exit when the trap file exists.
Not exiting when the trap file exists only occurs when the trap file has
content.  This latter behaviour is a debugging option.

Add the conditional for empty trap file, and always exit if the trap
file is empty.

For the debugging feature it is helpful for the procedure to remember
the trap number set in the trap file.  Use DEBUGGING_TRAP global
variable to remember the debugging trap requested, and exit whenever
that exit_on_call trap is run.

Also:
Refactor the parameter variable as 'trap' for readability. Adjust all
the logs for exit_on_trap to permit search for "exit_on_trap".  And log
at INFO level when exiting vault-manager.

Test Plan:
PASS  unit test exit_on_trap
PASS  default behavior; vault-manager responds promptly to termination
PASS  debug feature exits on matching trap (remembers debug trap)
PASS  bashate on the rendered init.sh

Story: 2010930
Task: 48843

Change-Id: Id67a89e063daa18ba7627553ac2a19ca673ff00b
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-10-25 19:58:13 +00:00
Tae Park
11c11f1e0c Support storing key shards in k8s secrets
Replaces current implementation of storing key shards from PVC to k8s
secret. Includes additional improvements to existing vm code, such as
added error checking.

Test Plan:
PASS vault sanity test
PASS bashate

Story: 2010930
Task: 48845

Change-Id: Ie0e5fe9749fa871d73d7b52600a8905abcb31887
Signed-off-by: Tae Park <tae.park@windriver.com>
2023-10-23 11:11:46 -04:00
Tae Park
9b9e0e8f62 Move Template Values to Top
Organizing the helm template vaules so that each call is unique. This
should allow for more efficient management of helm values within
init.sh. Also done some additional code clean up.

TEST PLAN:
PASS vault sanity test
PASS bashate

Story: 2010930
Task: 48844

Change-Id: Ia8e7820b9c86e307991f9affda7035bb89dfcc57
Signed-off-by: Tae Park <tae.park@windriver.com>
2023-10-11 16:46:13 -04:00
Tae Park
7267389382 Add manual exit points for catching SIGTERM
Adds new catch points for SIGTERM sent to the vault manager. Allows the
vault manager to exit gracefully. This includes debugging capability to
exit on a specific point.

Test Plan:
PASS vault sanity test
PASS bashate
PASS vault manager pod exits and restarts with no errors when an
     exit_on_trap file is created with the specificed exit point number.
     The file must be located in the vault manager work directory.

Story: 2010930
Task: 48843

Change-Id: I3dadccfca554b448d729d37132c8af17324368f1
Signed-off-by: Tae Park <tae.park@windriver.com>
2023-10-06 21:17:00 +00:00
Tae Park
a56dc6dfbb Add log levels to vault manager
Adding new log levels in vault manager to help diagnose logs. There are
five levels added (DEBUG, INFO, WARNING, ERROR, FATAL). Existing logs
are assigned to one of the above levels, and some of the echo commands
are removed since the log levels will now fulfill its role.

Test Plans:
PASS vault sanity test
PASS observe new log levels within vault manager log
PASS assign a new default log level and observe new log reflecting it
PASS bashate

Story: 2010930
Task: 48842

Change-Id: I03679ade6e1a6dcc51d13e76264f6c05d132f7c7
Signed-off-by: Tae Park <tae.park@windriver.com>
2023-10-04 16:48:43 -04:00
Tae Park
896008fb73 vault-manager wait for one server only when initialized
Modifying the vault-manager initialization logic so that it only waits
for pod number equal to the replica value to be active
if the raft is not yet initialized.

TEST PLAN:
 - In a 2 controller, 1 worker setup,
 - Upload and apply vault
 - Lock the host that vault-manager is running on
 - Vault manager should restart
 - Within the logs, there should not be a repetition of " Waiting for sva-vault statefulset running pods..."
 - Vault Sanity test in AIO-SX
 - Bashate of rendered init.sh

Closes-bug: 2029375

Signed-off-by: Tae Park <tae.park@windriver.com>
Change-Id: I41990b87395a5d5364ef91c048f740d0f0675d6b
2023-08-08 15:31:29 -04:00
Alan Bandeira
08305a2286 Add label core affinity labels to each vault pod
This commit adds the support to core affinity labels for
vault. The label 'app.starlingx.io/component' identifies
to k8s to rather run the application pods by 'platform'
or 'application' cores.

The default value for 'app.starlingx.io/component' label
is 'platform', but the label accept the values
'application' and 'platform'. The override has to be
performed when vault is in the uploaded state, after
application remove or before the first apply. This
behavior is required to ensure that no vault pod is
restarted in an improper manner.

Test plan:

PASS: In a AIO-SX system upload and apply the vault app. When apply
      is finished, run "kubectl -n vault describe po sva | grep
      platform" and the output should be three instances of
      "app.starlingx.io/component=platform", indicating that the
      default configuration is applied ofr each pod.

PASS: In a AIO-SX, where the vault app is in the applied state, run
      "system application-remove vault" and override
      'app.starlingx.io/component' label with 'application' value by
      helm api. After the override, apply vault and verify
      'app.starlingx.io/component' label is 'application' on the
      pods describe output, similar to the previous test.

PASS: In a AIO-SX, where the vault app is in the applied state, run
      "system application-remove vault" and override
      'app.starlingx.io/component' label with any value rather
      than 'platform' or 'application' and after the apply check if
      the default value of 'platform' was used for the pod labels.

PASS: In a Standard configuration with one worker node, upload and
      apply the vault app. When apply is finished, run 'kubectl -n
      vault describe po sva | grep -b3 "app.starlingx.io/component"'
      and check the output for the 'app.starlingx.io/component'
      label is the default value of 'platform' for each pod, with
      every vault server pod having the label.

PASS: In a Standard configuration with one worker node, remove vault
      and override 'app.starlingx.io/component' label with any value,
      valid or not, and after the override, apply vault. With vault
      in the applied state, verify the replica count override is kept
      and check the pods in a similar way to the previous test to
      validate that the HA configuration is maintained. The number
      of pods replicas should reflect the configuration.

Story: 2010612
Task: 48252

Change-Id: If729ab8bb8fecddf54824f5aa59326960b66942a
Signed-off-by: Alan Bandeira <Alan.PortelaBandeira@windriver.com>
2023-06-20 16:45:27 -03:00
Michel Thebeau
82478326fe update vault helm chart to 0.24.1
Replace references of 0.19.0 with 0.24.1.  Refresh the patches for
vault-manager and agent image reference. Update the image tags to match
new vault chart.

Test plan:
 PASS AIO-sx and Standard 2+2
 PASS vault aware and un-aware applications
 PASS HA tests
 PASS test image pulls from private registry with external network
      restriction

 Story: 2010393
 Task: 48109

Change-Id: Ib6b4d0a6f7d3a54676563c59f60d93d129c81c1c
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-06-14 15:05:32 -04:00
Michel Thebeau
363529d1fc delete old chart build files
These have not been needed for a while and do not impact the build of
this application.

The Makefile remains as the necessary component of the build.

Test plan:
  PASS  compare chart files before/after to ensure no changes
  PASS  compare all of stx-vault-helm package before/after

Story: 2010393
Task: 47164

Change-Id: I97025ceee2875a6fc588d72436b55e7f5ac59062
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-06-07 12:45:45 +00:00
Michel Thebeau
27f4742d8d remove extra vault-manager patch
This patch was part of the CentOS build, which was removed with
commit 20167fc54f7d3111762c52a4e8a4fe1e7c8ead5a

Whereas the patch was copied for debian here:
commit d96e143a34392324457f92019947d9af91ef803e

Test Plan:
PASS: debian build unaffected (there is no CentOS build)

Story: 2010393
Task: 47232

Change-Id: If90017b58f6220bca82e554e2fb50bd655d240ec
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-05-26 13:19:35 +00:00
Michel Thebeau
198f4e5164 set images to pull from configured registries
Add yaml to the fluxcd manifest which is compatible with the platform's
image pull and service parameter registry override handling.  The
platform will pull the image and populate registry.local, and the vault
injector agent will pull from registry.local.

Story: 2010393
Task: 47927

Test Plan:
PASS: sysinv.log shows that agentImage image is pulled when vault
      server image is hardcoded differently
PASS: agent image pulls when public network is blocked
PASS: agent image pulls when it is different than vault server image
PASS: vault app test, including vault un-aware application

Change-Id: Idd1215744bb31881127a6be23cf570166c79fad8
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-05-05 18:56:31 -04:00
Michel Thebeau
1dab2dbf8f update vault-manager image tag
The new image has updated packages for CVE fixes, no other changes.

Test Plan:
PASS - apply vault application (inspect vault-manager pod)

Story: 2010710
Task: 47905

Change-Id: I83848d12baf0558edc0a2e4cd9a964f781edec56
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-05-03 16:53:59 -04:00
Michel Thebeau
82d5d9abdc update vault-manager statefulset
Update the statefulset to prompt update strategy.  The config map is
updated in previous commits, for which we want vault-manager to restart.

Test Plan:
PASS - sw-patch upload/apply/install/remove
PASS - manual revert to original tarball (system application-update)

Story: 2010393
Task: 47731

Change-Id: Ib52d019170763d066c730d679067b91ed4d59bb5
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-03-31 10:31:46 -04:00
Michel Thebeau
dc79220541 add health query timeout for HA
This change causes vault-manager to not pause for long periods when a
configured vault server is not responsive.

Use curl --connect-timeout for queries to vault server /sys/health.
During HA recovery it is known that the server is non-responsive, so
vault-manager should not wait the default time, which is 60s or 5m
depending on the google search result.

It is observed that vault-manager appears to hang for long periods
during HA recovery. Watching the $PVCDIR/pods.txt confirms that
vault-manager is inactive for minutes at a time.  This changes the
default behavior to timeout within 2 seconds during the HA recovery
scenario.

In addition to not waiting, the vault-manager log will show the 'sealed'
status as empty string when the query times-out.

Test Plan:
PASS - vault ha 3 replicas
PASS - vault 1 replica
PASS - kubectl exec kill vault process
PASS - kubectl delete vault pod
PASS - short network downtime
PASS - long network downtime
PASS - rates including 1, 5
PASS - wait intervals including 0, 1, 3, 15
PASS - kubectl delete 2 vault pods
PASS - kubectl delete 3 (all) vault pods

Story: 2010393
Task: 47701

Change-Id: I4fd916033f6dd5210078126abb065393d25851cd
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-03-23 11:52:37 -04:00
Michel Thebeau
02184560c5 fix bashate reports in vault-manager init.sh
Run tox against the rendered init.sh from vault-init.yaml; fix most of
the reports except for some long lines from jsonpath templates.

Test Plan:
PASS - vault ha 3 replicas
PASS - vault 1 replica
PASS - kubectl exec kill vault process
PASS - kubectl delete vault pod
PASS - short network downtime
PASS - long network downtime

Story: 2010393
Task: 47700

Change-Id: I844c5de510e8a7a3724852d4e6500eec6c327aba
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-03-23 10:36:59 -04:00
Michel Thebeau
a046dca09c add option to delay server unseal
This change delays unsealing recovering vault servers for 15 seconds.

vault-manager automatically unseals vault servers in a perpetual loop
after initial configuration.  This final loop is the HA recovery
procedure.

It is observed that vault-manager will unseal a recovering vault server
when the active server has not started to send heartbeats to the new
pod.  The result is that the recovering server will timeout waiting for
heartbeats and start an election.  Although the active and standby
server will reject the election, there being a leader already, the
recovering vault will increment 'term' and restart election until
heartbeats are received, or until it wins election.

Although the raft algorithm is resilient to this, the procedure is not
appropriate.  It is better to unseal the vault server after the active
vault sends heartbeats to the new pod.

It is observed that the heartbeat interval reduces promptly from less
than 1 second per heartbeat to ~10-12 seconds for a failed vault server.
So it is reasonable for vault-manager to wait 12 seconds before
unsealing the recovering vault.  This also assumes the vault-manager and
active vault server would receive updated pod information at about the
same time and the latest heartbeat was issued immediately prior to the
update.

The options are configurable in helm-overrides.  The defaults for
example:
  manager:
    statusCheckRate: 5
    unsealWaitIntervals: 3

statusCheckRate is the rate at which vault-manager will check pod
status, in seconds.  unsealWaitIntervals is the number of intervals to
wait before unsealing the server.

Default is 5 s/interval * 3 intervals == 15 seconds

When unsealWaitIntervals is set to 0 there is no delay in unsealing the
recovering vault servers.  This is equivalent to the existing behaviour
before this change when statusCheckRate is also set to 5, which is the
value hard-coded before this change.

Test Plan:
PASS - vault ha 3 replicas
PASS - vault 1 replica
PASS - kubectl exec kill vault process
PASS - kubectl delete vault pod
PASS - short network downtime
PASS - long network downtime
PASS - rates including 1, 5
PASS - wait intervals including 0, 1, 3, 15
PASS - not reproduced with default values (many attempts)

Story: 2010393
Task: 47701

Change-Id: I763f6becee3e1a17e838a4f8ca59b2b0d33ba639
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-03-23 10:36:55 -04:00
Michel Thebeau
686cf0606f make vault-manager rate configurable
Add a chart override for the rate at which vault-manager checks vault
pod status.  Leave the default at previously hard-coded 5s.

Move all of the hard-coded sleep values to variables so they would be
more visible.

Test Plan:
PASS - vault ha 3 replicas
PASS - vault 1 replica
PASS - kubectl exec kill vault process
PASS - kubectl delete vault pod
PASS - short network downtime
PASS - long network downtime
PASS - rates including 1, 5

Story: 2010393
Task: 47701

Change-Id: I1de647760f6fe1806b0b1450c0e8f1117ad725ea
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-03-22 17:30:00 -04:00
Michel Thebeau
f3e9831823 compress the vault-manager log
Track the seal status of vault server pods so that logs can be omitted
when there is no change.

The converted loop no longer ignores pods without IP addresses.  Add an
explicit test for empty IP address field coming from getVaultPods().

Test Plan:
PASS - vault ha 3 replicas
PASS - vault 1 replica
PASS - kubectl exec kill vault process
PASS - kubectl delete vault pod
PASS - short network downtime
PASS - long network downtime

Story: 2010393
Task: 47700

Change-Id: Ic75c397046a3e183faf5ecc5b37dc8abefc7af64
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-03-22 17:29:47 -04:00
Michel Thebeau
e066354ade add timestamp to vault-manager logs
Enhance debugging with dated logs for vault-manager pod.  Allows
correlating the logs with other pods.

Test Plan:
PASS - vault ha 3 replicas
PASS - vault 1 replica
PASS - kubectl exec kill vault process
PASS - kubectl delete vault pod
PASS - short network downtime
PASS - long network downtime

Story: 2010393
Task: 47700

Change-Id: I4a877b8c0fc8ddc2626aaccc15196c30b6fb4b82
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-03-22 17:29:35 -04:00
Michel Thebeau
6b80a23f50 fix AND logic in log of vault-manager
This was probably supposed to be '&&' for AND logic and not an intention
to background the grep of pods.txt.  The symptom of this mistake is a
somewhat random log output - sometimes the output is printed before and
sometimes after the "Sealed status is".

Test Plan:
PASS - vault ha 3 replicas
PASS - vault 1 replica
PASS - kubectl exec kill vault process
PASS - kubectl delete vault pod
PASS - short network downtime
PASS - long network downtime

Story: 2010393
Task: 47700

Change-Id: Ia4358ca7ed7ca7af3b116934c4491a5887871853
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2023-03-22 17:27:30 -04:00
Manoel Benedito Neto
45de5de069 Update debian packages for pkg-versioning
The Debian packaging meta_data file has been changed to reflect all the
latest git commits under the directory, pointed as usable, and to
improve pkg-versioning addressing the first commit as start point to
build packages.

This ensures that any new code submissions under those
directories will increment the versions.

Test Plan:
PASS: Verify package versions are updated as expected.
PASS: build-pkgs -c -p vault-helm,python3-k8sapp-vault,stx-vault-helm

Story: 2010550
Task: 47501

Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com>
Change-Id: I999b1d96146fb1e2ac931641620621f445cbda71
2023-02-28 10:51:17 -03:00
Michel Thebeau
1a7c490264 Update vault to use new stx-vault-manager image
Test Plan:
PASS: Standard, dedicated storage, vault HA
PASS: Simplex
PASS: app sanity

Story: 2010393
Task: 47157

Depends-on: https://review.opendev.org/c/starlingx/root/+/871667

Change-Id: I844aaefb31e172c91134eb0b46d2911a88ba8508
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2023-01-25 17:26:50 -05:00
Michel Thebeau
20167fc54f remove CentOS build of vault
Disables the packages for CentOS build, as well as the vault manager
image.  Conversion of docker image to Debian will happen at a later
date (task 46869).

Test Plan:
PASS: centos build
PASS: debian build

Story: 2010393
Task: 46868

Change-Id: I827352122460976b07b436fb022741f7d89e5548
Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>
2022-11-17 19:11:32 +00:00
Greg Waines
428454dc96 Upversioning VAULT to v0.19.0 in order to support k8s v1.22 and higher.
Story: 2010393
Task: 46786

https://github.com/hashicorp/vault-helm/releases/tag/v0.19.0
Vault image default 1.9.2
Vault K8s image default 0.14.2
Vault CSI Provider image default 0.4.0

Testing
- built vault application in Deb build environment
- on nov 6 nightly build
    * system application-upload ./vault-1.0-1.tgz
    * system application-apply vault
    * configured vault with
      https://docs.starlingx.io/security/kubernetes/configure-vault-using-the-cli.html
    * system application-remove vault
    * system application-delete vault



Signed-off-by: Greg Waines <greg.waines@windriver.com>
Change-Id: I75f4bdc93e3a8dc630f3ade45f53d150a3945f37
2022-11-14 12:48:48 +00:00
Thiago Brito
b04ebb5b79 Fix vault-app to use upversioned cert-manager
On [1] and [2] cert-manager was migrated to fluxcd and upversioned
to version 1.7.1, but the vault helm-charts are still creating
CRs with apiVersion v1beta2. This commit fixes it.

[1] https://review.opendev.org/c/starlingx/cert-manager-armada-app/+/831956
[2] https://review.opendev.org/c/starlingx/cert-manager-armada-app/+/838590

TEST PLAN
PASS build vault-fluxcd app
PASS Upload
PASS Apply (verified created resources)
PASS Remove
PASS Delete

Logs: https://paste.opendev.org/show/bxn3yZEzas1o9bODJ5RO/

Story: 2009837
Task: 45363

Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
Change-Id: I4d61f65f453cdd55f514e8bd45c2c43ce5e45cc3
2022-05-13 11:13:55 -03:00
Michel Thebeau
2115404c7d Add FluxCD version of the vault app
Add new manifest files to the vault app to enable FluxCD support.

The new spec will now generate 2 rpms:
- the original one that contains the armada manifest yaml
- a new one that contains the new FluxCD yaml

Add missing serviceName for vault-manager statefulset, which is required
by newer versions of helm.

TEST PLAN:
- build, ISO image includes in progress fluxcd commits
- verify the armada app version of vault
- verify the fluxcd app version of vault
- test case for both includes asserting that vault is effective at
  storing secrets per the starlingx example
- Debian: build-pkgs -p stx-vault-helm

Story: 2009138
Task: 44485

Change-Id: I120c7a375ab586cf652bfff22557d4873d59bded
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2022-03-07 14:04:22 +00:00
Yue Tao
d96e143a34 vault-helm: remove dl_hook
Add "dl_path" to download the source tarball and untar it to build
directory directly other than a sub-directory. No need the
$(VAULT_CHART_DIR) in rules.

Add "src_files" to copy local files to build directory

Use debian method to apply local patch instead of applying in rules.

No longer need dl_hook

Test Plan:

Pass: successfully build vault-helm.
Pass: No difference comparing with the result of dl_hook

Story: 2009101
Task: 44277

Signed-off-by: Yue Tao <Yue.Tao@windriver.com>
Change-Id: Idb225ffcbb8455af0ed965e6e3087ec9d4cf0484
2022-01-13 12:19:31 +08:00
Tracey Bogue
ccdb17d296 Add Debian packaging for vault app
Create Debian packages for vault-helm, python-k8sapp-vault
and stx-vault-helm.

Signed-off-by: Tracey Bogue <tracey.bogue@windriver.com>
Change-Id: Ifb0c1de001e75e01e501c0078d85e562dc802d84
2021-12-03 09:12:06 -06:00
Rei Oliveira
276e4f1e9b Add toleration to vault Pod objects
A toleration needs to be added to all resources that create pods since
the node-role.kubernetes.io/master taint will be restored to all master
nodes. This ensures that the pods will run on the master node.

This adds toleration to vault statefulset and deployment objects

Test cases:

PASSED: Verify that vault pods are able to run on a tainted node

PASSED: Verify that other pods, without the taint toleration on,
fail to schedule at the tainted node and that a 'kubectl describe'
of them shows a Warning of 'node(s) had taint
{node-role.kubernetes.io/master: }, that the pod didn't tolerate.'

PASSED: Verify that system application-update from a previous
version to this version works fine

PASSED: Verify that disabling the taint has no effect on vault
running pods

PASSED: Verify that enabling the taint has no effect on vault
running pods

PASSED: Verify that vault is working by creating a vault secret
using vault's '/secret/basic-secret' api

PASSED: Verify that valut is working by reading a vault secret
using vault's '/secret/basic-secret' api

Story: 2009232
Task: 43386

Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com>
Change-Id: Ida9787e059e8c8b97f8b45d829c531f4cee1115a
2021-09-30 18:02:18 -04:00
Michel Thebeau
c85f980a6d vault-manager: use image from values
The vault-manager does not pull its image from registry-local.

Add values to manifest and use them in vault-manager chart so that
armada will prepend registry-local to the image:tag.

Also edit the values.yaml in vault-helm tarball with the vault-manager's
repository and tag.

Closes-Bug: 1921008
Change-Id: If7a086c9dd10c3b5b961e0275be56bfb117e6a1d
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2021-04-05 13:39:11 -04:00
Michel Thebeau
b851751970 fix Add pull secret for registry.local
Commit 53ad52c956314ebc00665f656ab2c4c4f49ff3e2 did not address the
issue.

Instead, set the global value of imagePullSecrets in the manifest.

Fix vault-init.yaml where imagePullSecrets had fallen into another
section.

Refrain from patching the vault tarball (revert the commit 53ad52c9)

Closes-Bug: 1912696
Change-Id: Ia9e7cb52055ba9da342ea32f3c2bd3f24ce06630
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2021-03-25 13:51:59 -04:00
Michel Thebeau
53ad52c956 Add pull secret for registry.local
Pulling vault images from registry.local will fail without
imagePullSecrets set.  containerd.log shows "failed to resolve
reference" for the image:tag.

Closes-Bug: 1912696
Change-Id: I9a4c791aa7517a5bef6b58ceadcfcc765c8fa94e
Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>
2021-01-26 20:58:29 -05:00
Cole Walker
cda9a32082 Improve vault-manager cluster join logic
The init script could fail when trying to join a vault pod to the
cluster if the leader was not ready. Added logic to retry joining until
the leader is ready.
Also includes some formatting cleanup.

Closes-Bug: 1889136

Change-Id: Ia74600bca46cd9ccdbabb48dcf3455431a4ba908
Signed-off-by: Cole Walker <cole.walker@windriver.com>
2020-07-31 11:55:57 -04:00