vault-armada-app

Author	SHA1	Message	Date
Sabyasachi Nayak	f61e33f6e1	update vault helm chart to 0.25.0 Replace references of 0.24.1 with 0.25.0. Refresh the patches for vault-manager and agent image reference. Update the image tags to match new vault chart. The vault helm chart uses vault server 1.14.0 version. The latest version of the vault server in the 1.14.x series is 1.14.8. Verified that the changes between vault v1.14.0 and v1.14.8 tags most of them are 'backport'', "cherry-pick" of commits i:e bug fixes. So used 1.14.8 version of vault sever. Test plan: PASSED AIO-sx and Standard 2+2 PASSED vault aware and un-aware applications PASSED HA tests PASSED test image pulls from private registry with external network restriction story: 2010393 Task: 49391 Change-Id: I6bd022fed79ead6e1dc224e323a179d1dcd3ab0f Signed-off-by: Sabyasachi Nayak <sabyasachi.nayak@windriver.com>	2024-01-10 17:47:38 +00:00
Tae Park	857fedecc6	Issue a Warning for Vault-Manager PVC Storage This commit adds an additional check for PVC storage for vault-manager after PVC-to-k8s conversion. If the storage is found then it will log a warning during start-up of vault manager. Test Plan: PASS bashate PASS AIO-SX vault sanity PASS New code issues logs only when the PVC storage persists after conversion Story: 2010930 Task: 49293 Change-Id: I2d669b06927b9d396ce5d6e582983ab78a3cc5fc Signed-off-by: Tae Park <tae.park@windriver.com>	2023-12-18 16:53:33 -05:00
Michel Thebeau	494edafaa9	Remove hardcoded vault and sva-vault The vault namespace and full-name are in variables and should not have been hardcoded. Test Plan: PASS bashate of rendered init.sh PASS vault sanity PASS all affected code paths Story: 2010930 Task: 49232 Change-Id: I1c4765b907ce8ce4200e98575922467edb34e9fd Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-12-15 15:40:47 +00:00
Michel Thebeau	1aa869135b	Fix removal of rekey milestone secrets When vault-manager is killed during finalizeRekey the k8s secrets may not be deleted. Especially: the kubectl command deleting multiple secrets may be interrupted. It is unclear in what order kubectl/k8s would delete the secrets when they are specified in a single command - i.e., it is observed to be a different order than what was specified. Use one kubectl command for each milestone secret. Use cluster-rekey-audit as the final milestone. Fix needsRekey to allow the procedure to resume as long as cluster-rekey-audit persists. Also adjust some comments and remove some chatty logs. Test Plan: PASS bashate of rendered init.sh PASS vault sanity, including rekey PASS application-update PASS kubectl delete vault-manager pod tests PASS kill -9 vault-manager tests Story: 2010930 Task: 49174 Change-Id: I2e5e15b4f89f9f9495381d33064c631cde6da193 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-12-15 15:40:37 +00:00
Tae Park	65b38b925d	Prevent multiple vault-manager pods from acting This commit adds new check in the main loop of vault manager for multiple instances of vault manager. Only one vault manager is needed, so it will be put on sleep or be terminated until only one is left Story: 2010930 Task: 49199 Test Plan: PASS Bashate PASS Vault sanity test Change-Id: I0fd881aa4078528ba3f804087db87069dae58f7e Signed-off-by: Tae Park <tae.park@windriver.com>	2023-12-13 19:58:07 +00:00
Michel Thebeau	be0e85ec77	stability fixes for vault-manager rekey Continue/complete the rekey procedure when vault-manager is interrupted (kill -9). Fixes include: - Refactor logic of rekeyRecover function - additionally handle specific failure scenarios to permit the rekey procedure to continue - correct return codes of procedure functions to fall through to the recovery procedure - resort the tests of needsShuffle - misc adjustment of logs and comments The additional handling of failure scenarios includes: - partial deletion of cluster-rekey secrets after copying to cluster-key - restart rekey on failure during authentication Test Plan: PASS vault sanity, ha sanity PASS IPv4 and IPv6 PASS system application-update, and platform application update PASS rekey operation without interuption PASS bashate the rendered init.sh Stability testing includes kubectl deleting pods and kill -9 processes during rekey operation at intervals spread across the procedure, with slight random time added to each interval PASS delete a standby vault server pod PASS delete the active vault server pod PASS delete the vault-manager pod PASS delete the vault-manager pod and a random vault server pod PASS delete the vault-manager pod and the active pod PASS delete the vault-manager pod and a standby pod PASS kill -9 vault-manager process PASS kill -9 active vault server process PASS kill -9 standby vault server process PASS kill -9 random selection of vault and vault-manager processes Story: 2010930 Task: 49174 Change-Id: I508e93a36de9ca8b4c8fa1da7941fe49936de159 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-12-07 13:30:32 +00:00
Michel Thebeau	615d6e4657	use the vault-manager image stx.9.0-v1.28.4 This new image adds uuidgen and multiple versions of kubectl, which vault-manager now supports. Test Plan: PASS sanity test of vault application PASS watch vault-manager log over kubernetes upgrade Depends-On: Ib0a105306cecb38379f9d28a70e83ed156681f08 Depends-On: I03e37af31514c3fa3b95e0560a6d6f83879ec9de Story: 2010930 Task: 49177 Change-Id: I7f578ac7e8d2aab98fb1e104f336fd750d7d7933 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-12-06 21:23:55 +00:00
Michel Thebeau	733ca0e9a6	Add multiple version support of kubectl Allow vault-manager to pick the version of kubectl that matches the currently running server. Add a helm override option to pick a particular version available within the image. Refresh the helm chart patches on top of this change. Test Plan: PASS Unit test the code PASS helm chart override PASS sanity of vault application PASS watch vault manager log during kubernetes upgrade Story: 2010930 Task: 49177 Change-Id: I2459d0376efb6b7e47a25f59ee82ca74b277361f Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-12-04 20:46:29 +00:00
Michel Thebeau	65e8589183	add vault rekey option during upgrade Allow the vault to be rekeyed after conversion from PVC storage to k8s storage of the shard secrets. Update the vault-manager patch to include rekey enable/disable and timing parameters in helm values.yaml. Refresh the other patches (include git long log descriptions in those patch files omitting description). Test Plan: PASS vault sanity, ha sanity PASS IPv4 and IPv6 PASS system application-update, and platform application update PASS rekey operation without interuption PASS helm chart options PASS bashate the rendered init.sh Stability testing includes kubectl deleting pods and kill -9 processes during rekey operation at intervals spread across the procedure, with slight random time added to each interval PASS delete a standby vault server pod PASS delete the active vault server pod PASS delete the vault-manager pod PASS delete the vault-manager pod and a random vault server pod PASS delete the vault-manager pod and the active pod PASS delete the vault-manager pod and a standby pod TBD kill -9 vault-manager process TBD kill -9 active vault server process TBD kill -9 standby vault server process TBD kill -9 random selection of vault and vault-manager processes Story: 2010930 Task: 48850 Change-Id: I87911819c27caaf30be69b3c969a20ed97be42cb Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-12-04 19:21:11 +00:00
Michel Thebeau	dfcfa46061	improve error handling in vaultInitialized A rare condition can result in vault servers not responding to this early initialization status check. The omission has no effect after vault is initialized, but fails the application if it happens before vault is initialized. Test Plan: PASS Unit test the changes PASS vault sanity Story: 2010930 Task: 49168 Change-Id: I6b5270f89ccea27f6c10edc6e1bc250b248f4054 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-12-04 19:21:07 +00:00
Michel Thebeau	8c6d86ea3b	improve error handling of unsealVault Add generic and specific error handling for unsealVault function. Changes include: Recognize unseal success from the API response Recognize and stop unseal procedure if the response indicates authentication failure Always 'reset' unseal in progress, if any Recognize if the requested server is already unsealed Handle return code from vaultAPI function Remove key_error check as it is printed as DEBUG by vaultAPI Refactor reused variables to be less specific Test Plan: PASS unit test the function PASS vault sanity including HA test Story: 2010930 Task: 49167 Change-Id: If55589d207bbb374a6137922f62e2d494278e72c Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-11-30 21:17:34 +00:00
Michel Thebeau	8669743ae2	add vault-manager pause debugging option A debug feature to allow vault manager function to be paused. Use case may include setting up specific conditions for test. Include a helm override for initial pause condition, which may be difficult to reach as a pod starts. Test Plan: PASS vault sanity PASS unit test the pause_on_trap code, helm override PASS misc usage of the option Story: 2010930 Task: 49048 Change-Id: Icd69a79685427268d7d59b3fbe655b9b93e8ece8 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-11-13 19:04:30 +00:00
Michel Thebeau	f2d02300a9	add interactive mode for init.sh Allow the init.sh script to be sourced by an author to permit development and test activity. Test Plan: PASS vault sanity test PASS enter vault-manager pod and source init.sh PASS bashate on the rendered script Story: 2010930 Task: 49047 Change-Id: I899dcf6df793ee69b51b63a8b214320282d091fa Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-11-13 19:02:36 +00:00
Michel Thebeau	c91580ebd2	add generic function for vault REST API calls Replace curl REST API calls with a generic function. This prepares for adding more functionality to vault manager; more REST API calls. Main feature includes error/debug logging of the responses. Also includes: Define variables for server targets Refactor ubiquitous global 'row' variable, covert to parameter Explicitly declare the curl's default connect-timeout (120s) Test plan: PASS vault sanity PASS vault HA test PASS all code paths with REST API calls PASS misc examples GET, POST, DELETE PASS unit test the new function PASS bashate of the rendered document Story: 2010930 Task: 49042 Change-Id: Ic329f075ba1c0480f5d507f9768f76fa86fc2094 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-11-13 19:02:25 +00:00
Michel Thebeau	464f9d0e76	Conversion of storage during application update Add lifecycle code to read secrets from PVC mounted to running vault-manager, and vault-manager code for conversion of storage from PVC to k8s secrets. The lifecycle code is added because the previous version of vault-manager does not respond to SIGTERM from kubernetes for termination. And yet the pod will be terminating when the new vault-manager pod runs. Reading the PVC data in lifecycle code before helm updates the charts simplifies the process when vault-manager is running during application-update. The new vault-manager also handles the case where the application is not running at the time the application is updated, such as if the application is removed, deleted, uploaded and applied. In general the procedure for conversion of the storage from PVC to k8s secrets is: - read the data from PVC - store the data in k8s secrets - validate the data - confirm the stored data is the same as what was in PVC - delete the original data only when the copy is confirmed The solution employs a 'mount-helper', an incarnation of init.sh, that mounts the PVC resource so that vault-manager can read it. The mount-helper mounts the PVC resource and waits to be terminated. Test plan: PASS vault sanity PASS vault sanity via application-update PASS vault sanity update via application remove, delete, upload, apply (update testing requires version bump similar to change 881754) PASS unit test of the code PASS bashate, flake8, bandit PASS tox Story: 2010930 Task: 48846 Change-Id: Iace37dad256b50f8d2ea6741bca070b97ec7d2d2 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-11-02 15:12:47 +00:00
Tae Park	ef1b8f663b	Split key shards in json Splitting key shard file gained from vault initialization into separate files. Each key shard (plus root token) is now stored separately. Test Plan: PASS vault sanity PASS bashate Task: 2010930 Story: 48847 Change-Id: I8a007e505ea7ee9764301e494f4801a25cb194ce Signed-off-by: Tae Park <tae.park@windriver.com>	2023-10-31 14:02:10 -04:00
Michel Thebeau	87bf94a0c5	fix print of new log level When changing the log level, the logged new level was reported as the log parameter rather than the configured log level. Refactor the case statement as it's own function so it can be used in several places. Use the conversion function to print the correct configured log level. Also print the user friendly text when detecting invalid log level. Misc other changes including comments, line lengths, local declarations. Test Plan: PASS Unit test log functions PASS changing configured log level PASS bashate on the rendered init.sh Story: 2010930 Task: 48842 Change-Id: I6c96f2e5193d722bb9e4cd32eb66c2cd2f65a503 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-10-26 14:03:13 +00:00
Michel Thebeau	3684074d88	exit_on_trap: adjust default and debug behaviours It is observed that the exit_on_trap is not working under normal operation - vault-manager takes 30 seconds to exit when the application is removed. The default behaviour is to exit when the trap file exists. Not exiting when the trap file exists only occurs when the trap file has content. This latter behaviour is a debugging option. Add the conditional for empty trap file, and always exit if the trap file is empty. For the debugging feature it is helpful for the procedure to remember the trap number set in the trap file. Use DEBUGGING_TRAP global variable to remember the debugging trap requested, and exit whenever that exit_on_call trap is run. Also: Refactor the parameter variable as 'trap' for readability. Adjust all the logs for exit_on_trap to permit search for "exit_on_trap". And log at INFO level when exiting vault-manager. Test Plan: PASS unit test exit_on_trap PASS default behavior; vault-manager responds promptly to termination PASS debug feature exits on matching trap (remembers debug trap) PASS bashate on the rendered init.sh Story: 2010930 Task: 48843 Change-Id: Id67a89e063daa18ba7627553ac2a19ca673ff00b Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-10-25 19:58:13 +00:00
Tae Park	11c11f1e0c	Support storing key shards in k8s secrets Replaces current implementation of storing key shards from PVC to k8s secret. Includes additional improvements to existing vm code, such as added error checking. Test Plan: PASS vault sanity test PASS bashate Story: 2010930 Task: 48845 Change-Id: Ie0e5fe9749fa871d73d7b52600a8905abcb31887 Signed-off-by: Tae Park <tae.park@windriver.com>	2023-10-23 11:11:46 -04:00
Tae Park	9b9e0e8f62	Move Template Values to Top Organizing the helm template vaules so that each call is unique. This should allow for more efficient management of helm values within init.sh. Also done some additional code clean up. TEST PLAN: PASS vault sanity test PASS bashate Story: 2010930 Task: 48844 Change-Id: Ia8e7820b9c86e307991f9affda7035bb89dfcc57 Signed-off-by: Tae Park <tae.park@windriver.com>	2023-10-11 16:46:13 -04:00
Tae Park	7267389382	Add manual exit points for catching SIGTERM Adds new catch points for SIGTERM sent to the vault manager. Allows the vault manager to exit gracefully. This includes debugging capability to exit on a specific point. Test Plan: PASS vault sanity test PASS bashate PASS vault manager pod exits and restarts with no errors when an exit_on_trap file is created with the specificed exit point number. The file must be located in the vault manager work directory. Story: 2010930 Task: 48843 Change-Id: I3dadccfca554b448d729d37132c8af17324368f1 Signed-off-by: Tae Park <tae.park@windriver.com>	2023-10-06 21:17:00 +00:00
Tae Park	a56dc6dfbb	Add log levels to vault manager Adding new log levels in vault manager to help diagnose logs. There are five levels added (DEBUG, INFO, WARNING, ERROR, FATAL). Existing logs are assigned to one of the above levels, and some of the echo commands are removed since the log levels will now fulfill its role. Test Plans: PASS vault sanity test PASS observe new log levels within vault manager log PASS assign a new default log level and observe new log reflecting it PASS bashate Story: 2010930 Task: 48842 Change-Id: I03679ade6e1a6dcc51d13e76264f6c05d132f7c7 Signed-off-by: Tae Park <tae.park@windriver.com>	2023-10-04 16:48:43 -04:00
Tae Park	896008fb73	vault-manager wait for one server only when initialized Modifying the vault-manager initialization logic so that it only waits for pod number equal to the replica value to be active if the raft is not yet initialized. TEST PLAN: - In a 2 controller, 1 worker setup, - Upload and apply vault - Lock the host that vault-manager is running on - Vault manager should restart - Within the logs, there should not be a repetition of " Waiting for sva-vault statefulset running pods..." - Vault Sanity test in AIO-SX - Bashate of rendered init.sh Closes-bug: 2029375 Signed-off-by: Tae Park <tae.park@windriver.com> Change-Id: I41990b87395a5d5364ef91c048f740d0f0675d6b	2023-08-08 15:31:29 -04:00
Alan Bandeira	08305a2286	Add label core affinity labels to each vault pod This commit adds the support to core affinity labels for vault. The label 'app.starlingx.io/component' identifies to k8s to rather run the application pods by 'platform' or 'application' cores. The default value for 'app.starlingx.io/component' label is 'platform', but the label accept the values 'application' and 'platform'. The override has to be performed when vault is in the uploaded state, after application remove or before the first apply. This behavior is required to ensure that no vault pod is restarted in an improper manner. Test plan: PASS: In a AIO-SX system upload and apply the vault app. When apply is finished, run "kubectl -n vault describe po sva \| grep platform" and the output should be three instances of "app.starlingx.io/component=platform", indicating that the default configuration is applied ofr each pod. PASS: In a AIO-SX, where the vault app is in the applied state, run "system application-remove vault" and override 'app.starlingx.io/component' label with 'application' value by helm api. After the override, apply vault and verify 'app.starlingx.io/component' label is 'application' on the pods describe output, similar to the previous test. PASS: In a AIO-SX, where the vault app is in the applied state, run "system application-remove vault" and override 'app.starlingx.io/component' label with any value rather than 'platform' or 'application' and after the apply check if the default value of 'platform' was used for the pod labels. PASS: In a Standard configuration with one worker node, upload and apply the vault app. When apply is finished, run 'kubectl -n vault describe po sva \| grep -b3 "app.starlingx.io/component"' and check the output for the 'app.starlingx.io/component' label is the default value of 'platform' for each pod, with every vault server pod having the label. PASS: In a Standard configuration with one worker node, remove vault and override 'app.starlingx.io/component' label with any value, valid or not, and after the override, apply vault. With vault in the applied state, verify the replica count override is kept and check the pods in a similar way to the previous test to validate that the HA configuration is maintained. The number of pods replicas should reflect the configuration. Story: 2010612 Task: 48252 Change-Id: If729ab8bb8fecddf54824f5aa59326960b66942a Signed-off-by: Alan Bandeira <Alan.PortelaBandeira@windriver.com>	2023-06-20 16:45:27 -03:00
Michel Thebeau	82478326fe	update vault helm chart to 0.24.1 Replace references of 0.19.0 with 0.24.1. Refresh the patches for vault-manager and agent image reference. Update the image tags to match new vault chart. Test plan: PASS AIO-sx and Standard 2+2 PASS vault aware and un-aware applications PASS HA tests PASS test image pulls from private registry with external network restriction Story: 2010393 Task: 48109 Change-Id: Ib6b4d0a6f7d3a54676563c59f60d93d129c81c1c Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-06-14 15:05:32 -04:00
Michel Thebeau	363529d1fc	delete old chart build files These have not been needed for a while and do not impact the build of this application. The Makefile remains as the necessary component of the build. Test plan: PASS compare chart files before/after to ensure no changes PASS compare all of stx-vault-helm package before/after Story: 2010393 Task: 47164 Change-Id: I97025ceee2875a6fc588d72436b55e7f5ac59062 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-06-07 12:45:45 +00:00
Michel Thebeau	27f4742d8d	remove extra vault-manager patch This patch was part of the CentOS build, which was removed with commit 20167fc54f7d3111762c52a4e8a4fe1e7c8ead5a Whereas the patch was copied for debian here: commit d96e143a34392324457f92019947d9af91ef803e Test Plan: PASS: debian build unaffected (there is no CentOS build) Story: 2010393 Task: 47232 Change-Id: If90017b58f6220bca82e554e2fb50bd655d240ec Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-05-26 13:19:35 +00:00
Michel Thebeau	198f4e5164	set images to pull from configured registries Add yaml to the fluxcd manifest which is compatible with the platform's image pull and service parameter registry override handling. The platform will pull the image and populate registry.local, and the vault injector agent will pull from registry.local. Story: 2010393 Task: 47927 Test Plan: PASS: sysinv.log shows that agentImage image is pulled when vault server image is hardcoded differently PASS: agent image pulls when public network is blocked PASS: agent image pulls when it is different than vault server image PASS: vault app test, including vault un-aware application Change-Id: Idd1215744bb31881127a6be23cf570166c79fad8 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-05-05 18:56:31 -04:00
Michel Thebeau	1dab2dbf8f	update vault-manager image tag The new image has updated packages for CVE fixes, no other changes. Test Plan: PASS - apply vault application (inspect vault-manager pod) Story: 2010710 Task: 47905 Change-Id: I83848d12baf0558edc0a2e4cd9a964f781edec56 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-05-03 16:53:59 -04:00
Michel Thebeau	82d5d9abdc	update vault-manager statefulset Update the statefulset to prompt update strategy. The config map is updated in previous commits, for which we want vault-manager to restart. Test Plan: PASS - sw-patch upload/apply/install/remove PASS - manual revert to original tarball (system application-update) Story: 2010393 Task: 47731 Change-Id: Ib52d019170763d066c730d679067b91ed4d59bb5 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-03-31 10:31:46 -04:00
Michel Thebeau	dc79220541	add health query timeout for HA This change causes vault-manager to not pause for long periods when a configured vault server is not responsive. Use curl --connect-timeout for queries to vault server /sys/health. During HA recovery it is known that the server is non-responsive, so vault-manager should not wait the default time, which is 60s or 5m depending on the google search result. It is observed that vault-manager appears to hang for long periods during HA recovery. Watching the $PVCDIR/pods.txt confirms that vault-manager is inactive for minutes at a time. This changes the default behavior to timeout within 2 seconds during the HA recovery scenario. In addition to not waiting, the vault-manager log will show the 'sealed' status as empty string when the query times-out. Test Plan: PASS - vault ha 3 replicas PASS - vault 1 replica PASS - kubectl exec kill vault process PASS - kubectl delete vault pod PASS - short network downtime PASS - long network downtime PASS - rates including 1, 5 PASS - wait intervals including 0, 1, 3, 15 PASS - kubectl delete 2 vault pods PASS - kubectl delete 3 (all) vault pods Story: 2010393 Task: 47701 Change-Id: I4fd916033f6dd5210078126abb065393d25851cd Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-03-23 11:52:37 -04:00
Michel Thebeau	02184560c5	fix bashate reports in vault-manager init.sh Run tox against the rendered init.sh from vault-init.yaml; fix most of the reports except for some long lines from jsonpath templates. Test Plan: PASS - vault ha 3 replicas PASS - vault 1 replica PASS - kubectl exec kill vault process PASS - kubectl delete vault pod PASS - short network downtime PASS - long network downtime Story: 2010393 Task: 47700 Change-Id: I844c5de510e8a7a3724852d4e6500eec6c327aba Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-03-23 10:36:59 -04:00
Michel Thebeau	a046dca09c	add option to delay server unseal This change delays unsealing recovering vault servers for 15 seconds. vault-manager automatically unseals vault servers in a perpetual loop after initial configuration. This final loop is the HA recovery procedure. It is observed that vault-manager will unseal a recovering vault server when the active server has not started to send heartbeats to the new pod. The result is that the recovering server will timeout waiting for heartbeats and start an election. Although the active and standby server will reject the election, there being a leader already, the recovering vault will increment 'term' and restart election until heartbeats are received, or until it wins election. Although the raft algorithm is resilient to this, the procedure is not appropriate. It is better to unseal the vault server after the active vault sends heartbeats to the new pod. It is observed that the heartbeat interval reduces promptly from less than 1 second per heartbeat to ~10-12 seconds for a failed vault server. So it is reasonable for vault-manager to wait 12 seconds before unsealing the recovering vault. This also assumes the vault-manager and active vault server would receive updated pod information at about the same time and the latest heartbeat was issued immediately prior to the update. The options are configurable in helm-overrides. The defaults for example: manager: statusCheckRate: 5 unsealWaitIntervals: 3 statusCheckRate is the rate at which vault-manager will check pod status, in seconds. unsealWaitIntervals is the number of intervals to wait before unsealing the server. Default is 5 s/interval * 3 intervals == 15 seconds When unsealWaitIntervals is set to 0 there is no delay in unsealing the recovering vault servers. This is equivalent to the existing behaviour before this change when statusCheckRate is also set to 5, which is the value hard-coded before this change. Test Plan: PASS - vault ha 3 replicas PASS - vault 1 replica PASS - kubectl exec kill vault process PASS - kubectl delete vault pod PASS - short network downtime PASS - long network downtime PASS - rates including 1, 5 PASS - wait intervals including 0, 1, 3, 15 PASS - not reproduced with default values (many attempts) Story: 2010393 Task: 47701 Change-Id: I763f6becee3e1a17e838a4f8ca59b2b0d33ba639 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-03-23 10:36:55 -04:00
Michel Thebeau	686cf0606f	make vault-manager rate configurable Add a chart override for the rate at which vault-manager checks vault pod status. Leave the default at previously hard-coded 5s. Move all of the hard-coded sleep values to variables so they would be more visible. Test Plan: PASS - vault ha 3 replicas PASS - vault 1 replica PASS - kubectl exec kill vault process PASS - kubectl delete vault pod PASS - short network downtime PASS - long network downtime PASS - rates including 1, 5 Story: 2010393 Task: 47701 Change-Id: I1de647760f6fe1806b0b1450c0e8f1117ad725ea Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-03-22 17:30:00 -04:00
Michel Thebeau	f3e9831823	compress the vault-manager log Track the seal status of vault server pods so that logs can be omitted when there is no change. The converted loop no longer ignores pods without IP addresses. Add an explicit test for empty IP address field coming from getVaultPods(). Test Plan: PASS - vault ha 3 replicas PASS - vault 1 replica PASS - kubectl exec kill vault process PASS - kubectl delete vault pod PASS - short network downtime PASS - long network downtime Story: 2010393 Task: 47700 Change-Id: Ic75c397046a3e183faf5ecc5b37dc8abefc7af64 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-03-22 17:29:47 -04:00
Michel Thebeau	e066354ade	add timestamp to vault-manager logs Enhance debugging with dated logs for vault-manager pod. Allows correlating the logs with other pods. Test Plan: PASS - vault ha 3 replicas PASS - vault 1 replica PASS - kubectl exec kill vault process PASS - kubectl delete vault pod PASS - short network downtime PASS - long network downtime Story: 2010393 Task: 47700 Change-Id: I4a877b8c0fc8ddc2626aaccc15196c30b6fb4b82 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-03-22 17:29:35 -04:00
Michel Thebeau	6b80a23f50	fix AND logic in log of vault-manager This was probably supposed to be '&&' for AND logic and not an intention to background the grep of pods.txt. The symptom of this mistake is a somewhat random log output - sometimes the output is printed before and sometimes after the "Sealed status is". Test Plan: PASS - vault ha 3 replicas PASS - vault 1 replica PASS - kubectl exec kill vault process PASS - kubectl delete vault pod PASS - short network downtime PASS - long network downtime Story: 2010393 Task: 47700 Change-Id: Ia4358ca7ed7ca7af3b116934c4491a5887871853 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2023-03-22 17:27:30 -04:00
Manoel Benedito Neto	45de5de069	Update debian packages for pkg-versioning The Debian packaging meta_data file has been changed to reflect all the latest git commits under the directory, pointed as usable, and to improve pkg-versioning addressing the first commit as start point to build packages. This ensures that any new code submissions under those directories will increment the versions. Test Plan: PASS: Verify package versions are updated as expected. PASS: build-pkgs -c -p vault-helm,python3-k8sapp-vault,stx-vault-helm Story: 2010550 Task: 47501 Signed-off-by: Manoel Benedito Neto <Manoel.BeneditoNeto@windriver.com> Change-Id: I999b1d96146fb1e2ac931641620621f445cbda71	2023-02-28 10:51:17 -03:00
Michel Thebeau	1a7c490264	Update vault to use new stx-vault-manager image Test Plan: PASS: Standard, dedicated storage, vault HA PASS: Simplex PASS: app sanity Story: 2010393 Task: 47157 Depends-on: https://review.opendev.org/c/starlingx/root/+/871667 Change-Id: I844aaefb31e172c91134eb0b46d2911a88ba8508 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2023-01-25 17:26:50 -05:00
Michel Thebeau	20167fc54f	remove CentOS build of vault Disables the packages for CentOS build, as well as the vault manager image. Conversion of docker image to Debian will happen at a later date (task 46869). Test Plan: PASS: centos build PASS: debian build Story: 2010393 Task: 46868 Change-Id: I827352122460976b07b436fb022741f7d89e5548 Signed-off-by: Michel Thebeau <michel.thebeau@windriver.com>	2022-11-17 19:11:32 +00:00
Greg Waines	428454dc96	Upversioning VAULT to v0.19.0 in order to support k8s v1.22 and higher. Story: 2010393 Task: 46786 https://github.com/hashicorp/vault-helm/releases/tag/v0.19.0 Vault image default 1.9.2 Vault K8s image default 0.14.2 Vault CSI Provider image default 0.4.0 Testing - built vault application in Deb build environment - on nov 6 nightly build * system application-upload ./vault-1.0-1.tgz * system application-apply vault * configured vault with https://docs.starlingx.io/security/kubernetes/configure-vault-using-the-cli.html * system application-remove vault * system application-delete vault Signed-off-by: Greg Waines <greg.waines@windriver.com> Change-Id: I75f4bdc93e3a8dc630f3ade45f53d150a3945f37	2022-11-14 12:48:48 +00:00
Thiago Brito	b04ebb5b79	Fix vault-app to use upversioned cert-manager On [1] and [2] cert-manager was migrated to fluxcd and upversioned to version 1.7.1, but the vault helm-charts are still creating CRs with apiVersion v1beta2. This commit fixes it. [1] https://review.opendev.org/c/starlingx/cert-manager-armada-app/+/831956 [2] https://review.opendev.org/c/starlingx/cert-manager-armada-app/+/838590 TEST PLAN PASS build vault-fluxcd app PASS Upload PASS Apply (verified created resources) PASS Remove PASS Delete Logs: https://paste.opendev.org/show/bxn3yZEzas1o9bODJ5RO/ Story: 2009837 Task: 45363 Signed-off-by: Thiago Brito <thiago.brito@windriver.com> Change-Id: I4d61f65f453cdd55f514e8bd45c2c43ce5e45cc3	2022-05-13 11:13:55 -03:00
Michel Thebeau	2115404c7d	Add FluxCD version of the vault app Add new manifest files to the vault app to enable FluxCD support. The new spec will now generate 2 rpms: - the original one that contains the armada manifest yaml - a new one that contains the new FluxCD yaml Add missing serviceName for vault-manager statefulset, which is required by newer versions of helm. TEST PLAN: - build, ISO image includes in progress fluxcd commits - verify the armada app version of vault - verify the fluxcd app version of vault - test case for both includes asserting that vault is effective at storing secrets per the starlingx example - Debian: build-pkgs -p stx-vault-helm Story: 2009138 Task: 44485 Change-Id: I120c7a375ab586cf652bfff22557d4873d59bded Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2022-03-07 14:04:22 +00:00
Yue Tao	d96e143a34	vault-helm: remove dl_hook Add "dl_path" to download the source tarball and untar it to build directory directly other than a sub-directory. No need the $(VAULT_CHART_DIR) in rules. Add "src_files" to copy local files to build directory Use debian method to apply local patch instead of applying in rules. No longer need dl_hook Test Plan: Pass: successfully build vault-helm. Pass: No difference comparing with the result of dl_hook Story: 2009101 Task: 44277 Signed-off-by: Yue Tao <Yue.Tao@windriver.com> Change-Id: Idb225ffcbb8455af0ed965e6e3087ec9d4cf0484	2022-01-13 12:19:31 +08:00
Tracey Bogue	ccdb17d296	Add Debian packaging for vault app Create Debian packages for vault-helm, python-k8sapp-vault and stx-vault-helm. Signed-off-by: Tracey Bogue <tracey.bogue@windriver.com> Change-Id: Ifb0c1de001e75e01e501c0078d85e562dc802d84	2021-12-03 09:12:06 -06:00
Rei Oliveira	276e4f1e9b	Add toleration to vault Pod objects A toleration needs to be added to all resources that create pods since the node-role.kubernetes.io/master taint will be restored to all master nodes. This ensures that the pods will run on the master node. This adds toleration to vault statefulset and deployment objects Test cases: PASSED: Verify that vault pods are able to run on a tainted node PASSED: Verify that other pods, without the taint toleration on, fail to schedule at the tainted node and that a 'kubectl describe' of them shows a Warning of 'node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.' PASSED: Verify that system application-update from a previous version to this version works fine PASSED: Verify that disabling the taint has no effect on vault running pods PASSED: Verify that enabling the taint has no effect on vault running pods PASSED: Verify that vault is working by creating a vault secret using vault's '/secret/basic-secret' api PASSED: Verify that valut is working by reading a vault secret using vault's '/secret/basic-secret' api Story: 2009232 Task: 43386 Signed-off-by: Rei Oliveira <Reinildes.JoseMateusOliveira@windriver.com> Change-Id: Ida9787e059e8c8b97f8b45d829c531f4cee1115a	2021-09-30 18:02:18 -04:00
Michel Thebeau	c85f980a6d	vault-manager: use image from values The vault-manager does not pull its image from registry-local. Add values to manifest and use them in vault-manager chart so that armada will prepend registry-local to the image:tag. Also edit the values.yaml in vault-helm tarball with the vault-manager's repository and tag. Closes-Bug: 1921008 Change-Id: If7a086c9dd10c3b5b961e0275be56bfb117e6a1d Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2021-04-05 13:39:11 -04:00
Michel Thebeau	b851751970	fix Add pull secret for registry.local Commit 53ad52c956314ebc00665f656ab2c4c4f49ff3e2 did not address the issue. Instead, set the global value of imagePullSecrets in the manifest. Fix vault-init.yaml where imagePullSecrets had fallen into another section. Refrain from patching the vault tarball (revert the commit 53ad52c9) Closes-Bug: 1912696 Change-Id: Ia9e7cb52055ba9da342ea32f3c2bd3f24ce06630 Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2021-03-25 13:51:59 -04:00
Michel Thebeau	53ad52c956	Add pull secret for registry.local Pulling vault images from registry.local will fail without imagePullSecrets set. containerd.log shows "failed to resolve reference" for the image:tag. Closes-Bug: 1912696 Change-Id: I9a4c791aa7517a5bef6b58ceadcfcc765c8fa94e Signed-off-by: Michel Thebeau <Michel.Thebeau@windriver.com>	2021-01-26 20:58:29 -05:00
Cole Walker	cda9a32082	Improve vault-manager cluster join logic The init script could fail when trying to join a vault pod to the cluster if the leader was not ready. Added logic to retry joining until the leader is ready. Also includes some formatting cleanup. Closes-Bug: 1889136 Change-Id: Ia74600bca46cd9ccdbabb48dcf3455431a4ba908 Signed-off-by: Cole Walker <cole.walker@windriver.com>	2020-07-31 11:55:57 -04:00

1 2

56 Commits