904 Commits

Author SHA1 Message Date
Robert Church
6132aa7317 Remove minimal PV support on AIO/workers
To support long running patch-able systems that don't require a
reinstall, the entire root disk will be allocated to the cgts-vg volume
group as part of installation.

This update simply removes the use of MINIMUM_PLATFORM_PV_SIZE and
ensures that the 'platform_pv' uses all available space.

NOTE: A followup commit will be provided to clean up the large, small,
      tiny disk references and provide an accurate log checking for and
      displaying minimal disk size based on default logical volume
      sizes.

Test Plan:
PASS - Install AIO-SX, bootstrap, unlock
PASS - Install 2+2+2, bootstrap, unlock

Change-Id: I3a50f2305b781de1cf9b80c5aed62b03bebc4790
Story: 2010444
Task: 46981
Signed-off-by: Robert Church <robert.church@windriver.com>
2022-12-03 12:36:45 -06:00
Kyle MacLeod
f15661bcf6 Allow e2fsck exit codes of 0,1
From the e2fsck man pages, the exit codes of 0, 1, 2 should
not be treated as failures.  We should never see exit code 2 though,
since it only occurs when e2fsck is run against a mounted filesystem.

The solution is to extend the check to only fail on exit code > 1.

Test Plan
PASS: Verify e2fsck exit code handling during subcloud install
      with resized partition

Closes-Bug: 1998611

Change-Id: Ie22fd77e3d2e2d631ba467b818bdc77c77f0d8b8
Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>
2022-12-02 11:18:32 -05:00
Robert Church
1796ed8740 Update wipedisk for LVM based rootfs
Now that the root filesystem is based on an LVM logical volume, discover
the root disk by searching for the boot partition.

Changes include:
 - remove detection of rootfs_part/rootfs and adjust rootfs related
   references with boot_disk.
 - run bashate on the script and resolve indentation and syntax related
   errors. Leave long-line errors alone for improved readability.

Test Plan:
PASS - run 'wipedisk', answer prompts, and ensure all partitions are
       cleaned up except for the platform backup partition
PASS - run 'wipedisk --include-backup', answer prompts, and ensure all
       partitions are cleaned up
PASS - run 'wipedisk --include-backup --force' and ensure all partitions
       are cleaned up

Change-Id: I036ce745353b6a26bc2615ffc6e3b8955b4dd1ec
Closes-Bug: #1998204
Signed-off-by: Robert Church <robert.church@windriver.com>
2022-11-29 05:04:38 -06:00
Robert Church
b0066dcd27 Remove all volume groups by UUID
In cases when wipedisk isn't run or isn't working correctly,
pre-existing volume groups, physical volumes, and logical volumes will
be present on the root disk. Depending on the sizes and layout of the
previous install along with partial or aborted cleanup activities, this
may lead [unknown] PVs with duplicate volume group names.

Adjust the cleanup logic to:
- Discover existing volume groups by UUID so that duplicate volume
  groups (i.e two occurrences of cgts-vg) can be handled individually.
- Ignore [unknown] physical volumes in a volume group as they cannnot be
  removed. Cleaning up existing physical volumes across all volume
  groups will resolve any [unknown] physical volumes.

In addition, unify if/then for/do syntax in the %pre-part hook

Test Plan:
PASS - create a scenario with multiple partitions along with a
       nova-local and cgts-vg volume group that result in an [unknown]
       physical volume and a duplicate cgts-vg. Do not wipe the disks
       and install an ISO with the above changes. Observe proper cleanup
       and install.
PASS - Perform consecutive installs without wipedisk and observe proper
       cleanup and install

Change-Id: Idf845cf00ca3c009d72dedef0805a77d94fa3d97
Partial-Bug: #1998204
Signed-off-by: Robert Church <robert.church@windriver.com>
2022-11-29 05:04:06 -06:00
Robert Church
651bd76566 Ensure magic strings that are visible for libblkid are erased
In the case when the root disk partition table is wiped but individual
partitions are not wiped correctly, this will leave previous physical
volume metadata intact on the disk.

When a new LVM partition is created and assigned as a newly created
physical volume the old LVM metadata on the disk partition will prevent
the cgts-vg volume group from being created.

This update will wipe all the magic strings present in the new physical
volume partition established by the kickstart by executing 'wipefs -a'
prior to creating the cgts-vg.

Test Plan:
PASS - Successfully install an ISO with this change on a system that did
       not cleanup the LVM metatadata from a previous install. Log in to
       the installed system and confirm that the cgts-vg is properly
       configured.

Change-Id: I63f4235a27cb40a4283f0f4c34f63564a4f18cdd
Partial-Bug: #1998204
Signed-off-by: Robert Church <robert.church@windriver.com>
2022-11-29 04:40:42 -06:00
Al Bailey
d4aaeb5836 Debian: Fix ostree remote for patching on workers
The sw_version was uninitialized for workers.
This led to a 404 error when doing ostree pull
during a patch installation on worker nodes.

The problem was introduced by
https://review.opendev.org/c/starlingx/metal/+/864930

Test Plan:
  Build / Install /Deploy Duplex env with a worker
  Successfully apply a patch on the worker

Closes-Bug: 1997130
Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: If40466b0ac9ffe0ce1ae068e948682eafa3703e5
2022-11-24 19:04:13 +00:00
Zuul
d614eda8fc Merge "Fix failure to set instdev parameter when we use ks-setup.cfg" 2022-11-24 01:25:40 +00:00
Shrikumar Sharma
15e971b59e Fix failure to set instdev parameter when we use ks-setup.cfg
When we create the prestage iso with an external script,
ks-setup.cfg, we may not provide the rootfs_device or
boot_device parameter. This is a valid scenario where these
parameters are defined in ks-setup.cfg. An installation failure
is observed in this case.

The cause of the failure is that the prestage code is handled in
a pre-part hook. This commit moves it to a ks-early hook.

In this commit, a provision for the execution of a custom script
named ks-addon.cfg is also made. This script is a bash script
that must execute in the last post hook.

Test Plan:
PASS: Verify that the installation succeeds when the rootfs
      and boot device parameters are only specified via
      the ks-setup.cfg.

PASS: Verify that the external script, ks-addon.cfg, is executed
      after the install and configurations are done.

PASS: Verify that the logs from the execution of ks-addon.cfg
      are present in kickstart.log.

Closes-bug: 1997305

Signed-off-by: Shrikumar Sharma <shrikumar.sharma@windriver.com>
Change-Id: Ica1735aef3ab457cf0609ebee6aac45671e97987
2022-11-23 23:06:05 +00:00
Zuul
f80c698031 Merge "Make var and root filesystems LVM based" 2022-11-22 19:05:10 +00:00
Robert Church
c5c6f5353a Make var and root filesystems LVM based
Move the /var and /root partition based filesystems into the cgts-vg so
that they can be resized as required at runtime in the future.

This change includes:
- Update pxeboot network personality files to add installer command line
  parameters inst_ostree_root andinst_ostree_var to allow specifying the
  root and var devices to be created and populated by the installer.
- Update the StarlingX grub.cfg file to add a new single option booting
  that drops the rollback boot option (not working) and adds grub
  options ostree_root, rd.lvm.lv, and ostree_var to enable mounting the
  root and var filesystems at boot time.
- Update the kickstart/miniboot config files to:
  - Remove support for lat/lat-disk partition size variables and
    refactor the hooks to use specific PART_SZ_* and LV_SZ_* variables.
  - Increase /boot partition size to 2GB from 500M to provide some
    additional space for future patching scenarios that may require
    staging multiple ostree deployments prior to reboot and cleanup.
  - Create logical volumes for root and var set to the current 20GB
    values.
  - Adjust the minimum physical volume size used on AIO and worker
    personalities to include the new root and var logical volumes.
  - Adjust normal install disk thresholds to 219GB for AIOs and 120GB
    for workers.
  - Fix mkfs hook to ensure that the aio vs. std sizes are correctly
    reflected on hook execution.

Test Plan:
- PASS: BIOS AIO-SX
- PASS: UEFI AIO-SX
- PASS: BIOS 2+2+2
- SKIP: secure boot, not ready for Stx8.0
- PASS: AIO-SX upgrade
- PASS: AIO-DX upgrade
- PASS: DC subcloud install (virtual test)

Change-Id: I5f77266336b53d178eaae0e6fbb556bbea6400e8
Depends-On: https://review.opendev.org/c/starlingx/integ/+/865076
Story: 2010444
Task: 46881
Signed-off-by: Robert Church <robert.church@windriver.com>
2022-11-22 13:05:13 +02:00
Zuul
87a645b8ea Merge "Remove normal/rollback toggle code from stx grub menu" 2022-11-21 20:28:31 +00:00
Zuul
fde03b15d7 Merge "Debian: metal: update debian_iso_image.inc" 2022-11-21 19:49:54 +00:00
Eric MacDonald
04e9723dbb Remove normal/rollback toggle code from stx grub menu
Modify the stx grub template file to remove the
normal / rollback image switching/toggle algorithm.
Also remove the temporary sed based method in the
kickstart code.

Effectively, this moved the previous change introduced by

  https://review.opendev.org/c/starlingx/metal/+/861461

... to a grub.cfg 'code block remove' rather
than 'on the fly sed modification' by the kickstart.

Test Plan:

PASS: Verify build and install
PASS: Verify on target code removed from /boot/efi/EFI/BOOT/grub.cfg
PASS: Verify normal image is selected after 10 back to back reboots

Story: 2009968
Task: 46886
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: Id8799dff6eef7ef8aa6f66180d6ed971c005618d
2022-11-20 23:19:53 +00:00
Zuul
d24c008374 Merge "Create new pxeboot feed refresh script and service" 2022-11-20 18:48:23 +00:00
Zuul
10a4bece22 Merge "Debian: Fix ostree remote pulls for IPv6 workers" 2022-11-20 16:00:40 +00:00
Eric MacDonald
b5d22ef3e7 Create new pxeboot feed refresh script and service
This update introduces a new script that can be called
by patching to refresh the kernel, initrd and other
system node install feed staged files in support of
kernel patching.

This update also introduces and enables new service file
that triggers the creation of the pxeboot feeds or refreshes
the pxeboot feeds if what they contain does not match the
content in /boot.

Both new script and service files are added to the
pxe-network-installer package so they get installed
into the filesystem properly.

Lastly, there are 2 kickstart changes implemented.
 1. The kickstart code that copied the kickstart files from
      /var/www/pages/feed/rel-xx.xx/
      to
      /var/www/pages/feed/rel-xx.xx/kickstart
    is removed in favor of the pxe-network-installer package
    doing that automatically.
 2. The kickstart is modified to remove the previous pxeboot
    feed fetch and creation function.
    One exception to this is the efi.img file, its fetch remains.
    Note the efi image is currenly not included in the /boot dir.

Test Plan:

PASS: Verify Debian build and AIO DX install (cd and pxe installs)
PASS: Verify Debian Standard 2+1 DX system install
PASS: In above cases verify end-to-end handling of the following
      test case staging.
PASS: Verify pxeboot feed staging on subcloud controller-0 install
PASS: Verify pxeboot feed file positioning in
      - /var/pxeboot/rel-xx.xx (kernel and initrd images)
      - /var/www/pages/feed/rel-xx.xx/pxeboot (kernel/initrd images)
      - /var/www/pages/feed/rel-xx.xx/pxeboot/EFI/BOOT (other files)
      - /var/pxeboot and /var/www/pages/feed/rel-xx.xx (efi.img)
PASS: Verify rsync bypass for the above cases when the files match
      - complete and partial cases
PASS: Verify staging when the stage dirs are missing
      - complete and partial cases
PASS: Verify staging when stage files mismatch
      - complete and partial cases
PASS: Verify service enable on controllers for AIO and STD configs
PASS: Verify kickstart file position change
PASS: Verify shellcheck static analysis
PASS: Verify pxeboot_feed.sh script error handling
PASS: Verify pxeboot_feed.sh script logging

Story: 2009968
Task: 46789
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: Ic98b2686c417103749cb777adb28ac73ac1d397c
2022-11-20 15:36:23 +00:00
Al Bailey
32feb4a89a Debian: Fix ostree remote pulls for IPv6 workers
The entry for the ostree remote in computes is using
pxecontroller.   This is an ipv4 address, and therefore
will not be accessible on an unlocked IPv6 Worker
(or storage).

The fix is to use 'controller' instead of 'pxecontroller'
That address exists in both ipv4 and ipv6.

Test Plan:
 Debian: Successfully apply a patch to a worker node

Closes-Bug: 1997130

Signed-off-by: Al Bailey <al.bailey@windriver.com>
Change-Id: Idbc1e2728582ab3cd5c73761790cdd9fbc6d951a
2022-11-19 01:19:16 +00:00
Shrikumar Sharma
d19af3331d Copy efi.img to /var/pxeboot on subcloud for multinode support
To create a duplex subcloud with multiple nodes with different
personalites, pxeboot must be supported by controller-0 of the
subcloud.

Here, pxeboot is enabled on controller-0 by copying efi.img
from the mounted miniboot iso image to /var/pxeboot. This will
allow the installation of controller-1 and the computes of the
subcloud via pxeboot.

Test Plan:
PASSED: Verify that all nodes in the subcloud install, come online
      and are unlocked, enabled and available by the end of the
      installation process.

PASSED: Verify that multinode install completes successfully with
      prestaged ostree_repo.

Depends-On: https://review.opendev.org/c/starlingx/metal/+/862619/
Story: 2010118
Task: 46754

Signed-off-by: Shrikumar Sharma <shrikumar.sharma@windriver.com>
Change-Id: I0a6789b5a86f89da5e86581ab7b3eed950361ce7
2022-11-17 17:24:13 +00:00
Yue Tao
18c7435e39 Debian: metal: update debian_iso_image.inc
Move the packages of "metal" from stx-std.lst to debian_iso_image.inc

Test Plan:

Pass: build-pkgs -c -a
Pass: build-image
Pass: boot

Story: 2008862
Task: 46844

Signed-off-by: Yue Tao <yue.tao@windriver.com>
Change-Id: Ib284ae6f1762b0f3ca2fea242b49c1b75846286d
2022-11-16 12:06:51 +08:00
Zuul
1132443626 Merge "Deprecated sysinv-fpga-agent service cleanup" 2022-11-14 14:59:03 +00:00
Davi Frossard
9df5d206df Deprecated sysinv-fpga-agent service cleanup
Removing setup for pmon files. The service sysinv-fpga-agent
doesn't exist anymore. So this change is only a cleanup.

Test plan (AIO-SX):
PASS: Build, boot, bootstrap and unlock.

Story: 2010087
Task: 45628

Depends-on: https://review.opendev.org/c/starlingx/integ/+/864133
Signed-off-by: Davi Frossard <dbarrosf@windriver.com>
Change-Id: I0e56483f49be3a64bcb8047934df5bbb13fe1490
2022-11-11 18:40:38 +00:00
Shrikumar Sharma
67a31d1c6b Revert "Enable Multinode Subcloud in Distributed Cloud"
This reverts commit d09313ff0b527efdcfd2c03bdfb950eb1432be10.

While the code here itself is functionally correct and tested,
a download in the code is dependent on a location on the
active System Controller that is overridden by a drbd2 mount
on /var/www/pages/iso.

This drbd2 mount masks the pxeboot related files which were
placed there during System Controller installation.

Reverting this change until a resolution to the drbd mount on
/var/www/pages/iso on the active System Controller is resolved.

Signed-off-by: Shrikumar Sharma <shrikumar.sharma@windriver.com>
Change-Id: Ie91fde9a09f693d133fa484782a7df28ffd29faf
2022-11-11 18:33:27 +00:00
Eric MacDonald
0e7024f9a7 Grub file modifications for Debian signed UEFI installs
Initial delivery of UEFI system node installs did not
use the signed boot loader. As a result Secure Boot
of system nodes was not supported. This update changes
that by swapping in the signed bootx64.efi boot loader
in a puppet update ; see depends on.

This update modifies to the pxe-network-installer
and kickstart to support a robust UEFI system node
install that supports Secure Boot.

The first change creates and uses an stx template
file from LAT grub file. This is done to avoid ongoing
and difficult to implement LAT grub file hack changes
from the kickstart.

This new grub.cg.stx file is packaged in the
pxe-network-installer.

The kickstarts are modified to replace the LAT grub.cfg
file with the new stx template file grub.cfg.stx. As far
as this update goes, this template file is a null change
from the LAT grub file and represents what the LAT grub
file looked like at the time the template was created.

Moving forward, further changes to the system node
install grub file will be made to this new grub.cfg.stx
template file.

The second change is to modify existing stx unprovisioned
default pxe-grub.cfg files to look for the new mac based
config file with the '.cfg' extention.

The system node install mac-based grub files are dynamically
created with no signature file. To work around that, this
update exports the LAT environment variable 'skip_check_cfg'
which instructs LAT to 'skip' the grub menu signature 'check'
for these dynamically created grub files.

An additional change is made to handle timer reload on menu
refresh if the new node remains unprovisioned after timeout.

Test Plan:

PASS: Verify the default LAT file is renamed and the new
      template file positioned in its place.
PASS: Verify Debian pxe-network-installer package update
PASS: Verify Debian AIO DX UEFI Install
PASS: Verify CentOS kickstarts do not require the kickstart change

PASS: Verify build and UEFI install
      - Debian
      - CentOS
PASS: Verify unprovisioned grub menu reload handling with
      re-occuring timeout until node is provisioned.

Regression:

PASS: Verify host-delete and host-update install and unlock
PASS: Verify host-reinstall and host-unlock
PASS: Verify lock/unlock controller-1 and controller-0
PASS: Verify lock/delete/reinstall/unlock controller-1
PASS: Verify swact to controller-1
PASS: Verify lock/delete/reinstall/unlock controller-0

Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/863776

Story: 2009968
Task: 46701
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: Id073842ac1b29acf54c999022a9e37d4c2366031
2022-11-10 23:12:53 +00:00
Kyle MacLeod
c9fbb076db Handle persistent_size variations for subcloud install
The persistent_size variable is passed in via the dcmanager
install-values file, allowing the customer to specify
a non-default size for the platform-backup partition.
Default size is 30000 MiB.

When deviating from the default size we must handle the
following installation cases:

1. New Partition
- persistent_size must be >= default (30000 MiB)

2. Existing Partition
- persistent_size must be >= default (30000 MiB)
- persistent_size must be >= existing partition size
- if persistent_size > existing, then we must also
  extend the filesystem to match the new persistent_size

Story: 2010118
Task: 46698

Test Plan:

PASS:
- Fail installation if persistent_size is < default
- New Partition:
    - Installation with unspecified persistent_size
    - Installation with specified persistent_size == default value
    - Installation with specified persistent_size > default
- Existing Partition:
    - Installation with unspecified persistent_size
    - Installation with specified persistent_size == default value
    - Fail installation if persistent_size is < existing partition size
    - Installation with specified persistent_size > default
        - Verify that existing filesystem is extended to match
          new partition size

Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>
Change-Id: I8d06ee585ad96acf1076d4b140d7c516a17f15ea
2022-11-09 17:40:39 -05:00
Zuul
81142e5b00 Merge "Enable Multinode Subcloud in Distributed Cloud" 2022-11-09 22:35:13 +00:00
Zuul
b8f7d1ecac Merge "Basic time sync if required on subcloud install" 2022-11-09 21:19:17 +00:00
Shrikumar Sharma
d09313ff0b Enable Multinode Subcloud in Distributed Cloud
To create a duplex subcloud with multiple nodes with different
personalites, pxeboot must be supported by controller-0 of the
subcloud.

Here, pxeboot is enabled on controller-0 by downloading files
required for pxeboot from the System Controller, and copying
them to the relevant locations under /var/www/pages/feed/rel-id
and /var/pxeboot.

Test Plan:
PASS: Verify that all nodes in the subcloud install, come online
      and are unlocked, enabled and available by the end of the
      installation process.

PASS: Verify that multinode install completes successfully with
      prestaged ostree_repo.

Depends-On: https://review.opendev.org/c/starlingx/metal/+/862619/
Story: 2010118
Task: 46754

Change-Id: I8cfda9688d41d1f6f5997ac81f9b6e21d7f3ebe6
Signed-off-by: Shrikumar Sharma <shrikumar.sharma@windriver.com>
2022-11-09 20:33:53 +00:00
Eric MacDonald
884ff2aae8 Debian: Stage subcloud install feed as /var/www/pages/iso
There is a lighttpd rule that prevents a subcloud install from
accessing the system controller's feed directory for the purpose
of setting up its own feed dir in prep for its own system node
installs.

This update enhances the kickstart.cfg file to stage the
feed content for a subcloud controller-0 install.

The new subcloud feed dir is /var/www/pages/iso/rel-xx.xx

Test Plan:

PASS: Verify /var/www/pages/iso file content for
      - usb install
      - pxe install
      - controller system node install
PASS: Verify subcloud can install controller-1 from iso feed

Regression:

PASS: Verify /var/pxeboot content for
      - usb install
      - pxe install
      - controller system node install
PASS: Verify /var/www/pages/feed file content for
      - usb install
      - pxe install
      - controller system node install
PASS: Verify Debian DX System Install
      - usb install then controller-1 as system node install
      - pxe install then controller-1 as system node install
PASS: Verify UEFI and Legacy BIOS Install

Story: 2009968
Task: 46648
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: Ia1511813c5673762ad386122cf9a1666e6392b30
2022-11-09 17:52:16 +00:00
Zuul
2ae23916a8 Merge "Prestage container images at /opt/platform-backup" 2022-11-08 20:47:49 +00:00
Zuul
0866448442 Merge "Improve remote install robustness" 2022-11-08 20:29:47 +00:00
Kyle MacLeod
54cba044f1 Basic time sync if required on subcloud install
This commit avoids install/bootstrap issues when the hwclock
is very far out of date with the system controller.

If the hwclock is more than approximiately 20m different than the
system controller then we set hwclock based on the system date

LAT initializes the system date based on the 'instdate' boot parameter.
The instdate boot parameter is the timestamp applied when the miniboot
bootimage.iso file is created on the system controller. It is close
enough to avoid any major out-of-sync system clock on the subcloud.

Secondary change: Optimize the interface assignment prior to ostree pull
This was mentioned during review
https://review.opendev.org/c/starlingx/metal/+/861017
Rather than employ sleeps, the /sys/class/net/${mgmt_dev}/operstate
file is used to determine when the interface has settled.
This is much more efficient than sleeps. We timeout after 60s.

Test Plan:
PASS:
- Install with subcloud in relative sync with system controller
  (within seconds). No hwclock change is applied to subcloud.
- Install when subcloud hwclock is more than 20m out of sync with
  system controller.
    - Verify that the hwclock is updated on the subcloud before
      ostree pull is initiated.
    - Verify that system date is proper on the first post-miniboot
      boot into the ostree installation.
- Test interface assignment wait/timeout functionality before
  ostree pulls. This is done on hardware subclouds (success path)
  and in sushy subclouds where failure mode testing was done
  by simulating stuck inteface operstate values.

Closes-Bug: 1995643
Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>
Change-Id: Ieddc774f962878f3c7f5886148310b87d4ffddfe
2022-11-08 12:46:43 -05:00
Li Zhu
eaf07202a9 Improve remote install robustness
Adding retries to handle the following types of failure:
1. Create communication session failed - Failed to create session.
2. Unable to establish Redfish client connections to BMC at <ip address>
(Server not reachable, return code: 503).
3. Fail to set System Power State to On/Off.

Test Plan:
PASS: Retries work properly when session creation fails.
PASS: Retries work properly when Unable to establish Redfish client
      connection to BMC.
PASS: Retries work properly when returning 500 error in the "Power Off
      Host" stage.
PASS: rvmc script executed successfully without above errors.

Story: 2010144
Task: 46761

Signed-off-by: Li Zhu <li.zhu@windriver.com>
Change-Id: I6bb2e0822a51770b181181b49a86fb51d6dca18b
2022-11-08 15:52:54 +00:00
Shrikumar Sharma
b36d22a8b5 Prestage container images at /opt/platform-backup
To save on network resources, it is required to prestage the
ostree_repo and the container images. When installation of a
subcloud is done, it is preferred that the prestaged ostree
repo and the prestaged container images are used, instead of
downloading them.

The prestaged container images are copied over to
/opt/platform-backup from the mounted media containing the
prestage iso.

Test Plan:
PASS: Verify that the prestaged container images are copied to
      /opt/platform-backup/<release version>.

PASS: Verify that the ostree_repo is not pulled from the system
      controller but from /opt/platform-backup for installation.

Story: 2010120
Task: 46709

Signed-off-by: Shrikumar Sharma <shrikumar.sharma@windriver.com>
Change-Id: Ie2276d65e44f51a36dfcff922afaf5a9bdd0cf89
2022-11-08 13:35:23 +00:00
Zuul
26fcaf8b9f Merge "Debian: clean machine-id generated during installation" 2022-11-03 13:58:15 +00:00
Andre Kantek
1d82d29073 Debian: clean machine-id generated during installation
The Debian installations are generating the same machine-id if using
the same BUILD_ID. This ID is used to generate the value of random
MACs for SRIOV's VF interfaces, since it is the same across the same
BUILD_ID the network cards are generating the exact same MAC if the
NIC is on the same pci-slot across multiple nodes

This change removes the existing files so each installation's systemd
can generate an exclusive value

Test Plan (Debian)
[PASS]  install multiple nodes and verify that each one contains an
        exclusive /etc/machine-id content
[PASS]  reboot node to validate that machine-id does not change on
        subsequent boots

Closes-Bug: 1995505

Signed-off-by: Andre Kantek <andrefernandozanella.kantek@windriver.com>
Change-Id: I702d1cc0353d0d19149fdd1ac1ec4bd16e674119
2022-11-03 09:03:35 -03:00
Zuul
4d3fcda735 Merge "Align miniboot.cfg with recent kickstart.cfg changes" 2022-11-02 20:47:41 +00:00
Zuul
052e04bc5f Merge "kickstart update for patched iso" 2022-11-02 18:06:26 +00:00
Kyle MacLeod
94452bffd3 Align miniboot.cfg with recent kickstart.cfg changes
Incorporate the following commits into miniboot.cfg:
- https://review.opendev.org/c/starlingx/metal/+/857894
- https://review.opendev.org/c/starlingx/metal/+/862669

The current redfish installs are broken because the sysadmin user
is no longer properly setup for password-change required on first login.
This commit pulls in the required change from kickstart.cfg.

Test Plan:

PASS:
- Verify that the sysadmin/sysadmin password is required to change
  on first login via sushy boot
- Verify unsuccessful installation in sushy emulator without this fix
- Verify successful installation in sushy emulator with this fix
- Verify the FD deletion error is no longer present

Story: 2010118
Task: 46714

Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>
Change-Id: I1c39184c05442946b55ec375e643e94e6dc89fd6
2022-11-02 13:51:37 -04:00
Shrikumar Sharma
5ee4654b4d Install from prestaged ostree_repo if exists in
/opt/platform-backup

To make installation more efficient, we need to prestage the
ostree_repo in /opt/platform-backup.

When we add a subcloud, the ostree_repo will not be fetched from
the System Controller, but from the prestaged ostree_repo, if it
exists.

Test Plan:
PASS: Verify that the ostree_repo is not pulled from the system
      controller but from /opt/platform-backup for installation.

Story: 2010120
Task: 46691
Change-Id: I6a00b28abe96f5d254a77331fc231cd9edac7bea
Signed-off-by: Shrikumar Sharma <shrikumar.sharma@windriver.com>
2022-11-01 18:10:10 +00:00
Zuul
6c85ea114b Merge "Debian: fix the bug of getting the dev's fd" 2022-10-27 18:59:00 +00:00
Wentao Zhang
5744685eca Debian: fix the bug of getting the dev's fd
Some pipe files are generated when assigning variables
and getting the process's fd, and are removed when done.
Since the pipe's fd is not the dev's fd, filter out the pipe fd.

Test Plan:
PASS: qemu iso installed successful
PASS: lab iso installed successful

Closes-Bug: 1991816
Signed-off-by: Wentao Zhang<Wentao.Zhang@windriver.com>
Change-Id: Icfdd4922dcc7e3c2e5003296c90519f8b7624d88
2022-10-27 16:48:52 +08:00
Zuul
d4a5bea226 Merge "debian: Add user/groups workarounds" 2022-10-27 00:57:46 +00:00
Zuul
51b5cdf9d3 Merge "Support bootstrap_vlan subcloud install value" 2022-10-25 21:44:31 +00:00
Eric MacDonald
654e18e9db Prevent kickstart hang when ipv6 dhclient fails to resolve
The interface setup post phase of the kickstart issues a
dhclient (dhcp) request for IP address. Normally this executes
fine and an ip address (lease) is acquired.

However, in a failure mode case in ipv6 mode that dhclient
request will hang there waiting until the dhcp server responds
So, if there is a network configuration error that precludes
dhclient from getting a response the kickstart and therefore
the entire installation process hangs.

This is a known issue/behavior in dhclient that is typically
worked around with the -1 option.
- https://bugzilla.redhat.com/show_bug.cgi?id=585047
- https://linux.die.net/man/8/dhclient

Rather than using the -1 option which changes the behavior
with fixed 30 second timeout, this update uses the linux 'timeout'
command with a chosen 60 second upper bound on the vlan dhclient
(dhcp) request. If the request does not complete in that time
then it is terminated in error, allowing the kickstart to proceed.

Test Plan: Change does not affect CentOS in any way.

PASS: Verify Debian build and iso install for ipv6 and ipv4.
PASS: Verify success path with and without vlan in ipv4 and ipv6
PASS: verify failure path handling in ipv6 vlan case
PASS: Verify logging

Closes-Bug: 1993342
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: I2853f52b79e0f82c0a2e645fdeb9e7b7aa4f0a9e
2022-10-25 18:50:26 +00:00
Kyle MacLeod
29d4047d14 Support bootstrap_vlan subcloud install value
Add Debian support for the bootstrap_vlan install parameter
in the subcloud install values file. For use in redfish installs.

The vlan interface is configured during the initial boot,
prior to the initial ostree pull. We also configure the
interfaces in /etc/network/interfaces.d for the post-ostree
reboot.

Story: 2010118
Task: 46538

Test Plan:

PASS:
- Install and provision subcloud with bootstrap_vlan value
  configured via subcloud install values - IPv6 hardware
    - Verify that subcloud installs, bootstraps, and is
      properly unlocked
- Tested in libvirt only: IPv4 configuration, including
  bootstrap_vlan value in IPv4 format. This test
  ensured that configuration is properly applied.
- Install and provision subcloud without bootstrap_vlan
  (regression case)
- Failure mode: verify that bootstrap fails if invalid
  vlan is configured

Signed-off-by: Kyle MacLeod <kyle.macleod@windriver.com>
Change-Id: If7e400359ad36cfb6835a8aff0f2ebd7d8e1817d
2022-10-25 14:42:21 -04:00
Zuul
03e953d8df Merge "Debian: Make Mtce offline handler more resilient to slow shutdowns" 2022-10-24 19:41:06 +00:00
Eric MacDonald
da398e0c5f Debian: Make Mtce offline handler more resilient to slow shutdowns
The current offline handler assumes the node is offline after
'offline_search_count' reaches 'offline_threshold' count
regardless of whether mtcAlive messages were received during
the search window.

The offline algorithm requires that no mtcAlive messages
be seen for the full offline_threshold count.

During a slow shutdown the mtcClient runs for longer than
it should and as a result can lead to maintenance seeing
the node as recovered before it should.

This update manages the offline search counter to ensure that
it only reached the count threshold after seeing no mtcAlive
messages for the full search count. Any mtcAlive message seen
during the count triggers a count reset.

This update also
1. Adjusts the reset retry cadence from 7 to 12 secs
   to prevent unnecessary reboot thrash during
   the current shutdown.
2. Clears the hbsClient ready event at the start of the
   subfunction handler so the heartbeat soak is only
   started after seeing heartbeat client ready events
   that follow the main config.

Test Plan:

PASS: Debian and CentOS Build and DX install
PASS: Verify search count management
PASS: Verify issue does not occur over lock/unlock soak (100+)
      - where the same test without update did show issue.
PASS: Monitor alive logs for behavioral correctness
PASS: Verify recovery reset occurs after expected extended time.

Closes-Bug: 1993656
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: If10bb75a1fb01d0ecd3f88524d74c232658ca29e
2022-10-24 15:57:43 +00:00
Eric MacDonald
492dfeec48 Debian: Package instead of fetch pxeboot utilities
This update packages the pxeboot utilities into the
pxe-network-installer and modifies the kickstart to
no longer fetch them.

Once packaged they no longer need to be staged in feed.
Once packaged then they can be patched.

Test Plan:

PASS: Build and install AIO DX
PASS: Compare /var/pxeboot dir before and after update
PASS: Verify system host-reinstall controller-0 from controller-1
PASS: Verify unlock of reinstalled controller-0
PASS: Verify kickstart logs now exclude pxeboot utility staging

Story: 2009968
Task: 46619
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
Change-Id: I75bcfe06724cbfc7203b187cb7c131694de69920
2022-10-23 11:07:47 +00:00
Charles Short
1ba1894990 debian: Add user/groups workarounds
Remove the user and group workarounds from the LAT
build and place them in the kickstart.cfg when the
ISO is initialized.

Depends-On: https://review.opendev.org/c/starlingx/tools/+/857893

Test Plan
Build platform-kickstarts package
Build ISO
Boot ISO
Login as sysadmin user

Story: 2009964
Task: 44285

Signed-off-by: Charles Short <charles.short@windriver.com>
Change-Id: I5267f228bd114a79d9acadf7ffb74a04eeb87df1
2022-10-20 10:32:38 -04:00
Zuul
b88d7456b1 Merge "Mtce: Add ActionInfo extension support for reset operations." 2022-10-18 15:54:40 +00:00