Commit Graph

128 Commits

Author SHA1 Message Date
Adam Williamson
5561372a5f openqa/worker: drop edk2-arm package install
It no longer exists and we're no longer doing 32-bit ARM tests.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-05-07 10:39:33 -07:00
Michal Konecny
2ec055db6f Use first uppercase letter for all handlers
This will unify all the handlers to use first uppercase letter for
ansible-lint to stop complaining.

I went through all `notify:` occurrences and fixed them by running
```
set TEXT "text_to_replace"; set REPLACEMENT "replacement_text"; git grep
-rlz "$TEXT" . | xargs -0 sed -i "s/$TEXT/$REPLACEMENT/g"
```

Then I went through all the changes and removed the ones that wasn't
expected to be changed.

Fixes https://pagure.io/fedora-infrastructure/issue/12391

Signed-off-by: Michal Konecny <mkonecny@redhat.com>
2025-02-10 20:31:49 +00:00
Ryan Lerch
47c68f478d ansiblelint fixes - fqcn[action-core] - template to ansible.builtin.template
Replaces references to template: with ansible.builtin.template

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:30:29 +10:00
Ryan Lerch
3c41882bb0 ansiblelint fixes - fqcn[action-core] - shell to ansible.builtin.shell
Replaces references to shell: with ansible.builtin.shell

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:29:10 +10:00
Ryan Lerch
25391e95b7 ansiblelint fixes - fqcn[action-core] - package to ansible.builtin.package
Replaces many references to  package: with ansible.builtin.package

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:28:00 +10:00
Ryan Lerch
462176464b ansiblelint fixes-- fqcn[action-core] - command to ansible.builtin.command
Replaces many references to  command: with ansible.builtin.command

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:26:47 +10:00
Ryan Lerch
6a3816dfdc ansiblelint fixes-- fqcn[action-core] - copy to ansible.builtin.copy
Replaces many references to 'copy' with ansible.builtin.copy

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 10:43:31 +10:00
Ryan Lerch
62952df107 ansiblelint fixes-- fqcn[action-core] - file to ansible.builtin.file
Replaces many references to  file: with ansible.builtin.file

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 10:41:52 +10:00
Ryan Lerch
691adee6ee Fix name[casing] ansible-lint issues
fix 1900 failures of the following case issue:

`name[casing]: All names should start with an uppercase letter.`

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-14 20:20:07 +10:00
Ryan Lerch
89f6f1fc32 Fix majority of remaining yamllint warnings and errors
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2024-11-28 17:31:45 +10:00
Adam Williamson
4d801444a9 openqa: set up a side repo for prod as well as lab
Sometimes we want to deploy something to prod before it goes
stable (or even to u-t).

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-11-25 17:06:34 -08:00
Adam Williamson
cb026b4120 openqa/worker: kill stuck qemu processes daily
This is an awful hack to deal with
https://github.com/os-autoinst/os-autoinst/issues/2549 while we
try and fix it properly. This finds stuck qemu processes by
parsing the journal messages of the workers, and kills them.
workers stuck in the broken state should then recover on the
next checkin with the server. I tested this manually on all the
worker hosts and it...seemed to work, mostly. I'll keep an eye on
things after deploying it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-10-15 13:13:42 -07:00
Adam Williamson
2dbf99e280 openqa/worker: bump load average threshold for big worker hosts
This is a new feature in openQA that prevents worker hosts
picking up new jobs if their load average is above a certain
threshold. It defaults to 40. Our big worker hosts tend to run
above this, so let's bump it on those.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-08-30 23:27:48 -07:00
Adam Williamson
4743c3fdce openqa/worker: transition all tap workers to NM-based setup
This seems to be working fine in testing, so let's deploy it
everywhere.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-07-25 14:54:03 -07:00
Adam Williamson
762b23ef7d openqa/worker tap-setup-nm: tweak some quoting, drop tunctl
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-07-25 14:06:52 -07:00
Adam Williamson
690a5eb951 openqa/worker: add NM-based tap setup and test on p09-worker01
network-scripts-openvswitch was removed in f40 and network-scripts
is going away in f41; we really need to get off using them.
This attempts to implement the same setup using NetworkManager,
based on a few different NM/ovs references, and the source of
openQA upstream's os-autoinst-setup-multi-machine . It might
need a bit of tweaking, so for now, we make it a separate task
and use it only on p09-worker01 for testing. This doesn't handle
tearing down the old network-scripts-based config as that's
pretty complex and will only need to happen once; I'll do it
manually before trying this out.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-07-25 13:50:39 -07:00
Adam Williamson
7f73f1253e openqa/worker: don't explicitly pull in ffmpeg-free on aarch64
We don't want it there - see earlier commits - but I didn't
notice it's actually explicitly listed here for all arches,
which breaks stuff on aarch64 now we told dnf to exclude it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-06-11 11:08:36 -07:00
Adam Williamson
826fd32330 openqa/worker: don't force createhdds off non-standard branch
Using the same approach as we do for the tests and fedora_openqa.
I wish I'd done this *before* I ran the playbook on lab and it
wiped every...single...goddamn...disk image.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-03-26 15:35:33 -07:00
Adam Williamson
7cfe3d61e6 openqa/worker: block ffmpeg-free on aarch64
Encoding with ffmpeg rather than os-autoinst's built-in encoder
gives us less broken videos, but on aarch64 it seems to cause
problems, especially on stg's old, busted worker hosts - I think
it's more CPU-intensive and they just can't handle the load. So,
let's block it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-03-26 15:07:27 -07:00
Adam Williamson
d3b6d1bafd openqa/worker: install ffmpeg-free
I'm adding this as a Recommends: for os-autoinst, but want to
get it on the workers now. Having it installed gives us better
videos of test runs (the internal video encoder is a bit wonky
and produces videos that have errors which make jumping around
within the video not work properly).

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-01-03 10:38:11 -08:00
Adam Williamson
1cb0c0cdc6 Put openqa-lab-repo.repo in worker role as well as server role
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-10-27 18:09:24 -07:00
Adam Williamson
530f69d967 openqa: use an external side repo for test builds
It's overall simpler and more idempotent to just use a side repo
maintained outside of ansible than re-create one on each system
on each run of the plays.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-10-27 11:20:58 -07:00
Adam Williamson
f1e0e0d037 Fix openqa_tap truthiness checks
Sigh, |bool doesn't do what you might think it does:
https://medium.com/opsops/wft-bool-filter-in-ansible-e7e2fd7a148f

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-11-25 14:58:36 -08:00
Adam Williamson
28110d34be openqa/worker: prepare to handle multiple tap worker classes
I'm going to try splitting the tap jobs across multiple worker
hosts. We have quite a lot of tap jobs, now, and I have seen
sometimes a situation where all non-tap jobs have been run and
the non-tap worker hosts are sitting idle, but the single tap
worker host has a long queue of tap jobs to get through.

We can't just put multiple hosts per instance into the tap
class, because then we might get a case where job A from a tap
group is run on one host and job B from a tap group is run on
a different host, and they can't communicate. It's actually
possible to set this up so it works, but it needs yet more
complex networking stuff I don't want to mess with. So instead
I'm just gonna split the tap job groups across two classes,
'tap' and 'tap2'. That way we can have one 'tap' worker host
and one 'tap2' worker host per instance and arch, and they will
each get about half the tap jobs.

Unfortunately since we only have one aarch64 worker for lab it
will still have to run all the jobs, but for all other cases we
do have at least two workers, so we can split the load.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-11-25 14:11:58 -08:00
Adam Williamson
f122367c34 openqa/worker: change name on kernel override config file
It really needs to be called exactly 60-block-scheduler.rules
as it's overriding a file of the same name in `/usr`.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-08-17 14:28:18 -04:00
Adam Williamson
ca8e3db401 Add file missed from previous commit
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-08-17 13:46:16 -04:00
Adam Williamson
5b49611201 openqa/worker: override kernel scheduler config
This applies only within Fedora infra for now, as we're not sure
whether worker hosts on different hardware hit this bug. It's
intended to work around:
https://bugzilla.redhat.com/show_bug.cgi?id=2009585
a bug which results in the infra worker hosts hanging after a
short time when running kernels newer than 5.11.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-08-17 13:34:15 -04:00
Adam Williamson
e5c5cc336f openqa: fix confusion between openqa_nfs_{worker,client}s
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-06-07 13:26:01 -07:00
Adam Williamson
bf4f704096 openqa: improve how we do the git config thing
The background to this is
https://bugzilla.redhat.com/show_bug.cgi?id=2073414 , in response
to which git was changed to die if a user runs git commands
on a repo which it doesn't own. In openQA, the test directory
is a git repo and openQA itself likes to run git commands on it,
but this is often going to be as a different user than the owner
of the directory. In fact on the worker hosts, the user that owns
the directory (geekotest on the server box) doesn't even exist.

This just sets the config by copying a file in place rather than
running a git command (which is hard to get to be idempotent) and
uses `/etc/gitconfig` so we don't wind up with a file in the
_openqa-worker user's home directory, which is meant to be empty.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-05-27 10:24:34 -07:00
Adam Williamson
3d148f5e7f openqa/worker: handle git 'safety' check for test dir
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-05-27 09:05:06 -07:00
Adam Williamson
f869c0f643 Revert "openqa/worker: handle git 'safety' check for test dir"
This reverts commit 34b3d3a5cc. On
second thoughts it's kinda ugly and I need to think about other
options...
2022-05-26 15:23:13 -07:00
Adam Williamson
34b3d3a5cc openqa/worker: handle git 'safety' check for test dir
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-05-26 15:05:00 -07:00
Adam Williamson
38888162ea openQA: remove swtpm-teardown now the work is done
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-12-06 14:18:46 -08:00
Adam Williamson
7a5d7f59fb openQA: Drop already-done step from swtpm-teardown
This is just cleaning up the mess of the bad parameter from
earlier, run of this play broke halfway through, need to do the
remaining half without choking on this part.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-12-06 14:12:43 -08:00
Adam Williamson
ca2684c711 openQA: fix stupid semodule argument
gah.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-12-06 14:05:14 -08:00
Adam Williamson
224e28131d openQA: prepare for prod deployment of latest releases
This unifies prod and stg onto the ways of doing things for the
latest packages, and rejigs the swtpm stuff a bit to tear down
more (we shouldn't need the custom SELinux policy any more).

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-12-06 10:40:33 -08:00
Adam Williamson
5889d3a9ae openQA: untag the swtpm-teardown task for stg now it's run
Keeping it around to run on prod when needed, then we'll take it
out.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-11-26 14:03:33 -08:00
Adam Williamson
7f3f19035f openQA: test new os-autoinst scratch build on lab
This also tears down our swtpm systemd service setup, as
os-autoinst should now handle swtpm device setup for us.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-11-26 12:34:41 -08:00
Adam Williamson
fc3f87b646 Revert "openQA: deploy new qemu build with qxl snapshot fix"
This reverts commit 92e66bb444 and
follow-up commits. We don't need it now we're back on virtio
graphics.
2021-11-12 15:40:00 -08:00
Adam Williamson
d00fdb03eb openqa/worker: install latest qemu-common
to make the last change actually work. temporary change.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-11-08 12:21:21 -08:00
Adam Williamson
faa8f6c27b openqa/worker: install packages used by tests
A recent test has a couple of perl deps, we need to ensure these
are installed on the worker hosts.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-06-18 08:56:17 -07:00
Adam Williamson
ca112a1922 openQA: update some branch names to 'main' not 'master'
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2021-06-01 13:39:02 -07:00
Adam Williamson
95f062c07a openQA: allow all workers NFS write access, other tweaks
The main goal of these changes is to allow all workers in each
deployment NFS write access to the factory share. This is because
I want to try using os-autoinst's at-job-run-time decompression
of disk images instead of openQA's at-asset-download-time
decompression; it avoids some awkwardness with the asset file
name, and should also actually allow us to drop the decompression
code from openQA I think.

I also rejigged various other things at the same time as they
kinda logically go together. It's mostly cleanups and tweaks to
group variables. I tried to handle more things explicitly with
variables, as it's better for use of these plays outside of
Fedora infra.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-11-05 16:10:32 -08:00
Adam Williamson
be8dc36f7f openqa/worker: sigh restarted not started
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-10-30 14:36:12 -07:00
Adam Williamson
c2023d5560 openQA: try to make NFS mount changes more robust
On client end, restart mount unit (with daemon-reload) if mount
file changes. On server end, run exportfs -r if export config
file changes.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-10-30 14:06:07 -07:00
Adam Williamson
13f59ad0eb openqa/worker: have swtpm service restart on success
This is because swtpm is designed not to be persistent, it's
sort of tied to a single "system" (VM in this case). We can't
expect an instance will stick around after it's been "used", it
doesn't do that, it exits successfully. So we need to restart it
when that happens.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-09-08 10:56:12 -07:00
Adam Williamson
a2bef634cf openqa/worker: use include_tasks not import_tasks
Using `when` with `import_tasks` doesn't actually skip the import
entirely, it just imports the tasks and skips them one by one.
Which reads oddly. `include_tasks` is properly dynamic so seems
better here.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-07-24 14:11:21 -07:00
Adam Williamson
d9f5530046 openqa/worker: configure to use 172. IP range not 10.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-07-23 17:27:19 -07:00
Adam Williamson
6b196e70ab openqa/worker: set up swtpm service on tap worker hosts
swtpm is a TPM emulator we want to use for testing Clevis on
IoT (and potentially other things in future). We're implementing
this by having os-autoinst just add the qemu args but expect
swtpm itself to be running already - that's counted as the
sysadmin's responsibility. My approach to this is to have openQA
tap worker hosts also be tpm worker hosts, meaning they run one
instance of swtpm per worker instance (as a systemd service) and
are added to a 'tpm' worker class which tests can use to ensure
they run on a suitably-equipped worker. This sets up all of that.
We need a custom SELinux policy module to allow systemd to run
swtpm - this is blocked by default.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-06-24 16:59:11 -07:00
Adam Williamson
be6a4937ea openqa/worker: revert br0 netmask
os-autoinst *really really* wants it to be this. The helper
service fails if it isn't.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2020-06-10 17:05:24 -07:00