This will unify all the handlers to use first uppercase letter for
ansible-lint to stop complaining.
I went through all `notify:` occurrences and fixed them by running
```
set TEXT "text_to_replace"; set REPLACEMENT "replacement_text"; git grep
-rlz "$TEXT" . | xargs -0 sed -i "s/$TEXT/$REPLACEMENT/g"
```
Then I went through all the changes and removed the ones that wasn't
expected to be changed.
Fixes https://pagure.io/fedora-infrastructure/issue/12391
Signed-off-by: Michal Konecny <mkonecny@redhat.com>
fix 1900 failures of the following case issue:
`name[casing]: All names should start with an uppercase letter.`
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
This is an awful hack to deal with
https://github.com/os-autoinst/os-autoinst/issues/2549 while we
try and fix it properly. This finds stuck qemu processes by
parsing the journal messages of the workers, and kills them.
workers stuck in the broken state should then recover on the
next checkin with the server. I tested this manually on all the
worker hosts and it...seemed to work, mostly. I'll keep an eye on
things after deploying it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is a new feature in openQA that prevents worker hosts
picking up new jobs if their load average is above a certain
threshold. It defaults to 40. Our big worker hosts tend to run
above this, so let's bump it on those.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
network-scripts-openvswitch was removed in f40 and network-scripts
is going away in f41; we really need to get off using them.
This attempts to implement the same setup using NetworkManager,
based on a few different NM/ovs references, and the source of
openQA upstream's os-autoinst-setup-multi-machine . It might
need a bit of tweaking, so for now, we make it a separate task
and use it only on p09-worker01 for testing. This doesn't handle
tearing down the old network-scripts-based config as that's
pretty complex and will only need to happen once; I'll do it
manually before trying this out.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We don't want it there - see earlier commits - but I didn't
notice it's actually explicitly listed here for all arches,
which breaks stuff on aarch64 now we told dnf to exclude it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Using the same approach as we do for the tests and fedora_openqa.
I wish I'd done this *before* I ran the playbook on lab and it
wiped every...single...goddamn...disk image.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Encoding with ffmpeg rather than os-autoinst's built-in encoder
gives us less broken videos, but on aarch64 it seems to cause
problems, especially on stg's old, busted worker hosts - I think
it's more CPU-intensive and they just can't handle the load. So,
let's block it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I'm adding this as a Recommends: for os-autoinst, but want to
get it on the workers now. Having it installed gives us better
videos of test runs (the internal video encoder is a bit wonky
and produces videos that have errors which make jumping around
within the video not work properly).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It's overall simpler and more idempotent to just use a side repo
maintained outside of ansible than re-create one on each system
on each run of the plays.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I'm going to try splitting the tap jobs across multiple worker
hosts. We have quite a lot of tap jobs, now, and I have seen
sometimes a situation where all non-tap jobs have been run and
the non-tap worker hosts are sitting idle, but the single tap
worker host has a long queue of tap jobs to get through.
We can't just put multiple hosts per instance into the tap
class, because then we might get a case where job A from a tap
group is run on one host and job B from a tap group is run on
a different host, and they can't communicate. It's actually
possible to set this up so it works, but it needs yet more
complex networking stuff I don't want to mess with. So instead
I'm just gonna split the tap job groups across two classes,
'tap' and 'tap2'. That way we can have one 'tap' worker host
and one 'tap2' worker host per instance and arch, and they will
each get about half the tap jobs.
Unfortunately since we only have one aarch64 worker for lab it
will still have to run all the jobs, but for all other cases we
do have at least two workers, so we can split the load.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It really needs to be called exactly 60-block-scheduler.rules
as it's overriding a file of the same name in `/usr`.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This applies only within Fedora infra for now, as we're not sure
whether worker hosts on different hardware hit this bug. It's
intended to work around:
https://bugzilla.redhat.com/show_bug.cgi?id=2009585
a bug which results in the infra worker hosts hanging after a
short time when running kernels newer than 5.11.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The background to this is
https://bugzilla.redhat.com/show_bug.cgi?id=2073414 , in response
to which git was changed to die if a user runs git commands
on a repo which it doesn't own. In openQA, the test directory
is a git repo and openQA itself likes to run git commands on it,
but this is often going to be as a different user than the owner
of the directory. In fact on the worker hosts, the user that owns
the directory (geekotest on the server box) doesn't even exist.
This just sets the config by copying a file in place rather than
running a git command (which is hard to get to be idempotent) and
uses `/etc/gitconfig` so we don't wind up with a file in the
_openqa-worker user's home directory, which is meant to be empty.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is just cleaning up the mess of the bad parameter from
earlier, run of this play broke halfway through, need to do the
remaining half without choking on this part.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This unifies prod and stg onto the ways of doing things for the
latest packages, and rejigs the swtpm stuff a bit to tear down
more (we shouldn't need the custom SELinux policy any more).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This also tears down our swtpm systemd service setup, as
os-autoinst should now handle swtpm device setup for us.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
A recent test has a couple of perl deps, we need to ensure these
are installed on the worker hosts.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The main goal of these changes is to allow all workers in each
deployment NFS write access to the factory share. This is because
I want to try using os-autoinst's at-job-run-time decompression
of disk images instead of openQA's at-asset-download-time
decompression; it avoids some awkwardness with the asset file
name, and should also actually allow us to drop the decompression
code from openQA I think.
I also rejigged various other things at the same time as they
kinda logically go together. It's mostly cleanups and tweaks to
group variables. I tried to handle more things explicitly with
variables, as it's better for use of these plays outside of
Fedora infra.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
On client end, restart mount unit (with daemon-reload) if mount
file changes. On server end, run exportfs -r if export config
file changes.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is because swtpm is designed not to be persistent, it's
sort of tied to a single "system" (VM in this case). We can't
expect an instance will stick around after it's been "used", it
doesn't do that, it exits successfully. So we need to restart it
when that happens.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Using `when` with `import_tasks` doesn't actually skip the import
entirely, it just imports the tasks and skips them one by one.
Which reads oddly. `include_tasks` is properly dynamic so seems
better here.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
swtpm is a TPM emulator we want to use for testing Clevis on
IoT (and potentially other things in future). We're implementing
this by having os-autoinst just add the qemu args but expect
swtpm itself to be running already - that's counted as the
sysadmin's responsibility. My approach to this is to have openQA
tap worker hosts also be tpm worker hosts, meaning they run one
instance of swtpm per worker instance (as a systemd service) and
are added to a 'tpm' worker class which tests can use to ensure
they run on a suitably-equipped worker. This sets up all of that.
We need a custom SELinux policy module to allow systemd to run
swtpm - this is blocked by default.
Signed-off-by: Adam Williamson <awilliam@redhat.com>