This is an awful hack to deal with
https://github.com/os-autoinst/os-autoinst/issues/2549 while we
try and fix it properly. This finds stuck qemu processes by
parsing the journal messages of the workers, and kills them.
workers stuck in the broken state should then recover on the
next checkin with the server. I tested this manually on all the
worker hosts and it...seemed to work, mostly. I'll keep an eye on
things after deploying it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It really needs to be called exactly 60-block-scheduler.rules
as it's overriding a file of the same name in `/usr`.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The background to this is
https://bugzilla.redhat.com/show_bug.cgi?id=2073414 , in response
to which git was changed to die if a user runs git commands
on a repo which it doesn't own. In openQA, the test directory
is a git repo and openQA itself likes to run git commands on it,
but this is often going to be as a different user than the owner
of the directory. In fact on the worker hosts, the user that owns
the directory (geekotest on the server box) doesn't even exist.
This just sets the config by copying a file in place rather than
running a git command (which is hard to get to be idempotent) and
uses `/etc/gitconfig` so we don't wind up with a file in the
_openqa-worker user's home directory, which is meant to be empty.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This unifies prod and stg onto the ways of doing things for the
latest packages, and rejigs the swtpm stuff a bit to tear down
more (we shouldn't need the custom SELinux policy any more).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is because swtpm is designed not to be persistent, it's
sort of tied to a single "system" (VM in this case). We can't
expect an instance will stick around after it's been "used", it
doesn't do that, it exits successfully. So we need to restart it
when that happens.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
swtpm is a TPM emulator we want to use for testing Clevis on
IoT (and potentially other things in future). We're implementing
this by having os-autoinst just add the qemu args but expect
swtpm itself to be running already - that's counted as the
sysadmin's responsibility. My approach to this is to have openQA
tap worker hosts also be tpm worker hosts, meaning they run one
instance of swtpm per worker instance (as a systemd service) and
are added to a 'tpm' worker class which tests can use to ensure
they run on a suitably-equipped worker. This sets up all of that.
We need a custom SELinux policy module to allow systemd to run
swtpm - this is blocked by default.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It shouldn't need anything but 10.0.2.*, and hopefully this will
stop it interfering with the rest of the infra network...
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This provides a mechanism for deploying scratch builds, and also
for controlling whether or not to install openQA and os-autoinst
from updates-testing.
I have been doing the scratch build thing for years already, just
manually by ssh'ing into the boxes. This is getting tiring now
we have like 15 worker hosts.
The scratch build mechanism isn't properly idempotent, but fixing
that would be hard and I really only intend to use it transiently
when I'm updating the packages, so I don't think it's worth the
effort.
This also adds a notification for restarting openQA worker
services when the packages or config are updated, and fixes the
worker playbook to enable the last worker service.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
For some reason /dev/kvm has 0600 perms after boot on the ppc64
worker host. Also, qemu won't run unless SMT is turned off, on
ppc64. I've just been doing this manually every time the box got
restarted, but that's dumb, so let's make it happen on boot with
a script and a service to run it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>