I'm going to try splitting the tap jobs across multiple worker
hosts. We have quite a lot of tap jobs, now, and I have seen
sometimes a situation where all non-tap jobs have been run and
the non-tap worker hosts are sitting idle, but the single tap
worker host has a long queue of tap jobs to get through.
We can't just put multiple hosts per instance into the tap
class, because then we might get a case where job A from a tap
group is run on one host and job B from a tap group is run on
a different host, and they can't communicate. It's actually
possible to set this up so it works, but it needs yet more
complex networking stuff I don't want to mess with. So instead
I'm just gonna split the tap job groups across two classes,
'tap' and 'tap2'. That way we can have one 'tap' worker host
and one 'tap2' worker host per instance and arch, and they will
each get about half the tap jobs.
Unfortunately since we only have one aarch64 worker for lab it
will still have to run all the jobs, but for all other cases we
do have at least two workers, so we can split the load.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This reverts commit 892453da7e.
openQA still had problems with the very long request, so I just
did an ugly hack to get the request under the limit instead.
The openQA job scheduler was hitting 414 errors today because
an update has so many builds there are more than 8190 characters
(the default limit) in the POST request. Let's bump the limit
to 16000.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
It really needs to be called exactly 60-block-scheduler.rules
as it's overriding a file of the same name in `/usr`.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This applies only within Fedora infra for now, as we're not sure
whether worker hosts on different hardware hit this bug. It's
intended to work around:
https://bugzilla.redhat.com/show_bug.cgi?id=2009585
a bug which results in the infra worker hosts hanging after a
short time when running kernels newer than 5.11.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Upstream implemented a feature that we can use to do the same
thing using just a test variable, so we're switching to that.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Ugh, we delegate for the assetsize stuff too and there's tons of
that, splitting it would be awful. Let's try a different approach
with a new optional variable for the delegate target.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Using the machine's own hostname works for the ansible delegate
stuff but doesn't work for openQA itself (if you try and access
the DB by hostname like this, postgres denies access; you have
to use 'localhost' for postgres to allow it). Using 'localhost'
works for postgres but doesn't do the right thing for delegation.
Let's use 'localhost' and split the two play steps into
delegated and non-delegated versions.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The config file should treat these as optional, not every openQA
instance wants to report results.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We don't want to include this section if the vars aren't set.
Not every openQA server has to be an AMQP publisher.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The background to this is
https://bugzilla.redhat.com/show_bug.cgi?id=2073414 , in response
to which git was changed to die if a user runs git commands
on a repo which it doesn't own. In openQA, the test directory
is a git repo and openQA itself likes to run git commands on it,
but this is often going to be as a different user than the owner
of the directory. In fact on the worker hosts, the user that owns
the directory (geekotest on the server box) doesn't even exist.
This just sets the config by copying a file in place rather than
running a git command (which is hard to get to be idempotent) and
uses `/etc/gitconfig` so we don't wind up with a file in the
_openqa-worker user's home directory, which is meant to be empty.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We were hiding these because in the past the only ISO assets
were those from the compose under test, and we wanted to avoid
people downloading them from openQA when we'd rather they get
them from dl.fp.o or the mirror system. But these days we have
tests that generate ISOs (update netinst and live image build
tests) and we often want to download the generated images to
test them locally.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This sets up the openQA lab instance to report to the new stg
instance of resultsdb, and use authentication. The scheduler
config file is now mode 0600 because it has a password in it.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We need to treat it and the x86_64 update group separately to
do this, but it really doesn't need 200G. We have images from
three weeks ago, and we don't need that kind of buffer, and space
is a bit tight.
Note: there is no aarch64 updates group as we do not currently
run updates tests on aarch64.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We've been using the httpd_can_network_connect boolean for years
to allow httpd to connect to the openQA server processes. This
is an unnecessarily large hammer when we only need it to be
able to connect to exactly the two openQA ports. This uses a
custom SELinux policy to allow connecting to those ports only,
and ensures the boolean is set back to off.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Several of these requirements are old ones that were only needed
for createhdds, when we ran createhdds on the servers. All of
those can go. Also make the list line-by-line for easier git
blame tracking in future (and add comments for the remaining
entries so we know why they're there).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is just cleaning up the mess of the bad parameter from
earlier, run of this play broke halfway through, need to do the
remaining half without choking on this part.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This unifies prod and stg onto the ways of doing things for the
latest packages, and rejigs the swtpm stuff a bit to tear down
more (we shouldn't need the custom SELinux policy any more).
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This also tears down our swtpm systemd service setup, as
os-autoinst should now handle swtpm device setup for us.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This sets us up for scheduling FCOS tests from messages, not
using a cron job. Also reduces some duplication of variables
between openqa-servers-common and the dispatcher role defaults.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
A recent test has a couple of perl deps, we need to ensure these
are installed on the worker hosts.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
We never use the auditing stuff, so let's turn it off (and set
short limits for audit event duration so we can run the cleanup
and get rid of existing audit events). Let's also use the new
setting that only runs asset cleanup if free space is low.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Just can't get Apache config Alias to work for some reason, so
let's go with the flow and stick the file in openQA's public
directory. This works!
Signed-off-by: Adam Williamson <awilliam@redhat.com>
The main goal of these changes is to allow all workers in each
deployment NFS write access to the factory share. This is because
I want to try using os-autoinst's at-job-run-time decompression
of disk images instead of openQA's at-asset-download-time
decompression; it avoids some awkwardness with the asset file
name, and should also actually allow us to drop the decompression
code from openQA I think.
I also rejigged various other things at the same time as they
kinda logically go together. It's mostly cleanups and tweaks to
group variables. I tried to handle more things explicitly with
variables, as it's better for use of these plays outside of
Fedora infra.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
On client end, restart mount unit (with daemon-reload) if mount
file changes. On server end, run exportfs -r if export config
file changes.
Signed-off-by: Adam Williamson <awilliam@redhat.com>