Commit Graph

425 Commits

Author SHA1 Message Date
Adam Williamson
ae2cd3530b roles/openqa/server: drop OpenID auth support
We've been using OAuth2 for prod and stg for some time now, so
let's clean this up.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2026-01-26 15:40:50 -08:00
Adam Williamson
4e4a12f2c3 roles/openqa/server: show more builds on the front page
We only have two job groups, so the front page is a bit sad and
empty. Let's show 10 builds per group, not 3.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2026-01-26 15:37:24 -08:00
Adam Williamson
804efd40d1 Update pagure.io/fedora-qa to forge.fedoraproject.org/quality
Quality org has completed moving repos to Forgejo (all but one),
so let's update all of these.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2026-01-23 15:31:46 -08:00
Adam Williamson
021c63e9df Update some Forgejo-migrated repo URLs
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2026-01-09 18:51:10 -08:00
Adam Williamson
bd8ca3c2ca openqa/worker: drop the whole 'no-ffmpeg-on-aarch64' thing
it never worked anyway (ffmpeg always showed up, *somehow*) and
on the new workers it doesn't seem to be an issue.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-11-17 15:31:53 -08:00
Adam Williamson
e1d8d18c3e openqa/server: update and correct aarch64 update asset quota
We weren't actually applying the quota we had defined (only) for
lab, so both lab and prod had the upstream default (which now
seems to be 200GB, not 100GB). Let's fix it so we do apply the
value, and set it to 250GB for both prod and stg, because we're
now aiming to have full parity in the update test sets between
aarch64 and x86_64, and we have the space on the rdu3 hosts.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-09-03 14:19:23 -07:00
Adam Williamson
7387394898 openqa/worker: fix nfs hostname
d'oh.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-21 10:14:34 -07:00
Adam Williamson
666196bbed openqa/worker: don't start worker unless NFS mount is up
There's this annoying pattern where the NFS mount fails on boot
and then the worker services all start up and take jobs, but they
instafail because the share isn't there.

Ideally we could handle this very easily with Restart= directives
but systemd has...*opinions* about this:

https://github.com/systemd/systemd/issues/4468
https://github.com/systemd/systemd/issues/1312

so we have to do some fairly awkward hacks to just express:

* Retry the NFS mount if it fails
* Don't start the workers unless the NFS mount is up
* Retry the workers after a while if they were blocked

It's ugly, but in testing this same config on one worker it seems
to work...

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-10 19:07:54 -07:00
Adam Williamson
099406f1b9 openqa/worker tap: set CAP_NET_ADMIN on qemu
I have no idea why we didn't need this before, but we seem to
need it now.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-02 20:08:18 -07:00
Adam Williamson
9d931214ea Revert "openQA: rename openvswitch bridge device to avoid conflict"
This reverts commit 4dc01bc892 and
a follow-up commit. I'm having trouble getting things to work
and want to see if it works if we go back to having the openQA
bridge be br0, and rename the bridge used for the system's bonded
network connection to something else instead.
2025-07-02 17:25:18 -07:00
Adam Williamson
10b68ac01f openqa/worker: remove old unused files
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-02 17:23:42 -07:00
Adam Williamson
b343d8de52 Try and fix openQA bridge config
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-02 16:20:39 -07:00
Adam Williamson
f7ca68a38e openqa/dispatcher: install resultsdb_conventions
I thought/assumed/knew/something? that resultsdb_conventions_fedora
required resultsdb_conventions, but right now it seems it doesn't.
It *should*, but I can't fix it right now as the buildsystem is
down, so let's just install it here...

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-02 15:20:09 -07:00
Adam Williamson
6606be7010 openqa tap: tell os-autoinst-openvswitch to use the right bridge
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-02 08:37:35 -07:00
Adam Williamson
69c9099615 openqa/worker tap: 'nogroup' is no more, must use 'nobody'
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-07-01 17:40:25 -07:00
Adam Williamson
4dc01bc892 openQA: rename openvswitch bridge device to avoid conflict
On the new rdu3 worker hosts, br0 already exists and is the main
system 'interface' (it's a bridge on two bonded physical interfaces
connected to different switches, to make networking upgrades
easier). So we can't call our openvswitch bridge 'br0' any more.
Let's try calling it 'openqabr0' and see if anything explodes.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-06-30 16:14:01 -07:00
Aurélien Bompard
8248acbe1e RabbitMQ: Drop the zmq.topic exchange, fedmsg has been retired
Signed-off-by: Aurélien Bompard <aurelien@bompard.org>
2025-06-26 10:58:07 +02:00
Adam Williamson
5561372a5f openqa/worker: drop edk2-arm package install
It no longer exists and we're no longer doing 32-bit ARM tests.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-05-07 10:39:33 -07:00
Adam Williamson
5da2faac67 openqa/server: allow OAuth2 authentication, enable on lab
OpenID support in FAS is going away. openQA has OAuth2 support.
I've tested this config to work with manual edits on lab, now
ansiblizing it (for lab only to start with).

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2025-03-28 13:40:57 -07:00
Michal Konecny
2ec055db6f Use first uppercase letter for all handlers
This will unify all the handlers to use first uppercase letter for
ansible-lint to stop complaining.

I went through all `notify:` occurrences and fixed them by running
```
set TEXT "text_to_replace"; set REPLACEMENT "replacement_text"; git grep
-rlz "$TEXT" . | xargs -0 sed -i "s/$TEXT/$REPLACEMENT/g"
```

Then I went through all the changes and removed the ones that wasn't
expected to be changed.

Fixes https://pagure.io/fedora-infrastructure/issue/12391

Signed-off-by: Michal Konecny <mkonecny@redhat.com>
2025-02-10 20:31:49 +00:00
Kevin Fenzi
6c38d7b61a various: fix some more shell variables that were accidentally converted to builtin.shell
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-01-15 17:26:50 -08:00
Ryan Lerch
47c68f478d ansiblelint fixes - fqcn[action-core] - template to ansible.builtin.template
Replaces references to template: with ansible.builtin.template

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:30:29 +10:00
Ryan Lerch
3c41882bb0 ansiblelint fixes - fqcn[action-core] - shell to ansible.builtin.shell
Replaces references to shell: with ansible.builtin.shell

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:29:10 +10:00
Ryan Lerch
25391e95b7 ansiblelint fixes - fqcn[action-core] - package to ansible.builtin.package
Replaces many references to  package: with ansible.builtin.package

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:28:00 +10:00
Ryan Lerch
462176464b ansiblelint fixes-- fqcn[action-core] - command to ansible.builtin.command
Replaces many references to  command: with ansible.builtin.command

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:26:47 +10:00
Ryan Lerch
6a3816dfdc ansiblelint fixes-- fqcn[action-core] - copy to ansible.builtin.copy
Replaces many references to 'copy' with ansible.builtin.copy

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 10:43:31 +10:00
Ryan Lerch
62952df107 ansiblelint fixes-- fqcn[action-core] - file to ansible.builtin.file
Replaces many references to  file: with ansible.builtin.file

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 10:41:52 +10:00
Ryan Lerch
691adee6ee Fix name[casing] ansible-lint issues
fix 1900 failures of the following case issue:

`name[casing]: All names should start with an uppercase letter.`

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-14 20:20:07 +10:00
Ryan Lerch
89f6f1fc32 Fix majority of remaining yamllint warnings and errors
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2024-11-28 17:31:45 +10:00
Adam Williamson
4d801444a9 openqa: set up a side repo for prod as well as lab
Sometimes we want to deploy something to prod before it goes
stable (or even to u-t).

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-11-25 17:06:34 -08:00
Adam Williamson
1a537f38ce openqa/server: correct scratchrepo removal
d'oh. this has been broken for some time...

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-11-20 17:27:57 -08:00
Adam Williamson
cb026b4120 openqa/worker: kill stuck qemu processes daily
This is an awful hack to deal with
https://github.com/os-autoinst/os-autoinst/issues/2549 while we
try and fix it properly. This finds stuck qemu processes by
parsing the journal messages of the workers, and kills them.
workers stuck in the broken state should then recover on the
next checkin with the server. I tested this manually on all the
worker hosts and it...seemed to work, mostly. I'll keep an eye on
things after deploying it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-10-15 13:13:42 -07:00
Adam Williamson
1325a7ab15 adamverse: add --no-deps to pip install commands
In various roles I maintain I use `python3 -m pip install` to
directly install a Python project (usually a fedora-messaging
consumer), to avoid the pointless bureaucracy of packaging them.
The roles install all the deps of these projects as packages
first, so pip doesn't have to install any deps, it only installs
the project itself. Well...that's the idea. It's possible for
this to go wrong (say I forget to update the roles when adding
a dep to the project), and in that case I think we'd rather have
things blow up (so I know something's wrong) than have pip
silently install some random upstream wheel system-wide to make
it work. The intent is that all the deps still come from proper
Fedora packages, only these projects themselves get installed
directly.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-09-20 11:10:09 -07:00
Adam Williamson
2dbf99e280 openqa/worker: bump load average threshold for big worker hosts
This is a new feature in openQA that prevents worker hosts
picking up new jobs if their load average is above a certain
threshold. It defaults to 40. Our big worker hosts tend to run
above this, so let's bump it on those.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-08-30 23:27:48 -07:00
Adam Williamson
4743c3fdce openqa/worker: transition all tap workers to NM-based setup
This seems to be working fine in testing, so let's deploy it
everywhere.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-07-25 14:54:03 -07:00
Adam Williamson
762b23ef7d openqa/worker tap-setup-nm: tweak some quoting, drop tunctl
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-07-25 14:06:52 -07:00
Adam Williamson
690a5eb951 openqa/worker: add NM-based tap setup and test on p09-worker01
network-scripts-openvswitch was removed in f40 and network-scripts
is going away in f41; we really need to get off using them.
This attempts to implement the same setup using NetworkManager,
based on a few different NM/ovs references, and the source of
openQA upstream's os-autoinst-setup-multi-machine . It might
need a bit of tweaking, so for now, we make it a separate task
and use it only on p09-worker01 for testing. This doesn't handle
tearing down the old network-scripts-based config as that's
pretty complex and will only need to happen once; I'll do it
manually before trying this out.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-07-25 13:50:39 -07:00
Adam Williamson
7f73f1253e openqa/worker: don't explicitly pull in ffmpeg-free on aarch64
We don't want it there - see earlier commits - but I didn't
notice it's actually explicitly listed here for all arches,
which breaks stuff on aarch64 now we told dnf to exclude it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-06-11 11:08:36 -07:00
Adam Williamson
44c5c79ad7 openqa/dispatcher: add a cron job to send missed test results
This works around an annoying problem where, for some reason, we
sometimes just miss sending completed test results to resultsdb.
I've never been able to figure out why this happens, but this
should band-aid it by looking, daily, for updates stuck in
waiting gating status, checking for cases where a test finished
but we didn't send a result, and sending it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-05-27 15:25:25 -07:00
Adam Williamson
826fd32330 openqa/worker: don't force createhdds off non-standard branch
Using the same approach as we do for the tests and fedora_openqa.
I wish I'd done this *before* I ran the playbook on lab and it
wiped every...single...goddamn...disk image.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-03-26 15:35:33 -07:00
Adam Williamson
7cfe3d61e6 openqa/worker: block ffmpeg-free on aarch64
Encoding with ffmpeg rather than os-autoinst's built-in encoder
gives us less broken videos, but on aarch64 it seems to cause
problems, especially on stg's old, busted worker hosts - I think
it's more CPU-intensive and they just can't handle the load. So,
let's block it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-03-26 15:07:27 -07:00
Adam Williamson
da391c4ba2 openQA: trim default routing keys for scheduler consumer
With Bodhi 8 we no longer need to listen to request.testing or
update.edit messages. Yay.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-02-09 12:25:55 -08:00
Adam Williamson
d3b6d1bafd openqa/worker: install ffmpeg-free
I'm adding this as a Recommends: for os-autoinst, but want to
get it on the workers now. Having it installed gives us better
videos of test runs (the internal video encoder is a bit wonky
and produces videos that have errors which make jumping around
within the video not work properly).

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2024-01-03 10:38:11 -08:00
Adam Williamson
1cb0c0cdc6 Put openqa-lab-repo.repo in worker role as well as server role
Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-10-27 18:09:24 -07:00
Adam Williamson
504b8217d3 openqa etc.: use pip for local installs, not setuptools
On Fedora 39, we ran into an issue with setuptools that isn't
immediately resolvable:
https://github.com/pypa/setuptools/issues/3797#issuecomment-1783613895
using pip like this seems to avoid it.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-10-27 17:23:53 -07:00
Adam Williamson
530f69d967 openqa: use an external side repo for test builds
It's overall simpler and more idempotent to just use a side repo
maintained outside of ansible than re-create one on each system
on each run of the plays.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-10-27 11:20:58 -07:00
Adam Williamson
374956365e openqa: drop the results_min_free_disk_space_percentage cleanup
It is extremely slow to run, and we figured out that the problem
on openqa01 was excessive space being used by Netapp snapshots,
so we don't need this any more. It was actually deleting old
jobs before their time, because it had already wiped every
video file and didn't know what else to do...

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-07-25 15:13:07 -07:00
Adam Williamson
1e26a28c2c openqa/server: try setting a limit on test result disk usage
We're having issues with test results eating up all the disk
space we can throw at them (prod is over 4T, stg is over 2T -
I don't know why prod is bigger, that's odd, but it may be an
odd effect of having more arches on stg, maybe aarch64 and
ppc64le tests generally have smaller videos, or something).
This config setting should make openQA keep the space usage
on the partition at a max of 85%, by deleting videos from older
tests as required.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-07-21 10:19:54 -07:00
Adam Williamson
a5c322b4ee More cleanup on the openQA AMQP stuff
nirik and I went around and around a bit today and ended up back
where we started, but with a clearer understanding of where that
this. This explains it a bit better, and makes what's actually
going on in various places clearer with the use of appropriate
shared variables. This should not actually *change* anything at
all when deployed.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2023-06-22 23:21:28 +02:00
Adam Williamson
de979123fa openQA: don't install the fedoraupdaterestart plugin any more
We don't need it, we use upstream RETRY now.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
2022-12-19 16:16:11 -08:00