I think the recent git builds are good and I want them everywhere
so I can merge a bunch of things and clean up before the weekend.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
Upgraded proxies and builders to f37. We have a reduced timeframe to get
this done before the holidays, so this time we just upgraded them in
place. Usually we do a full reinstall. We will try and do that next
cycle.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
New waiverdb container image starts 8 gunicorn web worker processes by
default using WEB_CONCURRENCY environment variable. This causes memory
to spike (over 500MiB) and workers to get terminated.
Instead of increasing memory limits, a better solution is to decrease
the number of processes and increase number of threads since the app
mostly waits on DB requests to finish and waiverdb workers themselves
are thread-safe.
Our openshift 3.11 cluster(s) served us long and well.
Now we have everything finally moved to the openshift 4 clusters (fas2
was the last holdout). We can finally retire this. :)
🎉🥂
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
The awk helper responsible for extracting IP addresses from the resolvectl
call could handle only 2 of them.
It turns out that api.openshift.com now has 4 A records, therefore this method
became flakey: It added only 2 addresses to the IP set, so if the osbuild
plugin used one of the 2 ignored addresses, the call failed.
This commit solves it by introducing a different method of parsing the
resolvectl output:
We now use an ugly but working sed command that erases everything from the
line except for the IPv4 address. Therefore, I had to quote the echo before
the new sed command so it can get a proper multiline input. Also, I limited
resolvectl to just use IPv4 because the new script cannot handle IPv6
properly. This doesn't cause any harm because api.openshift.com isn't
actually accessible by IPv6. Sigh...
Some of the openqa workers are encrypted and some aren't (this is a bit of a
mess that's partly a result of all the redeployments we did around
https://bugzilla.redhat.com/show_bug.cgi?id=2009585 ). We should only run
the nbde_client role on workers which are encrypted. Hopefully this gets that
right.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
I'm going to try splitting the tap jobs across multiple worker
hosts. We have quite a lot of tap jobs, now, and I have seen
sometimes a situation where all non-tap jobs have been run and
the non-tap worker hosts are sitting idle, but the single tap
worker host has a long queue of tap jobs to get through.
We can't just put multiple hosts per instance into the tap
class, because then we might get a case where job A from a tap
group is run on one host and job B from a tap group is run on
a different host, and they can't communicate. It's actually
possible to set this up so it works, but it needs yet more
complex networking stuff I don't want to mess with. So instead
I'm just gonna split the tap job groups across two classes,
'tap' and 'tap2'. That way we can have one 'tap' worker host
and one 'tap2' worker host per instance and arch, and they will
each get about half the tap jobs.
Unfortunately since we only have one aarch64 worker for lab it
will still have to run all the jobs, but for all other cases we
do have at least two workers, so we can split the load.
Signed-off-by: Adam Williamson <awilliam@redhat.com>