Commit Graph

212 Commits

Author SHA1 Message Date
Aurélien Bompard
f185573c41 Do stuff on iad2_internal also on rdu3_internal
Signed-off-by: Aurélien Bompard <aurelien@bompard.org>
2025-06-23 19:02:44 +02:00
Aurélien Bompard
d22bde741d Nagios: template the mail_queue.cfg file
Signed-off-by: Aurélien Bompard <aurelien@bompard.org>
2025-06-23 18:11:28 +02:00
Aurélien Bompard
9007df7619 Don't change the template name, or it will be the name of the remote file
Signed-off-by: Aurélien Bompard <aurelien@bompard.org>
2025-06-23 10:27:03 +02:00
Kevin Fenzi
aeaa0811c4 nagios: fix task to match the real template name
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-21 11:45:49 -07:00
Kevin Fenzi
ad3533e506 nagios: try and split out all hostgroups into _iad2 and _rdu3
We want to monitor iad2 from noc01.iad2 and rdu3 from noc01.rdu3, so
try and split this out into seperate all groups for each datacenter.
This will likely miss some things that aren't split out into seperate
_iad2 and _rdu3 groups, but we can hopefully fix those.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-21 11:26:38 -07:00
Kevin Fenzi
7113cce4ec nagios_server: fix missing = in when
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-21 11:06:35 -07:00
Kevin Fenzi
3be2d89e66 nagios: also add these templates in rdu3
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-20 22:34:45 -07:00
Kevin Fenzi
4ca8fa862c nagios: adjust when clause for rdu3
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-20 22:11:57 -07:00
Kevin Fenzi
a42481a782 nagios/rdu3: need templates and other config in rdu3 also
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-20 21:49:17 -07:00
Kevin Fenzi
449385c8b0 nagios: move rdu3 hosts over to noc01.rdu3
Also open firewalls to allow noc03.rdu3 to access them.
Also enable nagios_server on noc01.rdu3.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-20 20:29:24 -07:00
Kevin Fenzi
2b3441492a nagios: add rdu3-hosts template to be deployed
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-06-20 18:05:31 -07:00
Michal Konecny
f63e839698 [nagios-server] Move the datanommer checks to noc01
There were few fedora-messaging datanommer checks that were running on
busgateway01. As this machine is part of fedmsg it will be
decommissioned. Let's move the checks to noc01.

Signed-off-by: Michal Konecny <mkonecny@redhat.com>
2025-02-14 09:45:39 +00:00
Michal Konecny
6428f8f772 Sunset github2fedmsg and fedmsg
This commit is removing all the fedmsg related stuff from ansible
repository.

Signed-off-by: Michal Konecny <mkonecny@redhat.com>
2025-02-13 10:08:51 +00:00
Michal Konecny
2ec055db6f Use first uppercase letter for all handlers
This will unify all the handlers to use first uppercase letter for
ansible-lint to stop complaining.

I went through all `notify:` occurrences and fixed them by running
```
set TEXT "text_to_replace"; set REPLACEMENT "replacement_text"; git grep
-rlz "$TEXT" . | xargs -0 sed -i "s/$TEXT/$REPLACEMENT/g"
```

Then I went through all the changes and removed the ones that wasn't
expected to be changed.

Fixes https://pagure.io/fedora-infrastructure/issue/12391

Signed-off-by: Michal Konecny <mkonecny@redhat.com>
2025-02-10 20:31:49 +00:00
Kevin Fenzi
22f3d8832f handlers: more renaming fixes
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2025-01-24 14:06:11 -08:00
Ryan Lerch
47c68f478d ansiblelint fixes - fqcn[action-core] - template to ansible.builtin.template
Replaces references to template: with ansible.builtin.template

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:30:29 +10:00
Ryan Lerch
25391e95b7 ansiblelint fixes - fqcn[action-core] - package to ansible.builtin.package
Replaces many references to  package: with ansible.builtin.package

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:28:00 +10:00
Ryan Lerch
462176464b ansiblelint fixes-- fqcn[action-core] - command to ansible.builtin.command
Replaces many references to  command: with ansible.builtin.command

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 11:26:47 +10:00
Ryan Lerch
6a3816dfdc ansiblelint fixes-- fqcn[action-core] - copy to ansible.builtin.copy
Replaces many references to 'copy' with ansible.builtin.copy

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 10:43:31 +10:00
Ryan Lerch
62952df107 ansiblelint fixes-- fqcn[action-core] - file to ansible.builtin.file
Replaces many references to  file: with ansible.builtin.file

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-15 10:41:52 +10:00
Ryan Lerch
691adee6ee Fix name[casing] ansible-lint issues
fix 1900 failures of the following case issue:

`name[casing]: All names should start with an uppercase letter.`

Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2025-01-14 20:20:07 +10:00
Ryan Lerch
89f6f1fc32 Fix majority of remaining yamllint warnings and errors
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2024-11-28 17:31:45 +10:00
Kevin Fenzi
ef8a734d69 nagios: also make sure the service is running and enabled
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2024-11-21 12:53:00 -08:00
Kevin Fenzi
160a909053 noc: install ipmitool as well
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2024-11-19 13:22:48 -08:00
Kevin Fenzi
c84b99223c osbs: raise a glass for it's service
This removes osbs and allmost all it's associated playbooks and files.

It served long and well, but we no longer need it.
flatpaks are building with a koji-flatpak plugin.
base/minimal/toolbox containers are building with kiwi.
We aren't building any other containers right now, and we did they could
be added to kiwi.

This is the end of an era... I look with nostolga on
ansible-ansible-openshift-ansible (a role to setup ansible on a control
host and run it from our ansible).

Good bye osbs!

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2024-03-28 12:52:07 -07:00
Leo Puvilland
18e4f51c61 Make only the nagios group able to execute the matrix-notify script
Signed-off-by: Leo Puvilland <leo@craftcat.dev>
2023-12-21 14:46:02 -08:00
Leo Puvilland
11b56e8551 Fix path to Matrix-Notify script
Signed-off-by: Leo Puvilland <leo@craftcat.dev>
2023-12-20 10:29:15 -08:00
Leo Puvilland
e04948b31a Fix template file not being copied (matrix-notify script)
Signed-off-by: Leo Puvilland <leo@craftcat.dev>
2023-12-19 09:18:16 -08:00
Leo Puvilland
05bff0da9f nagios matrix notify: use full filename for script in role
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-12-19 10:13:23 +10:00
Leo Puvilland
48d7982ebf Correct syntax error in nagios role
Signed-off-by: Ryan Lerch <rlerch@redhat.com>
2023-12-19 10:08:07 +10:00
Leo Puvilland
5aafc6a1d2 Move nagios notifications to Matrix
Signed-off-by: Leo Puvilland <leo@craftcat.dev>
2023-12-18 15:55:30 -08:00
Kevin Fenzi
d727ee47ea nagios: remove another old notifs remnant
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2023-11-15 13:12:36 -08:00
Kevin Fenzi
fdf34aab57 nagios_server / noc02: set seboolean to allow certgetter to work
noc02 needs to be able to proxy to certgetter for the acme challenge for
ssl certs. So, set this there to allow that.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2023-11-08 15:40:23 -08:00
Kevin Fenzi
22dde8163b unbound: remove and retire unbound servers
These instances served long and well as fallback resolvers for
dnssec-trigger. This is no longer needed or used, so lets remove them.
See https://pagure.io/fedora-infrastructure/issue/11415

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2023-07-24 14:40:43 -07:00
Kevin Fenzi
71cdddf55b nagios: move the ipv6 specific ping config to a ping-ipv6.cfg file
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-11-17 16:39:11 -08:00
Kevin Fenzi
28fc20056a nagios: fix typo
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-11-17 16:30:06 -08:00
Kevin Fenzi
b9b35a09ed nagios: move ping.cfg to a template so it works for both nagios servers
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-11-17 16:19:50 -08:00
Stephen Smoogen
e36f982263 This should allow for ansible to build correctly the templates for noc01/noc02. 2022-11-17 12:06:00 -05:00
Seddik Alaoui Ismaili
9af427e1bf add ipv6 check for fedorapeople 2022-11-17 01:40:25 +00:00
Kevin Fenzi
b388a003b4 nagios: add checks for ssl certs on fcos and ocp4 endpoints, change to just checking proxy01
Add checks for ssl certs on fcos openshift endpoints.
Add checks for ocp4 wildcard certs.
Change check to only use proxy01/proxy01.stg instead of all proxies.
Ideally we really do want to check all proxies, but in practice this
results in like 70 alerts anytime the cert is going to expire.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2022-02-02 15:47:23 -08:00
Silvie Chlupova
6fa2999dbf copr: use already existing copr.cfg 2022-01-20 13:23:31 +01:00
Silvie Chlupova
8c5dc50c7e copr: move copr nagios services into separate file 2022-01-20 12:14:48 +01:00
Pavel Raiskup
6062eec80b nagios: drop copr_external.cfg from services 2021-08-10 09:59:23 +02:00
Pavel Raiskup
d2f9b772e9 nagios: move copr-ping to internal 2021-08-10 08:51:55 +02:00
Pavel Raiskup
f76859775c nagios: pick up copr_external.cfg services 2021-08-09 13:50:30 +02:00
Pavel Raiskup
29fb33bbb7 copr-be: test remaining results storage space 2021-07-28 13:51:16 +02:00
Rick Elrod
dcc53bd63b add crl check to nagios + nrpe + facl perms for nrpe
Signed-off-by: Rick Elrod <relrod@redhat.com>
2020-08-06 15:32:09 -05:00
Kevin Fenzi
6371dd26c3 nagios / server: fix check_koji plugin name
As it was it copied the check_koji.j2 template in ansible to
check_koji.j2 on the server, which meant that check_koji the actual
script wasn't on noc01 and the check couldn't work.

Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-07-09 16:49:07 -07:00
Stephen Smoogen
35f1746c3f things become clearer when we find a missing internal on soemthing that says for internal 2020-07-01 18:20:14 -04:00
Stephen Smoogen
6e218c7031 a box not on the vpn has a hard time testing for boxes on the vpn 2020-07-01 18:14:02 -04:00