Clement Verna
bb24183f46
oci-registry: Update nagios to monitor the correct directory for disk space
...
Signed-off-by: Clement Verna <cverna@tutanota.com >
2019-05-30 20:06:56 +02:00
Kevin Fenzi
b68a3cf906
nagios / bodhi: change masher to composer
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com >
2019-05-29 02:57:01 +00:00
Randy Barlow
2843dd8422
bodhi: Remove another nagios check.
...
Signed-off-by: Randy Barlow <randy@electronsweatshop.com >
2019-05-28 23:14:33 +00:00
Randy Barlow
4cf1624c76
bodhi: Upgrade production to Bodhi 4.0.0.
...
Signed-off-by: Randy Barlow <randy@electronsweatshop.com >
2019-05-28 15:58:52 +00:00
Kevin Fenzi
4b31ac5152
ansible: Change all our group names from foo-bar to foo_bar or foo-bar-baz to foo_bar_baz
...
In ansible 2.8 the - character isn't supposed to be valid in group names.
While we could override this, might has well just bite the bullet and change it.
So, just switch all group names to use _ instead of -
Signed-off-by: Kevin Fenzi <kevin@scrye.com >
2019-05-20 17:38:09 +00:00
Kevin Fenzi
df0cd4014a
nagios_client: The plugin is in the plugins dir.
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com >
2019-05-09 15:43:07 +00:00
Clement Verna
e220bb4867
Fedbages: put a higher threshold to fedmsg hub backlog nagios alerts.
...
The fedbages consumer has to consume many more messages and the query to datanomer
are slower. We need to allow the consumer to have a bigger backlog before raising an
alert.
Generally above 35000 message in the backlog it will be difficult for the consumer
to catch up, in that case it might be better to flush the backlog and restart from 0
Signed-off-by: Clement Verna <cverna@tutanota.com >
2019-05-05 13:59:31 +02:00
Stephen Smoogen
c432675c74
[nagios] add checks for datanommer monitoring
2019-04-25 18:13:43 +00:00
Clement Verna
93d0eeaf54
Nagios: monitor that resultsdb sends messages on the bus
...
Signed-off-by: Clement Verna <cverna@tutanota.com >
2019-04-24 11:22:46 +02:00
Kevin Fenzi
1a40dd5142
nagios: drop askbot fedmsg check.
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com >
2019-02-21 15:34:31 +00:00
Kevin Fenzi
a45be3147b
supybot: ignore freenodes 'frigg' bot that sends privmessages on connect
...
Signed-off-by: Kevin Fenzi <kevin@scrye.com >
2019-02-12 22:07:09 +00:00
Rick Elrod
51be1fa32d
Remove osbs check here too /cc cverna
...
Signed-off-by: Rick Elrod <relrod@redhat.com >
2019-02-12 00:10:39 +00:00
Clement Verna
ae70d3d6d3
Remove OSBS build check from nagios
...
Signed-off-by: Clement Verna <cverna@tutanota.com >
2019-02-11 17:59:45 +01:00
Rick Elrod
4c8cf933fc
make odcs-backend check for fedmsg-hub-3 instead (infra #7526 )
...
Signed-off-by: Rick Elrod <relrod@redhat.com >
2019-01-28 08:54:46 +00:00
Patrick Uiterwijk
ac65c80b07
Drop the variant-specific config again, hoping for ref=
...
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org >
2019-01-21 17:49:56 +01:00
Patrick Uiterwijk
3f4bf6db2b
Update file for nagios check
...
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org >
2019-01-21 17:23:41 +01:00
Patrick Uiterwijk
18b0acc8f3
Monitor ostree summary on proxies
...
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org >
2019-01-21 16:57:26 +01:00
Patrick Uiterwijk
2ded08f111
Add 24-hour check for bodhi compose start
...
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org >
2018-12-20 20:44:10 +01:00
Stephen Smoogen
cacbb74b61
This change will update monitoring and repoSpanner service
...
The monitoring needs to see that the service is run by the repoSpanner user.
The service needs to have a larger limit of open files to work.
2018-12-19 13:41:48 +00:00
Stephen Smoogen
5dd7924887
just make it simple and see if it works
2018-12-18 16:24:14 +00:00
Stephen Smoogen
d936a22544
this will fix a missing _proc in the file name.
2018-12-17 15:49:53 +00:00
Stephen Smoogen
3bbc0031f4
This will add minimal monitoring for repospanner on pkgs01.stg. This only says it is running or not.
2018-12-17 15:44:31 +00:00
Clement Verna
eb5e9a5138
NAGIOS: Update check_osbs_builds check to use Openshift cli instead of osbs-client
...
Signed-off-by: Clement Verna <cverna@tutanota.com >
2018-12-13 17:07:07 +01:00
Randy Barlow
4422e2bb2d
Monitor fedmsg-hub-3 on Bodhi instead of fedmsg-hub.
...
Signed-off-by: Randy Barlow <randy@electronsweatshop.com >
2018-11-19 21:33:37 +00:00
Randy Barlow
ce86a667b7
Configure check_fedmsg_cp_bodhi_backend02_hub to use fedmsg-hub-3.
...
Signed-off-by: Randy Barlow <randy@electronsweatshop.com >
2018-11-19 21:29:14 +00:00
Randy Barlow
2911286a3a
Rename check_fedmsg_masher_proc to check_fedmsg_composer_proc and have it check fedmsg-hub-3.
...
Signed-off-by: Randy Barlow <randy@electronsweatshop.com >
2018-11-19 21:20:15 +00:00
Stephen Smoogen
8190e9bb6f
this should fix the nagios check on raid
2018-11-11 21:22:15 +00:00
Patrick Uiterwijk
7a8de6026e
Silly me... Also add this to the list to actually sync out :)
...
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org >
2018-10-20 19:35:16 +02:00
Patrick Uiterwijk
3fc57e699b
Enable nagios checks for ticketkey, and stop emailing puiterwijk
...
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org >
2018-10-20 15:36:00 +02:00
Kevin Fenzi
2d7ac321c7
a few tagger stragglers
2018-10-03 17:56:00 +00:00
Kevin Fenzi
935b25decc
Up this check to 8 hours.
2018-09-29 20:13:51 +00:00
Patrick Uiterwijk
0854930115
We have no jenkins service anymore
...
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com >
2018-07-03 02:28:37 +02:00
Miroslav Suchý
a35851b271
check free space on retrace
...
retrace\'s /srv volume is 9.6TB big. So 5% (warning) is 480GB and 1% (critical) is 96GB
2018-06-18 12:11:31 +02:00
Ralph Bean
ed44e7f0dc
Only alert if we haven't seen a greenwave message in 2 days.
2018-05-12 14:20:48 +00:00
Kevin Fenzi
a8714caab3
first cut at changing all the old |changed to is changed per ansible deprecations
2018-05-07 23:51:48 +00:00
Kevin Fenzi
e51b6285a6
Shelve summershum
2018-04-10 21:39:56 +00:00
Clement Verna
3c69f743ba
Increase the warning and critical threshold for packages fedmsg backlog
...
Signed-off-by: Clement Verna <cverna@tutanota.com >
2018-03-08 08:32:00 +01:00
Kevin Fenzi
be38d5fcd9
increase greenwave datanommer fedmsg check from 4 hours to 6 hours as it has been alerting some nights with low build activity
2018-02-28 17:38:20 +00:00
Kevin Fenzi
c8e5316fb7
adjust more
2018-02-07 01:32:17 +00:00
Kevin Fenzi
8a0c1f266f
and setup check on mailman01 nrpe side
2018-02-07 01:24:52 +00:00
Kevin Fenzi
3cea53b5b5
perhaps this needs quoted
2018-02-06 23:16:37 +00:00
Kevin Fenzi
1cb897a769
Clean up nagios client for old stuff that no longer matters.
...
Add a mailman api check. It gets a 401 now, but at least that tells us it's working.
2018-02-06 23:12:14 +00:00
Kevin Fenzi
0b138f9111
add mdapi and greenwave monitoring. tickets 6639 and 6643
2018-01-19 21:32:19 +00:00
Patrick Uiterwijk
a89e7984fa
Add quotes
...
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com >
2018-01-12 21:48:40 +00:00
Patrick Uiterwijk
2b879d49e6
tags
...
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com >
2018-01-12 21:48:01 +00:00
Patrick Uiterwijk
f46144bd78
Add mirrorlist container selinux policy
...
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com >
2018-01-12 21:47:07 +00:00
Patrick Uiterwijk
d3ea8120ee
Add some more selinux policy to fi-nrpe
...
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com >
2018-01-12 21:47:07 +00:00
Patrick Uiterwijk
ca798ca07d
Add check_mirrorlist_cache.cfg
...
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com >
2018-01-12 20:44:44 +00:00
Patrick Uiterwijk
b58aec5fdb
Perform mirrorlist cache check against proxies
...
Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com >
2018-01-12 20:37:25 +00:00
Kevin Fenzi
7d265c9bf9
switch openqa machines to alert on disk only when 90% or higher instead of 85%
2017-11-29 21:52:49 +00:00