Something is broken with smtp_tls_connection_reuse = yes, so disable it
for now. Also, setup a tls_policy map file and tell it to not use tls
for mx2.redhat.com. The normal smtp connection reuse works just fine, so
this will keep mail flowing until we can one day figure out why tls
connection reuse is busted.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Recently, redhat.com changed internal MX servers. The new servers are
have rate limits on incoming emails from one ip and admins there don't
want to add a bunch of exceptions, so we need to adjust our end to not
flood connections to them. Currently, connections burst up to 100 (the
smtp postfix default) which goes over their limits and causes the
internal MX to reject emails from us for a while.
So, this change:
* Adds some domains to fast_flush. This allows us to use postqueue -s
domain to flush emails to a particular domain.
* Changes the smtp limit to 40. This is under the redhat.com limit.
* Has ansible actually install the master.cf.gateway on bastion servers.
Currently they were using the stock/default one.
* Enables the tlsproxy service, which is actually needed to get that tls
reuse working.
After these changes, we keep few connections to the redhat.com mx open,
but we reuse them and send more emails over existing connections. No
'too many connection emails' have happened since the changes.
The queue slowly seems to be processing down.
Since this was causing an outage of email, I have already applied these
things to bastion01, but I'd like to make sure we match up to whats in
ansible.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Everything should now be using linux-system-roles/network, so we drop
our hacky nmcli calls and everything that referred to them, including
exclude variables. Also, lets just let NM handle resolv.conf so it's not
wrong all the time on reboots.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Found the reason that the definitions I had put were not
working. There were two different ones and i was looking at the wrong
one. Put the two tasks with the same logic so things should work no
matter which one is run.
We need to always run these even in check mode, because they register
things used in the last one of them. So, this could change this in check
mode if we modify it. Be careful!
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
Just writing a config file isn't enough, apparently. We need to
really call update-crypto-policies. This attempts to do so, but
only if it's really necessary, by using some handy check args.
Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is VASTLY better than the hack we have in base now to try and setup
ifcfg files. It uses a standard role that has lots of options and does
the right thing with NetworkManager. Ideally we would switch everything
to this, but lets try it here first to see. It should work with bridges,
etc as well.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
This task seems to fail with a nameserver failed to answer message when
you provision a bunch of hosts at once. Try running just one at a time
and see if it helps any.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
At some point not too long ago we set 'logrotate.timer' as a dist
enabled service. This mostly works fine as all supported Fedora and RHEL
releases have this. However, we still have some old unsupported hosts
(like notifs-backend01) and this caused playbooks to fail on them.
So, lets conditionalize it only to newer ones so we can run playbooks on
the EOL ones.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
In 2017, I (Stephen Smoogen), put in a change to copy
roles/base/files/rsyslog/rsyslog-limits.conf to /etc/systemd on
log01. This was to make it so we have adequete limits on the logrunner
on log01. However I missed the fact that all *.conf files are copied
over to /etc/rsyslog.d/ in a previous section. So this file has been
copied over to every system since 2017.. which was ok when rsyslogd just
ignored the syntax. However on EL8, it dies and kills rsyslogd so
servers are not able to run.
Fix: change the file name to one which won't get globbed. Remove the
file from all systems in /etc/rsyslo.d
Finally take fed-cloud* out and all playbooks associated with the old (and attempts to make a new one).
This cloud was a pain at times, but it did serve long and well, we salute it!
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
The reason we do this is so we can use a few scripts (like nag-once)
as python2 on python2 hosts and python3 on rhel8 hosts.
Note that this depends on the script working on either.
Signed-off-by: Kevin Fenzi <kevin@scrye.com>