Commit Graph

228 Commits

Author SHA1 Message Date
Nils Philippsen
a3203d29d9 Get rid of implicit message topic prefix
Callers of simple_message_to_bus need to set and export MSGTOPIC_PREFIX
explicitly.

This decouples the fedora-messaging-utils and web-data-analysis roles.

Additionally, don't assume /bin/sh is /bin/bash.

Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-09-09 12:40:56 +02:00
Nils Philippsen
0b518a7e88 Fix typo
Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-09-09 10:38:38 +00:00
Nils Philippsen
ecd8ab8383 Merge syncing and combining logs into one cronjob
This should prevent race conditions of the form that logs are attempted
to be combined while syncing those of individual hosts hasn't finished.

Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-09-09 10:38:38 +00:00
Nils Philippsen
a766ec6416 Merge awstats role into web-data-analysis
This is to enable running the syncing and combining scripts in
series rather than from independently scheduled cron jobs.

Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-09-09 10:38:38 +00:00
Nils Philippsen
5e09dce82d Import fedora-messaging-utils role
Importing the role rather than listing it in the playbook lets its tasks
have the tags used in the importing role, i.e. should ensure they are
run when the things that need simple_message_to_bus are installed.

Additionally, don't attempt to install it manually from
web-data-analysis (it isn't found because it lives in a different role).

Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-08-16 06:02:37 +00:00
Nils Philippsen
6e62fcbe69 Don't drop temporary files all over the place
When renaming a file over another which is the same hard link, the
rename is a no-op. This left many temporary files in /var/log/hosts
because a file is attempted to be synced (and thus hard-linked between
dated and undated file names) over a couple of days. The solution to
this is how the `ln` command does it: rename, then unlink the temporary
file.

Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-08-12 09:45:49 +00:00
Adam Saleh
db936062b3 Add more message-based tracing to log01 scripts 2021-08-11 11:18:17 +00:00
Nils Philippsen
f703e7a771 Add and use optimized http log syncing script
The previous one synced all hosts serially and ran rsync for each log
file. This reimplements the shell script in Python, with these changes:

- Run rsync on whole directories of log files, with much reduced
  overhead.
- Use a pool of five workers which process hosts in parallel.

Additionally, remove download-rdu01.vpn.fedoraproject.org from the list
of synced hosts.

Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-08-05 16:44:47 +00:00
Stephen Smoogen
33df23d457 this will give copies of these emails to asaleh and nils so they can see how the cron jobs are working 2021-08-05 06:46:17 -04:00
Stephen Smoogen
b78179ed3c remove an email to smooge@smoogespace.com as debug is done 2021-08-04 08:38:42 -04:00
Adam Saleh
7a013fe511 Send tracing messages to the bus in syncHttpLogs
In the course, fix a typo which reduces stdout spam.

Signed-off-by: Adam Saleh <asaleh@redhat.com>
Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-07-28 11:23:10 +02:00
Nils Philippsen
c782eceae1 Move syncHttpLogs.sh into web-data-analysis role
Signed-off-by: Nils Philippsen <nils@redhat.com>
2021-07-23 13:06:23 +02:00
Stephen Smoogen
9a54f23d1e Fix a lot of unknown arches in mirrorlist.py. Take a stab at fixing some of the graphs in mirrors-data.gp. Let the team figure out a better way to fix the rest. 2021-07-15 05:06:56 -04:00
Kevin Fenzi
b7a031c9fd fedoraloveskde.org: add site and pipeline to deploy it and dns zone
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2021-06-14 12:49:11 -07:00
Stephen Smoogen
e21c30a720 Make the git countme email me at home so I can see if there is an error
Signed-off-by: Stephen Smoogen <smooge@smoogespace.com>
2021-05-27 12:37:51 -04:00
Stephen Smoogen
255b10c922 Add in roles for f34-f39 and epel9 for counting with old stat program 2021-04-06 13:11:31 -04:00
Stephen Smoogen
625441f66b remove wwoods and put mattdm as owner of this script.
Signed-off-by: Stephen Smoogen <smooge@smoogespace.com>
2021-03-29 08:43:25 -04:00
Stephen Smoogen
e1c6dccd5b update the dates for web data analysis gnuplots til 2021-06-30. should give enough time for next project 2020-12-08 17:56:37 -05:00
Will Woods
9a4201efc1 suppress 'nothing added to commit..' messages from countme-update.sh
Right now countme-update.sh tries to `git commit -a` whether or not
anything has changed, which results in this output whenever there's no
new changes to commit:

    On branch master
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
            raw.db
            totals.db

    nothing added to commit but untracked files present (use "git add" to track)

This commit tweaks `countme-update.sh` so that it only attempts `git commit`
if there are changes to be committed - i.e. when `git diff` returns 1.

Signed-off-by: Will Woods <wwoods@redhat.com>
2020-11-17 13:31:52 -05:00
Will Woods
3dadedeb26 web-data-analysis: fix countme-update
So it turns out that pip3 installs scripts to /usr/local/bin and cron
jobs don't have /usr/local/bin in the path.

This commit adds /usr/local/bin to PATH in countme-update.sh.

For Maximum Correctness we should probably get pip to tell us where it
installed countme-update-{rawdb,totals}.sh but this'll work just fine
as long as pip keeps installing scripts to /usr/bin or /usr/local/bin.

Signed-off-by: Will Woods <wwoods@redhat.com>
2020-11-10 17:57:56 -05:00
Stephen Smoogen
8d7d4f5389 when trying to find out why something is failing.. remove the /dev/null so that you can see why it is failing. Also add wwoods so he can delight in this. 2020-11-06 10:15:17 -05:00
Stephen Smoogen
1096f9b35f try to get the cron output to see why this is failing. 2020-11-06 10:07:15 -05:00
Kevin Fenzi
1cd00567f2 web-data-analysis: pip is pip3
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-28 08:33:02 -07:00
Kevin Fenzi
1736849caa web-data-analysis: add countme group before trying to add it to countme user
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-10-28 08:25:53 -07:00
Will Woods
f46768ec6b countme: add .gitconfig
This gives the web-data-analysis `countme` user a .gitconfig file so the
commits it makes in its local git repo have a proper user name and
email address. (Also it makes git stop complaining..)

The email address might not actually be valid, but this repo doesn't
currently go anywhere public so it shouldn't really matter.
2020-10-13 16:17:00 +00:00
Will Woods
f8a5720535 add 'countme' stuff to web-data-analysis role
This should automate running the "countme" scripts every day to parse
new log data and publish updated totals.

Here's what I've added to the ansible role:

* install package deps for `mirrors-countme`
* make "countme" user with home /srv/countme
* clone 'prod' branch of https://pagure.io/mirrors-countme to /srv/countme
  * if changed: pip install /srv/countme/mirrors-countme
* make web subdir /var/www/html/csv-reports/countme
* make local data dir /var/lib/countme
* install `countme-update.sh` to /usr/local/bin
* install `countme-update.cron` to /etc/cron.d
  * runs /usr/local/bin/countme-update.sh daily, as user `countme`

That should make sure `countme-update.sh` runs every day.
That script works like this:

1. Run `countme-update-rawdb.sh`
  * parse new mirrors.fp.o logs in /var/log/hosts/proxy*
  * write data to /var/lib/countme/raw.db
2. Run `countme-update-totals.sh`
  * parse raw data from /var/lib/countme/raw.db
  * write updated totals to /var/lib/countme/totals.{db,csv}
3. Track changes in updated totals
  * set up /var/lib/countme as git repo (if needed)
  * commit new `totals.csv` (if changed)
4. Make updated totals public
  * Copy totals.{db,csv} to /var/www/html/csv-reports/countme

For safety's sake, I've tried to set up everything so it runs as the
`countme` user rather than running everything as `root`. This might be
an unnecessary complication but it seemed like the right thing to do.

Similarly, keeping totals.csv in a git repo isn't _required_, but it
seemed like a good idea to keep historical records in case we want/need
to change the counting algorithm or something.

I checked the YAML with ansible-lint and tested that all the scripts
work as expected when run as `wwoods`, so unless I've missed something
this should do the trick.
2020-10-13 16:17:00 +00:00
Stephen Smoogen
28aa22994e this file was needed for older log analysis and not for the current one 2020-08-18 14:01:12 -04:00
Stephen Smoogen
e86fb420fd remove a playbook and job which should not be run on log01 2020-08-04 08:14:48 -04:00
Stephen Smoogen
ebfeeecc83 remove moving average from hotspot as it will not work in EL8 2020-08-03 15:11:38 -04:00
Stephen Smoogen
a694ee6042 again remove ending backslash from last line before unset 2020-08-03 07:25:15 -04:00
Stephen Smoogen
11b735ce27 move all latest start dates to be the same of 2018-01-01 2020-07-31 18:20:46 -04:00
Stephen Smoogen
348f7e76cd unknown_releases are currently getting a large amount due to the open cisco repository getting seen. need to rerun 6 months of data 2020-07-31 18:19:15 -04:00
Stephen Smoogen
64f85b6657 remove moving average graphs as they dont work 2020-07-31 18:08:26 -04:00
Stephen Smoogen
e60973c6cb fix a trailing \, in the gnuplot which does not look well 2020-07-31 18:03:12 -04:00
Stephen Smoogen
98b38667f0 add fedora 30 to 33 to the mirrors-data.gp. These are the last releases which can be added to this software without some major changes in how the csv and db are stored. 2020-07-31 16:41:21 -04:00
Stephen Smoogen
4766e57e53 fix the data-analysis graphs to plot past 2019-12-31 to 2020-12-31 2020-07-31 13:17:35 -04:00
Stephen Smoogen
5f7864f24f put in web-data-analysis changes to get proxies to work 2020-07-31 11:08:44 -04:00
Stephen Smoogen
b37734873b remove awstats and other cron jobs we arent using 2020-07-10 17:10:11 -04:00
Stephen Smoogen
184f33b98a turn off moving averages as I dont have time to rewrite the code with newer pandas syntax and python3 2020-07-02 14:24:10 -04:00
Stephen Smoogen
dd3d48189f need to have condense-mirrorlogs do its sort in /tmp and not /srv/tmp 2020-07-01 13:36:52 -04:00
Kevin Fenzi
b012b97f8a log01 / web-data-analysis: it is python3-pandas in rhel8
Signed-off-by: Kevin Fenzi <kevin@scrye.com>
2020-06-17 19:14:36 -07:00
Tim Flink
369aec85f2 web-data-analysis: removing taskotron.fp.o 2020-06-17 17:19:16 +00:00
Stephen Smoogen
f05aa2a046 start excising boot from websites 2020-04-24 21:34:23 +02:00
Stephen Smoogen
870eee6e99 Revert "remove boot.fedoraproject.org from proxy data files"
This reverts commit d6ecabe44506a1ca82125e17d1dc6a7f05ac7a2a.
2020-04-24 21:34:23 +02:00
Stephen Smoogen
9d3706d64d remove boot.fedoraproject.org from proxy data files 2020-04-24 21:34:23 +02:00
Patrick Uiterwijk
85407ea90a Remove puiterwijk from access to data-analysis
Signed-off-by: Patrick Uiterwijk <patrick@puiterwijk.org>
2020-04-24 21:34:22 +02:00
Stephen Smoogen
10ed78b366 add pingou to see analysis 2020-04-24 21:34:21 +02:00
Stephen Smoogen
bd8cdf0133 move the logs analysis up one more day and see how that works. 2020-04-24 21:34:12 +02:00
Stephen Smoogen
6dbfcf8b18 move the merge down 1 day to try and catch up with actual date 2020-04-24 21:34:12 +02:00
Stephen Smoogen
c045053e9b double meh 2020-04-24 21:34:09 +02:00