15:58:54 <GeKo> #startmeeting network-health 12/19/2022 15:58:54 <MeetBot> Meeting started Mon Dec 19 15:58:54 2022 UTC. The chair is GeKo. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:58:54 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:59:00 <GeKo> okay, let's get started 15:59:18 <GeKo> on to the last team sync this year :) 15:59:32 <GeKo> the pad is, as usual, at: http://kfahv6wfkbezjyg4r6mlhpmieydbebr5vkok5r34ya464gqz6c44bnyd.onion/p/tor-nethealthteam-2021.1-keep 16:00:06 <GeKo> juga: hiro: ^ 16:00:18 <GeKo> i wonder whether we have a ggus here for s112 16:00:26 <hiro> o/ 16:01:25 <juga> o/ 16:02:39 <GeKo> alright 16:03:01 <GeKo> hiro: how is it going the the infra issues we have all over the metrics place? 16:03:05 <GeKo> *with the 16:03:29 <hiro> so we had an issue with onionoo on friday 16:03:34 <hiro> someone was doing a lot of queries 16:03:43 <hiro> anarcat identified 3 or 4 azure ips 16:03:53 <hiro> and shut those down 16:04:04 <hiro> and it has been ok since then. 16:04:11 <GeKo> i guess that's still kind of ongoing given out alerts today in the morning? 16:04:16 <GeKo> ah, okay 16:04:41 <hiro> then there is an issue I think with getting bridges from polyanthum 16:05:01 <hiro> I got a collector alert I think on saturday and it was linked with a EOF error on one of the archives 16:05:18 <hiro> onionoo is running correctlyk but sometimes the bridge list is delayed 16:05:24 <hiro> I have an issue for the faulty archive 16:06:24 <GeKo> yeah 16:06:28 <hiro> it's very difficult to dig up issues because we just know some of the index is delayed 16:06:43 <hiro> I think we should make some time in the new year to have better log handling or something 16:07:15 <hiro> maybe a log browser like that loki service for prometheus 16:07:32 <hiro> https://grafana.com/oss/loki/ 16:07:36 <hiro> we have a ticket open for this 16:07:52 <GeKo> sounds good 16:07:58 <hiro> I think we need a place that when we see an alert we can explore what is happening in all our services because things are very interconnected 16:08:06 <GeKo> monitoring-and-alerting#4 16:08:18 * GeKo nods 16:08:19 <hiro> yep 16:08:42 <hiro> also the nl7 onionperf instance is back 16:08:49 <hiro> greenhost told me they powercycled it 16:08:51 <GeKo> i hope things are settle down a bit over the holidays :) 16:08:57 <GeKo> nice \o/ 16:09:05 <hiro> I tried a few times too over sunday morning but it wasn't coming back 16:09:07 <GeKo> *settling 16:10:41 <GeKo> hiro: do we have to kick the onionperf instance additionally? 16:10:48 <GeKo> it's not showing up in https://grafana2.torproject.org/d/TmYimmx7k/onionperf-clients-status?orgId=1 anymore 16:10:53 <hiro> uhm 16:11:00 <hiro> I logged into the machine and it was running 16:11:04 <hiro> I'll check again 16:11:33 <GeKo> k 16:12:27 <hiro> maybe it's prometheus that needs to be kicked 16:12:31 <hiro> on the machine I mean 16:12:51 <GeKo> yep 16:13:37 <GeKo> hiro: https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40078 contains a bunch of requests 16:13:52 <hiro> yeah I have those for this week 16:13:54 <GeKo> i am not sure whether we should do all of that now, like in this week 16:14:04 <hiro> was talking to richard last week 16:14:09 <GeKo> because given our lack of time 16:14:10 <hiro> yeah not sure we can 16:14:28 <GeKo> but maaaybe it is smart to touch the website things while we are at it 16:14:32 <hiro> but maybe let's see how much we can fix... especially on the app stats for the locale 16:14:39 <GeKo> right 16:15:03 <GeKo> having some kind of priority and making sure we are addressing the most important items seems like a good way to go 16:15:22 <GeKo> i mean there have been tickets around for enhancements for... quite some time :) 16:15:34 <hiro> I would like to finish having the data imported on metrics-psqlts-01 first and then do the rest 16:15:52 <GeKo> and it's not obvious why they should be suddenly such a high prio to squeeze them into the remaining days 16:15:57 <GeKo> +1 16:16:29 <GeKo> alright 16:16:33 <hiro> uhm I wanted to have a look at the locale thing first to be honest because I'd like to have a look at the webstats queries since we have issues with those 16:16:56 <hiro> if it's quick to fix the issue with the apps then also good 16:16:58 <GeKo> yeah, locale sounds like the most important of those items imo 16:17:23 <GeKo> right 16:18:02 <GeKo> okay, nothing marked in bold, do we still have stuff to discuss? 16:18:24 * juga is fine 16:18:50 * hiro is groot 16:19:00 <GeKo> just a small s112 update from my side: 16:19:21 <GeKo> i sent the criteria for O2.1 to bekeela so she is getting them to drl for some feedback 16:19:26 <GeKo> let's see how that goes 16:19:54 <juga> nice 16:20:10 <GeKo> and then i spent a lot of my time on analyzing an fd overload we saw in november 16:20:23 <GeKo> that's for the dos part of O3 16:21:04 <GeKo> might stilll need tomorrow to answer the most interesting questions 16:21:17 <GeKo> but then i should be able to focus on other stuff :) 16:21:29 <GeKo> seems we have no ggus here for today 16:21:43 <GeKo> i guess we can just call it then and get back to work 16:21:56 <GeKo> note: the next sync will be 1/9/2023 16:22:02 <juga> ack 16:22:14 <GeKo> a nice week everyone ΓΌ/ 16:22:16 <GeKo> hah 16:22:18 <GeKo> o/ 16:22:22 <juga> o/ 16:22:22 <GeKo> #endmeeting