16:00:00 <GeKo> #startmeeting network-health 09/06/2021
16:00:00 <MeetBot> Meeting started Mon Sep  6 16:00:00 2021 UTC.  The chair is GeKo. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:00 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:04 <GeKo> hello everyone!
16:00:14 <GeKo> let's see how many folks are around today
16:00:25 <GeKo> i think we won't have a gaba nor a dgoulet today
16:00:30 <GeKo> and gus is out, too
16:00:43 <GeKo> but i guess hiro is around :)
16:00:47 <GeKo> pad is at http://kfahv6wfkbezjyg4r6mlhpmieydbebr5vkok5r34ya464gqz6c44bnyd.onion/p/tor-nethealthteam-2021.1-keep#L65
16:01:02 <GeKo> or better
16:01:04 <GeKo> kfahv6wfkbezjyg4r6mlhpmieydbebr5vkok5r34ya464gqz6c44bnyd.onion/p/tor-nethealthteam-2021.1-keep
16:01:09 <hiro> o/
16:01:20 <GeKo> hihi
16:05:35 <GeKo> okay, let's go
16:05:55 <GeKo> hiro: do you have anything we should chat about?
16:06:20 <hiro> uhm I'd like to close that support document about metricsport if that's possible and deploy those website changes
16:06:30 <GeKo> right
16:06:31 <hiro> but I guess we can take more time thinking about it
16:06:41 <GeKo> i meant to ping dgoulet tomorrow
16:06:45 <hiro> we don't want to send the wrong information to people
16:06:54 <GeKo> figuring out whether he can give us some sentences
16:07:06 <hiro> irl was also mentioning that it was a good topic for a blog post
16:07:21 <GeKo> hrm
16:07:37 <GeKo> fine with me
16:07:52 <GeKo> i guess we should at least write a mail to tor-relays@, though
16:08:07 <GeKo> once we merged all the related changes and things are live
16:08:19 <GeKo> i am not sure whether we have the time for a blog post
16:08:23 <hiro> yes I was thinking tor-project but also tor-relays makes sense I wasn't sure about the blog post
16:08:36 <GeKo> but i am not opposed to it
16:08:37 <hiro> hehe yes same
16:08:40 <GeKo> :)
16:09:23 <GeKo> but, anyway, we should aim for this week getting all the work done here and merged
16:09:32 <hiro> sounds good
16:09:40 <GeKo> there is no need to drag this longer
16:10:14 <hiro> I have also started outlining okrs. I created a milestone in the network health team space. I hope that's ok.
16:11:00 <GeKo> if that helps you working on that part, fine with me :)
16:11:15 <hiro> I wasn't sure if that should have gone in the metrics space instead
16:11:18 <hiro> but I guess it can be moved
16:11:50 <GeKo> it's good having it at the network-health level
16:12:59 <GeKo> hiro: do you have any opinion on https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40021 ?
16:13:15 <GeKo> i'd just move it to base and would trust irl here
16:13:28 <GeKo> in that nothing breaks in the other projects
16:14:09 <hiro> yeah I think that is ok
16:14:19 <GeKo> great
16:14:34 <GeKo> i'll write a patch then and put it on your review plate later this week
16:14:48 <hiro> sounds good I'll check if we set that somewhere in some other metrics project
16:14:54 <GeKo> k
16:15:15 <GeKo> the final thing i had was the collector issue
16:15:22 <GeKo> how do we proceed here?
16:15:26 <hiro> yes
16:15:39 <GeKo> just waiting until rob stops flooding and seeing whether that#s been the problem?
16:15:50 <hiro> so I spent some time during the past few days thinking of how we could improve logging of the checker
16:16:04 <hiro> but even so, it can be used to just confirm what we can read in the logs of the downloader
16:16:16 <hiro> it's taking more time to download server descriptor and extra-info descriptor
16:16:26 <GeKo> right
16:16:28 <hiro> when does rob's experiment finish?
16:16:55 <GeKo> the current iteration is supposed to stop on 09/08
16:16:59 <GeKo> this wed
16:17:15 <GeKo> then a week for the advertized bw to get back to "normal" levels"
16:17:19 <GeKo> *levels
16:17:25 <GeKo> then one week off
16:17:30 <GeKo> and then we'd start again
16:17:38 <GeKo> with two weeks flooding
16:18:04 <hiro> uhm
16:18:34 <hiro> so I was reading through https://research.torproject.org/techreports/modern-collector-2018-12-19.pdf where irl and karsten did put tgether issues with the current collection model that we have
16:18:57 <hiro> and being i/o intensive is one of them. the main metrics services rely a lot of disk and network
16:19:45 <hiro> and what we see if in line with an excessive load  imo
16:20:00 <GeKo> what does "Missing too many referenced descriptors" actually mean? do we really lose them?
16:20:21 <GeKo> in the sense that they aren't collected and don't show up in our arhived data?
16:20:22 <hiro> it means they couldn't be downloaded for some reason
16:20:31 <hiro> and collector will queue and try again later
16:20:33 <GeKo> *archived
16:20:38 <GeKo> aha, okay
16:20:50 <GeKo> so it's not as bad as actually losing them?
16:22:15 <hiro> I am not sure if at the end of the data data on those relays is recovered somehow or we just lost it
16:22:33 <hiro> I guess we should go over the archived data to know and see if we are missing
16:22:35 <hiro> anything
16:22:47 <hiro> s/data/day
16:23:30 <GeKo> i guess that would help me at least understanding how severe the problem is
16:24:09 <hiro> ok then this is what I am going to do tomorrow. I guess it makes sense to know for the next experiment
16:24:16 <hiro> even if it is just the experiment
16:24:47 <GeKo> yep, thanks
16:25:08 <GeKo> sounds like a good next step
16:25:32 <GeKo> and then we can wait this week and check whether things get better once the experiment is off
16:26:57 <GeKo> okay, that's all i had
16:27:04 <GeKo> do you have anything else?
16:27:09 <hiro> nope I am fine
16:27:14 <GeKo> great!
16:27:21 <GeKo> thanks and ttyl o/
16:27:25 <GeKo> #endmeeting