16:58:54 <ahf> #startmeeting Network team meeting, 31st May 2022
16:58:54 <MeetBot> Meeting started Tue May 31 16:58:54 2022 UTC.  The chair is ahf. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:58:54 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:58:58 <ahf> yoyo
16:59:07 <jnewsome> o/
16:59:11 <ahf> welcome to the last meeting in may, which means on monday we have the s61 meeting on its own
16:59:17 <ahf> our pad is at https://pad.riseup.net/p/tor-netteam-2022.1-keep
16:59:26 <dgoulet> o/
16:59:36 <Diziet> o/
16:59:46 <mikeperry> o/
16:59:57 <nickm> hi all!
17:00:13 <ahf> o/
17:00:37 * ahf is completely off today since it's tuesday
17:00:58 <ahf> how are folks doing with their boards:     Board: https://gitlab.torproject.org/groups/tpo/core/-/boards
17:01:17 <dgoulet> stable
17:01:50 <ahf> LOL, that is a good way to look at it
17:01:54 <ahf> exponential growth this week
17:02:35 <nickm> okay with me
17:02:46 <ahf> we are entering last month of Q2 now, so this week is a good week to look a bit at whether the Q2 situation for yourself looks well
17:02:48 <nickm> hoping that i can find more stuff to work on actually :)
17:02:59 <ahf> nice
17:03:16 <dgoulet> good problem to have! :)
17:03:43 <ahf> nickm: at some point i would really love to have a sync with you, dgoulet, and gaba about tpo/core/tor and see how many tickets that we can close and/or turn into arti/torspec specific tickets
17:03:53 <ahf> we have ~850 tickets in tpo/core/tor right now
17:04:09 <dgoulet> +1
17:04:47 <ahf> ok, don't see anything off with the board
17:04:53 <ahf> dgoulet: anything on tor releases this week?
17:05:08 <dgoulet> nope
17:05:09 <ahf> i think you are waiting for some merges from me if we want to roll our any updates there?
17:05:12 <ahf> cool!
17:05:24 <dgoulet> yeah but no hurry but for sure a 045 and 046 are coming
17:05:29 <ahf> ya
17:05:31 <ahf> excellent
17:05:52 <ahf> we don't have anything incoming
17:06:30 <ahf> we have a discussion item:
17:06:32 <ahf> [2022-05-31] [nickm] I've been asked when we plan to build the improved MyFamily. Thoughts?
17:07:03 <ahf> i do not have any specific thoughts there other than i remember you had a design for it
17:07:31 <nickm> It is in exactly the wrong spot wrt fundable-ness: it is too small to be its own proposal, but too big to just do in a day.
17:07:49 <dgoulet> prop321 ?
17:07:54 <nickm> I guess it would take around two weeks to do it in C and arti.
17:08:04 <nickm> dgoulet: yes
17:08:31 <nickm> the benefit is that (eventually) it saves a lot of bandwidth, and that it enables bridges to meaningfully belong to families.
17:08:41 <nickm> It is a prerequisite for walking onions
17:09:16 <ahf> would you be interested in trying to fit it in with you looking for some tasks right now?
17:09:55 * juga joined
17:10:16 <ahf> o/ juga
17:10:26 <nickm> Hm.  I'll think about it, but I think it would be good to work ahead on arti stuff instead.
17:10:43 <nickm> If I wind up way ahead of schedule on Q2 arti stuff then I'll consider?
17:10:56 <ahf> nickm: oki, i have no rush on this at all. wouldn't it be something we could batch up with the walking onion grant proposal(s)
17:10:58 <ahf> ya
17:10:59 <ahf> sounds good
17:11:30 <ahf> i think mikeperry can do s61 next then?
17:11:40 <mikeperry> ok
17:12:21 <mikeperry> so hiro and I took a look at the onionperf instances with congestion control: https://gitlab.torproject.org/tpo/network-health/analysis/-/issues/37
17:12:56 <mikeperry> it looks like they are only 1.5-2X faster than non-congestion control, which is less than shadow predicted. (it predicted 3-4X)
17:13:33 <mikeperry> I am not sure if this is because everyone has not upgraded yet, or other reasons
17:14:01 <mikeperry> it is def odd that consumed bandwidth has not risen, despite TBB upgrade: https://metrics.torproject.org/bandwidth-flags.html?start=2022-05-20&end=2022-05-31
17:14:53 <mikeperry> we can make the congestion control params more agressive.. that is one option
17:15:21 <mikeperry> we could also see what shadow says if we set our exact exit upgrade yet, but hold back all non-perfclients from upgrading
17:15:52 <GeKo> the consumed bw date on metrics goes until (and including) 05/28, hrm
17:15:59 <GeKo> *data
17:16:17 <mikeperry> it is a bit of a head scratcher
17:16:56 <ahf> since the TB upgrade is being distributed now does it make sense to wait a little bit and see if this corrects itself (the consumed bw gets a larger %) ?
17:17:24 <GeKo> even if tor browser usage were not the main usage of tor, i think it should still be visible on the consumed bw graph
17:17:31 <ahf> ok
17:17:50 <GeKo> so, yeah, maybe waiting a bit more?
17:18:17 <GeKo> otherwise we could maybe look at the data manually and figure out whether we have a bug for the consumed bw graph?
17:18:31 <jnewsome> the sim idea is interesting; would give us a bound for what "low uptake" looks like in shadow
17:18:56 <mikeperry> jnewsome: is it possible with shadow to put all markovclients on 0.4.6, but all exits and perfclients on 0.4.7? that might be a similar situation to what we have now
17:19:13 <mikeperry> juga: can you run the exit consensus upgrade check script real quick?
17:19:28 <juga> 0.84
17:19:32 <jnewsome> mikeperry: I don't remember if that option exists in the pipeline now, but if not it'd be easy to add
17:19:44 <mikeperry> juga: thanks
17:19:49 <juga> np :)
17:20:44 <jnewsome> do we know that exits are mostly on 0.4.7? I guess we have that from the relay descriptors?
17:21:09 <mikeperry> jnewsome: that's the number juga just gave. we're at 84% on 0.4.7 by consensus weight
17:21:22 <jnewsome> ah cool
17:21:56 <mikeperry> so we could input that fraction for Exits on 0.4.7 in shadow, and just hold back all markov clients, and see what that looks like
17:22:02 <juga> mikeperry: not sure how important it is, but i don't see longclaw aproximating to gabelmoo at https://metrics.torproject.org/totalcw.html and it's already using CC exits (since bwscanner_cc param was changed)
17:22:02 <jnewsome> does that mean we'd expect e.g. 84% of the onionperf measurements to be through an upgraded exit?
17:22:12 <dgoulet> hhmmm I have 63.46%
17:22:13 <dgoulet> 3946: 0.4.7             [63.46 %] (MAJOR)
17:22:19 <dgoulet> (that is weighted ^)
17:22:34 <dgoulet> (we can confirm after the meeting)
17:22:47 <juga> dgoulet: i filter the ones that allow exiting to 443 and don't have BAD flag
17:23:09 <ahf> hm, interesting with running a sim like that
17:23:12 <dgoulet> you filter out those that allow 443 ?
17:23:15 <mikeperry> there is also descriptor weight vs sbws "w Bandwidth=N Measured=1" line
17:23:45 <juga> dgoulet: yes, and i take consensus weight
17:24:08 <jnewsome> can we directly check the onionperf logs to see which/how-many of the measurements went through an upgraded exit?
17:25:21 <mikeperry> it is some work. we'd need onionperf to listen to CIRC_BW events, and cross-reference those with CIRC events
17:25:41 <jnewsome> ok, yeah maybe not worth it yet then
17:26:16 <jnewsome> though if we have it start listening to CIRC_BW now the data will be there if we decide it's worth checking
17:26:39 <mikeperry> but yeah that could be a factor. it did not look simiar to the 10%, 25%, or 50% upgrade runs in terms of CDFs
17:26:51 <mikeperry> it looked like a larger fraction than that
17:27:44 <mikeperry> juga: longclaw and gabelmoo are both on 0.4.7.7 and sbws 1.5.2 now?
17:27:52 <dgoulet> juga: strange that I don't get the same :S ... would be curious to see your script so we can fix the health team helper scripts
17:28:16 <juga> mikeperry: i think gabelmoo is not using sbws 1.5.2 yet, i can check
17:28:39 <GeKo> (and i don't think it's on 0.4.7.7 either)
17:28:56 <juga> dgoulet: yes, i'm looking at exits with 2 in flowctrl, not the tor version, can pass you the link to the code in some secs
17:28:59 <GeKo> i'd assume sebastian would have notified the dir-auth thread otherwise
17:30:25 <juga> dgoulet: you would need to dig into the other functions too :/ https://gitlab.torproject.org/tpo/network-health/sbws/-/blob/m15/sbws/core/flowctrl2.py#L183
17:30:39 <dgoulet> awseome thanks
17:31:26 <mikeperry> so for sbws, yeah lets get those two upgraded. is bastet still yoloing?
17:31:43 <juga> faravahar you mean?
17:32:00 <mikeperry> I thought bastet was on 0.4.7 but not sbws 1.5.x
17:32:08 <juga> ah!
17:32:09 <GeKo> i think that's right
17:32:12 <juga> yes, i think so
17:32:45 <mikeperry> it is interesting that all of them went up around the time that cc_alg=2 was set
17:32:56 <mikeperry> maybe they are all yoloing on 0.4.7, despite the ask
17:33:06 <juga> i can check
17:33:23 <mikeperry> in which case, might as well just get them to yolo onto sbws 1.5.2 too :)
17:33:36 <juga> lol
17:34:24 <juga> only 2 using 0.4.7.7
17:34:30 <juga> (guess longclaw and bastet)
17:34:56 <jnewsome> mikeperry: you should be able to do the sim you want now with PL_TORV2_EXIT_BW_UP_FRAC and PL_TORV2_BG_CLIENT_BW_UP_FRAC
17:35:31 <mikeperry> very strange.. this could mean that the upgrade to congestion control freed up capacity for sbws to measure even with 0.4.6
17:35:47 <mikeperry> that could mean that in some cases, 0.4.6 can in fact out-compete congestion control
17:36:01 <mikeperry> which might explain some of this behavior
17:36:27 <mikeperry> jnewsome: very nice, live shadow hacks!
17:36:35 <mikeperry> ok I can try a sim after the meeting
17:36:38 <ahf> nice
17:36:43 <juga> mikeperry: yes, re. longclaw and bastetr, only lonclaw using sbws 1.5.x
17:38:30 <mikeperry> GeKo: did the overload we saw last week go down? there was a spike but I think it was onion service noise again
17:38:48 <mikeperry> curious about that and any other netthealth reports
17:39:05 <GeKo> yeah, as mentioned on another channel here is an updated graph:
17:39:07 <GeKo> https://share.riseup.net/#0_wdcsiggs-LI9ptk9LeWQ
17:39:17 <GeKo> so the guard overload seems indeed to go down
17:39:51 <ahf> hmmm
17:39:54 <GeKo> there is a spike in exit overloads where there about 100 additional ones get added on 05/28
17:39:55 <mikeperry> GeKo: ooh but exit overload is increasing as of tbb update
17:40:02 <GeKo> i don't think so
17:40:03 <mikeperry> oh, so unrelated?
17:40:17 <GeKo> i looked at it and that's niftybunny's relays
17:40:18 <mikeperry> they got added and immediately were overloaded?
17:40:24 <GeKo> which are still on 0.4.6.8 o_O
17:40:43 <GeKo> and that version still had some bugs we fixed later on
17:40:43 <mikeperry> ohh so that had false positives in it still
17:40:50 <mikeperry> nifty indeed
17:41:02 <GeKo> yeah, i am inclined to think this is cc unrelated
17:41:35 <GeKo> so from the overload side things look okay-ish imo
17:41:54 <GeKo> i got no new reports either by relay operators etc. complain on irc or somewhere else
17:42:31 <mikeperry> ok.. so I will run a sim or two with jnewsome bg client hax and we can see if perf is similar to what we see now on live. in which case we should not mess with things
17:42:57 <mikeperry> but if shadow says it still should be faster.. hrmm.. I might get itchy to jack up the cc params, esp if overload stays low
17:43:42 <mikeperry> dgoulet: also I see https://gitlab.torproject.org/tpo/core/tor/-/issues/40620.. that one might be annoying to find.. the function that is in gets called from all over the place :/
17:44:07 <mikeperry> connection_start_reading()
17:44:25 <mikeperry> so one or more callpoints is probably not checking the XOFF state first
17:44:50 <dgoulet> right
17:44:52 <dgoulet> :S
17:45:25 <mikeperry> anyway it is doing the "right thing" there.. it maybe just should be rate limited
17:45:55 <mikeperry> or info idk
17:46:52 <mikeperry> ok I think that is all I have for s61
17:47:32 <ahf> nice
17:47:35 <ahf> anything else for today?
17:47:41 * juga is good
17:48:13 <GeKo> <- too
17:48:38 <ahf> let's call it then, thanks all for joining!
17:48:41 <ahf> #endmeeting