14:58:48 <karsten> #startmeeting metrics team meeting 14:58:48 <MeetBot> Meeting started Thu May 28 14:58:48 2020 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:58:48 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 14:58:56 <karsten> https://pad.riseup.net/p/tor-metricsteam-2020.1-keep <- agenda pad 14:59:11 <karsten> please add topics you want to discuss today. 14:59:41 <acute> hi! 14:59:46 <karsten> hi! 15:00:07 <gaba> hi! 15:00:11 <karsten> hi! 15:01:45 <karsten> giving another minute or so for people to add more topics. 15:01:55 <gaba> ok 15:02:44 <karsten> alright. 15:02:50 <karsten> let's start! 15:02:55 <karsten> Find out why onion service measurements have gotten slower (#34303) 15:03:09 <karsten> so, it looks like this is soon going to be resolved. 15:03:21 <karsten> right not it's in network team land, I think. 15:03:21 <gaba> nice! 15:03:25 <karsten> right now* 15:03:30 <acute> really good catch! 15:03:35 <karsten> :) 15:03:53 <karsten> if you're still running op-ab, I think you can stop that now. but don't delete the instance just yet. 15:04:06 <acute> understood 15:04:12 <karsten> thanks! 15:04:20 <karsten> so, the question on the pad: 15:04:25 <karsten> How do we catch future '#34303's? 15:04:42 <karsten> monitoring with expected bounds? 15:05:05 <karsten> where we would have to define these bounds per instance. 15:05:25 <karsten> and if measurements get faster or slower, we'll put out a warning. 15:05:33 <dennis_jackson> This would be my suggestion. There's obviously a lot of incoming time series, there's no way the metrics team can catch them all by manual inspection 15:05:34 <karsten> something like that? 15:05:34 <acute> we could have some instances running the latest op/tor code, that get updated when the software does? 15:05:57 <dennis_jackson> I think that's a really nice idea. Diversity in Tor version and/or config 15:06:45 <dennis_jackson> #34257 is also quite related to this - where Karsten's eye catches some strange behaviour 15:07:34 <acute> we would at least see if software changes produce changes in our results 15:08:01 <karsten> note that the current state of things is that we don't have any real monitoring in place. 15:08:32 <karsten> monitoring the three long-running instances is a matter of me running a local script that fetches the latest tgen logs, greps them, and I look at the last heartbeat message. 15:09:07 <karsten> but yes, it's good to keep this in mind when we improve our monitoring capabilities. 15:09:43 <acute> hmm, reminds me of #28271 15:09:46 <karsten> and yes, #34257 could be part of this, too. another metric to keep an eye on. though harder to spot. 15:10:35 <karsten> yeah, #28271 needs more attention. 15:11:33 <karsten> it's on the list, so we're not going to forget about it. 15:12:15 <karsten> okay. noted as something to keep in mind in the future! moving on? 15:12:59 <karsten> Analyze unusual distribution of time to extend to first hop in circuit (#34257) 15:13:10 <karsten> this one is yet unresolved. 15:13:33 <karsten> one question here: do we still need that other hong kong instance? 15:13:52 <karsten> and do we need other measurements? 15:14:28 <karsten> and do we need other measurements? <- last thing I wrote before you left. 15:14:39 <dennis_jackson> Sorry, lost connection for a moment 15:15:02 <dennis_jackson> To my mind, not immediately 15:15:29 <karsten> what's the best way forward to further investigate this? 15:16:15 <dennis_jackson> I was thinking of sitting down with the raw logs and looking at the actual initiation times 15:16:24 <karsten> this is also less urgent than #34303. 15:16:36 <karsten> yes, that makes sense. 15:16:49 <dennis_jackson> To me, it really feels like a onionperf bug, because the performance is definitely not this bad with Tor Browser 15:17:21 <karsten> do you have times to compare? 15:17:28 <karsten> circuit build times, that is? 15:17:52 <karsten> but why would those be different? 15:18:01 <dennis_jackson> I think I have start2req times from GCP Instances in Hong Kong and they look normal 15:18:18 <karsten> ah, I was thinking of the time to build the first hop. 15:18:41 <dennis_jackson> Well I think the error for start2req is much larger in magnitude 15:18:46 <karsten> the start2req suffer from #34303, of course. 15:19:00 <karsten> in the onion case. 15:19:56 <dennis_jackson> hm.could #34303 not also be impacting the timings here? 15:20:05 <karsten> yes! 15:20:14 <karsten> well, the start2req. 15:20:26 <karsten> unclear about circuit build times. 15:20:54 <karsten> the impact could be that newly built circuits are different from preemtively built circuits. 15:20:59 <karsten> well, I don't know. 15:21:19 <dennis_jackson> me neither, but I think we are on the same page that we need to cast the net deeper rather than wider 15:21:22 <karsten> should I change the ec2 hong kong instance to run a #34303-patched tor version? 15:21:35 <dennis_jackson> Ah, that would be great 15:21:52 <karsten> okay, I'll do that and let it collect measurements over the next days. 15:22:26 <karsten> great! 15:22:34 <dennis_jackson> Fantastic, I next hope to scrape together a few hours for Tor analysis on Saturday morning and happy to have a look if there's some data available by then 15:22:37 <karsten> so many mysteries. and we're only starting here. 15:22:45 <dennis_jackson> Haha, indeed 15:23:02 <karsten> yup, will add measurements by friday evening then. 15:23:24 <dennis_jackson> great 15:23:37 <karsten> cool! moving on: 15:23:41 <karsten> Harmonize TTFB/TTLB definitions with Tor Metrics plots (#34215) 15:23:59 <karsten> maybe we can decide what to do here. 15:24:40 <karsten> I'd like to change the onionperf graphs to show the same TTFB/TTLB as the metrics website graphs. 15:25:00 <karsten> if we don't do that, we'll need to make a new plan. 15:25:19 <dennis_jackson> I have never used onionperf to do plotting, so I can't stake much of a comment 15:25:20 <karsten> any objections here to make that change? (the patch is trivial.) 15:25:24 <dennis_jackson> Harmonising sounds great though 15:25:35 <karsten> ok. 15:25:42 <karsten> acute: what do you think? 15:26:01 <acute> have just had a look at this 15:26:53 <acute> I think it makes sense to include tor part of the measurement in the total time, so I'd say we should do it 15:27:01 <karsten> great! 15:27:29 <karsten> I'll go ahead then. thanks! 15:27:36 <karsten> Split visualizations into public server vs. v2 onion server vs. v3 onion server measurements (#34216) 15:27:50 <karsten> this is another important change to the visualizations to make them actually useful. 15:28:10 <karsten> before this change, all measurements would be plotted together; but that doesn't work so well with public+onion measurements. 15:28:26 <karsten> this is less about the decision to do it, but about the code to review. 15:28:34 <acute> happy to review this 15:28:38 <karsten> it touches all visualizations in the onionperf code. 15:28:50 <karsten> that would be wonderful! 15:29:02 <karsten> at least the changes are pretty much the same for all graphs there. 15:29:14 <acute> cool, I'll accept it :D 15:29:21 <karsten> yay! :) 15:29:31 <karsten> thanks! 15:29:46 <karsten> Update metrics-web to only plot "official" data (#33397) 15:30:01 <karsten> I had this on the agenda for last week, but we ran out of time. 15:30:22 <karsten> I was thinking that we might want to reconsider archiving all measurements in collector. 15:30:38 <karsten> we're doing that with long-running instances, and we should keep doing that. 15:30:48 <karsten> but I'm less sure about experimental measurements. 15:30:59 <karsten> like the ones I did for #34303 and #34257. 15:31:16 <karsten> if we want to archive them, we'll want to archive more than just the .json files. 15:31:37 <karsten> I only found the issue in #34303 by reading the tor logs, for example. 15:32:14 <karsten> the question is whether we should define some guidelines for ourselves rather than build a tool. 15:32:38 <karsten> we could say that we archive a tarball of the onionperf-data/ directory after running an experiment and put that somewhere. 15:33:14 <karsten> it's just a thought. 15:33:17 <dennis_jackson> I think it is not unlikely that there would be a need for long term non-plotted measurements. But maybe there is no rush to do the work required to support that 15:33:29 <acute> experimental measurements tend to generally be more short-lived 15:33:42 <acute> so we should think about how long we keep the data for as well 15:34:33 <dennis_jackson> But experiments ran for one purpose can be useful for others 15:35:06 <karsten> I'm not yet sure about long term non-plotted measurements. 15:35:13 <dennis_jackson> E.g. when I looked back at latency measurements in the early 2010s, I would have loved to have additional high resolution samples for shorter periods. 15:35:16 <karsten> how would they differ from short-term measurements? 15:36:27 <dennis_jackson> Well, maybe you want to run OnionPerf on {X,Y,Z} Tor versions with a set of different configs 15:36:55 <karsten> yes, but we could do that. 15:37:02 <dennis_jackson> But maybe only the current release with normal config should be plotted as official? 15:37:21 <karsten> right now, we tell collector which onionperf .json files to fetch and archive. 15:37:29 <karsten> and everything that collector archives goes on the metrics website. 15:37:54 <karsten> these other long-term measurements would then run, but not be archived by collector. 15:38:03 <karsten> the files could still be available via their own web server. 15:38:05 <dennis_jackson> How would they be distributed? 15:38:56 <dennis_jackson> Okay, well, I do think having things live in Collector is easier for downstream users, but I totally see it would be effort to implement 15:39:33 <acute> karsten: this sounds like a good compromise 15:39:37 <karsten> okay. I guess we'll have to reconsider as we learn which are our main use cases. 15:39:46 <karsten> good to hear. :) 15:40:11 <karsten> okay, moving to the last topic: 15:40:12 <karsten> Fix message logging and filtering (#29369) 15:41:00 <karsten> this is "implementation-ready". :) but I'm not sure if you're looking for more work right now. 15:41:10 <karsten> maybe I should ask phw if he's interested. 15:41:21 <karsten> I hear friday is his onionperf day. 15:41:42 <karsten> let me try that. 15:41:56 <karsten> Anybody need any help with anything? 15:42:01 <karsten> last topic on the agenda. 15:42:06 <karsten> good question! 15:42:07 <phw> karsten: sure, i can take that 15:42:12 <karsten> hey! 15:42:22 <karsten> perfect! 15:42:42 <acute> :) 15:42:58 <karsten> anything we can do to unblock anyone here? 15:43:33 <acute> things are ok for me at the moment, thank you very much for all the feedback! 15:43:44 <karsten> thank you for all the input! :) 15:43:52 <dennis_jackson> All good here. Could just do with another few days in the week 15:43:58 <acute> haha 15:44:17 <karsten> that would be cool! 15:44:30 <karsten> but you would turn them into weekdays, not weekend days? ok. 15:44:50 <karsten> great! 15:45:00 <dennis_jackson> I think it'd let me turn the weekends back into actual weekends but yes :P 15:45:03 <karsten> if something comes up before the next meeting, just use email or trac. 15:45:12 <karsten> heh, good point! 15:45:28 <acute> dennis_jackson: indeed 15:45:47 <karsten> thanks, everyone! have a good rest of the week and a wonderful weekend! 15:45:52 <karsten> bye! o/ 15:45:59 <acute> bye! 15:46:03 <dennis_jackson> o/ :) 15:46:17 <karsten> #endmeeting