14:58:26 <karsten> #startmeeting metrics team meeting 14:58:27 <MeetBot> Meeting started Thu Sep 10 14:58:26 2020 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:58:27 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 14:58:30 <karsten> okay, let's start! 14:58:32 <mikeperry> o/ 14:58:33 <karsten> https://pad.riseup.net/p/tor-metricsteam-2020.1-keep 14:58:36 <karsten> hi mikeperry! 14:58:55 <karsten> are there more topics for today's agenda? 14:59:48 <karsten> unfortunately, acute cannot be here today. we briefly talked before the meeting. she'll read the notes. 15:00:27 <karsten> okay, let's start with the agenda. 15:00:33 <karsten> Priority on OnionPerf project to finish in the next week. 15:00:48 <karsten> it's indeed just 1.5 weeks until I go on leave. 15:01:09 <karsten> we should try to get one more onionperf version out by then. and deployed. 15:01:36 <karsten> we should probably try to get all features merged by next thursday. 15:01:48 <mikeperry> I successfully built 0.7ish. started it up last night 15:01:52 <karsten> and postpone everything that's not ready. 15:01:58 <karsten> yay! 15:02:12 <karsten> 0.7ish means master? 15:02:16 <mikeperry> yah 15:02:21 <karsten> okay, cool! 15:02:30 <mikeperry> c8275b25e4afda9328634ec6be56ff46c7ee1cfe 15:02:48 <karsten> yep. 15:02:52 <karsten> any surprises? 15:03:33 <mikeperry> the path to tgen was slightly off from the docs. there is an additional 'src' in my tgen build 15:03:40 <mikeperry> otherwise it was mostly cut+paste 15:04:01 <mikeperry> https://gitlab.torproject.org/tpo/metrics/onionperf#starting-and-stopping-measurements is the section with the tgen path 15:04:03 <karsten> will check that 'src' thing. there was a change between tgen 0.0.1 and 1.0.0. 15:04:15 <karsten> ok. 15:04:42 <mikeperry> i am on 048bcc8e2421320c9a27763612b82e86c3c6e683 for tgen fwiw 15:05:05 <karsten> yep, that's the latest. 15:05:12 <karsten> I'll check that. 15:05:49 <karsten> let us know how this works out, including analysis/visualization. 15:06:29 <karsten> shall we go through the open issues for 0.8? 15:06:56 <karsten> https://gitlab.torproject.org/tpo/metrics/onionperf/-/boards 15:07:14 <karsten> tpo/metrics/onionperf#33260 is almost done. 15:07:50 <karsten> acute and I discussed some final tweaks today. we're good there. 15:07:55 <karsten> just a final revision and review. 15:08:25 <karsten> tpo/metrics/onionperf#33421 has a comment from mikeperry that I just replied to. 15:08:44 <karsten> I think the concept is clear now, this needs some code cleanup, then review, possibly revisions, and the merge. 15:08:51 <karsten> on track for 0.8, I'd say. 15:09:10 <karsten> tpo/metrics/onionperf#40001 is also in good shape. 15:09:45 <karsten> there might still be placeholders in the new documentation for parts that we'll write later, but the important stuff will be documented. 15:09:52 <karsten> also in time for 0.8. 15:10:11 <karsten> tpo/metrics/onionperf#33420 is more critical. 15:10:20 <karsten> we may not have enough time to do this properly. 15:10:45 <karsten> I'd like to leave it at the end just in case there's still time to do it. 15:10:54 <karsten> but we shouldn't rush this, or we won't do it right. 15:10:54 <mikeperry> is it possible to have onionperf just record the BUILDTIKMEOUT_SET line directly on the side? 15:11:07 <mikeperry> that might be enough for initial testing/experiments 15:11:19 <karsten> onionperf should already log that event. 15:11:33 <karsten> I'll have to check, but I believe it's written to the .torctl.log files. 15:11:42 <karsten> that would be in `onionperf measure`. 15:11:54 <mikeperry> ok 15:11:56 <karsten> onionperf would not do anything with that event though. 15:12:14 <karsten> in the analyze/visualize modes. 15:12:31 <karsten> you could check in your logs. 15:13:16 <karsten> okay, that's all about those last remaining four issues. 15:13:22 <mikeperry> that might be ok. we should not be changing the tuning parameter (cbtquantile) that much.. logs may suffice 15:13:31 <mikeperry> so it is fine for that one to be lower priority 15:13:37 <karsten> sounds good. 15:14:04 <karsten> in fact, having some real data would make it easier to implement that feature. 15:14:16 <karsten> so, in 2-3 months it might be easier to build it. 15:14:42 <karsten> let's briefly discuss one thing about tpo/metrics/onionperf#40001 here: 15:15:06 <karsten> acute and I have been talking about serving measurement data tarballs from somewhere. 15:15:21 <karsten> those tarballs are ~100G right now and more in the future. 15:15:34 <karsten> one option would be asking for a tp.o host. 15:15:38 <karsten> another option would be S3. 15:16:08 <karsten> where S3 would be related to also moving instances to AWS. 15:16:21 <mikeperry> and that is the data needed to re-graph and filter results? 15:16:43 <karsten> no, that data is tiny in comparison. 15:16:46 <karsten> it would be the full logs. 15:17:04 <karsten> tarballs of the onionperf-data/ directories produced by `onionperf measure`. 15:17:28 <karsten> those would be relevant for extracting other parts using `onionperf analyze` than we're extracting right now. 15:17:37 <karsten> or for grepping/parsing the logs for other things. 15:18:14 <karsten> the json files required for filtering/re-graphing would still be archived by collector. 15:19:00 <karsten> is this a discussion to have with the admins? 15:20:01 <karsten> I'll bring it up there. :) 15:20:27 <mikeperry> hrm my tbb is failing to download the instructions.md from 40001 15:21:02 <mikeperry> just says failed in the tbb download manager :/ 15:21:15 <karsten> :( 15:21:21 <karsten> what about the .html? 15:22:14 <mikeperry> aha I was in some other downloads directory other than system one. permissions issue :) 15:23:16 <karsten> okay. 15:23:35 <karsten> we'll work on making those instructions even more accessible then. ;) 15:23:49 <mikeperry> so my main concern right now is how do I graph and examine my custom onionperf instance data 15:23:57 <mikeperry> do I need my own collector instance for that? 15:24:02 <karsten> no! 15:24:17 <karsten> just use `onionperf visualize` for that. 15:24:29 <karsten> README.md has some instructions. 15:24:40 <mikeperry> ok and if I want to add any custom graphs? are there examples? 15:24:41 <karsten> that mode produces a PDF file and a CSV file. 15:25:17 <karsten> hmm. you'll probably want to look at onionperf/visualization.py and go from the existing code. 15:25:40 <karsten> or you could take the CSV file and use another graphing tool that you're more familiar with. 15:26:13 <karsten> and if you need even more, you could always ask acute or phw_ for help at these meetings. 15:26:19 <mikeperry> I am a newbie-level grapher.. so any examples, especially python ones, will help me get going 15:26:51 <mikeperry> I've used python things before.. stuff in numpy I think 15:27:20 <karsten> take a look through https://gitlab.torproject.org/tpo/metrics/onionperf/-/blob/master/onionperf/visualization.py 15:27:30 <karsten> the part about extracting data is craziness. 15:27:43 <karsten> but the parts about visualization things are quite readable. 15:28:08 <karsten> I think it's easiest to modify the code there and re-run the visualize mode. 15:28:09 <mikeperry> does an 'onionperf analyze' step always have to run before 'visualize'? 15:28:14 <karsten> no. 15:28:33 <karsten> the output from `onionperf analyze` is the json file that is the input to `onionperf visualize`. 15:28:52 <karsten> or json file_s_. you can have directories of those as input to `onionperf visualize`. 15:29:20 <mikeperry> has dennis_jackson worked with this data? he is very good at dataviz I have noticed. perhaps I can pester him too 15:29:27 <karsten> oh, right! 15:29:33 <karsten> he can for sure help. 15:30:50 <karsten> okay, let's move to the second agenda item? 15:31:22 <alsmith> sounds good to me. 15:31:26 <gaba> if nothing else you could use https://app.rawgraphs.io/ with a csv :P 15:31:28 <gaba> sounds good 15:31:47 <karsten> Funding proposal for next OnionPerf phase (link in the mail that was sent). 15:32:14 <alsmith> we have an opportunity to apply for amazon’s ‘AWS imagine grant’ 15:32:14 <karsten> so, this funding would be for AWS resources and development? 15:32:21 <karsten> please go ahead. 15:32:45 <alsmith> yes - as karsten said, this would be for a grant that involves AWS resources and development. 15:33:40 <alsmith> i know we would like to increase the geographic diversity of the network by spinning up new onion perf instances — and we could pay for the work involved with that, plus a year of aws hosting as one objective 15:33:52 <karsten> yay! 15:34:07 <gaba> that would be awesome 15:34:14 <alsmith> but there’s more $ available. i spoke with gaba and we brainstormed — would it be possible to take the onionperf work from the old MOSS proposal (work that hasn’t been done yet) and glue these ideas together 15:34:29 <alsmith> ?* 15:34:54 <karsten> good question. 15:35:09 <karsten> regarding hosting, does there have to be a one year limit? 15:35:19 <alsmith> in the pad, i took two objects from our old versions of the MOSS proposal and copy/pasted 15:35:19 <karsten> stated differently, what happens when that year is over? 15:35:42 <alsmith> karsten - yes, it is limited to a year. so we would need to be sure we can pay for the ~$4k out of tor’s pocket when that is over 15:36:34 <karsten> regarding development work, we already did a lot in the past few months. 15:36:51 <karsten> the parts that I suggested in my mail were related to large-scale deployment and monitoring. 15:37:16 <karsten> given how long we took last week to set up the latest set of onionperf instances, 15:37:33 <karsten> it would be really important to automate that more if we go from 3 to 9 instances or more. 15:38:10 <alsmith> ok, that makes sense. so step 0 to ‘increasing onionperf geographical diversity’ is automation work 15:38:11 <karsten> same with monitoring that those 9 instances stay online. 15:38:18 <karsten> yes, I think so. 15:39:21 <alsmith> is there visualization improvement work that we could include? i think making the results of this project publicly consumable will be important 15:40:26 <karsten> maybe we should work on the Tor Metrics graphs for this. 15:40:32 <mikeperry> how about stability and monitoring work? things like failover instances, data merging in the event of failure, etc? 15:40:53 <karsten> right now they're designed for 3 instances, but we already reach a limit there with changing sets of 3 instances over time. 15:41:38 <mikeperry> missing gaps in our measurements is a big problem I have had while casually digging on https://metrics.torproject.org 15:42:08 <karsten> I think stability has become better in the past few months. 15:42:16 <mikeperry> idk if/what more can be done 15:42:27 <karsten> monitoring is important. 15:42:32 <karsten> we can do more there. 15:42:47 <alsmith> your mail talks about Monit 15:42:51 <karsten> yep. 15:43:30 <karsten> other than that I think we should have enough data that we can tolerate missing data from single failing instances. 15:43:38 <karsten> well, make sure we have enough data, that is. 15:44:19 <karsten> in all these scaling considerations we'll have to keep in mind that resources are available for 1 year only. 15:44:35 <mikeperry> ah yes 15:44:40 <karsten> if we scale too much now, we'll have to scale down more in 1 year. 15:44:40 <alsmith> right, yes 15:45:11 <gaba> or it will be hard to mantain. 15:45:13 <karsten> I mentioned serving data in my mail. 15:45:26 <karsten> we'll want to serve measurement data of all these instances. 15:45:36 <karsten> we need processes and documentation and guidelines for that. 15:46:05 <karsten> we already need this for our three instances, but it will be more work for 9 or even more instances. 15:46:21 <karsten> this will be clearer once tpo/metrics/onionperf#40001 is a thing. 15:46:23 <alsmith> would that kind of work go under a ‘visualization’ objective? sorry, i’m not totally sure what serving the data means — like getting it to metrics.tpo? 15:46:42 <karsten> making sure that tarballs go to S3 and are linked on the right pages. 15:47:06 <karsten> together with configuration details, maybe after passing a validation script that everything in the tarball is good data. 15:47:19 <karsten> this will be even more important for experiments. 15:47:24 <alsmith> i see 15:47:33 <alsmith> processing 15:47:39 <karsten> the boring part of doing experiments: documenting what you did. 15:48:33 <gaba> right now we have 4 objectives that would be important (they are in the meeting pad). Where documentation would go? 15:48:39 <gaba> nevermind 15:48:41 <gaba> obj 4 15:48:54 <alsmith> i added it to obj 4 — but that can be moved if it doesn’t make sense 15:48:58 <karsten> what's the difference between 1 and 2? 15:49:29 <alsmith> my understanding is that we need to develop automated deployment tools 15:49:33 <karsten> 1 is writing the scripts, and 2 is executing them and making sure everything's deployed? 15:49:35 <alsmith> then use them to deploy 9 instances 15:49:51 <alsmith> right, yes 15:50:03 <alsmith> does that make sense? 15:50:04 <gaba> right. They could be combined in one obj 15:50:16 <alsmith> got it 15:50:24 <karsten> hmm. 15:50:38 <karsten> they can be separate, I just didn't understand the difference. I do understand now. 15:50:47 <karsten> maybe objective 2 should be at the start. 15:50:58 <karsten> that's what we want to do: have more measurements in more places. 15:51:22 <karsten> automating this should happen early in the project, but we might start with setting up things manually and improving automation over time. 15:51:36 <karsten> same with monitoring. we would start with the simple monitoring we do now and improve over time. 15:51:48 <karsten> the important thing is to start doing measurements as soon as we have the resources available. 15:52:03 <karsten> and note how we set up a new set of measurement instances every month right now. 15:52:20 <karsten> we would likely start with a manual setup on day 1, even if that takes the whole day. 15:52:29 <karsten> and be happy how it only takes 2 hours the month after. 15:52:42 <gaba> ok 15:53:25 <karsten> we should include acute in this conversation. 15:53:40 <karsten> she probably has many ideas on the automation part. 15:54:09 <karsten> phw_: do you have thoughts on scaling up monitoring if we have 9 or more onionperf instances? 15:54:31 <mikeperry> ooh could we use aws for large shadow simulations? 15:54:40 <karsten> oh, maybe! 15:54:43 <mikeperry> that could be a temporary usage of the 1yr capacity 15:54:49 <karsten> absolutely. 15:55:02 <mikeperry> I will need machines for sims like that 15:55:22 <gaba> as next step alsmith: you are ok adding all this to the google doc where we have the proposal? and we all continue writing it there? 15:55:29 <mikeperry> the previous plan was to beg pastly but I bet that would be complicated because NRL 15:55:40 <alsmith> gaba - yes 15:56:07 <karsten> mikeperry: maybe we'll need an objective for making sure that shadow and onionperf results are comparable in some regard. 15:56:26 <karsten> rather than just "give us resources so that we can run large simulations." 15:56:36 <phw> karsten: right now we have a single monitor that checks our instances, right? 15:56:53 <karsten> phw: we have a monitor on each instance, and they all check themselves and each other. 15:56:57 <alsmith> what’s the best way to flesh out these objectives and activities outside of the meeting, once we move it to the google doc? i need to jump to another meeting in 2 min 15:57:12 <karsten> phw: but we do not have a central monitoring instance. 15:57:36 <karsten> alsmith: can we invite everyone who participated in this discussion, plus acute, and talk more vial mail/gdoc? 15:57:40 <phw> karsten: i have nothing useful to add off the top of my head. my monit experience is based on a central monitoring instance 15:57:44 <gaba> alsmith: what aobut changing the objectives and proposal based on what we talked about it now and then continue in the email? 15:57:58 <karsten> phw: okay! 15:57:59 <alsmith> gaba & karsten - sounds good 15:58:05 <karsten> we should also ask hiro! 15:58:08 <gaba> we can call to a voice meeting if we need to after the discussion on email 15:58:14 <alsmith> ok! 15:58:17 <karsten> great! 15:58:21 <karsten> gotta end the meeting now. 15:58:28 <karsten> thanks, everyone! bye! o/ 15:58:31 <alsmith> thank you everyone o/ 15:58:34 <gaba> o/ 15:58:34 <karsten> clearing the channel in 5, 4, ... 15:58:38 <karsten> #endmeeting