14:58:26 <karsten> #startmeeting metrics team meeting
14:58:27 <MeetBot> Meeting started Thu Sep 10 14:58:26 2020 UTC.  The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:58:27 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:58:30 <karsten> okay, let's start!
14:58:32 <mikeperry> o/
14:58:33 <karsten> https://pad.riseup.net/p/tor-metricsteam-2020.1-keep
14:58:36 <karsten> hi mikeperry!
14:58:55 <karsten> are there more topics for today's agenda?
14:59:48 <karsten> unfortunately, acute cannot be here today. we briefly talked before the meeting. she'll read the notes.
15:00:27 <karsten> okay, let's start with the agenda.
15:00:33 <karsten> Priority on OnionPerf project to finish in the next week.
15:00:48 <karsten> it's indeed just 1.5 weeks until I go on leave.
15:01:09 <karsten> we should try to get one more onionperf version out by then. and deployed.
15:01:36 <karsten> we should probably try to get all features merged by next thursday.
15:01:48 <mikeperry> I successfully built 0.7ish. started it up last night
15:01:52 <karsten> and postpone everything that's not ready.
15:01:58 <karsten> yay!
15:02:12 <karsten> 0.7ish means master?
15:02:16 <mikeperry> yah
15:02:21 <karsten> okay, cool!
15:02:30 <mikeperry> c8275b25e4afda9328634ec6be56ff46c7ee1cfe
15:02:48 <karsten> yep.
15:02:52 <karsten> any surprises?
15:03:33 <mikeperry> the path to tgen was slightly off from the docs. there is an additional 'src' in my tgen build
15:03:40 <mikeperry> otherwise it was mostly cut+paste
15:04:01 <mikeperry> https://gitlab.torproject.org/tpo/metrics/onionperf#starting-and-stopping-measurements is the section with the tgen path
15:04:03 <karsten> will check that 'src' thing. there was a change between tgen 0.0.1 and 1.0.0.
15:04:15 <karsten> ok.
15:04:42 <mikeperry> i am on 048bcc8e2421320c9a27763612b82e86c3c6e683 for tgen fwiw
15:05:05 <karsten> yep, that's the latest.
15:05:12 <karsten> I'll check that.
15:05:49 <karsten> let us know how this works out, including analysis/visualization.
15:06:29 <karsten> shall we go through the open issues for 0.8?
15:06:56 <karsten> https://gitlab.torproject.org/tpo/metrics/onionperf/-/boards
15:07:14 <karsten> tpo/metrics/onionperf#33260 is almost done.
15:07:50 <karsten> acute and I discussed some final tweaks today. we're good there.
15:07:55 <karsten> just a final revision and review.
15:08:25 <karsten> tpo/metrics/onionperf#33421 has a comment from mikeperry that I just replied to.
15:08:44 <karsten> I think the concept is clear now, this needs some code cleanup, then review, possibly revisions, and the merge.
15:08:51 <karsten> on track for 0.8, I'd say.
15:09:10 <karsten> tpo/metrics/onionperf#40001 is also in good shape.
15:09:45 <karsten> there might still be placeholders in the new documentation for parts that we'll write later, but the important stuff will be documented.
15:09:52 <karsten> also in time for 0.8.
15:10:11 <karsten> tpo/metrics/onionperf#33420 is more critical.
15:10:20 <karsten> we may not have enough time to do this properly.
15:10:45 <karsten> I'd like to leave it at the end just in case there's still time to do it.
15:10:54 <karsten> but we shouldn't rush this, or we won't do it right.
15:10:54 <mikeperry> is it possible to have onionperf just record the BUILDTIKMEOUT_SET line directly on the side?
15:11:07 <mikeperry> that might be enough for initial testing/experiments
15:11:19 <karsten> onionperf should already log that event.
15:11:33 <karsten> I'll have to check, but I believe it's written to the .torctl.log files.
15:11:42 <karsten> that would be in `onionperf measure`.
15:11:54 <mikeperry> ok
15:11:56 <karsten> onionperf would not do anything with that event though.
15:12:14 <karsten> in the analyze/visualize modes.
15:12:31 <karsten> you could check in your logs.
15:13:16 <karsten> okay, that's all about those last remaining four issues.
15:13:22 <mikeperry> that might be ok. we should not be changing the tuning parameter (cbtquantile) that much.. logs may suffice
15:13:31 <mikeperry> so it is fine for that one to be lower priority
15:13:37 <karsten> sounds good.
15:14:04 <karsten> in fact, having some real data would make it easier to implement that feature.
15:14:16 <karsten> so, in 2-3 months it might be easier to build it.
15:14:42 <karsten> let's briefly discuss one thing about tpo/metrics/onionperf#40001 here:
15:15:06 <karsten> acute and I have been talking about serving measurement data tarballs from somewhere.
15:15:21 <karsten> those tarballs are ~100G right now and more in the future.
15:15:34 <karsten> one option would be asking for a tp.o host.
15:15:38 <karsten> another option would be S3.
15:16:08 <karsten> where S3 would be related to also moving instances to AWS.
15:16:21 <mikeperry> and that is the data needed to re-graph and filter results?
15:16:43 <karsten> no, that data is tiny in comparison.
15:16:46 <karsten> it would be the full logs.
15:17:04 <karsten> tarballs of the onionperf-data/ directories produced by `onionperf measure`.
15:17:28 <karsten> those would be relevant for extracting other parts using `onionperf analyze` than we're extracting right now.
15:17:37 <karsten> or for grepping/parsing the logs for other things.
15:18:14 <karsten> the json files required for filtering/re-graphing would still be archived by collector.
15:19:00 <karsten> is this a discussion to have with the admins?
15:20:01 <karsten> I'll bring it up there. :)
15:20:27 <mikeperry> hrm my tbb is failing to download the instructions.md from 40001
15:21:02 <mikeperry> just says failed in the tbb download manager :/
15:21:15 <karsten> :(
15:21:21 <karsten> what about the .html?
15:22:14 <mikeperry> aha I was in some other downloads directory other than system one. permissions issue :)
15:23:16 <karsten> okay.
15:23:35 <karsten> we'll work on making those instructions even more accessible then. ;)
15:23:49 <mikeperry> so my main concern right now is how do I graph and examine my custom onionperf instance data
15:23:57 <mikeperry> do I need my own collector instance for that?
15:24:02 <karsten> no!
15:24:17 <karsten> just use `onionperf visualize` for that.
15:24:29 <karsten> README.md has some instructions.
15:24:40 <mikeperry> ok and if I want to add any custom graphs? are there examples?
15:24:41 <karsten> that mode produces a PDF file and a CSV file.
15:25:17 <karsten> hmm. you'll probably want to look at onionperf/visualization.py and go from the existing code.
15:25:40 <karsten> or you could take the CSV file and use another graphing tool that you're more familiar with.
15:26:13 <karsten> and if you need even more, you could always ask acute or phw_ for help at these meetings.
15:26:19 <mikeperry> I am a newbie-level grapher.. so any examples, especially python ones, will help me get going
15:26:51 <mikeperry> I've used python things before.. stuff in numpy I think
15:27:20 <karsten> take a look through https://gitlab.torproject.org/tpo/metrics/onionperf/-/blob/master/onionperf/visualization.py
15:27:30 <karsten> the part about extracting data is craziness.
15:27:43 <karsten> but the parts about visualization things are quite readable.
15:28:08 <karsten> I think it's easiest to modify the code there and re-run the visualize mode.
15:28:09 <mikeperry> does an 'onionperf analyze' step always have to run before 'visualize'?
15:28:14 <karsten> no.
15:28:33 <karsten> the output from `onionperf analyze` is the json file that is the input to `onionperf visualize`.
15:28:52 <karsten> or json file_s_. you can have directories of those as input to `onionperf visualize`.
15:29:20 <mikeperry> has dennis_jackson worked with this data? he is very good at dataviz I have noticed. perhaps I can pester him too
15:29:27 <karsten> oh, right!
15:29:33 <karsten> he can for sure help.
15:30:50 <karsten> okay, let's move to the second agenda item?
15:31:22 <alsmith> sounds good to me.
15:31:26 <gaba> if nothing else you could use https://app.rawgraphs.io/ with a csv :P
15:31:28 <gaba> sounds good
15:31:47 <karsten> Funding proposal for next OnionPerf phase (link in the mail that was sent).
15:32:14 <alsmith> we have an opportunity to apply for amazon’s ‘AWS imagine grant’
15:32:14 <karsten> so, this funding would be for AWS resources and development?
15:32:21 <karsten> please go ahead.
15:32:45 <alsmith> yes - as karsten said, this would be for a grant that involves AWS resources and development.
15:33:40 <alsmith> i know we would like to increase the geographic diversity of the network by spinning up new onion perf instances — and we could pay for the work involved with that, plus a year of aws hosting as one objective
15:33:52 <karsten> yay!
15:34:07 <gaba> that would be awesome
15:34:14 <alsmith> but there’s more $ available. i spoke with gaba and we brainstormed — would it be possible to take the onionperf work from the old MOSS proposal (work that hasn’t been done yet) and glue these ideas together
15:34:29 <alsmith> ?*
15:34:54 <karsten> good question.
15:35:09 <karsten> regarding hosting, does there have to be a one year limit?
15:35:19 <alsmith> in the pad, i took two objects from our old versions of the MOSS proposal and copy/pasted
15:35:19 <karsten> stated differently, what happens when that year is over?
15:35:42 <alsmith> karsten - yes, it is limited to a year. so we would need to be sure we can pay for the ~$4k out of tor’s pocket when that is over
15:36:34 <karsten> regarding development work, we already did a lot in the past few months.
15:36:51 <karsten> the parts that I suggested in my mail were related to large-scale deployment and monitoring.
15:37:16 <karsten> given how long we took last week to set up the latest set of onionperf instances,
15:37:33 <karsten> it would be really important to automate that more if we go from 3 to 9 instances or more.
15:38:10 <alsmith> ok, that makes sense. so step 0 to ‘increasing onionperf geographical diversity’ is automation work
15:38:11 <karsten> same with monitoring that those 9 instances stay online.
15:38:18 <karsten> yes, I think so.
15:39:21 <alsmith> is there visualization improvement work that we could include? i think making the results of this project publicly consumable will be important
15:40:26 <karsten> maybe we should work on the Tor Metrics graphs for this.
15:40:32 <mikeperry> how about stability and monitoring work? things like failover instances, data merging in the event of failure, etc?
15:40:53 <karsten> right now they're designed for 3 instances, but we already reach a limit there with changing sets of 3 instances over time.
15:41:38 <mikeperry> missing gaps in our measurements is a big problem I have had while casually digging on https://metrics.torproject.org
15:42:08 <karsten> I think stability has become better in the past few months.
15:42:16 <mikeperry> idk if/what more can be done
15:42:27 <karsten> monitoring is important.
15:42:32 <karsten> we can do more there.
15:42:47 <alsmith> your mail talks about Monit
15:42:51 <karsten> yep.
15:43:30 <karsten> other than that I think we should have enough data that we can tolerate missing data from single failing instances.
15:43:38 <karsten> well, make sure we have enough data, that is.
15:44:19 <karsten> in all these scaling considerations we'll have to keep in mind that resources are available for 1 year only.
15:44:35 <mikeperry> ah yes
15:44:40 <karsten> if we scale too much now, we'll have to scale down more in 1 year.
15:44:40 <alsmith> right, yes
15:45:11 <gaba> or it will be hard to mantain.
15:45:13 <karsten> I mentioned serving data in my mail.
15:45:26 <karsten> we'll want to serve measurement data of all these instances.
15:45:36 <karsten> we need processes and documentation and guidelines for that.
15:46:05 <karsten> we already need this for our three instances, but it will be more work for 9 or even more instances.
15:46:21 <karsten> this will be clearer once tpo/metrics/onionperf#40001 is a thing.
15:46:23 <alsmith> would that kind of work go under a ‘visualization’ objective? sorry, i’m not totally sure what serving the data means — like getting it to metrics.tpo?
15:46:42 <karsten> making sure that tarballs go to S3 and are linked on the right pages.
15:47:06 <karsten> together with configuration details, maybe after passing a validation script that everything in the tarball is good data.
15:47:19 <karsten> this will be even more important for experiments.
15:47:24 <alsmith> i see
15:47:33 <alsmith> processing
15:47:39 <karsten> the boring part of doing experiments: documenting what you did.
15:48:33 <gaba> right now we have 4 objectives that would be important (they are in the meeting pad). Where documentation would go?
15:48:39 <gaba> nevermind
15:48:41 <gaba> obj 4
15:48:54 <alsmith> i added it to obj 4 — but that can be moved if it doesn’t make sense
15:48:58 <karsten> what's the difference between 1 and 2?
15:49:29 <alsmith> my understanding is that we need to develop automated deployment tools
15:49:33 <karsten> 1 is writing the scripts, and 2 is executing them and making sure everything's deployed?
15:49:35 <alsmith> then use them to deploy 9 instances
15:49:51 <alsmith> right, yes
15:50:03 <alsmith> does that make sense?
15:50:04 <gaba> right. They could be combined in one obj
15:50:16 <alsmith> got it
15:50:24 <karsten> hmm.
15:50:38 <karsten> they can be separate, I just didn't understand the difference. I do understand now.
15:50:47 <karsten> maybe objective 2 should be at the start.
15:50:58 <karsten> that's what we want to do: have more measurements in more places.
15:51:22 <karsten> automating this should happen early in the project, but we might start with setting up things manually and improving automation over time.
15:51:36 <karsten> same with monitoring. we would start with the simple monitoring we do now and improve over time.
15:51:48 <karsten> the important thing is to start doing measurements as soon as we have the resources available.
15:52:03 <karsten> and note how we set up a new set of measurement instances every month right now.
15:52:20 <karsten> we would likely start with a manual setup on day 1, even if that takes the whole day.
15:52:29 <karsten> and be happy how it only takes 2 hours the month after.
15:52:42 <gaba> ok
15:53:25 <karsten> we should include acute in this conversation.
15:53:40 <karsten> she probably has many ideas on the automation part.
15:54:09 <karsten> phw_: do you have thoughts on scaling up monitoring if we have 9 or more onionperf instances?
15:54:31 <mikeperry> ooh could we use aws for large shadow simulations?
15:54:40 <karsten> oh, maybe!
15:54:43 <mikeperry> that could be a temporary usage of the 1yr capacity
15:54:49 <karsten> absolutely.
15:55:02 <mikeperry> I will need machines for sims like that
15:55:22 <gaba> as next step alsmith: you are ok adding all this to the google doc where we have the proposal? and we all continue writing it there?
15:55:29 <mikeperry> the previous plan was to beg pastly but I bet that would be complicated because NRL
15:55:40 <alsmith> gaba - yes
15:56:07 <karsten> mikeperry: maybe we'll need an objective for making sure that shadow and onionperf results are comparable in some regard.
15:56:26 <karsten> rather than just "give us resources so that we can run large simulations."
15:56:36 <phw> karsten: right now we have a single monitor that checks our instances, right?
15:56:53 <karsten> phw: we have a monitor on each instance, and they all check themselves and each other.
15:56:57 <alsmith> what’s the best way to flesh out these objectives and activities outside of the meeting, once we move it to the google doc? i need to jump to another meeting in 2 min
15:57:12 <karsten> phw: but we do not have a central monitoring instance.
15:57:36 <karsten> alsmith: can we invite everyone who participated in this discussion, plus acute, and talk more vial mail/gdoc?
15:57:40 <phw> karsten: i have nothing useful to add off the top of my head. my monit experience is based on a central monitoring instance
15:57:44 <gaba> alsmith: what aobut changing the objectives and proposal based on what we talked about it now and then continue in the email?
15:57:58 <karsten> phw: okay!
15:57:59 <alsmith> gaba & karsten - sounds good
15:58:05 <karsten> we should also ask hiro!
15:58:08 <gaba> we can call to a voice meeting if we need to after the discussion on email
15:58:14 <alsmith> ok!
15:58:17 <karsten> great!
15:58:21 <karsten> gotta end the meeting now.
15:58:28 <karsten> thanks, everyone! bye! o/
15:58:31 <alsmith> thank you everyone o/
15:58:34 <gaba> o/
15:58:34 <karsten> clearing the channel in 5, 4, ...
15:58:38 <karsten> #endmeeting