16:03:52 <asn> #startmeeting
16:03:52 <MeetBot> Meeting started Tue Dec 23 16:03:52 2014 UTC.  The chair is asn. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:03:52 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:03:59 <asn> (a bit late)
16:04:00 <ohmygodel> first, i did add a small section to the tech report
16:04:13 <ohmygodel> karsten, um, how do i do a pull request ?
16:04:29 <asn> ohmygodel: push your branch to the internet, and pass him the repo url and branch name.
16:04:29 <karsten> ohmygodel: post a link to your repo on the ticket, and I'll pull.
16:04:52 <ohmygodel> can you link the ticket
16:05:00 * karsten finds it
16:05:10 <asn> syverson: hello
16:05:14 <ohmygodel> i i don see it obviously on https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
16:05:26 <syverson> asn: hi
16:05:37 <ohmygodel> syverson im doing my status update
16:05:51 <ohmygodel> im asking karsten how to request my changes to the tech report be pulled in
16:05:54 <syverson> OK, sorry if I missed anything
16:06:31 <asn> karsten: is there a ticket for the tech report itself?
16:06:38 <asn> i know of #13509 which is for the proposal
16:06:39 <ohmygodel> asn: push my branch to the internet ?
16:06:43 <karsten> looking. please carry on with reports.
16:06:49 <ohmygodel> sorry if im dumb at this
16:06:50 <asn> ohmygodel: yeah, push it on github or whatever you use.
16:07:22 <asn> ohmygodel: since the upstream repo of the tech report is on torproject.org, you can't really use github's pull request (tm) thing.
16:07:30 <ohmygodel> ok got it
16:07:55 <asn> so you just need to push your branch to your public git repo, and post its url to karsten.
16:08:02 <asn> ohmygodel: please proceed with the status report :)
16:08:09 <ohmygodel> hm yeah so about that
16:08:17 <robgjansen> ha!
16:08:21 <karsten> hmm, okay, maybe there's no ticket.
16:08:44 <ohmygodel> i probably shouldnt publish anything under my name publicly
16:09:13 <ohmygodel> so can i push it to a private repo
16:09:16 <ohmygodel> i use bitbucket
16:09:23 <asn> why not publish with your name?
16:09:23 <ohmygodel> you can merge it if you desire, out of my hands
16:09:27 <asn> sure
16:09:33 <syverson> ohmygodel: at least not without a long lead time
16:09:33 <karsten> ohmygodel: or send me `git format-patch` files.
16:09:34 <ohmygodel> and dont put my name on until i get approval
16:09:45 <ohmygodel> i have to get approval to publish publicly under my name
16:09:50 <robgjansen> send the patch
16:09:53 <ohmygodel> yes it sucks. welcome to the government
16:10:11 <ohmygodel> ok cool ill do either the private repo link or the patch
16:10:14 <ohmygodel> probably the patch actually
16:10:20 <ohmygodel> would be easier
16:10:22 <asn> reat
16:10:22 <karsten> sure
16:10:23 <asn> great
16:10:23 <robgjansen> the patch that karsten can look over and take inspiration from, and then create his own commit
16:10:33 <ohmygodel> ok next for me
16:10:33 <karsten> a very similar one, yes.
16:10:41 <robgjansen> ;)
16:10:46 <ohmygodel> i thought a bunch about the stats and how to report them
16:10:56 <ohmygodel> and id like to discuss options for them during discussion
16:11:06 <asn> #topic stats and how to report them
16:11:07 <ohmygodel> ok thats it for me
16:11:17 * asn shrugs at MeetBot
16:11:19 <asn> great
16:11:20 <asn> next ?
16:11:32 <karsten> I can go next.
16:11:38 <asn> karsten: go for it!
16:11:39 <karsten> - Helped a bit with merging code and preparing announcement.
16:11:42 <karsten> - Looked at David’s tracepoints code and logs and tried to explain strange clusters.
16:11:45 <karsten> - Discussed detecting hidden-service crawlers with Donncha.
16:11:48 <karsten> - Helped Paul set up a mailing list.
16:11:50 <karsten> done
16:11:58 <asn> nice
16:12:08 <asn> the donncha discussion is those two mails in tor-assistants, right?
16:12:13 <karsten> yes.
16:12:18 <asn> or is it in an ml somewhere?
16:12:19 <asn> ok great
16:12:24 <asn> what about the clusters?
16:12:26 <asn> anything found?
16:12:44 <dgoulet> I need to do some more tracing with new tracepoints that karsten suggested
16:12:44 <karsten> we added another tracepoint for *sending* cells.
16:12:47 <syverson> is the donncha discussion only on tor-assistants?
16:12:56 <karsten> the one we had was for *receiving* cells.
16:12:57 <asn> syverson: unfortunately it is.
16:13:15 <karsten> turns out the delay doesn't happen at the introduction point but on the way back to the service.
16:13:22 <asn> karsten: o_o
16:13:26 <karsten> next step is to add another tracepoint to see where on that way.
16:13:32 <asn> karsten: you mean in one of the hops of the circuit?
16:13:34 <karsten> tracepoint for relaying a cell, that is.
16:13:37 <DonnchaC> O
16:13:38 <karsten> yes.
16:13:43 <asn> curazy
16:13:47 <asn> ok.
16:13:48 <DonnchaC> *I'm here
16:13:49 <dgoulet> pretty weird yah
16:13:51 <robgjansen> nice
16:13:57 <asn> next?
16:14:06 * dgoulet can go
16:14:09 <karsten> oh, and about stats:
16:14:11 <karsten> very quickly
16:14:13 <asn> dgoulet: sec
16:14:14 <karsten> hidserv-dir-onions-seen: -19.00 26.50 71.00 92.75 165.00
16:14:14 <karsten> hidserv-rend-relayed-cells: -4709 44534 129374 4021317 14015109
16:14:14 <asn> karsten: yes
16:14:22 <karsten> that's min, q1, median, q3, max.
16:14:26 <asn> karsten: where is that from?
16:14:28 <karsten> more in the discussion part maybe.
16:14:36 <karsten> that's the 42 stats we have by now.
16:14:38 <asn> karsten: great.
16:14:44 <karsten> dgoulet: go.
16:14:46 <dgoulet> karsten: hrm... you don't ahve mine?
16:14:48 <asn> dgoulet: go for it
16:15:01 <dgoulet> my relay is at 7mil cells right now and last one was at 8mil
16:15:01 <karsten> dgoulet: should be in there.
16:15:16 <karsten> what's the fingerprint again?
16:15:22 <dgoulet> karsten: ah top one is 14mil sorry
16:15:27 <karsten> ah
16:16:04 <asn> karsten: a question I had is, should non-HSDirs publish onion-seen?
16:16:14 <dgoulet> ok so quick, most of my work on sponsor I did was on some tickets here https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
16:16:22 <asn> karsten: howver, the relays themselves dont know if they are HSDirs...
16:16:26 <asn> karsten: lets keep it for discussion.
16:16:28 <asn> dgoulet: yep
16:16:29 <karsten> ok
16:16:30 <dgoulet> new tracepoints also from karsten, published all the info about it
16:16:56 <dgoulet> my next step now is to run the intro. point experiment and graph the new times from the cells in/out
16:16:59 <dgoulet> done
16:17:24 <robgjansen> ok me
16:17:41 <robgjansen> last week i worked with dgoulet to try to get the tracepoint code running in shadow
16:18:11 <robgjansen> we made progress, lttng was running correctly in shadow but we were not able to get any of the tracepoints printed
16:18:41 <robgjansen> so still no useful data on the shadow front
16:18:51 <dgoulet> robgjansen: on that, I completely failed to build shadow here ... :( (but we can talk later about that)
16:18:52 <karsten> but getting closer!
16:19:16 <robgjansen> other things came up for me, but i will have more time to spend in the next few days
16:19:18 <robgjansen> done
16:19:26 <syverson> me now
16:19:29 <asn> great
16:19:30 <asn> syverson: go
16:19:54 <syverson> talked to ohmygodel about stats, which he will tell more about in discussion
16:20:11 <syverson> set up mailing list with Karsten, will be sending out invites after this meeting
16:20:13 <syverson> done
16:20:19 <asn> ack
16:20:29 <asn> and we are done?
16:20:36 <asn> move on the phase 2?
16:20:40 <asn> or did i forget someone
16:20:42 <asn> ?
16:21:06 <asn> ok
16:21:12 <asn> i guess we can go to discussion
16:21:13 <asn> some topics:
16:21:23 <asn> - aajohnson talks about how to gather statistics
16:21:34 <asn> - karsten and me should discuss how to extrpolate from these statistics to network totals
16:21:52 <asn> - we should discuss what to do about HS count when a relay is not HSDir.
16:22:15 <asn> - we should talk maybe a bit about tech report?
16:22:16 <asn> anything else?
16:22:33 <karsten> sounds like fine topics.
16:22:36 <asn> ok
16:22:42 <asn> ohmygodel: wanna start?
16:22:51 <ohmygodel> so i looked over the tech report this week
16:23:07 <ohmygodel> and a challenge i see
16:23:26 <ohmygodel> is the statistics for which knowing the reporting relay causes a privacy problem
16:23:28 <ohmygodel> for example
16:23:49 <ohmygodel> if relays publish the number of HS descriptor fetches they receive
16:24:14 <ohmygodel> then if you know the .onion, then you know the responsible HSDirs, and then you can see how much larger or smalle
16:24:23 <ohmygodel> r the number of fetches tend to be from the responible HSDirs over time
16:24:43 <ohmygodel> and then you can infer the number of clients typically connecting to that HSDir
16:24:58 <ohmygodel> another good example is introduction points
16:25:15 <ohmygodel> if a relay reports the number of INTRODUCE1 cells it receives
16:25:43 <ohmygodel> then if you know the .onion, you can get the IPs from the descriptor, and then you can see how much larger or smaller the number of INTRODUCE1 cells its IPs tend to see
16:25:53 <ohmygodel> again revealing the number of connections to that specific HS
16:25:57 <asn> yes
16:26:15 <ohmygodel> but there are some ways around this
16:26:33 <ohmygodel> ideal, of course, would be an aggregation procedure where the total number across all relays just pops out
16:26:41 <ohmygodel> e.g. secure multi party computation
16:26:47 <ohmygodel> but i dont think we necessarily need that
16:26:58 <ohmygodel> we can anonymize the reports to hide the reporting relay
16:27:19 <ohmygodel> of course, this brings up the issue of authenticating that the report is from a valid relay
16:27:34 <asn> oh anomyzing reports before they even reach us.
16:27:39 <ohmygodel> yes exactly
16:28:04 <ohmygodel> you can deal with the authentication issue by providing blind signatures from DirAuths to relays that allow them to report a single stat
16:28:22 <asn> hah plausible
16:28:25 <ohmygodel> you can report anonymously via Tor, of course
16:28:35 <ohmygodel> or you could do something simpler and more efficient
16:28:41 <ohmygodel> and run a shuffle over the DirAuths
16:28:58 <asn> for some stats, the anonymity set can be reduced by looking at the path selection  probabilities though.
16:29:09 <syverson> ohmygodel: you'll come to delaying reporting I assume?
16:29:28 <ohmygodel> so my point is, there is a solution that seems very implementable in the near term, and would yield numbers that we really want but do not yield themselves to the model of each relay individually and identifiably reporting their stats
16:29:39 <ohmygodel> yes syverson its on my list
16:29:52 <asn> aha. which solution is that?
16:30:03 <ohmygodel> uh, the one i just outlined :-)
16:30:10 <asn> the blind sigs?
16:30:16 <ohmygodel> yse
16:30:18 <ohmygodel> *yes
16:30:31 <asn> hm. that does not immediately seem very implementable to me.
16:30:37 <ohmygodel> why not
16:30:43 <karsten> but better than multi-party things.
16:31:12 <asn> it seems easier than the multiparty thing indeed
16:31:15 <asn> but still tricky.
16:31:18 <asn> and needs security analysis.
16:31:25 <asn> i don't think the cell stats can be anonymized like that for example.
16:31:30 <asn> only big relays have big number of cells.
16:31:31 * nickm perks up at the mention of adding more crypto primitives
16:31:37 <asn> also this.
16:31:58 <asn> also, more types of directory documents.
16:32:06 <asn> and more authority crypto.
16:32:10 <syverson> shuffling through authorities is simpler still, with a little shifting around of trust going on.
16:32:31 <asn> syverson: yes. that's even simpler but a bit awkward.
16:32:32 <ohmygodel> asn that is a good point
16:32:44 <asn> "ehm now the DAs also know the popularity of HSes"
16:33:08 <syverson> threshold shuffle ;)
16:33:18 <ohmygodel> although you could avoid in the shuffle model that by having each relay submit a fixed number of votes
16:33:22 <ohmygodel> each vote is a zero or one
16:33:33 <ohmygodel> they submit the number of ones that is the count they wish to report
16:33:41 <ohmygodel> this applies to counts and to histograms
16:33:52 <asn> hah
16:34:07 <asn> like publish it in batches?
16:34:17 <ohmygodel> btw shuffles have been implemented n+1 times, moreover, it is a separate sytems whose security and insecurity would be fairly orthogonal to the rest of Tor
16:34:20 <asn> "i saw 800 hits. i send 200 to this DA, 100 to that DA, and 500 to that DA?"
16:34:36 <ohmygodel> perhaps but not exactly i had in mind
16:34:54 <ohmygodel> everybody submits 100K votes to DirAuth1
16:34:56 <asn> it still puts more trust to the DAs. i don't really like that.
16:35:03 <ohmygodel> DA1 shuffle, sends to DA2, who repeats, etc.
16:35:09 <ohmygodel> finally all votes are revealed and tallies
16:35:16 <ohmygodel> this is 90s technology, :-P
16:35:29 <ohmygodel> it is completely distributed trust
16:35:34 <ohmygodel> only one DA needs to be trustworthy
16:36:26 <syverson> ohmygodel: say more about trust assumptions you're making
16:36:41 <ohmygodel> what more can i say
16:36:42 <ohmygodel> ohmygodel: only one DA needs to be trustworthy
16:37:16 <karsten> whichever model we use, it needs to handle single failing DAs.
16:37:25 <karsten> this happens much more often than one would think.
16:37:27 <asn> karsten: good point.
16:37:52 <syverson> ohmygodel: are you saying all aggregation happens at DirAuth1 and that's it?
16:37:56 <syverson> I don't like that.
16:38:03 <ohmygodel> no
16:38:14 <syverson> so please say more.
16:38:26 <ohmygodel> everybody sends k votes, each representing a 0 or a 1
16:38:32 <ohmygodel> each dirauth shuffles in turn
16:38:37 <syverson> to DirAuth1?
16:38:56 <ohmygodel> dirauth1 is the first to shuffle
16:39:02 <asn> what is shuffle in this context?
16:39:05 * asn does not know things
16:39:07 <ohmygodel> but all relays can verify that they received all votes DA1 received
16:39:29 <asn> all DAs? or all relays?
16:39:33 <syverson> is this provable shuffles?
16:39:34 <ohmygodel> e.g. via commits to all DAs, or signatures
16:39:47 <ohmygodel> not necessarily, you could use a Dissent v1-style accountability mechanism
16:40:03 <syverson> OK good, that's what I was hoping.
16:40:12 <ohmygodel> but why not
16:40:21 <ohmygodel> provable shuffles are good too
16:40:33 <ohmygodel> anyway, i dont have all details worked out, obviously
16:40:34 <syverson> Grab some implementation from a voting application then?
16:40:39 <asn> i think maybe
16:40:40 <asn> ohmygodel:
16:40:48 <asn> the best way would be to make a tor-dev post?
16:40:54 <asn> you don't need to have all details
16:41:12 <asn> but at least a brief outline, and a brief security analysis would be nice as a mailing list post
16:41:20 <ohmygodel> but it seems to me that there is a solution in the near term, and that doing so is extremely important because the most useful statistics we are not getting have this privacy issue
16:41:22 <karsten> I'd also want to learn more about the other options.
16:41:32 <karsten> like blind signatures, and whatever else comes to mind.
16:41:43 <asn> from what I see, and I don't really understand the whole thing, this does not look  like something that can be implemented in say the next month.
16:41:53 <ohmygodel> yeah i can think of at least a couple of different ways you could do it
16:41:56 <asn> it looks like something that can maybe be implemented before summer.
16:42:01 <asn> maybe
16:42:01 <ohmygodel> there is also the issue of poising the aggregate stats
16:42:12 <ohmygodel> because your stats are no longer segregrated to yourself
16:42:40 <ohmygodel> which is an important issues, although perhaps one that need not be solve immediately
16:42:50 <karsten> I could imagine that you publish anonymously, check at a later time that your stat is contained, and then publish non-anonymously that this is the case.
16:42:50 <ohmygodel> my suggestion there would be to use robust stats
16:43:10 <ohmygodel> for example, median instead of average
16:43:25 <karsten> hmm, no, that won't work as I first thought.
16:43:31 * nickm volunteers to implement the blind-signature tweaks on top of ed25519, if somebody tells me what those tweaks are.
16:43:40 <ohmygodel> *poisoning the aggregate stats, sorry
16:43:41 * nickm has already gotten waist-deep in ed25519-ref
16:43:47 <syverson> nickm: cool
16:44:23 <ohmygodel> ok asn so i think a tor-dev post is in order
16:44:28 <asn> ohmygodel: yes please
16:44:29 <karsten> cool.
16:44:30 <ohmygodel> thanks for the suggestion
16:44:35 <asn> ok that's good.
16:44:38 <ohmygodel> i would also like to bring up some safety issues that
16:44:41 <ohmygodel> syverson and i discussed
16:44:42 <asn> shall we move tto next topic?
16:44:47 <asn> since we are approaching one hour.
16:44:56 <ohmygodel> ok i can go to the end of the line
16:45:06 <asn> ohmygodel: go for it.
16:45:08 <asn> let that be the next topic
16:45:38 <ohmygodel> ok great
16:45:45 <ohmygodel> so one issue is that of time delay
16:46:02 <ohmygodel> some stats may refer to relays that continue to act in the capacity upon which they were reported
16:46:05 <ohmygodel> for example
16:46:20 <ohmygodel> if a relay reports on the number of ESTABLISH_INTRO cells it receives
16:46:27 <ohmygodel> it may still be serving as that intro point
16:47:01 <ohmygodel> it seems harmless to utility
16:47:12 <ohmygodel> to add a time delay to many statistics
16:47:32 <ohmygodel> so they wont be reported (or released) until relays will no longer be used in certain roles
16:47:36 <karsten> yep.
16:47:37 <asn> i agree
16:47:50 <ohmygodel> the risk is small, but the utility loss is negligible, so why not
16:47:50 <asn> ohmygodel: but what's the danger to the example you described?
16:47:56 <asn> sure.
16:48:11 <ohmygodel> ok so im discussing that in the section i added to the tech report
16:48:21 <ohmygodel> which is titled “obfuscation techniques"
16:48:34 <ohmygodel> for a similar reason
16:48:37 <asn> ok.
16:48:46 <ohmygodel> statistics about circuits shouldnt be reported
16:48:50 <ohmygodel> until after the circuit has been destroyed
16:49:28 <ohmygodel> e.g. dont report cells on a circuit that still exists
16:49:37 <asn> we do this currently.
16:49:53 <ohmygodel> ok great, that wasn’t clear to me in the tech reoprt
16:50:09 <asn> but the stats we do now are innocuous enough
16:50:14 <asn> that this danger is not very real.
16:50:28 <ohmygodel> i agree
16:50:44 <asn> and tbh i'm hoping that the stats we will do in the future will also be equally innocuous .
16:50:45 <ohmygodel> its just a suggested refinement
16:50:50 <karsten> are we sure this report will be ready by jan 12?
16:51:03 <karsten> is it a problem if it's not?
16:51:04 <ohmygodel> there will be something ready by jan 12 :-)
16:51:08 <asn> karsten: some sort of draft yes.
16:51:11 <asn> karsten: i don't think so.
16:51:12 <karsten> ok.
16:51:15 <syverson> define ready.
16:51:26 <karsten> because it seems it will become even better with more time.
16:51:28 <ohmygodel> yeah i say we give them what we have at that point
16:51:32 <karsten> ready as in we won't change it ever again.
16:51:38 <karsten> (but instead write a new one....)
16:51:41 <asn> ah
16:51:42 <syverson> Oh, then no.
16:51:45 <asn> probably no.
16:51:57 <karsten> sounds good.
16:52:05 <asn> i still haven't thought of a nice format for the tech report :(
16:52:14 <karsten> pdf?
16:52:20 <karsten> ;)
16:52:26 <asn> sorry. i meant that  details/risk/benefits format is a bit misleading.
16:52:40 <ohmygodel> asnt i actually liked that
16:52:42 <karsten> ah that, true.
16:52:47 <asn> it just ends up being a big paragraph on weird benefits that are not really benefits
16:52:52 <asn> and with risks "no real risk"
16:52:57 <asn> which makes it look like the stat is actually useful
16:53:08 <ohmygodel> imo that will just take some effort
16:53:12 <asn> but in reality it's something random like "time from INTRODUCE1 to INTRODUCE_ACK" or something
16:53:32 <asn> which is pretty useless imo
16:53:40 <ohmygodel> for example syverson and i went through the first three stats in Sec. 4.2
16:53:49 <karsten> we'll have to define common criteria for what we think is useful or harmful.
16:53:51 <ohmygodel> and discussed several new risks for each
16:53:56 <karsten> and then evaluate all stats using those criteria.
16:54:11 <asn> ohmygodel: true
16:54:13 <ohmygodel> i could imagine treating various timing stats all at once though
16:54:38 <asn> it's also not clear how these stats are going to be reported?
16:54:51 <ohmygodel> e.g. some subset of times between the following sequence of events...
16:54:52 <asn> is it "all the times from INTRODUCE to INTRODUCE_ACK" or "average time" or "median time" or...
16:54:59 <asn> every decision has very different risks.
16:55:03 <karsten> asn: agreed.
16:55:08 <ohmygodel> yes that also needs to be detailed
16:55:18 <ohmygodel> i have listed all stats that i think should be reported as distributions
16:55:21 <ohmygodel> in the section i added
16:55:26 <ohmygodel> it includes all time-based stats
16:56:01 <ohmygodel> btw i have an outline of how to report such distributions safely
16:56:13 <ohmygodel> tl;dr use a noisy histogram
16:56:44 <asn> curious to read about it
16:57:04 <ohmygodel> ok yes perhaps we can discuss next week when you have had a chance
16:57:12 <asn> sure
16:57:15 <asn> so pleaese next topic?
16:57:29 <asn> karsten: we now need to work on how to extrpolate from those stats
16:57:35 <karsten> right.
16:57:39 <ohmygodel> oh on this topic
16:57:43 <asn> and the part  I'm  very curious about: how to understand how much noise we added to the strats.
16:57:47 <karsten> I only started looking at stats half an hour before the meeting.
16:57:47 <ohmygodel> syverson and i wrote up how to do this for a bunch of stats
16:58:04 <karsten> including the two we just implemented?
16:58:05 <asn> ideally in the end, we should be able to precisely upper and lower bound the stats we got.
16:58:08 <ohmygodel> can i send the writeup somewhere
16:58:12 <ohmygodel> attached to a ticket perhaps?
16:58:26 <asn> ohmygodel: ticket or mailing list all work fine.
16:58:55 <karsten> so, I think I'd start exploring the data we got by ignoring noise.
16:59:03 <asn> karsten: so we should be able to say we have 50k to 85k HSes. not "we have somewhere earound 70k HSes"
16:59:09 <ohmygodel> not including the number of relay cells
16:59:17 <karsten> I'd like to look what fraction of observations we'd expect a certain relay to see.
16:59:30 <asn> karsten: yep. that's going to be a bit tricky.
16:59:35 <asn> karsten: please look into it!
17:00:03 <karsten> for .onions, we should look at the time since the relay first got the HSDir flag,
17:00:11 <karsten> "distance" to other HSDirs, etc.
17:00:23 <karsten> for cells, it's consensus weight fraction during the stats interval,
17:00:33 <asn> karsten: hah even distance from other HSDirs?
17:00:35 <karsten> relevant flags that clients consider when selecting rendezvous points, etc.
17:00:41 <karsten> asn: well, why not.
17:00:45 <karsten> asn: worth a try.
17:00:46 <asn> karsten: i would just consider it uniform by assumption. but distance is more good.
17:01:03 <asn> the whole process will include reading old consensuses.
17:01:12 <asn> and also probably include being able to calculate the RP selection probability.
17:01:18 <karsten> right.
17:01:26 <asn> which might be different from all the other probs we already calculate.
17:01:30 <karsten> do you know how clients select RPs?
17:01:47 <karsten> ohmygodel: do you remember from writing TorPS?
17:01:49 <asn> karsten: i have some notes
17:02:04 <ohmygodel> karsten: i didnt pay attention to HS code
17:02:08 <karsten> ok.
17:02:10 <ohmygodel> i thought they were selected as middle relays are
17:02:29 <asn> ***** Sep 04 18:51:25.000 [warn] RP circuit flags:
17:02:29 <asn> need_uptime = 1  need_capacity = 1  need_guard = 0  allow_invalid = 1  weight_for_exit = 0  need_desc = 1
17:02:46 <asn> i think that's very similar to middle relays.
17:03:06 <asn> i think invalid relays are not considered for middle relays. though.
17:03:43 <ohmygodel> i thought they were
17:03:55 <karsten> are there invalid relays in the consensus?
17:03:58 <asn> the magic is in rend_services_introduce()
17:04:08 <asn> karsten: i think that's non-Valid?
17:04:12 <asn> karsten: don't remember.
17:04:24 <asn> router_crn_flags_t flags = CRN_NEED_UPTIME|CRN_NEED_DESC;
17:04:24 <asn> if (get_options()->AllowInvalid_ & ALLOW_INVALID_INTRODUCTION)
17:04:24 <asn> flags |= CRN_ALLOW_INVALID;
17:04:24 <asn> node = router_choose_random_node(exclude_nodes,
17:04:24 <asn> options->ExcludeNodes, flags);
17:04:33 <karsten> ok. step 1: consensus weight fraction, step 2: worry about flags.
17:04:39 <asn> yes
17:04:43 <karsten> and Wxx values.
17:04:47 <asn> in any case, we don't really need to find out the xact process now.
17:04:48 <dgoulet> asn: that's for IP no?
17:04:53 <asn> dgoulet: yes, i'm stupid.
17:04:55 <karsten> yup
17:04:55 <asn> dgoulet: thanks.
17:05:09 <karsten> we can figure that out.
17:05:16 <dgoulet> circuit_get_best()
17:05:17 <ohmygodel> i checked in TorPS, and the Valid flag is only checked for guard and exit nodes, not middles. That was intentional, although I could have gotten that wrong.
17:05:23 <asn> karsten: we will need to work on this over the next week or so.
17:05:41 <asn> karsten: so that we are not very siurprised on the begininng of jan when we need to do this for real.
17:05:51 <karsten> asn: right.
17:06:05 <asn> ok
17:06:08 <asn> so.
17:06:11 <asn> we are almost done?
17:06:16 * karsten puts that on the high-priority list.
17:06:26 <asn> karsten: do you want to work on that?
17:06:36 <asn> karsten: and i work on how to remove the noise to get the extrpolation error rate?
17:06:38 <ohmygodel> btw i attached that writeup to https://trac.torproject.org/projects/tor/ticket/13509
17:07:08 <ohmygodel> asn you are correct in that you should really output a distribution, not a single value
17:07:12 <karsten> asn: sounds good. I don't have good plans for handling the noise.
17:07:47 <asn> ohmygodel: ye.. we can output a single value for non-technical people. but we should be able to present a range for technical people.
17:08:00 <karsten> we know the distribution, right?
17:08:22 <ohmygodel> there actually is a good general technique to do that: bayesian inferenec via metropolis-hastings sampling
17:08:25 <karsten> so we only need to output variance or something.
17:08:30 <asn> ohmygodel: o.o
17:08:34 <ohmygodel> that way you dont need to attempt to do an explicit calculation
17:09:02 <karsten> and variance depends on the factor we applied to the originally reported value.
17:09:03 <ohmygodel> the question is, given the output, how likely is each input to have yielded that value
17:09:08 <ohmygodel> you generally assume a uniform prior
17:10:25 <ohmygodel> i dont expect you to necessarily do that, but maybe its easier than you think
17:11:02 <ohmygodel> a relevant paper on doing this for differentially-private statistics: “Probabilistic Inference and Differential Privacy” by
17:11:03 <ohmygodel> Oliver Williams and Frank McSherry, NIPS 2010, http://research.microsoft.com/apps/pubs/default.aspx?id=142363
17:11:18 <asn> ok i will look into it
17:11:38 <asn> just doing the noise removal for binning is easy
17:11:44 <asn> that is finding the upper and lower bound.
17:11:57 <asn> i need to think a bit more what happens when it's combined with the additive noise.
17:12:02 <asn> thx for the links btw
17:12:28 <asn> ok guys.
17:12:30 <karsten> ok. I'll explore the stats we received and let you know in a few days.
17:12:36 <asn> i think we should call it a day for today.
17:12:38 <ohmygodel> ill email out a patch
17:12:40 <ohmygodel> to karsten right ?
17:12:46 <karsten> yes
17:12:53 <ohmygodel> great thx
17:13:03 <asn> karsten: and we can be in contact over IRC.
17:13:10 <asn> for the extrapolation activity.
17:13:10 <karsten> be sure to edit the git author to whatever you want to see there.
17:13:18 <asn> and dgoulet, you do ... ?
17:13:18 <karsten> asn: sure.
17:13:20 <ohmygodel> ah ic thx
17:13:21 <asn> it's somewhere up in the backlog.
17:13:27 <dgoulet> asn: I do .. ?
17:13:32 <asn> dgoulet: i don't remember.
17:13:36 <asn> but it's somewhere up in the backlog.
17:13:47 <asn> dgoulet: what will you be doing next week? :)
17:14:17 <dgoulet> asn: more analysis on in/out cells, also start preparing a compact summary that Roger will need for Jan. meeting for his 15 min talk :P
17:14:25 <asn> there are also some christianity festivites going on this week.
17:14:27 <dgoulet> (also will be a bit afk for holidays)
17:14:30 <asn> dgoulet: great
17:14:51 <asn> ok. i think we have some sort of plan laid down for hte next week.
17:14:54 <karsten> asn: we can also talk more at 31c3.
17:14:57 <robgjansen> hopefully getting dgoulet's tracing running correctly in shadow
17:14:58 <nickm> dgoulet: I just had a question on your #13667
17:14:59 <asn> karsten: you are coming?
17:15:03 <asn> karsten: that's great!
17:15:08 <karsten> asn: yes, last two days.
17:15:08 <ohmygodel> ok sayonara amigos
17:15:10 <dgoulet> nickm: ack
17:15:20 <asn> karsten: fantastic. i'm not going to be there the last day.
17:15:23 <dgoulet> thanks all!
17:15:25 <asn> karsten: but we meed the second to last.
17:15:28 <asn> thanks!
17:15:32 * asn relocates
17:15:33 <karsten> thanks!
17:15:34 <asn> bbl
17:15:37 <karsten> asn: sounds good.
17:15:38 <asn> #endmeeting