17:00:53 #startmeeting Tor Congestion Control Research 17:00:53 Meeting started Wed Aug 15 17:00:53 2018 UTC. The chair is mikeperry. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:53 Useful Commands: #action #agreed #help #info #idea #link #topic. 17:01:08 taking a sec to look over the agenda 17:01:38 Yeah, reminder it is at https://pad.riseup.net/p/TorCongestionControl-keep 17:02:05 and it is now reloading in my window so I can't see it :) 17:03:22 so anyway the first thing that I wanted to do was try to consolidate some of the folklore attacks that has been floating around wrt datagram transports 17:04:39 Note that "congestion control" is not necessarily the same as "datagram transport" 17:04:43 we've had a few tech reports about various approaches, and a couple research implementations, but there is not a real comprehensive treatment of security+privacy issues with this approach 17:04:52 i am just barely here. am sending my two additions to the research question list via email. 17:05:54 iang: yes, right. and we also need to figure out if any of the folklore generalizes to all congestion control, or if some of it is inherently a property of datagrams 17:06:24 Just to clarify: are the concerns and issues about circuit-wide datagram transport? Or hop-by-hop? 17:06:51 some (e.g. hopper) works better however Tor gets more responsive, regardless of what's happening at the network 17:07:10 and many of these things actually apply to Tor as is. but the channels are just more noisy 17:07:20 correct 17:07:43 so after we all go through this list, a key thing for later discussion will be how to measure this so that it is comparable to current Tor 17:07:51 and I don't think the answer to that is "let's keep Tor performance bad so that these attacks don't get better" 17:08:19 we should address the attacks in a better way 17:09:01 i think it is a good point to distinguish datagram transports from congestion control, and changes how we address concerns 17:09:20 agree 17:10:07 ok sounds good 17:10:23 A lot of the problems listed on that list are datagram-specific 17:10:54 So basically I see the Part I.B questions 3-4 being related to datagram vs non-datagram congestion control 17:11:11 ideally we should look at both, but they could be analyzed separately (which hopefully will help clarify pros/cons on either side) 17:11:53 I can speak to the datagram piece. datagram and drop-signaled congestion control allows us to put bounds on the memory consumption and queues of relays -- no more OOM/Sniper attacks. Those becomes congestion attacks instead. 17:12:35 IMO the problem with queues are performance, not OOM. 17:12:37 also, moving away from TCP ontol a more secure connection model eliminates "Man-on-the-side" connection termination vectors.. which I think may be easier to pull off against Tor than we have considered.. 17:12:44 mikeperry: that 17:12:53 s for the end-to-end datagram model 17:13:05 is that the only model on the table? In Rome, we talked about both 17:13:06 yes 17:13:11 i agree with nickm, based on experiences troubleshooting bufferbloat-related issues 17:13:15 and arma's email just now says both 17:14:02 nickm: we have seen a couple research papers now that deliberately trigger the circuit OOM killer as part of their attacks 17:14:54 hm, fair enough 17:15:17 But I don't think that the reason to move to datagram is to avoid that: it creates far more sidechannels than it closes. 17:15:22 At least, so it seems 17:15:33 I think the big reason people have been talking about datagram is for performance, right? 17:15:45 iang: if you want to enumerate alternatives on the pad, that would be useful 17:16:34 nickm: yes. that's also part of why I want to have this meeting. because it's clear we need to study more than just performance 17:17:02 right 17:17:53 We can assume passive traffic confirmation to be easier, but we're fine with that right? 17:17:53 nickm: I also think that the types of "new" side channels in datagram are forms of side channels that already exist in Tor, but are more noisy.. like from an information-theoretic perspective, a drop is a less noisy form of "pause throughput for a while" 17:18:11 I think of these as high-bandwidth 17:18:18 *high-bandwidth sidechannels 17:18:25 and I think we should find a way to measure this bandwidth 17:19:33 because if we introduce padding that is not end-to-end and unacked, the channel becomes more noisy, which means less bandwidth is available. at what padding rate does that start to approach delay-based side channels? 17:19:40 at europoakland, george danezis had a neat paper closing some of these side channels. the downside is that it only protected you against 3rd parties, not the person you're communicating with 17:20:02 could you link that in the pad? 17:21:36 mikeperry: so I tried to think about it, and I think there are a few ways to measure the bandwidth, and we need to think about more than one... 17:22:02 would it be fair to say that one of the first questions that should be answered is if datagram transports can be safefy used? 17:22:02 one important one, practically speaking, is the total number of bits you can send per circuit. 17:22:11 another would be bits per cell 17:22:14 iang: that still sounds interesting.. in the end-to-end model the partners would be client and exit, with the other nodes being "third parties".. I think in some ways side channels that the exit can induce are kinda equivalent to many ways they could muck with current TCP and sendme delivery already.. 17:22:17 another would be bits per second 17:23:08 all of those matter, i think 17:24:08 but we're still learning here: it would not be shocking if somebody comes up with a better way to advertise these in a year or two 17:24:30 komlo: I am not sure that answer can be definitive without good metrics for this stuff that can be compared to Tor as it is today 17:24:53 komlo: and even then, there are tweaks to all of these designs that may change things 17:25:06 s/advertise/measure/ 17:25:11 sorry, my brain is messed up 17:25:26 *but we're still learning here: it would not be shocking if somebody comes up with a better way to *measure these in a year or two 17:26:30 right, so should this meeting concentrate on the merits of datagram transport or congestion control? 17:26:46 nickm: I see us needing to resort to noisy-channel communication theory to get those values.. and they may be a function of other activity 17:27:07 it seems like it started with the latter and move onto the former 17:27:19 I think it started with talking about QUIC 17:27:37 what do you say, mikeperry ? 17:27:50 has anyone here been communicating with the qatar group working on quictor specifically? 17:28:20 i think we should think about whether we want to look at congestion/latency performance improvements as opposed to datagram transport specifically 17:28:34 i.e., I.B.4 in the agenda 17:28:46 iang: I believe all they had was an abstract. I think both QUUX and qutor studied replacing TLS with QUIC 17:29:04 mikeperry: end-to-end, or just on the link? 17:29:17 nickm: just the link, unfortunately. 17:29:26 much easier to analyze, though 17:29:28 iiuc, they have (working?) code at least. 17:29:29 but yea 17:30:37 yeah, the code I saw replaced connection_or.c with quic things 17:31:41 Do we have a list of congestion-control ideas we want to look at? 17:32:06 Would you be interested by an end-to-end implem? I may have some funding which would allow me to do this 17:32:08 I'm hearing "datagram" vs "other", but I bet we all have different stuff in mind for "other" 17:32:28 stef's on a plane right now (I think? maybe just at the airport?) but her CC paper is approaching ready 17:32:56 Jaym: that's a good question, and one we're going to ahve to figure out. 17:32:58 and I have some proposed hacks to datagram to mitigate drop+reorder side channels 17:33:22 it keeps the hop-by-hop tcp, but uses the queues to do rate-based CC 17:33:57 mikeperry: maybe one action item from this meeting can be to identify the metrics needed to determine if datagram transports can be used safely? (this might fit under research frameworks) 17:33:58 interesting 17:34:03 iang: is that an explicit signal sent as a cell by each hop or inferred? 17:34:33 explicit 17:35:46 komlo: yeah 17:36:29 one thing we should consider wrt metrics is how they behave under lab conditions vs the tor network 17:37:04 i started an action items section at the bottom of the pad, adding this 17:37:15 so, what's next? :) 17:37:25 like how do we model noise introduced by other traffic, as well as the law of large numbers/base rate issues 17:37:30 mikeperry: which is part of a much larger task of building measurement tools representative of the live network 17:38:50 like I see naive bits-per-second metrics being incomparable to ones that properly take into account various forms of noise -- organic and deliberate 17:39:27 for example: how much capacity to drop-based side channels have in the end-to-end model if the adversary cannot tell which packets are unacked padding vs end-to-end acked data 17:40:55 komlo: sorry, I can't make my bullet points match yours :/ 17:40:58 then there is also circuit level encryption -- can we use order preserving encryption to middle hops such that they can correct reordering or convert it into drops 17:41:27 and can we also detect tagging attacks early with that 17:41:32 I don't see OPE being of benefit there? 17:42:09 iang: like if there was an OPE field that conveyed ordering to middle hops, they could prevent out-of-order packets from making it to the last hop 17:43:10 couldn't you just have a per-circuit sequence number in the clear (under the hop-by-hop encryption) in that case? If all middle nodes see it, isn't that the same? 17:43:25 Can that be done in the reverse path? 17:43:40 exit->client, with encryption 17:44:19 the cleartext version of course can. the ope version would be trickier, since the exit doesn't share keys with the middle or guard 17:44:24 we're coming up on 60 minutes. what else do we want to answer today? 17:45:05 iang: the question was for the OPE field onion-encrypted 17:45:37 iang: I think that cleartext sequence numbers allows the endpoint to have a cleaner signal as to which of their drops were applied to padding vs which actually were done to end-to-end traffic 17:46:11 I think we need to have this design written down to analyze it. There are a lot of moving parts 17:46:22 the endpoint here is the exit node, which *has* to know that, since it's the one shoving the result into a TCP stream. 17:46:25 (same goes for most designs) 17:47:57 nickm: +1 17:47:57 so yeah, it seems that one action item is to enumerate the designs. naive QUIC; padded QUIC; padded+OPE QUIC; semi-reliable hop-by-hop (ian's I.A.5.a); and ECN-under-current-Tor 17:48:47 didn't sjmurdoch long ago have a tech report listing a whole lot of such options? 17:49:03 of course it's somewhat outdated 17:49:11 I think that's from back in the svn days; it should still be around 17:49:33 iang: but if the sequence numbers are visible to all hops, if I'm the guard and I deliberately drop packet N, the exit sees that packet N was a sequencde number that made it to the end and was dropped.. but if N is not visible to me at the guard, I don't know if my drop survived at the exit, or was actually an organic drop 17:50:25 iang: yah, it examined basically our options for datagram transports using then-existing userland congestion control, but did not have any of this side channel analysis 17:50:28 the guard still knows it's the nth packet on the circuit. the difference isn't that big, I'd think? 17:50:42 right. the side-channel analysis is key here 17:51:19 murdoch report from November 2011, I think (can link in pad) 17:51:31 yes please 17:52:34 iang: if some percentage of packets are being dropped at the middle, there will be a probability that your attempted drop is a drop of a padding cell.. there will also be a probability of organic drops also being present 17:52:43 both of steven's tech reports are on https://research.torproject.org/techreports.html 17:52:52 so you basically lose bits in your side channel and have to perform some kind of error correction 17:53:06 (and by both apparently i mean all five) 17:53:33 mikeperry: sure. I don't see that as hurting the attack, though, I guess. 17:54:18 since the attacker is basically trying to get ~1 bit through. 17:54:49 iang: I think there is a point where the cost of doing that error correction starts to make the side channel comparable to existing delay-based side channels in Tor today 17:55:41 which will depend on the padding rate and the organic drop rate 17:55:44 that's true. (Though it's not like we're really comfortable with that existing channel either.) 17:56:14 +1 17:56:26 * iang needs to be at his next meeting. I'll keep the window open and scroll back after if this meeting continues. 17:57:16 ok I think as far as action items, we have: 1. create a better sketch of design alternatives 17:57:51 2. Try to break our questions into research topics 17:59:14 wrt the delay-based side channel -- is there anything we can do to mitigate it? and if we could, why would that not also apply to the end-to-end datagram case? 18:00:39 good question. 18:00:52 mikeperry: i think with each design, enumerating the assumptions and potential safety tradeoffs will help determine what mitigations/further work are necessary? 18:01:10 I would be happy to try to write up what we've been calling the "side channel" issues here, though they extend far beyond side channels 18:01:47 I'd also like to write up all the similar issues in current Tor; especially if mikeperry can help me enumerate them. 18:02:00 yes! more precisions about what you consider side-channels and what is not would be neat 18:02:14 I am a bit confused by the difference between "side channel" and I.A.1'/extended issues 18:03:01 IMO a side channel is when you can send more information in the protocol than is intended. 18:03:02 komlo: +1 18:03:05 bram, back in the day, really thought we should use libutp style delay-based congestion control, rather than drop-based, and rather than the weird thing we do now. steven tried to set up a real experiment, but couldn't get stuff working. 18:03:17 nickm: +1 on writing up/enumerating similar issues in Tor 18:03:22 +1 to doing a cross-comparison between designs. also addressing which threats should be mitigated/are outside of scope 18:03:46 1.A.1' is something different: we aren't just sending information, we're provoking different behavior by a communications partner 18:04:06 "active side channel"? maybe there is a better name 18:05:32 my preference is for us to arrive at a metrics framework under which these things are all information leaks that can be comparable in terms of the amount of information provided to the adversary 18:06:11 That's desirable, but only to the extent that it reflects some ground truth 18:07:41 I can try to do the writeup on attacks next week, if that's worthwhile 18:08:22 That seems useful to me 18:08:31 Agreed.. 18:08:57 (signing off but will read scrollback later) 18:08:58 yeah 18:09:02 can somebody else take on coming up with a list (and breakdown) of existing designs in this field? 18:09:15 I can do this 18:09:29 mikeperry: also can you send me a list of the attacks in current tor that you're thinking of some time this week? 18:09:38 or before monday? 18:09:38 I'll also try to enumerate some research-paper sized topics that we have so far 18:11:03 ok 18:11:46 so I will take a stab at C, F, and G on the action items list 18:12:01 and then send that around, perhaps on a fresh pad. 18:12:52 and then from there we can ponder if we want to discuss metrics, research topics, or frameworks in the next meeting in a few weeks 18:13:20 maybe waiting for a preprint of stephanie and ian's paper for further comparison 18:14:18 I think we're good for now, then? 18:14:24 * Samdney is watching and reading the backlog ... 18:14:26 I think! 18:14:45 seems like a good start 18:14:51 ready to #endmeeting? 18:15:04 yep! 18:15:08 #endmeeting