15:59:17 #startmeeting anticensorship meeting 15:59:17 Meeting started Thu Dec 3 15:59:17 2020 UTC. The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:59:17 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:59:37 o/ 15:59:53 here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 15:59:55 Hi all, I'm Diogo Barradas and I'm attending today's meeting per Cecylia and David's invitation 16:00:04 dmbb: yay! welcome :) 16:00:09 dmbb: welcome and thanks for joining us! 16:00:09 i'm cecylia 16:00:36 welcome! 16:00:47 we usually go through our regular meeting agenda first 16:00:54 and the reading group follows at the end of the meeting 16:00:55 Thank you all, it is good to be part of the resistance =) 16:01:30 8) 16:01:43 Certainly, I'll wait for the proper time to contribute to the discussion 16:01:51 phw: i think the first announcement is yours? 16:02:29 yes, so rdsys and bridgestrap are now deployed and we have https://bridges.torproject.org/status?id=FINGERPRINT which allows bridge operators to look up the status of their bridge 16:02:48 we currently only test obfs2, obfs3, obfs4, and scramblesuit because rdsys doesn't yet have a parser for vanilla bridges 16:03:22 nice \(^-^)/ 16:03:23 tor is going to log this url, so bridge operators can simply click on it to learn if their setup works 16:03:37 hey everyone, I'll be standing in for anto over the next few months :) 16:03:41 (don't mind me lurking) 16:03:59 (you can also provide a hashed fingerprint, so you can share the url with others) 16:04:08 hi dunqan and thanks for joining! o/ 16:04:19 thanks for having me o/ 16:04:42 that's it for bridge testing. i may as well do the announcement too 16:04:47 dunqan: welcome! 16:05:03 please take a look at our november 2020 report and change/add items as you see fit: https://pad.riseup.net/p/U4o0LNYPgm7SCxuF-1Sm 16:05:12 my plan is to publish it later today 16:06:21 will do 16:06:33 any other discussion announcements? 16:06:42 if not we can move on to reviews 16:07:05 i'm good 16:08:29 cool 16:08:38 i need a review of snowflake!21 16:09:33 phw needs snowflake!22 and bridgestrap#10 16:09:40 i can take both of those phw 16:10:16 happy to review snowflake!21 unless dcf (who doesn't seem to be here) wants to 16:10:41 agix: do you need any reviews? 16:10:55 agix: i took a look at tpo/anti-censorship/rdsys#5 yesterday and haven't made up my mind yet on how to approach this 16:11:18 oh ok thanks, take your time 16:11:26 in particular, i was thinking about moving the code to a dedicated persistence layer. i'll add my thoughts to the ticket 16:11:48 do you have any other ticket I can help you with regarding rdsys? 16:11:59 oh for sure, let me take a look at the open issues 16:12:11 cool 16:13:19 i wonder if we should wait a bit to see if dcf shows up before the reading group 16:13:38 tpo/anti-censorship/rdsys#6 or testing more broadly is an important issue 16:13:56 but i can also think of a few other issues that i haven't gotten around to filing yet 16:14:00 i'll let you know 16:14:17 ok thanks 16:16:41 okay that's it for our regular agenda it seems 16:17:31 hm 16:17:34 i don't mind waiting a few minutes for dcf as well, if you prefer 16:18:23 let's wait like 5 more minutes and then start? 16:18:31 sounds good. just enough time to make coffee 16:18:35 good call cohosh 16:18:51 * cohosh makes tea 16:23:44 okay let's get started \o/ 16:23:55 i have a quick high level summary 16:24:01 oh yay dcf1 16:24:07 that is perfect timing 16:24:24 lol 16:24:26 right on time 16:25:16 alright, pasting summary: 16:25:43 Protozoa is a new anti-censorship system design and implementation that uses existing WebRTC services as covert channels for censorship resistance traffic. 16:25:46 It works by having a client and a proxy visit the same WebRTC service through the Protozoa-modified Chromium browser. 16:25:49 One user creates a password-protected conferencing room and shares that URL with the other user out of band. 16:25:52 The camera and mic of each user records video, but these frames are replaced with covert traffic after they are encoded but before they are sent on the wire by a hooks in the WebRTC stack. A corresponding downstream hook retrieves the covert traffic and replaces it in turn with a blank keyframe to prevent software malfunction. 16:25:57 This covert channel behaves like a SOCKS5 proxy and allows the tunelling of arbitrary traffic. 16:26:00 The authors evaluate the throughput and detection resistance of this channel by using statistical properties of the packet flows. They send a variety of downloaded YouTube videos over the channel and try to detect when these videos are being replaced with covert traffic. 16:26:04 The results are good: both high throughput and high detection resistance 16:26:07 They also test their tool in the wild to evade censorship in China, Russia, and India. 16:26:10 16:26:44 I got in touch with the authors and they say they are working on a proposal to make Protozoa into a Tor pluggable transport 16:26:55 Did any of the authors come to the meeting? 16:26:56 yup dmbb is here :) 16:27:03 great 16:27:04 Yes, this is Diogo :) 16:27:27 Regarding this proposal: 16:28:17 It is something that we intend to discuss with you. I will prepare a well-formated document and share with you 16:29:08 nice! 16:30:03 So I think it's clear that the key innovative idea in this work is the "encoded media tunneling" 16:30:26 While the overall plumbing of Protozoa would require some adaptation to the PT API, I think we can do some more work on the bridge infrastructure part of the thing 16:30:51 Were you don't try to encode payloads as pixel data that has to survive transcoding, but just replace the existing pixel data entirely 16:31:48 yes, that is correct 16:31:59 For me it brought up similarieities to Slitheen. Protozoa cover video :: Slitheen over user similator; Protozoa replaces all video bitstream data :: Slitheen replaces payloads of leaf resources 16:32:08 dmbb: what changes to the PT API were you envisioning? 16:32:27 In both cases, the result is to yield strongly indistinguishable packet length and timing distributions 16:32:42 yeah, it's like end to end traffic replacement 16:33:02 personally, i think this approach is a lot cleaner than slitheen 16:33:08 which requires a state machine 16:33:42 and is subject to a lot of bandwidth loss due to out of order packets and packet boundaries splitting headers 16:34:11 pwd: I would not say changes to the PT API itself, but changes to the current way we are picking up and packaging IP packets in Protozoa's covert channel. For now, we are picking up IP packets within a network namespace, and we can do this part better by directly hooking Tor's port to feed its packets to Protozoa 16:35:00 for reference, we did try doing video frame replacement in slitheen and we had some trouble: https://uwspace.uwaterloo.ca/bitstream/handle/10012/13595/Bocovich_Cecylia.pdf#subsection.4.2.2 16:35:19 i really like the protozoa approach 16:35:19 dmbb: gotcha 16:35:26 Right, currently they have a VPN-like model where the circumvention transport carries raw IP datagrams (encoded), so the OS kernels at both ends are responsible for reliability and retransmission 16:36:06 Whereas Tor PT is all userspace and application layer, transports have to implement their own reliability if they run over lossy carriers 16:37:46 dmbb I wanted to ask about your experience in modifying Chromium 16:37:48 dcfl: This is right. We also did several experiments with different network conditions and the covert channel's throughput does not seem to be largely affected. Whether this + Tor circuitry would prove to slow down the channel, is something that we need to ascertain 16:38:11 Formerly, in Snowflake, we used the WebRTC stack of Chromium, separated into a standalone library: https://github.com/keroserene/go-webrtc 16:39:04 But this approach became an extreme maintenance burden and was preventing ports to platforms that were not supported by Chromium's cross-compiling (e.g. Windows) 16:39:48 What's your impression, would the changes you had to make be likely to re-apply cleanly after a Chromium major version upgrade, say? 16:40:45 dcfl: Well, I modified Chromium's WebRTC stack for two particular versions (about a year and halft prior to the submission and then just a few months before). There were indeed some changes to the code, but nothing to harsh to accomodate. 16:41:19 In particular, they refactored a few functions that deal with media, which forced me to create a couple of getters/setters to the portion of the frames we were replacing 16:41:48 The hooks we placed within Chromium's code are quite simple 16:42:58 For instance, for encoding covert data, one hook tells Protozoa the size of the frame. Protozoa then packs IP packets up to that lenght, and gives it back so that the hook can replace the media content 16:43:56 So, in my experience, I would say we probably cannot rely completely rely on a "perfect" compatibility between versions 16:44:42 But the effort for adapting between versions seems rather small. And this is because the media containers kung fu seem to be rather static portions of the code 16:45:04 . 16:45:08 I see, thanks 16:45:12 dmbb: did you have an idea about how to do proxy distribution for the PT version of protozoa? 16:45:24 I don't think there would be a problem in representing Protozoa bridges in rdsys, it would be probably a service name and a chat room name. 16:45:26 from what i can tell, the design in the paper requires a lot of manual set up 16:45:55 But in terms of anti-enumeration, rdsys may not provide enough protection, because besides the WebRTC obfsucation, Protozoa's model does not really differ from obfs4's 16:46:23 hmm yeah 16:46:25 I.e., volunteers set up their Protozoa bridges (at relatively static IP addresses), censored users learn about them and use them 16:47:10 As I understand it, the Whereby servers are only used for signaling and STUN, the actual video connection is direct peer-to-peer between client and bridge 16:47:38 there has to be a TURN server in use then 16:47:51 cohosh: definitely. In our paper we largely dismiss the proxy distribution problem and show how the system could work if you had a friend/family outside the censored region. One way we though about to do this better under the PT model, was to use these chat room names and rely on some kind of trusted bridge infrastructure where bridges were able to rotate their IP quite often 16:48:12 (Perhaps there are alternative servers that do route all the peer-to-peer connections through their own servers, which would provide some degree of collateral damage, but then those servers would also *not* have to try to decode the video stream they are passing through.) 16:48:29 we do rely on STUN only for our paper 16:48:33 dmbb: sounds like a good fit for rdsys then :) 16:48:40 hm 16:48:45 dcfl: i checked your thoughts on discord 16:48:51 yeah i wonder how whereby deals with incompatible NATs 16:49:06 about the use of a central server that inspects the content of discord connections 16:49:20 the advantage of protozoa is that there's a collatoral damage factor here that snowflake does not have 16:49:26 it makes me think about WebRTC gateways (used to re-encode, inspect, forward) media data 16:49:54 Yeah I found this blog post: https://medium.com/tenable-techblog/lets-reverse-engineer-discord-1976773f4626 that says Discord uses WebRTC but not peer-to-peer, a Discord middlebox intercepts and decodes all connections 16:50:06 If we could have a service that relies on a WebRTC gateway to forward data between peers *without* exposing their IPs, that could be useful for hiding bridges IPs 16:50:31 (but of course, which did not inspect our traffic for malformed content) 16:50:36 cohosh: collateral damage on the Whereby web, STUN, and signaling servers maybe, but not on the bridge IP addresses, as I understand it 16:51:04 dcf1: yeah i mean collatoral damage for whatever central server they are using to proxy between users with incompatible NATs 16:51:07 WhereBy itself uses one of these WebRTC gateways (with 4+ participants), but I'm not sure about the kind of inspection / validation they perform 16:51:27 A censor could harvest rdsys for Protozoa bridge addresses and block them, without blocking Whereby 16:51:48 cohosh: oh I see what you're saying and agree, if the service has a TURN server or something similar 16:52:10 yup, the same is true for snowflake to an extent. a censor can continuously poll the broker for all the snowflake addresses 16:52:23 the advantage there is how lightweight the proxies are so we hope to get a lot of them 16:52:56 but i would be interested to know how much blocking of existing proxies is required to degrade the performance enough to amke it unusable anyway 16:53:25 yes, that's why we thougght about setting up an infrastructure of bridges which could rotate their IP (for instance, after each videocall), or to take advantage of one of these TURN/WebRTC gateway boxes 16:53:52 yeah it's good to think about 16:54:07 and also worth thinking in mind that the system doesn't have to be 100% proof against 100% of censors to be useful 16:54:13 *keeping in mind 16:54:35 yup 16:54:55 indeed. Still related to snowflake, how aware are you of attempts to fingerprint its traffic? 16:55:18 dmbb: we haven't noticed it yet 16:55:32 i've stumbled across some work that tried to take a deeper look at it (https://arxiv.org/pdf/2008.03254.pdf) 16:55:35 only by researchers so far 16:55:38 the most blocking we've had is china blocking google's STUN server which used to be the default 16:55:55 and then some blocking of individual proxies back when we only had like 10 16:55:59 dcfl: ist that the one you know about, or there is more research on that? 16:56:14 Yes, we have a fingerpriting wiki page and that paper is already listed 16:56:15 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/Fingerprinting 16:56:15 that's the only one i've seen 16:56:31 We have had some converstations with Kyle MacMillan in these meetings in the past 16:56:41 cohosh: about the STUN servers, are you able to use any STUN server for Snowflake? 16:56:48 dmbb: yeah 16:56:52 say, a server controlled by ggoogle or Whereby? 16:57:11 definitely 16:57:20 Hum, so there's not any kind of app-dependent auth, right 16:57:41 no, other than stun we don't use any third party infrastructure 16:57:56 we use our own domain-fronted service for signaling 16:58:57 dcfl: thank you for the link 16:59:28 dmbb: so for the PT, to run proxy, the user would install and use the protozoa-modified chromium browser? 17:00:50 cohosh: as far as we thought about it, yes. There may also be the possibility to do this kind of change in the Tor Browser itself, I think? 17:01:07 Tor Browser doesn't have webrtc enabled 17:01:35 right, so I think we would need a separate browser, then 17:01:40 though i definitely think that getting this incorporated into an existing browser would be a good way to go 17:02:11 you can ask the tor browser team about the difficulties of maintaining even a patchset ontop of an existing browser :) 17:02:53 at first we tried to take a look at browser extensions to see if it was possible to change some of the WebRTC inner workings through browser extensions 17:03:22 but then we gave up on that as we really need to control the native code 17:03:25 i'm guessing the don' 17:03:28 blah 17:03:31 yeah 17:03:42 guessing they don't expose the hooks you need to repalce the video frames 17:03:55 Yes, nothing like that, really 17:04:13 We get to interact (open webrtc sessions and the like) but not much else 17:04:18 maybe brave would open to something like this? 17:04:35 One thing we did not do was to change audio frames 17:05:12 i believe it's also doable, but probably gives us less bandwidth. And coordinating video + audio delivery may not prove to give that many benefits 17:06:09 cohosh: if brave were interested in doing this, I think it could be easier to deploy a PT, for sure (at least between updates) 17:06:10 could you use turbotunnel to coordinate it? 17:06:47 i'll have to check your docs first, sorry :) 17:06:54 we found dcf1's turbotunnel pretty much a requirement for snowflake 17:07:17 alright, i will take a look at it! 17:07:39 though i suspect that protozoa would be more reliable than snwoflake is just because of using an existing service for NAT punching 17:08:05 I thought that the network namespace + kernel IP would take care of any coordination you need, but yes, something in userspace could make it more portable, though probably less efficient 17:08:40 dcfl: exactly, i was thinking about the need to depend on the kernel 17:09:25 another thing to consider: are there any merits to applying the video frame replacement idea to snowflake? 17:09:48 yeah, the big benefit IMO is media streams vs. data streams 17:10:02 dcfl: some other difficulties I faced when deploying protozoa hooks within chromium was that "the code is the doc" and that were two concurrent implementations of the video engine (one being slowly phased out) 17:10:41 Section 2 of https://arxiv.org/pdf/1605.08805.pdf "Media vs. data transport" 17:10:43 lol this was my experience with modifying firefox to do video replacement for web-based video streams too 17:10:43 cohosh: i would say resistance to fingerprinting may be another 17:11:05 ^^' 17:12:00 dmbb: we use https://github.com/pion/webrtc for snowflake and the developer of that project is very keen on censorship resistance efforts fwiw 17:12:27 is suspect he would be amenable to implementing the hooks needed to manipulate video and audio frames upstream 17:12:56 regarding other issues like DDoS, I feel like Protozoa is on the same board as Snowflake. I see you are working on Salmon to perform a judicious distribution of bridges 17:12:57 he has already upstreamed patches for snowflake 17:13:02 It would be a possibility to create a dummy session with e.g. Whereby before starting the Snowflake peer-to-peer connection, in case a sudden peer-to-peer connection out of the blue is identifiable 17:13:26 dcf1: oh that's a good idea 17:13:28 But it couldn't fully take the place of the broker, even Protozoa has a out-of-band data transfer at the beginning to do what the Snowflake broker does 17:14:16 dmbb: yeah, phw is working on the salmon implementation :) 17:14:51 cohosh: I think Protozoa would benefit from it as well, yes 17:16:07 dcfl: yes, we also require something similar to a broker, unless the client does know someone outside the censored area. I don't know how often is it to do this kind of connections via WebRTC data VS video channels 17:16:07 so it looks like there's two paths for integrating protozoa work into tor: 1) a protozoa PT that integrates the proxy into an existing browser and uses rdsys(+salmon) to distribute protozoa bridges, and 2) using some ideas from protozoa to enhance snowflake 17:16:15 and both paths are worth pursuing 17:17:02 seems like rdsys would fill the role of the broker 17:17:04 Unfortunately I don't think the encoded media tunneling can apply to Snowflake, because a browser extension won't have the necessary level of access to the media stream 17:17:26 dcf1: it could if we talk to pion 17:17:27 At least with browser-based proxies 17:17:33 aha right 17:17:39 just the standalone proxies would have it 17:17:45 yes, so 1) is kind of the subject of our proposal I was telling you about the other day. I can share it with you by tomorrow 17:18:13 Would definitely be interested in having a chat with you, after you got the chance to look at it 17:18:38 dmbb: yes let's do that! we started an email chain about it earlier this week 17:18:46 dcf1: do you want to be a part of the discussion? 17:18:51 (or anyone else here) 17:19:17 i'm sure roger will also be interested and i just realized he's away this and next week 17:19:37 cohosh: yes, thanks! I will reply to that thread with our draft 17:20:13 I'm afraid I may be not so available over the next week, but you can Cc me in the thread 17:20:24 cool, we can continue to coordinate after the reading group ends 17:20:31 Did anyone write a longer summary of the paper this week? agix? If not, I have one to post to bbs. 17:20:32 perfect 17:21:01 dcf1 i didn't prepare one, so feel free to use yours 17:21:19 dcfl: let me thank you again for the summary, it's rally on point 17:21:27 really* 17:21:36 np 17:21:57 just trying to build the constructive and cooperative research field I want 17:22:03 :D 17:22:41 hehe. does anyone have some other question about the paper? 17:23:07 dmbb: this was really impressive work, i appreciated the attention to detail with the implementation and the quality of the performance and detection evaluations 17:23:16 dmbb yeah a quick one from me. considering that Protozoa uses 98.8% of the available frame space for transmitting covert data, do you see any opportunities to enhance the throughput in the future? 17:24:00 cohosh: thank you! I've really been learning a lot from all the work you people have been doing! 17:24:17 agix: one possibility would be to use the audio channel as well 17:24:44 agix: i suppose the same kind of replacement can also been done there 17:25:11 agix: we did focus on video only since the majority of the bandwidth is allocated to video anyway 17:26:08 agix: for now, I don't see many other possibilities for us to replace more content from the video frames. We need the header for knowing which frame size to replace at the receptor 17:27:12 cohosh: I hope the code and artifacts may also be useful. I tried to build a full walkthrough for performing our experiments at https://github.com/dmbb/Protozoa 17:27:21 dmbb thanks for the info and great work btw! 17:28:06 agix: thank you! 17:29:21 alright, it seems dicussion is winding down and it's been about an hour 17:29:30 i'll wait another minute and then close the meeting 17:30:11 dcf1: i agreed to review https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/21 but would leave it to you if you're interested 17:30:53 I am afraid I won't have time this week, you review it please 17:31:01 will do 17:31:04 thanks phw 17:31:12 #endmeeting