16:00:17 <onyinyang[m]> #startmeeting tor anti-censorship meeting 16:00:17 <MeetBot> Meeting started Thu Nov 9 16:00:17 2023 UTC. The chair is onyinyang[m]. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:17 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:28 <onyinyang[m]> hello everyone! 16:00:28 <onyinyang[m]> here is our meeting pad: [https://pad.riseup.net/p/tor-anti-censorship-keep](https://pad.riseup.net/p/tor-anti-censorship-keep) 16:00:34 <cohosh> hi 16:00:36 <meskio> hello 16:00:36 <shelikhoo> hi~ 16:00:58 <onyinyang[m]> sorry for the late start, I blame DST 16:01:08 <shelikhoo> Do we wants to try a private pad? or after try that in the next meeting? 16:01:20 <shelikhoo> private=read only for public 16:01:38 <meskio> shelikhoo: let's explain a bit the problem: 16:01:39 <onyinyang[m]> since we only discussed it last week and didn't land on a solution, perhaps it would be best to discuss it fully in this meeting? 16:01:56 <meskio> we have our pad vandalize regularly and have to recover it manually 16:02:20 <meskio> we could try to have a public link that is read only and share the edit link with the people that usually participates in this meeting 16:02:30 <meskio> riseup pads do support that 16:02:46 <meskio> ahh, it was discussed last week, sorry I was not around 16:02:59 <meskio> I think we should give it a try next week 16:04:44 <onyinyang[m]> we discussed the issue last week but we were going to look into some things over the course of this week 16:05:10 <meskio> ahh, cool 16:05:11 <onyinyang[m]> I think there has been some discussion here: https://gitlab.torproject.org/tpo/community/hackweek/-/issues/16#note_2964041 16:05:39 <onyinyang[m]> I'm not sure if anything has been finalized yet though as I haven't been following the issue very closely since I posted on it 16:05:54 <meskio> I'm ok changing tools if needed, but we might not even need to do that, the only *problem* I see is that read-only links in riseup pads are not human friendly 16:06:12 <dcf1> I never knew about the read-only share link, thanks. There it is right in the toolbar. 16:06:30 <meskio> dcf1: exactly, is in the share icon 16:06:47 <shelikhoo> yes, we also have a look at other tools, and etherpad with read only is the step that make the least change and still fulfill our requirement 16:07:44 <meskio> onyinyang[m]: in hackweek#16 we didn't reach any conclusions, more like exploring things, and I've being experimenting with pad backups 16:07:57 <onyinyang[m]> ok so is the proposed course of action: read-only links for riseup pad shared publicly with edit links for regular attendees starting next week 16:08:30 <onyinyang[m]> and then possibly move to one of the other ideas (etherpad/cryptpad) if that doesn't work as expected? 16:09:01 <shelikhoo> I think this is the right move... 16:09:20 <meskio> sounds good, I think I'm next weeks facilitator, I can take care of setting it up 16:09:21 <cohosh> sounds good to me 16:09:29 <onyinyang[m]> sounds reasonable to me :) 16:10:18 <onyinyang[m]> ok so we're a bit out of order, but that's fine. Going back up to the top, we have a Fastly discussion point: 16:10:21 <onyinyang[m]> Fastly to block domain fronting in February 2024 https://lists.torproject.org/pipermail/anti-censorship-team/2023-October/000328.html 16:10:48 <onyinyang[m]> hmm, sorry about that formatting 16:10:55 <meskio> not sure if there is anything concrete to discuss on that topic, but I added a related topic: 16:11:00 <onyinyang[m]> this is the link: https://lists.torproject.org/pipermail/anti-censorship-team/2023-October/000328.html 16:11:05 <meskio> azure is giving a new date for the domain front closing 16:11:09 <meskio> January 8 16:11:19 <meskio> https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/33#note_2963884 16:11:24 <meskio> it was supposed to be yesterday 16:11:35 <onyinyang[m]> right 16:12:15 <meskio> I guess this is in the hands of cohosh to investigate and we might need to wait for her results to react 16:12:29 <cohosh> i don't have any updates yet 16:13:05 <meskio> we still have a couple of months 16:13:35 <meskio> EOF from my side 16:14:17 <cohosh> same from me 16:14:51 <onyinyang[m]> ok I think that's all of the discussion points 16:15:29 <onyinyang[m]> There are a couple of interesting links, namely video recordings from FOCI & PETS 2023 16:15:58 <onyinyang[m]> and a forum post about Snowflake 16:16:05 <onyinyang[m]> https://forum.torproject.org/t/snowflake-daily-operations-october-2023-update/10106 16:16:46 <onyinyang[m]> is there anything in particular anyone would like to to mention about any of those? 16:17:37 <onyinyang[m]> If not, we can move to the reading group discussion 16:18:21 <onyinyang[m]> Ok, let's move on. 16:18:39 <onyinyang[m]> The paper we decided to discuss today it: On Precisely Detecting Censorship Circumvention in Real-World Networks 16:18:53 <onyinyang[m]> It can be found here: https://www.robgjansen.com/publications/precisedetect-ndss2024.html 16:18:59 <rwails> For the reading group.. hi! Rob Jansen and I (Ryan Wails) are here to aid with discussion :) 16:19:07 <robgjansen[m]> 馃憢 16:19:11 <cohosh> welcome! 16:19:14 <dcf1> oh great you're here 16:19:16 <shelikhoo> hi~ welcome! 16:19:21 <meskio> nice to have you around, congrats for the paper, is pretty good 16:19:25 <onyinyang[m]> Great! Thanks you for coming :) 16:21:11 <dcf1> So this paper takes another look at past work that has claimed to be able to classify circumvention traffic with high precision 16:21:22 <dcf1> notably Wang et al. 2014 https://censorbib.nymity.ch/#Wang2015a 16:22:01 <dcf1> The biggest problem, of course is the base rate: since cicumvention flows are only a very small proportion of traffic, classifiers need to have very low false positives 16:23:06 <dcf1> In this work, they use notation with a 位 to quantify the traffic mix. 位 is the how many non-circumventing flows there are for a circumventing flow. 位 = 1 is an equal mix. 位 = 100 means roughly 1% of flows are circumventing. 16:23:31 <dcf1> Take a look at Table I on page 6 https://www.robgjansen.com/publications/precisedetect-ndss2024.pdf#page=6 16:24:16 <dcf1> It shows how precision/recall figures that look rosy at 位 = 1 become pretty dire at even 位 = 1000 (which is still probably wildly high compared to real-world conditions) 16:25:00 <dcf1> They build hand-tuned classifiers that are better than Wang et al.'s, and then a deep learning classifier that does even better. 16:25:26 <dcf1> But even that, they say, has too many false positives to be useful at 位 > 10,000 or so. 16:26:06 <dcf1> So to mitigate the low per-flow precision, they propose "host-based analysis" (Section VI), where you watch multiple flows to the same IP address over time. 16:26:41 <dcf1> Snowflake naturall mitigates this kind of host-based analysis (it is believed), because of they way the proxies are not at consistent IP addresses. 16:26:52 <dcf1> That's the end of my summary. 16:27:08 <meskio> the nice thing of the conclusions is that not only snowflake does mitigate that 16:27:10 <onyinyang[m]> Thanks for that great summary dcf1 ! 16:27:22 <meskio> also our new PTs: conjure and webtunnel do mitigate it 16:27:28 <meskio> conjure by using ephemeral hosts 16:27:29 <dcf1> Besides the general research, this paper is interesting to us, because it looks specifically at obfs4 (and a hacked entropy-reduced obfs4 called obfs4*) as well as Snowflake rendezvous and data transfer. 16:27:39 <meskio> and webtunnel by having other traffic on the same ip:port 16:28:55 <dcf1> By the way, I wrote to the authors of "Covertness Analysis of Snowflake Proxy Request" (https://ieeexplore.ieee.org/document/10152736) that we linked a few weeks ago 16:29:10 <dcf1> and asked what their effective 位 was in evaluation 16:29:41 <shelikhoo> I think there is another paper that show traffic shape analysis maybe able to identify tls in tls traffic as in the case of webtunnel that is being published 16:29:51 <dcf1> they said 位 = 3.97 (4100 negative to 1032 positive) 16:30:28 <cohosh> i'm not sure in practice how ephemeral conjure hosts are 16:31:10 <cohosh> theoretically, the unused IP address space changes, but in practice it might not be enough or a large enough space if this technique were deployed 16:31:17 <dcf1> shelikhoo: yes, good point. IN fact the DL classifier in this paper does not use all the features it conceivably could: it just uses traffic sizes and directions. 16:31:34 <dcf1> V-E: "It is perhaps surprising that the CNN classifiers outperform the classical approaches using only packet sizes and directions." 16:32:11 <meskio> cohosh: in this paper the research calls hosts to the comvination of IP+port, so as long as conjure uses different ports it might work 16:32:31 <meskio> snowflake is nice there as it does use different port per connection 16:32:32 <shelikhoo> from a real user's feedback, I am aware that many users are reporting websocket-tls-vmess traffic blocked by IP address or port, which reinforce the idea of censors using host based censorship analysis 16:33:01 <shelikhoo> I was not aware of a way to reliably reproduce such censorship 16:33:22 <cohosh> at the moment, conjure uses fixed ports, but it could depend on the transport being used 16:33:38 <cohosh> if they eventually support a webrtc transport, for example, this could change 16:33:59 <meskio> ohh, I see, it might be a nice thing to improve... 16:34:14 <shelikhoo> and this is one of the challenge of dealing with host based analysis and censorship that is the difficulty of reproducing it reliably in real world environment 16:34:19 <rwails> it may not be necessary for a censor to use ports, btw, but it was the easiest way for us to isolate flows; for the host-based analysis we proposed to work, the censor needs a reliable way to capture sets of flows corresponding to one protocol/activity 16:35:11 <rwails> (but the censor is not limited to such choices) 16:35:46 <meskio> sure, but then having hosts that have other services will make the censors life harder 16:35:54 <rwails> right 16:35:57 <meskio> like if I host an obfs4 bridge in a server that I also have a website 16:36:35 <rwails> yes, if the censor looks across flows only for the IP address, then the website flows may serve as confusion 16:36:55 <shelikhoo> In the same time, creating a website does increase the cost of creating a bridge.... 16:37:10 <dcf1> There is a piece of somewhat related research, coming out of a Chinese research lab in 2020 16:37:19 <dcf1> https://ieeexplore.ieee.org/document/9408011 "Towards Aggregated Features: A Novel Proxy Detection Method Using NetFlow Data" 16:38:24 <dcf1> They use NetFlow data, which is aggregated and fairly information-poor. To compensate for the low quality of the data, they look at multiple measurements of the same IP/port tuple over time, like the host-based analysis of this paper. 16:38:36 <dcf1> > Although NetFlow data is widely available today, it also brings about some challenging problems. The main reason is that NetFlow data is collected by sampling, resulting in the inability to obtain comprehensive information. Meanwhile, the statistical attributes of the sampled data lose original representation meaning. Therefore, we adopt NetFlow data aggregation method to overcome the challenges 16:38:42 <dcf1> imposed by using NetFlow data to achieve better proxy detection effect. In addition, through the approach, we aggregate statistics across multiple flows, which is not possible in a single flow. 16:38:46 <dcf1> > In order to deal with this problem, we design effective features from raw NetFlow data by data aggregation. In order to extract the aggregated features, we should first select the aggregation key and aggregation time window. In this paper, the IP (source IP/destination IP) is treated as a keyword, and the choice of the appropriate time window will be discussed in depth in section III-F2. Then all 16:38:52 <dcf1> NetFlow records with the key IP in the time window are aggregated to generate a feature vector. 16:39:20 <rwails> oh that's really interesting 16:40:33 <dcf1> Tunneled traffic features are something we are going to have to start paying more serious attention to. Now that we have more empirically informed threat models for it, we can do better than the best-effort attempts of ScrambleSuit and obfs4. 16:41:54 <shelikhoo> one of the quick fix for this issue is traffic mulplexing, just tunneling more than one payload connection in a proxy connection 16:42:06 <rwails> maybe related -- here's a link to new work appearing at ACM SIGCOMM where they develop efficient classifiers able to detect tunneled connections: <https://dl.acm.org/doi/10.1145/3603269.3604840> 16:42:06 <dcf1> I have a vague concept of introducing a list of traffic shaping "challenges" to encourage developers to start making the changes to their programs that will benecessary for more sophisticated traffic shaping 16:42:06 <shelikhoo> this does create other issue such as performance 16:43:14 <dcf1> My thinking is, there's currently a chicken-and-egg problem: current tools don't support arbitrary shaping, and also no one knows what a "good" shape should be, even if you could achieve it 16:44:13 <dcf1> My idea was to propose challenges that are admittedly not "good", but still provide something to target to make changes that have to be made anyway. And then, with better tool support, the community is in a better position overall to do experiments and find that "good" family of traffic schedules. 16:44:41 <dcf1> I have ambitions to write up something more complete about it, but I did post a sketch to give you an idea: https://github.com/net4people/bbs/issues/281#issuecomment-1724755111 16:45:12 <shelikhoo> https://github.com/3andne/restls/blob/main/Restls-Script%3A%20Hide%20Your%20Proxy%20Traffic%20Behavior.md 16:45:13 <bottooni> 12[slack] <github> signin 16:45:41 <shelikhoo> I think there is some tool that are trying to achieve scriptable padding 16:45:43 <dcf1> rwails, robgjansen[m]: my intuition is that, for example, if you had something like obfs4, but the server does the first send, rather than the client, it would confuse classifier based on direction/size. Is that right? 16:46:35 <rwails> yes, I do think these DL classifiers are fairly sensitive to perturbations like that 16:46:59 <meskio> robgjansen[m]: could protheus do this kind of things? 16:47:06 <robgjansen[m]> i would expect some resilience to a single bit-flip though 16:47:13 <rwails> we found that obfs4 had other identifiable features, specifically in the size of packets sent, but I think it would help 16:47:25 <dcf1> shelikhoo: because yeah, the packet-at-a-time padding/chopping is not good enough, we're learning 16:48:18 <robgjansen[m]> yeah we aim to generate protocols in proteus (github.com/unblockable/proteus) that can do many different handshake patterns including server-sends-first 16:48:20 <rwails> like Rob is alluding to though, I would imagine it would be easy to re-train the classifier to learn features that are more robust if the censor was able to collect flows from the modified obfs4 instance, if only bit flips were considered 16:49:26 <cohosh> the strength of proteus being that you can more easily provide a moving target of the features you'd need to learn, right? 16:49:37 <robgjansen[m]> yeah any changes to the PTs will always be public and hence the adversary can always retrain 16:49:42 <meskio> dcf1: does it make sense to add some of that to our research ideas wiki page? https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Research-ideas 16:49:57 <robgjansen[m]> so i think a very important design point is being able to adapt quickly 16:50:16 <dcf1> meskio: I will write it up at some point. 16:50:17 <shelikhoo> or procedural generated protocols that are difficult to ban them all 16:50:18 <rwails> cohosh: that's the idea :) make identifiable features tweakable 16:50:36 <meskio> dcf1: thanks :) 16:50:44 <dcf1> robgjansen[m]: well, adapt quickly, or else match something valuable to the censor so well that it doesn't get blocked. 16:50:54 <robgjansen[m]> hopefully both 16:51:05 <shelikhoo> yeah.. 16:51:07 <dcf1> It's the same as in the "Grounding Circumvention in Empiricism": the two strategies are polymorphic and steganographic 16:51:23 <dcf1> same for traffic signature as for protocol payload features, as I see it. 16:51:56 <dcf1> But yeah, that's part of my idea, every time this topic comes up, everyone gets bogged down in talking about specifics, and no progress is made. 16:52:20 <dcf1> I'm hoping to cut through that impasse by giving developers some concrete targets, even if those targets are not directly useful for circumvention. 16:52:58 <robgjansen[m]> also not all censors behave the same so while we maybe only have some types of protocols that work in one country (eg because those protocols match well to something and have high collateral damage), in other countries maybe a wider set of protocols are still useful 16:53:22 <dcf1> good point robgjansen[m] 16:55:04 <meskio> yes, that will make an interesting challenge on how we clasify the protocols that work on each country to make sure to distribute working ones on each contry 16:55:22 <dcf1> robgjansen[m], I remember you were in the audience for Wang et al. 2014 at CCS (so was I) 16:55:34 <meskio> or we could just ignore that problem completely and head towards the lox idea of testing bridges and know what is blocked where without checking what protocol is there 16:56:15 <dcf1> And you asked a question afterward like, "so all these systems are totally broken, what do we do now?" 16:56:35 <robgjansen[m]> i think protocol adaptation to find the ones that work best in a target environments, and then using a non-traffic shaped approach like obfs4 still has legs, but the next step will be traffic shaping as mentioned earlier. traffic shaping is more complicated but definitely on our research plan. i do wonder how long it will take censors to just move to allowlisting and i fear we are accelerating toward that point. 16:57:57 <robgjansen[m]> > "so all these systems are totally broken, what do we do now?" 16:57:57 <robgjansen[m]> Such a good question :D 16:57:58 <shelikhoo> let's say in Turkmenistan, they are blocking entire /24 when there are proxies discovered in that ip range 16:58:04 <robgjansen[m]> * > "so all these systems are totally broken, what do we do now?" 16:58:04 <robgjansen[m]> Such a good question :D 16:58:33 <shelikhoo> as a result, there are less and less ip ranges that are reachable 16:58:39 <dcf1> sure, but Turkmenistan is hardly a representative example 16:59:15 <shelikhoo> yes, it was just an extreme example of what could happen in a allowlist future 16:59:24 <cohosh> i had a followup question about the utility of webtunnel (based on HTTPT) against host-based attacks 16:59:25 <dcf1> Allowlisting has a lot of downsides for a censor, it's not free to "just" move to allowlisting. It's only feasible in an environment like TM, where the network is so little valued they don't care about breaking it further. 16:59:37 <shelikhoo> that being said I have no idea how long or if that will actually happen in other region 17:00:06 <cohosh> if you have two (or maybe more) different shapes of traffic going to the same ip:port, can this attack be adapted to deal with that? 17:00:39 <cohosh> and how much of the benign, probe-resisting traffic would you need to throw it off? 17:00:48 <dcf1> As in a lot of activism, we have to contend with our opponent's degree of sociopathy 17:01:08 <dcf1> And it's probably true that "the censor can stay sociopathic longer than you can remain solvent" :D 17:01:47 <robgjansen[m]> sure it could possibly learn the two modes very tightly, and maybe even learn that those are actually two protocols on the same ip:port. 17:01:47 <shelikhoo> dcf1: yes, i agree with you that allowlist based censorship is a costly move, so it is not like it is imminent 17:02:05 <robgjansen[m]> I think that's what the ggfast paper is doing maybe? 17:02:36 <cohosh> what is the ggfast paper? 17:02:46 <meskio> one missfeature of webtunnel is that 位 might actually be 1 or lower for the traffic into that host from that censored country 17:02:54 <dcf1> ggfast https://dl.acm.org/doi/10.1145/3603269.3604840 17:02:57 <rwails> cohosh: I think that would help if the censor is using a classifier that is tuned to only one of the traffic shapes. We did think about possibly adapting a host-based aggregation method that allows for background traffic to exist, but it's hard to be confident that a host is participating in circumvention if there is a large fraction of flows that are not doing so 17:03:08 <cohosh> thanks for the link 17:03:45 <cohosh> one thing about webtunnel is that the probe-resistant shape can be modified without involving a protocol update in the client 17:04:08 <cohosh> it's potentially more adaptable than proteus because you don't need to ship a new bridge line with the new spec 17:04:20 <cohosh> but getting traffic to it is the tricky part 17:04:31 <cohosh> it's just there if a censor is probing it, not for regular use 17:05:08 <cohosh> which is what meskio mentioned above, i think 17:05:26 <rwails> is there a link you can drop where we can read more about webtunnel? 17:05:48 <meskio> I mean, most people I expect to host webtunnels in real websites, but those websites might not have much traffic from the censored places... 17:06:00 <cohosh> rwails: it's based on HTTPT https://censorbib.nymity.ch/#Frolov2020b 17:06:01 <shelikhoo> webtunnel is a alias of HTTPT 17:06:18 <rwails> oh ok, got it, thanks :) 17:06:20 <meskio> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/webtunnel/ 17:06:50 <shelikhoo> although I am not so sure we can update its shape without an update to the client when it is used as a proxy 17:07:20 <shelikhoo> we do able to send traffic to the website it is fronting with 17:07:20 <cohosh> shelikhoo: not the shape of the circumvention flows to it, the shape of the benign flows 17:07:32 <shelikhoo> cohosh: yes 17:08:28 <dcf1> webtunnel forum posts (incl. setup guide) https://forum.torproject.org/t/tor-relays-announcement-webtunnel-a-new-pluggable-transport-for-bridges-now-available-for-deployment/8180 17:08:32 <dcf1> https://forum.torproject.org/t/call-for-testers-webtunnel-a-new-way-to-bypass-censorship-with-tor-browser/9855 17:08:44 <cohosh> i guess the shape is limited by changing the website it serves 17:10:13 <robgjansen[m]> cohosh: i don't understand your point about webtunnel enough to defend proteus the way i want, but i think the general point is either you ship a bunch of configuration choices ahead of time and have an algorithm on the client/server for choosing the right one, or you eventually have to update something 17:10:37 <robgjansen[m]> both are viable strategies for any PT i think 17:10:51 <cohosh> oh i didnt mean to say that webtunnel is strictly better than proteus 17:11:12 <shelikhoo> (BTW: research frontier in China is mostly about deniable censorship like ShadowTLSv3/restls or throttle resistant proxy like hysteria2) 17:11:23 <shelikhoo> (BTW: research frontier in China is mostly about deniable anti-\censorship like ShadowTLSv3/restls or throttle resistant proxy like hysteria2) 17:11:42 <cohosh> webtunnel is still limited in the adaptability of the circumvention protocol which seems like a pretty big shortcoming in light of this work 17:11:54 <cohosh> i was just trying to understand if its other features were useful here 17:12:52 <robgjansen[m]> ahh ok. yeah i think adaptability is huge in our game of cat-and-mouse. hopefully it's already on dcf1 's list of dev challanges :) 17:12:53 <shelikhoo> and tool to enable traffic shaping could be one of the next 17:14:00 <onyinyang[m]> it seems like the discussion is winding down a bit and we're ~15min over time 17:14:20 <onyinyang[m]> does anyone have any final thoughts or questions? 17:14:37 <onyinyang[m]> Otherwise, perhaps we can move further discussion to #tor-anticensorship:matrix.org ? 17:15:07 <cohosh> rwails: robgjansen[m]: this paper was really great 17:15:15 <cohosh> thanks for writing it and discussing with us 17:15:21 <dcf1> agreed, quality research 17:15:30 <dcf1> I appreciate the point you make in the conclusion: 17:15:35 <dcf1> "We focus on exploring realistic censorship adversaries *in service of understanding how to develop stronger CRSes*." 17:15:43 <meskio> yes, was a great paper and conversation 17:15:56 <shelikhoo> thanks for your work! nice paper! 17:16:30 <onyinyang[m]> yes! Great job and hopefully this will spur future research in helpful directions for censorship resistance! 17:16:56 <rwails> thanks! we're glad it was useful :) happy to have any follow up discussion too if there are any other questions 17:17:11 <robgjansen[m]> Thanks for the nice comments and great discussion! Of course we're available if you have any more questions later or what to discuss further. 17:17:24 <robgjansen[m]> s/what/want/ 17:18:26 <onyinyang[m]> Great, thank you both for joining in the discussion today :) 17:18:28 <onyinyang[m]> With that, I will end the meeting now 17:18:31 <onyinyang[m]> #endmeeting