16:00:02 #startmeeting tor anti-censorship meeting 16:00:02 Meeting started Thu Mar 5 16:00:02 2026 UTC. The chair is Shelikhoo[mds]. Information about MeetBot at https://wiki.debian.org/MeetBot. 16:00:02 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:02 here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469 16:00:02 editable link available on request 16:00:08 hi~ 16:00:10 hello 16:00:21 hi 16:00:33 sorry I am 2 second late 16:00:41 hi 16:01:03 or maybe it is the meeting bot 2 second late 16:01:04 hhhhhaaaa 16:03:36 okay we can start the discussion about the first topic 16:03:47 I will wait before everyone finish the edit before sending the email 16:04:02 don't worry if you have yet to finish your log 16:04:35 The first topic is from me: 16:04:36 Should we priorize better matching logic on snowflake broker 16:04:46 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40077#note_3361718 16:04:46 See also: https://datatracker.ietf.org/doc/html/rfc5780 16:05:05 i think this is a good idea 16:05:32 so during the investigation of why there is so many failed match (from iran) 16:05:32 and that it probably explains the failures theodorsm found while testing covertdtls 16:06:37 I discovered one of the potential non-censorship related reasons that why we are seeing so many failed matchs 16:07:12 while it is true, that in regions with censorship, this will be compounded with block by ip 16:07:43 so this might not be the only reason we are seeing so many failed match 16:07:47 and draining our proxy pool 16:07:51 this will certainly helps 16:08:23 maybe maybe to the point we don't need to tell client that failed to make match to cool down 16:09:07 which was one of the alternative we are considering to workaround the proxy pool get drained issue 16:09:39 this sounds like something we might want to prioritize, as we've been hit by this problem already twice in half a year 16:10:47 In this ticket, I also mentioned that there are some websites that supports testing nat test that can determine more accurate nat types 16:12:35 I think by combining these two approach, we will be able to fix(or reduce) the nat matching related connection failure 16:14:09 yes i think we should try this out 16:14:22 we'll have to see how blocking resistant these other NAT behaviour testing options are 16:14:33 It is worth mention that our current nat testing for proxies might determine endpoint(not port) dependent filtering as "unrestricted" 16:15:11 i'm working on a refactor of the broker pools that will make it easier to add more pools if that's what we need 16:15:14 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/663 16:17:06 so if the client's stun server is blocked and cannot send any candidate, it might not be able to connect to "unrestricted" proxy 16:17:07 so that is one thing we might wants to improve 16:17:42 thanks cohosh, I will first do more investigation and create a plan for how to proceed 16:17:52 before asking everyone in the team to review it 16:18:04 since it seems like a rather big task 16:18:09 over 16:18:39 okay anything more we wants to discuss in this topic? 16:19:11 thanks cohosh, that refactor will be super helpful >~< 16:19:12 not from me, thanks for looking into this Shelikhoo[mds] 16:19:12 proposal to get a separate domain name for bridge reachability tests 16:19:22 add a record for each known *.torproject.net bridge 16:19:22 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40504#note_3359873 16:19:28 from cohosh ? 16:20:09 oh, yeah, we discussed this a few meetings past 16:20:58 and decided proxies should do a reachability check on snowflake.torproject.net (which i just realized is actually hardcoded as snowflake.bamsoftware.com in orbot) 16:21:22 but arma2 brought up the point that this bridge could go down and we don't want all of our proxies to stop polling if this happens 16:22:05 he had a good solution to the problem with our current design pattern and not wanting proxies to have to know about all available bridges 16:22:31 to have a domain with multiple records, for each known bridge 16:22:50 i think this will work, but does anyone else here with more DNS knowledge see a problem with that? 16:23:14 the problem there is if a censor decides to block the domain name of the bridge, but not the one we use to test.... 16:23:17 I do think this plan has some tradeoffs we need to be explicit 16:23:23 hmm 16:23:24 yes 16:23:32 firstly censor could do sni blocking 16:23:35 yes 16:23:45 what if it's a CNAME record? 16:23:47 or maybe they can block all but one bridge 16:23:50 instead of an A record 16:24:22 CNAME record does not impact SNI sadly 16:24:43 so the sni will still be the test domain 16:24:59 i see 16:24:59 even if the dns is an alias 16:25:31 also, this will also make it really hard for us to run more snowflake bridges in the future 16:25:44 since we need to trust each bridge to the point we share a common certificate with it 16:25:59 maybe another option would be to have the proxy get CNAME records for the test domain and then try a websocket probe toe ach of those domains 16:26:39 proxies could ask the broker for the list of bridges 16:27:02 yeah, this might work, but some dns recursive only allow plain a or aaaa record 16:27:08 but I like simpler options that don't depend on the broker, what you propose of requesting the CNAME and use it manually sounds good if is not too hard to implement 16:27:13 yeah that's another option, the broker has that bridge list json and can push information to proxies 16:27:16 yeah, this might work, but some dns recursive server only allow plain a or aaaa record 16:27:35 I do have another option 16:27:49 that is when the proxy try to bridge's url and fail 16:28:19 it will have report this error and reduce pool frequency 16:28:37 as if it is asked to back off by broker 16:29:13 with something like a Exponential backoff 16:29:19 if one bridge goes down, doesn't that mean that all proxies will backoff? 16:29:39 but as soon as it successfully connect to a bridge, it will recover from that backoff 16:30:06 if only a single bridge goes down, the back off will be very tiny 16:30:41 but if all bridge are inaccessible, the back off will quickly increase to avoid spamming the network 16:31:01 i see, you're not talking about a probe at all, just polling and if the connection fails backing off a bit 16:31:12 i think this will clash with https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/25598 16:31:57 the broker poll frequency can be combined with this backoff 16:32:07 or have a max() on these two 16:32:33 so it will poll after at least the broker backoff second or the bridge connection second 16:32:39 whichever is higher 16:32:47 i guess another question is: if one of the two current bridges is blocked for aproxy, do we want it to poll at all? 16:33:03 because in this backoff scenario it would still poll and then fail roughly half of its connections 16:34:07 we could in theory let the proxy send the list of server it cannot connect to recently 16:35:07 that sounds like an increase in complexity of the matching logic that defeats the purpose of our current relay name superset pattern 16:35:21 yes.... 16:35:23 which we could consider 16:35:35 so maybe we wants it to pool? 16:36:02 even if it cannot reach some of the snowflake servers 16:36:37 i think i'm liking the simpler shorter term solution to do a manual DNS lookup of the test domain and reachability checks for each CNAME record 16:36:45 this would be a easier design in my opinion 16:36:46 50% of failure is not the end of the world, and with the backoff this proxy will not poll that frequently, maybe is not that bad 16:37:23 but I also like the CNAME record option 16:38:00 cohosh: do you think this cname plan would also work for browser? 16:38:13 I think this will require a doh request 16:38:14 oh good point 16:38:44 we could use a TXT record, but not sure if this is available in the browser 16:39:14 these CNAME and TXT will work in browser over doh 16:39:46 but implementing DOH in js seems like a little complex 16:40:09 unless there is some nice library we wants to import 16:41:17 ok i can look into it, if there isn't an easy option i think the poll frequency backoff is a good idea 16:42:03 yeah, let's do some research about it outside meeting 16:42:14 i guess it should be additive increase / additive decrease to avoud losing the pool if a bridge goes down for everyone? 16:42:17 there are also things like cors for doh 16:43:51 my original thought was Exponential backoff + recover to initial 1 sec after a single success 16:44:20 but I think both AIAD and Exponential backoff + instant recover would work 16:44:39 ok, this could use some more thinking and testing too 16:44:40 we might need to run some numbers to find out the best option 16:44:48 thanks for the discussion 16:44:55 nice! 16:45:13 anything more we wants to discuss in this meeting? 16:45:52 Thanks everyone! 16:45:53 #endmeeting