16:00:02 <Shelikhoo[mds]> #startmeeting tor anti-censorship meeting 16:00:02 <MeetBot> Meeting started Thu Mar 5 16:00:02 2026 UTC. The chair is Shelikhoo[mds]. Information about MeetBot at https://wiki.debian.org/MeetBot. 16:00:02 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:02 <Shelikhoo[mds]> here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469 16:00:02 <Shelikhoo[mds]> editable link available on request 16:00:08 <Shelikhoo[mds]> hi~ 16:00:10 <meskio[mds]> hello 16:00:21 <gaba> hi 16:00:33 <Shelikhoo[mds]> sorry I am 2 second late 16:00:41 <cohosh> hi 16:01:03 <Shelikhoo[mds]> or maybe it is the meeting bot 2 second late 16:01:04 <Shelikhoo[mds]> hhhhhaaaa 16:03:36 <Shelikhoo[mds]> okay we can start the discussion about the first topic 16:03:47 <Shelikhoo[mds]> I will wait before everyone finish the edit before sending the email 16:04:02 <Shelikhoo[mds]> don't worry if you have yet to finish your log 16:04:35 <Shelikhoo[mds]> The first topic is from me: 16:04:36 <Shelikhoo[mds]> Should we priorize better matching logic on snowflake broker 16:04:46 <Shelikhoo[mds]> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40077#note_3361718 16:04:46 <Shelikhoo[mds]> See also: https://datatracker.ietf.org/doc/html/rfc5780 16:05:05 <cohosh> i think this is a good idea 16:05:32 <Shelikhoo[mds]> so during the investigation of why there is so many failed match (from iran) 16:05:32 <cohosh> and that it probably explains the failures theodorsm found while testing covertdtls 16:06:37 <Shelikhoo[mds]> I discovered one of the potential non-censorship related reasons that why we are seeing so many failed matchs 16:07:12 <Shelikhoo[mds]> while it is true, that in regions with censorship, this will be compounded with block by ip 16:07:43 <Shelikhoo[mds]> so this might not be the only reason we are seeing so many failed match 16:07:47 <Shelikhoo[mds]> and draining our proxy pool 16:07:51 <Shelikhoo[mds]> this will certainly helps 16:08:23 <Shelikhoo[mds]> maybe maybe to the point we don't need to tell client that failed to make match to cool down 16:09:07 <Shelikhoo[mds]> which was one of the alternative we are considering to workaround the proxy pool get drained issue 16:09:39 <meskio[mds]> this sounds like something we might want to prioritize, as we've been hit by this problem already twice in half a year 16:10:47 <Shelikhoo[mds]> In this ticket, I also mentioned that there are some websites that supports testing nat test that can determine more accurate nat types 16:12:35 <Shelikhoo[mds]> I think by combining these two approach, we will be able to fix(or reduce) the nat matching related connection failure 16:14:09 <cohosh> yes i think we should try this out 16:14:22 <cohosh> we'll have to see how blocking resistant these other NAT behaviour testing options are 16:14:33 <Shelikhoo[mds]> It is worth mention that our current nat testing for proxies might determine endpoint(not port) dependent filtering as "unrestricted" 16:15:11 <cohosh> i'm working on a refactor of the broker pools that will make it easier to add more pools if that's what we need 16:15:14 <cohosh> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/663 16:17:06 <Shelikhoo[mds]> so if the client's stun server is blocked and cannot send any candidate, it might not be able to connect to "unrestricted" proxy 16:17:07 <Shelikhoo[mds]> so that is one thing we might wants to improve 16:17:42 <Shelikhoo[mds]> thanks cohosh, I will first do more investigation and create a plan for how to proceed 16:17:52 <Shelikhoo[mds]> before asking everyone in the team to review it 16:18:04 <Shelikhoo[mds]> since it seems like a rather big task 16:18:09 <Shelikhoo[mds]> over 16:18:39 <Shelikhoo[mds]> okay anything more we wants to discuss in this topic? 16:19:11 <Shelikhoo[mds]> thanks cohosh, that refactor will be super helpful >~< 16:19:12 <cohosh> not from me, thanks for looking into this Shelikhoo[mds] 16:19:12 <Shelikhoo[mds]> proposal to get a separate domain name for bridge reachability tests 16:19:22 <Shelikhoo[mds]> add a record for each known *.torproject.net bridge 16:19:22 <Shelikhoo[mds]> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40504#note_3359873 16:19:28 <Shelikhoo[mds]> from cohosh ? 16:20:09 <cohosh> oh, yeah, we discussed this a few meetings past 16:20:58 <cohosh> and decided proxies should do a reachability check on snowflake.torproject.net (which i just realized is actually hardcoded as snowflake.bamsoftware.com in orbot) 16:21:22 <cohosh> but arma2 brought up the point that this bridge could go down and we don't want all of our proxies to stop polling if this happens 16:22:05 <cohosh> he had a good solution to the problem with our current design pattern and not wanting proxies to have to know about all available bridges 16:22:31 <cohosh> to have a domain with multiple records, for each known bridge 16:22:50 <cohosh> i think this will work, but does anyone else here with more DNS knowledge see a problem with that? 16:23:14 <meskio[mds]> the problem there is if a censor decides to block the domain name of the bridge, but not the one we use to test.... 16:23:17 <Shelikhoo[mds]> I do think this plan has some tradeoffs we need to be explicit 16:23:23 <cohosh> hmm 16:23:24 <cohosh> yes 16:23:32 <Shelikhoo[mds]> firstly censor could do sni blocking 16:23:35 <Shelikhoo[mds]> yes 16:23:45 <cohosh> what if it's a CNAME record? 16:23:47 <Shelikhoo[mds]> or maybe they can block all but one bridge 16:23:50 <cohosh> instead of an A record 16:24:22 <Shelikhoo[mds]> CNAME record does not impact SNI sadly 16:24:43 <Shelikhoo[mds]> so the sni will still be the test domain 16:24:59 <cohosh> i see 16:24:59 <Shelikhoo[mds]> even if the dns is an alias 16:25:31 <Shelikhoo[mds]> also, this will also make it really hard for us to run more snowflake bridges in the future 16:25:44 <Shelikhoo[mds]> since we need to trust each bridge to the point we share a common certificate with it 16:25:59 <cohosh> maybe another option would be to have the proxy get CNAME records for the test domain and then try a websocket probe toe ach of those domains 16:26:39 <meskio[mds]> proxies could ask the broker for the list of bridges 16:27:02 <Shelikhoo[mds]> yeah, this might work, but some dns recursive only allow plain a or aaaa record 16:27:08 <meskio[mds]> but I like simpler options that don't depend on the broker, what you propose of requesting the CNAME and use it manually sounds good if is not too hard to implement 16:27:13 <cohosh> yeah that's another option, the broker has that bridge list json and can push information to proxies 16:27:16 <Shelikhoo[mds]> yeah, this might work, but some dns recursive server only allow plain a or aaaa record 16:27:35 <Shelikhoo[mds]> I do have another option 16:27:49 <Shelikhoo[mds]> that is when the proxy try to bridge's url and fail 16:28:19 <Shelikhoo[mds]> it will have report this error and reduce pool frequency 16:28:37 <Shelikhoo[mds]> as if it is asked to back off by broker 16:29:13 <Shelikhoo[mds]> with something like a Exponential backoff 16:29:19 <meskio[mds]> if one bridge goes down, doesn't that mean that all proxies will backoff? 16:29:39 <Shelikhoo[mds]> but as soon as it successfully connect to a bridge, it will recover from that backoff 16:30:06 <Shelikhoo[mds]> if only a single bridge goes down, the back off will be very tiny 16:30:41 <Shelikhoo[mds]> but if all bridge are inaccessible, the back off will quickly increase to avoid spamming the network 16:31:01 <cohosh> i see, you're not talking about a probe at all, just polling and if the connection fails backing off a bit 16:31:12 <cohosh> i think this will clash with https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/25598 16:31:57 <Shelikhoo[mds]> the broker poll frequency can be combined with this backoff 16:32:07 <Shelikhoo[mds]> or have a max() on these two 16:32:33 <Shelikhoo[mds]> so it will poll after at least the broker backoff second or the bridge connection second 16:32:39 <Shelikhoo[mds]> whichever is higher 16:32:47 <cohosh> i guess another question is: if one of the two current bridges is blocked for aproxy, do we want it to poll at all? 16:33:03 <cohosh> because in this backoff scenario it would still poll and then fail roughly half of its connections 16:34:07 <Shelikhoo[mds]> we could in theory let the proxy send the list of server it cannot connect to recently 16:35:07 <cohosh> that sounds like an increase in complexity of the matching logic that defeats the purpose of our current relay name superset pattern 16:35:21 <Shelikhoo[mds]> yes.... 16:35:23 <cohosh> which we could consider 16:35:35 <Shelikhoo[mds]> so maybe we wants it to pool? 16:36:02 <Shelikhoo[mds]> even if it cannot reach some of the snowflake servers 16:36:37 <cohosh> i think i'm liking the simpler shorter term solution to do a manual DNS lookup of the test domain and reachability checks for each CNAME record 16:36:45 <Shelikhoo[mds]> this would be a easier design in my opinion 16:36:46 <meskio[mds]> 50% of failure is not the end of the world, and with the backoff this proxy will not poll that frequently, maybe is not that bad 16:37:23 <meskio[mds]> but I also like the CNAME record option 16:38:00 <Shelikhoo[mds]> cohosh: do you think this cname plan would also work for browser? 16:38:13 <Shelikhoo[mds]> I think this will require a doh request 16:38:14 <cohosh> oh good point 16:38:44 <meskio[mds]> we could use a TXT record, but not sure if this is available in the browser 16:39:14 <Shelikhoo[mds]> these CNAME and TXT will work in browser over doh 16:39:46 <Shelikhoo[mds]> but implementing DOH in js seems like a little complex 16:40:09 <Shelikhoo[mds]> unless there is some nice library we wants to import 16:41:17 <cohosh> ok i can look into it, if there isn't an easy option i think the poll frequency backoff is a good idea 16:42:03 <Shelikhoo[mds]> yeah, let's do some research about it outside meeting 16:42:14 <cohosh> i guess it should be additive increase / additive decrease to avoud losing the pool if a bridge goes down for everyone? 16:42:17 <Shelikhoo[mds]> there are also things like cors for doh 16:43:51 <Shelikhoo[mds]> my original thought was Exponential backoff + recover to initial 1 sec after a single success 16:44:20 <Shelikhoo[mds]> but I think both AIAD and Exponential backoff + instant recover would work 16:44:39 <cohosh> ok, this could use some more thinking and testing too 16:44:40 <Shelikhoo[mds]> we might need to run some numbers to find out the best option 16:44:48 <cohosh> thanks for the discussion 16:44:55 <Shelikhoo[mds]> nice! 16:45:13 <Shelikhoo[mds]> anything more we wants to discuss in this meeting? 16:45:52 <Shelikhoo[mds]> Thanks everyone! 16:45:53 <Shelikhoo[mds]> #endmeeting