16:00:02 <Shelikhoo[mds]> #startmeeting tor anti-censorship meeting
16:00:02 <MeetBot> Meeting started Thu Mar  5 16:00:02 2026 UTC.  The chair is Shelikhoo[mds]. Information about MeetBot at https://wiki.debian.org/MeetBot.
16:00:02 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:02 <Shelikhoo[mds]> here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469
16:00:02 <Shelikhoo[mds]> editable link available on request
16:00:08 <Shelikhoo[mds]> hi~
16:00:10 <meskio[mds]> hello
16:00:21 <gaba> hi
16:00:33 <Shelikhoo[mds]> sorry I am 2 second late
16:00:41 <cohosh> hi
16:01:03 <Shelikhoo[mds]> or maybe it is the meeting bot 2 second late
16:01:04 <Shelikhoo[mds]> hhhhhaaaa
16:03:36 <Shelikhoo[mds]> okay we can start the discussion about the first topic
16:03:47 <Shelikhoo[mds]> I will wait before everyone finish the edit before sending the email
16:04:02 <Shelikhoo[mds]> don't worry if you have yet to finish your log
16:04:35 <Shelikhoo[mds]> The first topic is from me:
16:04:36 <Shelikhoo[mds]> Should we priorize better matching logic on snowflake broker
16:04:46 <Shelikhoo[mds]> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40077#note_3361718
16:04:46 <Shelikhoo[mds]> See also: https://datatracker.ietf.org/doc/html/rfc5780
16:05:05 <cohosh> i think this is a good idea
16:05:32 <Shelikhoo[mds]> so during the investigation of why there is so many failed match (from iran)
16:05:32 <cohosh> and that it probably explains the failures theodorsm found while testing covertdtls
16:06:37 <Shelikhoo[mds]> I discovered one of the potential non-censorship related reasons that why we are seeing so many failed matchs
16:07:12 <Shelikhoo[mds]> while it is true, that in regions with censorship, this will be compounded with block by ip
16:07:43 <Shelikhoo[mds]> so this might not be the only reason we are seeing so many failed match
16:07:47 <Shelikhoo[mds]> and draining our proxy pool
16:07:51 <Shelikhoo[mds]> this will certainly helps
16:08:23 <Shelikhoo[mds]> maybe maybe to the point we don't need to tell client that failed to make match to cool down
16:09:07 <Shelikhoo[mds]> which was one of the alternative we are considering to workaround the proxy pool get drained issue
16:09:39 <meskio[mds]> this sounds like something we might want to prioritize, as we've been hit by this problem already twice in half a year
16:10:47 <Shelikhoo[mds]> In this ticket, I also mentioned that there are some websites that supports testing nat test that can determine more accurate nat types
16:12:35 <Shelikhoo[mds]> I think by combining these two approach, we will be able to fix(or reduce) the nat matching related connection failure
16:14:09 <cohosh> yes i think we should try this out
16:14:22 <cohosh> we'll have to see how blocking resistant these other NAT behaviour testing options are
16:14:33 <Shelikhoo[mds]> It is worth mention that our current nat testing for proxies might determine endpoint(not port) dependent filtering as "unrestricted"
16:15:11 <cohosh> i'm working on a refactor of the broker pools that will make it easier to add more pools if that's what we need
16:15:14 <cohosh> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/663
16:17:06 <Shelikhoo[mds]> so if the client's stun server is blocked and cannot send any candidate, it might not be able to connect to "unrestricted" proxy
16:17:07 <Shelikhoo[mds]> so that is one thing we might wants to improve
16:17:42 <Shelikhoo[mds]> thanks cohosh, I will first do more investigation and create a plan for how to proceed
16:17:52 <Shelikhoo[mds]> before asking everyone in the team to review it
16:18:04 <Shelikhoo[mds]> since it seems like a rather big task
16:18:09 <Shelikhoo[mds]> over
16:18:39 <Shelikhoo[mds]> okay anything more we wants to discuss in this topic?
16:19:11 <Shelikhoo[mds]> thanks cohosh, that refactor will be super helpful >~<
16:19:12 <cohosh> not from me, thanks for looking into this Shelikhoo[mds]
16:19:12 <Shelikhoo[mds]> proposal to get a separate domain name for bridge reachability tests
16:19:22 <Shelikhoo[mds]> add a record for each known *.torproject.net bridge
16:19:22 <Shelikhoo[mds]> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40504#note_3359873
16:19:28 <Shelikhoo[mds]> from cohosh ?
16:20:09 <cohosh> oh, yeah, we discussed this a few meetings past
16:20:58 <cohosh> and decided proxies should do a reachability check on snowflake.torproject.net (which i just realized is actually hardcoded as snowflake.bamsoftware.com in orbot)
16:21:22 <cohosh> but arma2 brought up the point that this bridge could go down and we don't want all of our proxies to stop polling if this happens
16:22:05 <cohosh> he had a good solution to the problem with our current design pattern and not wanting proxies to have to know about all available bridges
16:22:31 <cohosh> to have a domain with multiple records, for each known bridge
16:22:50 <cohosh> i think this will work, but does anyone else here with more DNS knowledge see a problem with that?
16:23:14 <meskio[mds]> the problem there is if a censor decides to block the domain name of the bridge, but not the one we use to test....
16:23:17 <Shelikhoo[mds]> I do think this plan has some tradeoffs we need to be explicit
16:23:23 <cohosh> hmm
16:23:24 <cohosh> yes
16:23:32 <Shelikhoo[mds]> firstly censor could do sni blocking
16:23:35 <Shelikhoo[mds]> yes
16:23:45 <cohosh> what if it's a CNAME record?
16:23:47 <Shelikhoo[mds]> or maybe they can block all but one bridge
16:23:50 <cohosh> instead of an A record
16:24:22 <Shelikhoo[mds]> CNAME record does not impact SNI sadly
16:24:43 <Shelikhoo[mds]> so the sni will still be the test domain
16:24:59 <cohosh> i see
16:24:59 <Shelikhoo[mds]> even if the dns is an alias
16:25:31 <Shelikhoo[mds]> also, this will also make it really hard for us to run more snowflake bridges in the future
16:25:44 <Shelikhoo[mds]> since we need to trust each bridge to the point we share a common certificate with it
16:25:59 <cohosh> maybe another option would be to have the proxy get CNAME records for the test domain and then try a websocket probe toe ach of those domains
16:26:39 <meskio[mds]> proxies could ask the broker for the list of bridges
16:27:02 <Shelikhoo[mds]> yeah, this might work, but some dns recursive only allow plain a or aaaa record
16:27:08 <meskio[mds]> but I like simpler options that don't depend on the broker, what you propose of requesting the CNAME and use it manually sounds good if is not too hard to implement
16:27:13 <cohosh> yeah that's another option, the broker has that bridge list json and can push information to proxies
16:27:16 <Shelikhoo[mds]> yeah, this might work, but some dns recursive server only allow plain a or aaaa record
16:27:35 <Shelikhoo[mds]> I do have another option
16:27:49 <Shelikhoo[mds]> that is when the proxy try to bridge's url and fail
16:28:19 <Shelikhoo[mds]> it will have report this error and reduce pool frequency
16:28:37 <Shelikhoo[mds]> as if it is asked to back off by broker
16:29:13 <Shelikhoo[mds]> with something like a Exponential backoff
16:29:19 <meskio[mds]> if one bridge goes down, doesn't that mean that all proxies will backoff?
16:29:39 <Shelikhoo[mds]> but as soon as it successfully connect to a bridge, it will recover from that backoff
16:30:06 <Shelikhoo[mds]> if only a single bridge goes down, the back off will be very tiny
16:30:41 <Shelikhoo[mds]> but if all bridge are inaccessible, the back off will quickly increase to avoid spamming the network
16:31:01 <cohosh> i see, you're not talking about a probe at all, just polling and if the connection fails backing off a bit
16:31:12 <cohosh> i think this will clash with https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/25598
16:31:57 <Shelikhoo[mds]> the broker poll frequency can be combined with this backoff
16:32:07 <Shelikhoo[mds]> or have a max() on these two
16:32:33 <Shelikhoo[mds]> so it will poll after at least the broker backoff second or the bridge connection second
16:32:39 <Shelikhoo[mds]> whichever is higher
16:32:47 <cohosh> i guess another question is: if one of the two current bridges is blocked for aproxy, do we want it to poll at all?
16:33:03 <cohosh> because in this backoff scenario it would still poll and then fail roughly half of its connections
16:34:07 <Shelikhoo[mds]> we could in theory let the proxy send the list of server it cannot connect to recently
16:35:07 <cohosh> that sounds like an increase in complexity of the matching logic that defeats the purpose of our current relay name superset pattern
16:35:21 <Shelikhoo[mds]> yes....
16:35:23 <cohosh> which we could consider
16:35:35 <Shelikhoo[mds]> so maybe we wants it to pool?
16:36:02 <Shelikhoo[mds]> even if it cannot reach some of the snowflake servers
16:36:37 <cohosh> i think i'm liking the simpler shorter term solution to do a manual DNS lookup of the test domain and reachability checks for each CNAME record
16:36:45 <Shelikhoo[mds]> this would be a easier design in my opinion
16:36:46 <meskio[mds]> 50% of failure is not the end of the world, and with the backoff this proxy will not poll that frequently, maybe is not that bad
16:37:23 <meskio[mds]> but I also like the CNAME record option
16:38:00 <Shelikhoo[mds]> cohosh: do you think this cname plan would also work for browser?
16:38:13 <Shelikhoo[mds]> I think this will require a doh request
16:38:14 <cohosh> oh good point
16:38:44 <meskio[mds]> we could use a TXT record, but not sure if this is available in the browser
16:39:14 <Shelikhoo[mds]> these CNAME and TXT will work in browser over doh
16:39:46 <Shelikhoo[mds]> but implementing DOH in js seems like a little complex
16:40:09 <Shelikhoo[mds]> unless there is some nice library we wants to import
16:41:17 <cohosh> ok i can look into it, if there isn't an easy option i think the poll frequency backoff is a good idea
16:42:03 <Shelikhoo[mds]> yeah, let's do some research about it outside meeting
16:42:14 <cohosh> i guess it should be additive increase / additive decrease to avoud losing the pool if a bridge goes down for everyone?
16:42:17 <Shelikhoo[mds]> there are also things like cors for doh
16:43:51 <Shelikhoo[mds]> my original thought was Exponential backoff + recover to initial 1 sec after a single success
16:44:20 <Shelikhoo[mds]> but I think both AIAD and Exponential backoff + instant recover would work
16:44:39 <cohosh> ok, this could use some more thinking and testing too
16:44:40 <Shelikhoo[mds]> we might need to run some numbers to find out the best option
16:44:48 <cohosh> thanks for the discussion
16:44:55 <Shelikhoo[mds]> nice!
16:45:13 <Shelikhoo[mds]> anything more we wants to discuss in this meeting?
16:45:52 <Shelikhoo[mds]> Thanks everyone!
16:45:53 <Shelikhoo[mds]> #endmeeting