16:00:12 <cohosh> #startmeeting tor anti-censorship meeting
16:00:12 <MeetBot> Meeting started Thu May 28 16:00:12 2026 UTC.  The chair is cohosh. Information about MeetBot at https://wiki.debian.org/MeetBot.
16:00:12 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:20 <cohosh> hello
16:00:23 <meskio[mds]> hello
16:00:35 <cohosh> here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469
16:00:41 <onyinyang[mds]> hihi
16:00:47 <cohosh> let us know if you'd like a link the editable pad
16:01:15 <gaba[mds]> hi
16:01:48 <cohosh> first we have an announcement
16:01:57 <Shelikhoo[mds]> hi~hi~
16:02:01 <cohosh> from meskio[mds] i think?
16:02:05 <meskio[mds]> yes
16:02:18 <meskio[mds]> I put it as an announcement, as don't think we need to discuss it
16:02:36 <meskio[mds]> anyway, I've been cleaning up and evaluating the list of signaling channels we have colected in the wiki
16:02:48 <meskio[mds]> https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Signaling-Channels/channels
16:02:53 <meskio[mds]> and I wrote some conclusions:
16:03:05 <meskio[mds]> https://gitlab.torproject.org/tpo/anti-censorship/team/-/work_items/192#note_3418348
16:03:16 <meskio[mds]> feel free to comment in the issue if you have comments to them
16:03:31 <cohosh> thanks!
16:03:40 <cohosh> next up is a discussion
16:03:57 <cohosh> oops i missed another announcement
16:03:59 <Shelikhoo[mds]> The next announcement is from me: here is a written report of TLS fingerprint diversification/imitation system
16:03:59 <Shelikhoo[mds]> feedbacks are more than welcome
16:04:03 <Shelikhoo[mds]> sorry....
16:04:10 <cohosh> thanks
16:04:20 <Shelikhoo[mds]> I typed it too late, it was my mistake
16:04:25 <cohosh> ok now onto the discussion
16:04:44 <cohosh> Guardian project asked me if we'd be willing to allow some new proxy type names for orbot
16:04:45 <Shelikhoo[mds]> yes
16:04:48 <cohosh> to show up in our metrics
16:05:08 <cohosh> so far, orbot proxies have reported the "iptproxy" proxy type
16:05:37 <cohosh> but now that more applications are using iptproxy and that library has exposed a means to change it, they are interested in setting more specific proxy type names
16:06:08 <cohosh> they proposed having not just a single "orbot" proxy name, but "orbot-ios" and "orbot-android"
16:07:10 <cohosh> i don't have a strong opinion, these metrics are useful to us for distinguishing different implementations, because we can track down bugs or configuration updates
16:07:50 <cohosh> the main thing i'd worry about is the storage requirements on prometheus metrics, but we could truncate these type strings to "orbot" for those
16:07:53 <Shelikhoo[mds]> I didn't think of a reason not to add these two names as well. I am happy with adding these 2 news
16:08:21 <Shelikhoo[mds]> I am happy with adding these 2 new names
16:08:42 <meskio[mds]> yes, the prometheus scalability if we keep adding names is worrisome
16:08:49 <cohosh> well they wanted potentially:
16:08:52 <cohosh> android/386
16:08:52 <cohosh> android/amd64
16:08:52 <cohosh> android/arm
16:08:52 <cohosh> android/arm64
16:08:54 <cohosh> darwin/amd64
16:08:57 <cohosh> darwin/arm64
16:08:58 <meskio[mds]> two more names should be fine as we don't have so many, but I wonder if we should have two separated ones
16:08:59 <cohosh> ios/amd64
16:09:02 <cohosh> ios/arm64
16:09:05 <cohosh> which seems a bit excessive to me
16:09:19 <cohosh> we could ask them to stick with android/darwin/ios
16:09:36 <Shelikhoo[mds]> oh, that is a little too much...
16:09:51 <dcf1> haha we're not running a telemetry service
16:10:07 <Shelikhoo[mds]> considering for each name, we are storing maybe 10 new items
16:10:11 <meskio[mds]> we could report to prometheus only orbot and keep in the collector metrics the full list....
16:10:23 <Shelikhoo[mds]> and not just a single counter for that specific type
16:10:46 <meskio[mds]> because we do produce collector metrics with the proxies, isn't it?
16:10:56 <cohosh> yeah, i think we should allow new identifiers in our metrics to the extent they are useful to us for debugging
16:11:08 <meskio[mds]> I mean, if those metrics will be useful for guardian project maybe is good to collect them...
16:11:28 <Shelikhoo[mds]> or we can support the name prefix thing I suggested earlier, and add maybe 2-3 new prefixs
16:11:38 <cohosh> their motivation is to see trends that would help them direct their outreach efforts
16:12:24 <meskio[mds]> we had discussed the idea of supporting a prefix in the past, maybe is time to do it
16:12:25 <Shelikhoo[mds]> so they can use the long detailed name in their codebase
16:12:39 <dcf1> so they want to count the proportion of os/arch among users who have turned on kindness mode?
16:13:01 <cohosh> dcf1: yes
16:13:03 <Shelikhoo[mds]> and we can, if proven necessary, add more prefix or names to count in a more detail way
16:13:45 <Shelikhoo[mds]> or have varying level of detail for different matrices
16:13:58 <Shelikhoo[mds]> like have only a count for longer names
16:14:10 <dcf1> I mean, do they have to do that counting via tor metrics descriptors? If it's fine-grained user counting, maybe they should have their users opt in to that and do it themselves?
16:14:23 <Shelikhoo[mds]> and aggregated prefix for detailed match/nat info
16:14:47 <cohosh> yeah i think asking them to do their own telemetry is a reasonable thing to do
16:15:00 <dcf1> like, in that case why not add system light/dark theme to the id string
16:16:01 <dcf1> I don't mean to shut it down out of hand. But we should reflect on what purpose the proxy type distinctions are actually meant to serve, that will help guide our decisions.
16:16:33 <Shelikhoo[mds]> I do think adding a consent step will make data less useful, as different users may have different tendency for telemetry
16:16:41 <cohosh> yep, for us the proxy types have been useful to track down bugs and feature rollouts
16:16:46 <dcf1> Like, we used it to make graphs in the research paper, we're using it for whatever we're using it for now, what do we need it to do, and is gp's proposed change compatible with that?
16:17:08 <cohosh> we don't distinguish between chrome and firefox webext installs for example, but that might actually be useful for debugging purposes
16:17:17 <cohosh> because of the different webrtc library implementations
16:17:27 <cohosh> and requirements for the installation of webextensions
16:17:50 <cohosh> so i could see a world in which distinguishing between platforms on a high level (mac vs ios vs android) would be useful
16:18:18 <cohosh> there's also differences in how users update the app on different app stores
16:18:35 <cohosh> and we've used these metrics to help debug rollouts to changes in proxy configurations in the past
16:19:20 <cohosh> given this, i'd be okay with something like orbot-[android|ios|mac] in CollecTor metrics and just the orbot prefix in prometheus
16:21:26 * cohosh tracks down issue where these metrics came in handy for debugging
16:21:42 <Shelikhoo[mds]> yeah, I think we can add maybe add the prefix support as requested, and initially use a few prefix
16:22:18 <Shelikhoo[mds]> and if necessary, add longer and narrower prefixes
16:22:32 <cohosh> https://gitlab.torproject.org/tpo/anti-censorship/team/-/work_items/142
16:22:44 <dcf1> as a question of design, there's the question of whether to break it in to separate fields, rather than have the N*M*...*Q*R combinatorial explosion of field1-field2-field3
16:23:06 <cohosh> here's one where we lost a bunch of snowflake proxies ^
16:24:53 <dcf1> otherwise it becomes a poorly specified string embedding of structured data inside of what is already structured data (the tor descriptor)
16:25:27 <dcf1> or I guess you *do* want find-grained counts of every possible field1-field2-field3 combination, not just a count of field1, a count of field2, etc.
16:25:34 <cohosh> dcf1: can you elaborate on what that would look like? at the moment all of our metrics are (-) separated fields
16:26:06 <cohosh> would this be an overhaul of metrics output to make it more structured?
16:26:27 <dcf1> ok so first let me say the way we do the snowflake descriptors has always struct me as wrong
16:26:40 <dcf1> snowflake-ips-iptproxy 37682
16:26:40 <dcf1> snowflake-ips-standalone 4888
16:26:40 <dcf1> snowflake-ips-webext 88772
16:26:40 <dcf1> snowflake-ips-badge 10219
16:26:40 <dcf1> snowflake-ips-total 139706
16:26:58 <dcf1> I think it's obvious that this should be more like how snowflake-ips works
16:27:03 <dcf1> snowflake-ips DE=51628,US=16865,IN=10444
16:27:07 <dcf1> I.e.
16:27:43 <dcf1> snowflake-ips-type iptproxy=37682,standalone=4888,webext=88772,badge=10219
16:28:11 <dcf1> Why? because we also have separate metrics which are not supposed to mix with the above ones:
16:28:22 <dcf1> snowflake-ips-nat-restricted 111526
16:28:22 <dcf1> snowflake-ips-nat-unrestricted 18181
16:28:22 <dcf1> snowflake-ips-nat-unknown 52827
16:28:33 <dcf1> This should obviously be
16:28:51 <dcf1> snowflake-ips-nat restricted=111526,unrestricted=18181,unknown=52827
16:29:49 <dcf1> Why? Because how is a parse supposed to know that "nat-restricted" is not just another proxy type, like "iptproxy", "standalone", etc.? There's no way to tell, that information has to be hardcoded in the parser.
16:30:39 <cohosh> ok, i am in support of this change
16:30:51 <dcf1> There's similar confusion with client-denied-count, client-unrestricted-denied-count / client-ampcache-count,client-sqs-count, etc. Parsers just have to know which counts belong to different pools.
16:30:57 <cohosh> time for a @type snowflake-stats 2.0 maybe
16:31:00 <dcf1> It may be too late to change it now.
16:31:03 <dcf1> I don't know.
16:31:57 <dcf1> But if suddently there are dozens of variations on snowflake-ips-orbot-ios-amd64, snowflake-ips-orbot-android-amd64, on to N*M*..., all that information has to be encoded in parsers.
16:32:22 <meskio[mds]> I think this change is a great idea, but we'll need to coordinate with the metrics team so they update their side before we do it
16:32:26 <dcf1> Or, like, institiute heuristics like "if it starts with 'nat-', then it belongs to the nat pool of statistics"
16:33:11 <dcf1> What I was suggesting earlier, of splitting into separate fields, I think doesn't work. Sorry, I hadn't fully thought it out.
16:33:37 <cohosh> ok no worries
16:34:02 <cohosh> i wouldn't mind an overhaul of snowflake metrics output if it's possible but we can leave that for a separate discussion
16:34:06 <dcf1> What I was thinking is, if the proposal is that each proxy should report some string like "orbot-ios-amd64", then maybe it would be better to report that as different fields, like "type=orbot","os=ios","arch=amd64"
16:34:41 <cohosh> got it, this is more like what prometheus allows
16:34:43 <cohosh> with labels
16:34:55 <cohosh> but we have a state space explosion problem there
16:34:59 <dcf1> It still makes sense for individual proxy reporting to split it into structured fields instead of having a poorly specified "stringly typed" thing, but for aggregate statistics counting I guess you would want to smoosh them all back together anywya.
16:35:22 <dcf1> I'll show you how the parsing code works in snowflake-graphs
16:36:07 <dcf1> https://gitlab.torproject.org/dcf/snowflake-graphs/-/blob/799d931084b428fa4612dc1e947a9432b9ac35a8/snowflake-stats#L146
16:36:35 <dcf1> Basically the parser needs an exhaustive whitelist of every descriptor field that might be observed, because it has to know, for each one, which statistics it affects.
16:37:13 <dcf1> E.g. "snowflake-ips-badge"? That alters desc.snowflake_type_ips. "snowflake-ips-nat-restricted"? That alters desc.snowflake_nat_type_ips.
16:37:30 <dcf1> Which is a consequences of the "bag of words" way in which snowflake-stats descriptors are currently structured.
16:37:36 <cohosh> yeah i have worked with that code a little bit and it can be painful
16:37:43 <dcf1> Sorry for the rant.
16:37:48 <cohosh> no problem at all
16:38:36 <cohosh> i see we have two orthogonal problems: the (lack of) conventions we're currently using for metrics data, and which fields to allow
16:39:46 <cohosh> because no matter what we have a problem when we allow an arbitrary number of nested fields and identifiers, for prometheus at least, we have some constraints
16:40:03 <cohosh> i can make an issue to discuss a better metrics structure, that could be a longer term project
16:40:17 <dcf1> Currently I'm not able to generate new snowflake stats CSVs because of https://gitlab.torproject.org/tpo/network-health/metrics/collector-rs/-/issues/48, but I expect that I will have to add new field support for the other new proxy type we added.
16:41:19 <cohosh> the bloco type, yeah
16:42:06 <cohosh> ok so the two things we could tell orbot are: we're willing to add just 'orbot' as a new proxy type, to distinguish from general 'iptproxy' use
16:42:09 <cohosh> or
16:42:32 <cohosh> we tell them we'll allow some small number of specific strings like orbot-[ios|mac|android]
16:42:42 <cohosh> or we tell them we're blocked on a metrics overhaul
16:43:07 <cohosh> sorry that's 3 different options heh
16:43:35 <dcf1> I think option 1, add 'orbot', is not objectionable. It should probably have been that from the start.
16:43:48 <dcf1> And maybe option 2 is ok, too.
16:44:06 <meskio[mds]> I don't have strong opinions here
16:44:10 <dcf1> Otherwise, if it's potentially dozens of new distinct types, yes, we probably want to change our metrics format.
16:44:13 <Shelikhoo[mds]> In general I think maybe adding 3 new names are fine...
16:44:37 <cohosh> ok
16:44:38 <Shelikhoo[mds]> I don't think we wants to block them on metrics overhaul as... it might take a while
16:44:51 <cohosh> thanks for the perspective on this
16:47:31 <cohosh> i'll tell them orbot is ok for now
16:48:05 <meskio[mds]> +1
16:48:23 <cohosh> and we can add more specifics later if the tradeoffs are worth it
16:48:31 <cohosh> ok let's end with interesting links
16:48:37 <cohosh> https://github.com/net4people/bbs/issues/603#issuecomment-4529373091
16:49:20 <theodorsm> I added it, rumors about relay traffic in russia, might block all p2p traffic not going through approved TURN servers
16:49:51 <Shelikhoo[mds]> oh no..
16:50:14 <cohosh> thanks for sharing it
16:50:53 <Shelikhoo[mds]> thanks for sharing this!
16:51:12 <cohosh> ok, anything else before we end the meeting for today?
16:52:01 <Shelikhoo[mds]> EOF from me
16:52:12 <meskio[mds]> nothing from me
16:52:26 <cohosh> #endmeeting