16:00:03 <meskio> #startmeeting tor anti-censorship meeting
16:00:03 <MeetBot> Meeting started Thu May 30 16:00:03 2024 UTC.  The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:04 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:07 <shelikhoo> hi~
16:00:08 <meskio> hello everyone!!!
16:00:11 <meskio> here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469
16:00:13 <meskio> ask me in private to give you the link of the pad to be able to edit it if you don't have it
16:00:15 <meskio> I'll wait few minutes for everybody to add you've been working on and put items on the agenda
16:00:23 <onyinyang> hello o/
16:04:14 <meskio> we have one discussion topic today:
16:04:17 <meskio> Add nginx rate limiting to address "TTP-03-001 WP1: Snowflake broker vulnerability"
16:04:22 <meskio> shelikhoo: is it you?
16:04:23 <shelikhoo> yes! It is from me
16:04:42 <shelikhoo> I just discovered this task was assigned to me
16:05:00 <shelikhoo> and I think the best way forward is to set it up on nginx
16:05:19 <meskio> yes, I think that makes sense
16:05:36 <meskio> I did assign it to you thinking that you could fix it while deploying the new broker
16:05:43 <shelikhoo> I will setup the necessary configuration on nginx
16:05:50 <meskio> great
16:06:03 <shelikhoo> yes, it is not automatic, but only a few line of config away
16:06:21 <meskio> :)
16:06:26 <shelikhoo> we still need to move the broker before it will be actually applied
16:06:34 <shelikhoo> which we don't have a set date yet
16:06:52 <shelikhoo> I will have a check to see if ACME is working as expected
16:07:12 <meskio> I guess this is the next discussion point:
16:07:20 <meskio> Is the broker migration ready to apply yet?
16:07:47 <shelikhoo> I think it is ready after I finished set the ip rate limit and checked acme renew
16:07:49 <dcf1> Yes, let me know when you would like me to apply the VM changes.
16:08:09 <dcf1> What I have to do is reallocate the RAM resources from the old broker to the new broker under our usage limit, and restart them.
16:08:09 <shelikhoo> we also need to move the domain and then apply a new certificate
16:08:59 <shelikhoo> or we could copy the certificate to avoid downtime
16:09:07 <dcf1> Something I just thought of is we need to use the same Let's Encrypt credentials, on the new broker, because there are CAA DNS records on torproject.org that limit which accounts can make new certificates
16:09:13 <dcf1> https://gitlab.torproject.org/tpo/tpa/team/-/issues/41462
16:09:21 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40319
16:09:43 <shelikhoo> associated with domain renewal
16:09:54 <dcf1> I had to write a tool, the last time this was a problem, to extract the account information: https://gitlab.torproject.org/dcf/autocert-account-id
16:10:20 <dcf1> Yeah copying the existing certificate into the certificate cache is a good idea regardless.
16:10:44 <dcf1> There's no downtime for renewal (it's a few seconds) -- as long as nothing goes wrong.
16:11:46 <dcf1> If something goes wrong you'll want to have the admin team ready to possibly revert the DNS change
16:12:34 <dcf1> Okay, then just tell me on what date you'd like me to do the resource reallocation on the VMs and restart both.
16:12:47 <dcf1> The old broker can keep running with reduced RAM resources until the migration happens.
16:12:54 <meskio> we can warn TPA and make sure they will be around when we do the change
16:13:25 <shelikhoo> yes, or we can add the caa record necessary for the new broker to issue the certificate
16:13:41 <dcf1> yeah either way
16:13:44 <shelikhoo> then copy the certificate to avoid the downtime
16:13:45 <meskio> yes
16:14:15 <shelikhoo> once we move the domain name, we have until the certificate expire to fix any issue with certificate renewal
16:15:00 <shelikhoo> I will also have a look if any other services are hosted on the broker
16:15:08 <shelikhoo> such as the probetest
16:15:26 <meskio> I recall probetest to be hosted with the broker
16:15:28 <cohosh> probetest is running on the broker iirc
16:15:41 <shelikhoo> okay, it also need to be moved...
16:15:56 <dcf1> Check the exiting installation guide -- anything running on the broker is documented there.
16:16:16 <shelikhoo> yes... I think I will also need to move it first
16:16:33 <shelikhoo> so 3 tasks, ip rate limit, acme check, and probetest
16:16:39 <dcf1> Oh nvm it's not in the installation guide :(
16:16:40 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Broker-Installation-Guide
16:16:43 <dcf1> (probetest)
16:16:50 <cohosh> shelikhoo: did you end up setting up sqs for the new broker?
16:17:00 <cohosh> i remember that was yet to be done the last time this came up
16:17:13 <cohosh> probetest has its own installation guide: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Survival-Guides/Snowflake-Probetest-Installation-Guide
16:17:30 <shelikhoo> cohosh: no, It is not setup, otherwise the it will begin to receive production traffics
16:17:47 <cohosh> ah i see, you're right
16:19:13 <shelikhoo> anyway I will proceed with the tasks listed
16:19:15 <shelikhoo> over
16:19:21 <cohosh> thanks shelikhoo
16:19:29 <shelikhoo> no problem
16:20:53 <meskio> nice, I guess shelikhoo will work on it and come back to dcf1  and others when is ready to do the change
16:21:53 <meskio> anything more on this topic?
16:22:22 <meskio> or any other topic to discuss before we move to the reading group?
16:23:21 <meskio> we have the paper "Communication Breakdown: Modularizing Application Tunneling for Signaling Around Censorship" for our reading group
16:23:24 <meskio> https://petsymposium.org/popets/2024/popets-2024-0027.php
16:23:56 <meskio> which comes out in a good time, as we've being discussing how to deal with signaling channels
16:24:17 <meskio> I can try to do a short summary if no one has a ready one
16:24:31 <dcf1> I read it but I don't have a summary ready.
16:24:32 <arma2> oh neat, the two-six people wrote a paper, and got it published
16:24:57 <meskio> the authors created a library of signaling channels designed to be reused in different contexts
16:25:38 <meskio> with signaling channels they mean censorship resistant channels that can be used to bootstrap other more stable, faster, cheaper transport channels
16:25:44 <meskio> (not their words...)
16:26:33 <meskio> they had some interesting design choices, like allowing each direction of the communication to happen over a different channel
16:27:08 <meskio> or to support channels implemented in many programming languages as plugins in their system
16:27:48 <meskio> they divided the plugins in three components that can be remixed:
16:28:06 <meskio> user model. that descrives a pattern so the channel data looks real for an observer
16:28:16 <meskio> transport. the transport layer of the channel itself
16:28:27 <meskio> encoding. the encoding of the data inside the transport
16:28:54 <meskio> those are the highlight I have in mind from the paper, maybe we can get into the discussion
16:29:24 <dcf1> In large part this is a systematization/formalization paper. So there is a lot of terminology introduced and decomposition of things into smaller components.
16:29:55 <dcf1> The division of a "channel" into the 3 components user model, transport, encoding (what meskio just said) seemed like the most significant one to me.
16:30:42 <dcf1> Table 1 on page 10 has examples of user model/transport/encoding, including ones that are newly implemented in their Raceboat framework.
16:31:08 <onyinyang> yeah, I agree with that analysis.
16:31:17 <dcf1> And in Table 2 on page 11 they demonstrate mixing and matching different upstream and downstream tunnels to effect a Lox exchange.
16:32:02 <dcf1> The explicit attention to user model is interesting, and they have thought-provoking things to say about it. This would be analogous to the over user simulator of SLitheen.
16:32:20 <meskio> I've being wondering how much the user model is needed as a separate piece from the transport, as it looks like it adds a lot of complexity to the system
16:33:01 <dcf1> They say the user model can be as elaborate / high-fidelity as you like, but it doesn't necessarily have to be. Table 1 has an "OnDemand" user model which is basically the no-op user model commonly used now.
16:33:24 <arma2> in race, there was a lot of emphasis on user models, see e.g. the raven paper, https://petsymposium.org/2022/files/papers/issue3/popets-2022-0068.pdf
16:33:27 <shelikhoo> there is a lot of trade off when it comes to user model
16:33:44 <shelikhoo> let's say most users don't send a lots of emails
16:33:47 <dcf1> But they also say a user model is not only for the censor: it may also be as simple as a rate limit to stay under application server limitations, etc.
16:34:10 <shelikhoo> while in many case it would take a few roundtrip to finish the signaling process
16:34:23 <dcf1> I haven't read the Raven paper yet, but yes, it seems very relevant. Good to see attention paid to it.
16:34:57 <shelikhoo> I think it is in general the traffic shaping issue's equivalent in signaling space
16:35:09 <cohosh> regardless of whether it's worth it to put time and complexity into user models, at some point we have to make a decision about which transports to use when, and where
16:35:33 <cohosh> for example, in tor browser at the moment we just try to find one domain front that works everywhere
16:35:50 <cohosh> but that is becoming less and less viable, and just not a great idea from a single point of failure standpoint
16:36:23 <cohosh> so this user model can also be a simple algorithm for selecting from available signaling channels and configurations
16:36:36 <meskio> yes, it doesn't look like this is looking into that problem, this framework seems to leave to the application the problem of finding a comvination that works
16:37:07 <meskio> AFAIK the user model is is used by the transport to select when to transmit data, not to select transports
16:37:52 <dcf1> I'm planning to talk to the authors about what would be involved in doing a channel based on encrypted DNS (https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/25874). The only thing that makes that nontrivial is the limited capacity of DNS messages, which requires some kind of packetization and multiple transmission, which they say their framework supports.
16:37:53 <cohosh> ah i see, this would be like a meta user model across all transports
16:38:24 <arma2> "combination of protocols" is complex in the user model world -- even in raven we focused on "when to send emails" but left out things like "when you do a connection to your imap server to see if you have emails"
16:39:19 <shelikhoo> to be fair I think 26 is not aiming for the exact same thing when it comes to anti-censorship
16:39:29 <meskio> dcf1: they define 4 communication modes, it could be that DNS implement only 1 and 2 and not socket...
16:39:45 <dcf1> Good point cohosh. Their figure 2 shows each channel containing its own user model (which could be reused in different channels, as in figure 7). But there could be a meta-model over multiple coupled channels.
16:40:14 <cohosh> i am interested in the question of, if we have multiple signaling channels, do we: a) try everything all at once, b) try things sequentially until they work, c) try to maintain some kind of location-dependent mapping in tor browser about which things to try where, ..etc
16:40:35 <richard> 👀
16:40:35 <cohosh> there is a chicken and egg problem here at some point ofc
16:40:40 <meskio> a) sounds scary
16:40:52 <meskio> a comvination of b and c could makes sense
16:40:53 <shelikhoo> it is not really practical to really much user action when it comes to signaling channel user model as it would often introduce too much delay in message
16:41:35 <cohosh> maybe d) if something works, we save that and try it first next time until it doesn't work anymore
16:41:44 <shelikhoo> in the same way their hide message in image won't really work in real world deployment
16:42:00 <shelikhoo> I think d is better, remember what worked last time
16:42:21 <arma2> cohosh: i like "d but with low probability start over each time", so people don't get wedged on the most-reliable-but-most-costly transport over time
16:42:33 <shelikhoo> as we the network environment is different for everyone
16:42:46 <dcf1> shelikhoo: To be fair to the authors, I think they are upfront about the tradeoffs of user models. Section 4.1 says "In some use cases, true behavioral independence is unnecessary and introduces significant performance loss. ... may only need to avoid exceeding some sending limit rather than make its messages adhere to a rich model of real user behavior."
16:42:48 <cohosh> arma2: that's a good idea
16:42:51 <meskio> yes, something like b + d sounds good
16:43:15 <arma2> related tor ticket: https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/111
16:43:33 <meskio> I think there are a lot of interesting ideas in this paper to copy, I feel RACEBOAT is too complex for what we want to do and we might want anyway to implement our own simpler library
16:43:52 <shelikhoo> dcf1: yes, that's true. It was more about what kind of research effort are spent time on...
16:44:56 <shelikhoo> I think we could borrow some idea, but their library is more for special agent use case, rathe than mobile user use case
16:45:23 <dcf1> I'm curious about the implementation but it's not online yet. https://github.com/tst-race/raceboat-pets2024
16:46:02 <arma2> shelikhoo: in terms of anonymity, should even the special agents use the mobile user use case? so they don't stand out as special agents? :) "depends how good the user models really are"
16:46:28 <dcf1> The systematization and decomposition of the problem is quite valuable, I think. But I do get the feeling, probably like you meskio, that it's a mix of concepts that are intrinsic to the problem, and design choices made by the framework.
16:47:12 <dcf1> In any case, it's clear there's a lot of thought and consideration behind this research, with a good connection to the past and the current status quo.
16:47:21 <meskio> yes, I agree
16:47:21 <shelikhoo> arma2: I actually mean it is okay for special agent to have something only works on their laptop, but for users the solution need to work on their mobile phone
16:47:39 <cohosh> yeah i really liked a lot of the terminology and framing
16:47:47 <shelikhoo> I think their library will be quite heavy...
16:47:59 <meskio> I tend to prefer not to overengineer software and evolve it with the needs of the real world, and this seems to be the other way around, trying to anticipate any problems before actually using it
16:48:18 <arma2> a lot of the reason their library was going to be heavy was that some of their signaling channels brought in many gigabytes of model data
16:48:27 <meskio> shelikhoo: yes, they do support java, python, go and c++ :D
16:48:36 <arma2> but even without that, yeah it is a lot of engineering
16:48:43 <dcf1> arma2: oh interesting, like a generative language model, as in Meteor?
16:48:55 <shelikhoo> I think it is more about pushing the limit of research, which is really valuable
16:49:03 <shelikhoo> but not easily to deploy
16:49:14 <arma2> dcf1: each individual signaling channel did its own thing, e.g. they had one from a different race team that embedded its messages in images that it posted to an image board, etc
16:49:28 <shelikhoo> just recall 50 MB limit
16:49:38 <dcf1> They have a comment in 7.1 about high-latency high-bandwidth channels: "Higher latencies also do not always imply low bandwidth: e.g. steganographic videos uploaded to 3rd-party streaming sites can involve lengthy encoding and upload delays but provide megabytes of data transfer at a time"
16:49:49 <arma2> (i have not read the pets paper, so i don't know what they included in the paper vs what was just part of race)
16:50:14 <dcf1> arma2: oh, I thought you were referring to large models being *transferred* over the signaling channel.
16:50:29 <arma2> ah ha. nope, it's for both sides to be synchronized and 'realistic'
16:50:43 <meskio> arma2: the paper includes very few transports
16:50:50 <meskio> from their list of transports something we could look into is the S3Bucket, might be have the same properties than SQS, but could be nice to have it in mind if amazon blocks our SQS setup
16:50:58 <arma2> but yeah it would be cool to live in a world where marionette's vision of agreeing on a channel during the channel setup works
16:51:46 <shelikhoo> I see there is also a redis protocol
16:52:09 <shelikhoo> and with so many database as a service thing
16:52:18 <arma2> meskio: they also had a cool mechanism where there are companies that do automated reactions to new cloud content, e.g. you post something to google cloud and the company auto mirrors it to amazon cloud,
16:52:39 <arma2> and the idea would be that the client could post to google cloud, and the bridge could fetch from the other cloud, and it would all be auto synced on the backend by these helper companies
16:52:39 <shelikhoo> we might be able to have a dozen of them if we support enough kind of databases
16:52:53 <dcf1> We've previously talked about Meteor which requires presharing synchronized copies of a model about 5 GB in size: https://meetbot.debian.net/tor-meeting/2022/tor-meeting.2022-02-03-16.00.log.html#l-67
16:53:44 <dcf1> https://github.com/net4people/bbs/issues/104
16:53:45 <meskio> shelikhoo: I wonder what cloud providers will give us databases in a domain/ip that is not specific to our account
16:54:00 <meskio> or how will that be used? public databases? is that a thing?
16:54:41 <shelikhoo> The domain ip typically will not be specific to account, as I recall
16:54:44 <meskio> arma2: yes, the idea of this paper of using different channel for each direction is interesting and we might need to consider it in the future
16:54:54 <arma2> meskio: https://petsymposium.org/2014/papers/paper_68.pdf might have suggestions on your S3 topic
16:55:00 <shelikhoo> but they do limit amount of concurrent connections, typically
16:55:02 <meskio> shelikhoo: nice, I don't know that much of AWS and the like
16:55:03 <arma2> ("cloudtransport")
16:55:55 <meskio> arma2: yes, I think they mention that paper, I'll try to look into it
16:56:01 <shelikhoo> meskio: I have yet to check aws, it was more about other providers for mongodb, redis...
16:56:33 <meskio> ohh, true, that's an interesting idea
16:56:37 * arma2 , mindful of time, stops introducing distractions :)
16:57:14 <meskio> :)
16:57:42 <meskio> any more on this?
16:58:08 <meskio> I guess we'll need to continue talking about signaling channels, but that was a very useful talk
16:58:14 <arma2> cohosh: does your meta signaling channel order-of-operations discussion have a gitlab ticket? or is it a subset of the 'unified signaling channel library' topic
16:58:19 <meskio> is a pity I will not be in PETS to meet the authors
16:58:55 <cohosh> arma2: there are some tor browser tickets i think, it was also the thesis of the session i ran last week at the tor meeting, heh
16:59:04 <arma2> ah ha, great
16:59:16 <cohosh> i am in the process of writing up better notes
17:00:31 <cohosh> here's one issue: https://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/42436
17:00:47 <cohosh> but i don't think there is a multiple channel issue yet
17:00:52 <arma2> meskio: i do plan to be at pets, but i have already found myself in the weird in-the-middle position between two-six and anticensorship team, so i'm not sure how much more i can be useful. but let me know if you want to ask/say something specific and i will do it
17:01:35 <dcf1> It is interesting that they take the position that rendezvous requires no shared secrets (which they call the property of "public addressability"). It's pretty close to what we wrote in the Snowflake paper, though we stopped short of saying it was a necessary property of all rendezvous systems: https://www.bamsoftware.com/papers/snowflake/#p23
17:01:39 <meskio> arma2: I'll think about it, but I think there will be others from the team at PETS
17:02:48 <meskio> seeing that we are already on the hour, if no one has anything else I will close the meeting
17:02:54 <shelikhoo> eof from me
17:03:31 <meskio> #endmeeting