#tor-dev log

02:03:25 <isis> #startmeeting
02:03:25 <MeetBot> Meeting started Wed May 13 02:03:25 2015 UTC.  The chair is isis. Information about MeetBot at http://wiki.debian.org/MeetBot.
02:03:25 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
02:03:40 <isabela> :)
02:03:55 <isabela> isis: is it bi-weekly or weekly?
02:04:14 <isis> bi-weekly, but both Yawning andi missed the last one
02:04:37 <isabela> ok, next one in two weeks? I should add it to the calendar
02:04:48 <isis> my right index finger is currently being held together with superglue… so i might be typing slow
02:04:59 <isabela> !
02:05:01 <isis> isabela: that would be great! thanks
02:06:12 <isabela> what happened to your finger?
02:06:29 <isis> so, i'll reportback first, and then whoever wants to go next can go, then questions/discussion/brainstorming/etc
02:06:50 * isabela have a point at discussion time (roadmaps)
02:06:51 <isis> isabela: uh, i got in a fight with a very sharp knife and lost
02:07:17 <isis> apparently, this is why i can't have nice things :)
02:07:28 <isabela> :( fuen
02:07:29 <isis> it's healing okay though
02:09:12 <isis> these last few weeks i worked on finishing #12505. in the process, because the tasks turned out to be much more intertwined than i had expected, i finished #12029, #11330, #1839, and very nearly all of #12506
02:10:26 <isis> now i am currently a tiny bit stuck on whether i should do a bit of bending over backwards to keep the new hashring structures compatible with the old database schema
02:12:27 <isis> or if i should update the schema and transition the data to the new one (this would give us sub-hashring persistence, so e.g. bridges for email-riseup.net sub-hashring would always go to riseup.net users and bridges for email-gmail.com sub-hashring would always go to gmail.com users)
02:13:02 <isis> or if i should just start doing #12030 now
02:13:22 <isis> and switch to the new databases outlined in prop#226
02:13:53 <isis> i'd kind of prefer option #3, but that is a lot of changes to be making at once on a live system
02:15:16 <isis> okay, i think that is it for me
02:15:35 <isis> who would like to go next?
02:17:15 <mikeperry> do we have a dcf? I have some domain fronting questions/thoughts
02:17:18 <isis> okay, perhaps i am not doing so well at coraling people into attending meetings…
02:18:05 <isis> mikeperry: i do not see a dcf, no.
02:19:27 <isis> #action isis send out announcement email for pt+bridges meeting a day or so in advance
02:19:54 <mikeperry> the main takeaways from my last tor-dev post were: a) if we make domain fronting the default way of getting bridges in Tor Launcher, it will be used by TBB, Tails, TorBirdy, Tor Messenger, and probably also OrBot. We're going to need a lot more bridges for that than 20% of the current HTTPS pool size, I think. IMO, this should be the biggest pool, even if separated from the HTTPS one
02:20:58 <mikeperry> and b) can we get analytics on the probing? What if Tor Launcher gave you a request parameter that said the last set of bridges you gave it just failed.. like &justfailed=obfs4
02:21:53 <mikeperry> if you could break the count of that parameter by GeoIP country and export that statistic, it might tell us which countries are either harvesting lots of IPs, or blocking some transport by DPI
02:24:59 <isis> hmm, so you wouldn't tell me which bridges in particular failed, but it would be like POST /report?justfailed=obfs4&cc=cn ?
02:25:38 <isis> yes, i can easily do that, once prop#226 is done
02:26:21 <isis> currently the structure for persisting, collecting, and studying data is… um… nice words nice words… "lacking"
02:28:03 <mikeperry> yes, though you would be the one inferring the country code
02:28:58 <isis> okay, so the connection looks like: TorLauncher → BridgeDB's Domain Front → BridgeDB
02:29:13 <isis> and from #13171, i pull out the header with the IP address
02:29:17 <mikeperry> with an X-DomainFronted-For header with the original IP
02:30:06 <isis> okay, yeah, that is a great idea
02:31:26 <isis> and would TorLauncher also want/need a way to query like GET /recommended_transport ?
02:32:08 <mikeperry> you may want a filter that ensures this request can't be counted more frequently than the current IP turnover rate for the hash rings, both to guard against DoS and to not overcount users who keep trying over and over again with the same bridge lines because they don't know any better
02:32:25 <isis> (meta note: i guess the floor is open for discussion/questions/brainstorming now, but if anyone else shows up and would like to reportback, please feel free to jump in and do so)
02:32:26 <mikeperry> yes! /recommended_transport could use GeoIP to decide the answer
02:32:42 <mikeperry> which would greatly simplify the UX for deciding what transport to use for your situation
02:32:58 <mikeperry> that is a great idea
02:34:01 <isis> oh… mapping IPs to the time they last queried is harder…
02:34:32 <isis> it means i would have to store IP addresses… which there really isn't any way to do that privately
02:34:37 <mikeperry> oh right, the hashring mapping is stateless.. hrmm :/
02:34:56 <isis> hashring mapping?
02:35:15 <isis> oh, you mean like IP → which bridges
02:35:16 <isis> ?
02:35:19 <mikeperry> yes
02:35:40 <isis> uh… that is "pseudo-deterministic"
02:36:06 <isis> meaning that, yes, it would be entirely deterministic iff we didn't add/remove items from the hashring
02:37:38 <isis> but since we re-parse bridge descriptors and update the hashring every 30 minutes, there is a chance inversely proportional to the size of the hashring and the size of the set difference over the old and new hashrings, that the new "deterministic" mapping of IP → which bridges could produce a different answer
02:37:45 <mikeperry> what about a post-processing step then? if you recorded the statistics as (hashring_position, hashring_time_epoch, GeoIP_country) tuples for each transport, you could just count uniques
02:38:14 <mikeperry> ugh
02:38:23 <mikeperry> so add a reparse_epoch to that tuple? ;)
02:38:52 <isis> there is a reparse_epoch, see bridgedb.schedule.ScheduledInterval
02:41:40 <mikeperry> though if reparse_epoch changes much more frequently than hashring_time_epoch, we may have problems counting uniques :/
02:45:05 <mikeperry> maybe we just ignore reparse_epoch? how much does reparsing change what bridges you get?
02:45:16 <isis> by "reparse_epoch", you mean the frequency by which BridgeDB reparses bridge descriptors (i.e. 30 minutes)
02:45:21 <mikeperry> yes
02:46:15 <isis> and by "hashring_time_epoch", you mean the current rotation interval for any rotating sub-hashring the distributor might have, i think
02:46:29 <mikeperry> yes, for that transport
02:50:22 <isis> for that transport? the sub-hashrings (and sub-sub-hashrings, ad infinitum) are *mostly* bridge-type agnostic
02:51:35 <str4d> The idea (IIUC) is to be able to track censorship levels for each transport type, and BridgeDB is just a convenient place to do that.
02:53:34 <mikeperry> isis: does the sub-sub-hashring name/type also need to be included in the tuple then, in addition to the position? or is the position alone enough?
02:53:35 <str4d> So I guess the only reason for tracking it per-subring would be if the subrings have non-overlapping user sets...?
02:54:04 <isis> so i am imagining a redis store which has a SET for each transport type, and then we use mikeperry's idea to use "(hashring_position, interval, country_code)" as a unique string for a report during that period that those bridges were blocked
02:55:10 <mikeperry> str4d: yes. also, the subrings as I understand them mostly represent different types of restrictions on the bridges you get (like TCP port, IPv4 vs IPv4, etc). I thought that transport type was also something that caused a subring to be made, but I really only barely understand bridgedb's operation ;)
02:55:14 <isis> then, at the end of each interval, we clear each set, and increment a counter for "(transport_type, country_code)", iff there was a matching item in the set
02:56:28 <isis> oh, yeah, if we're tracking per sub-hashring, then the unique string should be like "(hashring_name, hashring_position, interval, country_code)"
02:57:07 <isis> i mean, that is only if we want this system to also aply to the HTTPS and email distributors
02:57:16 <isis> s/aply/apply/
02:58:09 <isis> which we don't, i guess, because then we'd need something like a way to reportback "hey! everything you just gave me doesn't work!" over email… and ugh
02:59:35 <isis> mikeperry: okay, that is totally doable
03:00:14 <isis> mikeperry: will users ever be hitting BridgeDB's domain front over Tor?
03:03:39 <mikeperry> hrmm.. maybe? I think we shouldn't rule it out.. how are Tor users handled by the HTTPS distributor again?
03:03:54 <mikeperry> are they treated like a single IP, or otherwise any differently than any other non-Tor IP?
03:04:20 <mikeperry> (maybe we could count them as country code "Proxies" or something)
03:05:56 <mikeperry> my guess is that it is an edge case that we shouldn't spend a lot of time making perfect, but at minimum we should ensure that the Tor IPs don't affect our country counts for non-Tor IPs
03:06:57 <mikeperry> the web tells me that MaxMind has a special GeoIP country code for proxies
03:07:03 <mikeperry> so it might "just work"
03:07:38 <mrphs> GeKo, mikeperry: just upgraded to 5.0a1 -- still have issues with the resize thing :/ had to disable it.
03:07:55 <isis> currently, they are treated like a single IP with four disjoint subgroups (the subgroup is deterministically computed from the client's exit node, so using "New Tor Circuit For This Site" would give a client up to four sets of bridges in a time period)
03:08:00 <mikeperry> hell yeah, represent nation "A1" ;)
03:08:03 <mikeperry> https://dev.maxmind.com/geoip/legacy/codes/iso3166/
03:09:42 <isis> no, A1 is not applied consistently, nor are Tor exits given the A1 classification by Maxmind
03:09:54 <isis> iirc
03:12:14 <isis> nope:
03:12:18 <isis> In [81]: import pygeoip
03:12:23 <isis> In [82]: geo4 = pygeoip.GeoIP('/usr/share/GeoIP/GeoIP.dat')
03:12:25 <isis> In [83]: geo4.country_code_by_addr('106.187.37.158')
03:12:26 <isis> Out[83]: 'JP'
03:12:33 <isis> that's my exit
03:12:39 <isis> and it is in japan
03:13:17 <mikeperry> was the package that installed /usr/share/GeoIP/GeoIP.dat updated since you started that exit on that IP?
03:13:32 <mikeperry> (and when was the upstream version of that package released?)
03:14:46 <isis> that exit has been running for like three or four years, so yeah
03:16:15 <isis> and that DB is the latest version from maxmind from 1 April 2015
03:16:36 <isis> anyway
03:17:22 <isis> there's a few more things we'd need to iron out for the design of the TorLauncher Distributor, but the blockage statistics is definitely doable
03:17:59 <mikeperry> also /recommended_transport ftw
03:18:18 <isis> yeah, less probing
03:19:07 <mikeperry> maybe in aggregate, but I suspect the statistics will show us that IP blocking is common, and will be the main reason why users try another transport type
03:19:30 <mikeperry> but we should get the data for this before just guessing and changing things, IMO
03:21:18 <isis> would it be mean to respond to e.g. US clients asking for /recommended_transport with "obfs3" or "scramblesuit" first?
03:21:35 <mikeperry> recommended_transport mainly gives us more agility and localized approaches. that's why I'm excited about it
03:22:36 <isis> rather than going straight to the latest-and-greatest for a country which only does occassional blocking (mostly depending on ISP and corporate/university network, etc.)
03:22:39 <mikeperry> good question. do we conserve our favorite transports for other users in other countries?
03:22:53 <mikeperry> I wonder if we can find other metrics to help answer that..
03:23:26 <mikeperry> I bet actually though, US censorship is pretty fucking hardcore where it happens
03:23:42 <mikeperry> we have the best gear after all... everyone is buying their shit from us
03:23:53 <isis> (meta note: i'm going to end the meeting at 03:30 UTC)
03:24:12 <mikeperry> and its used in libraries, schools, companies, etc
03:25:21 <mikeperry> so maybe we don't try to guess anything about recommended_transport without more data, but the API should be there so we can adjust as needed. like if Iran suddenly blocks SSL or anything that looks encrypted during their elections, then at that point we'll want to tell .IR users to use FTE
03:26:24 <isis> ack
03:26:51 <isis> bridgedb is almost going to need an admin interface :P
03:26:54 <isis> (kidding)
03:27:47 <isis> okay, more questions? comments?
03:28:04 <isis> isabela: did you have a request for roadmapping?
03:28:32 <isis> any more Tor Browser resizing bug reports? ;)
03:33:30 <isabela> sorry
03:33:41 <isabela> was distracted
03:34:18 <isabela> I am pinging teams to update their roadmaps ->> https://trac.torproject.org/projects/tor/wiki/org/roadmaps/TorObfuscation
03:34:32 <isabela> I will add a status column there for you folks
03:35:10 <isabela> we missed april, so I am asking to look at march and april - move things around if you didn't got a chance to do it and you either doing in may or will do in another month
03:35:24 <isabela> that's it
03:36:18 <isis> isabela: okay, will do
03:36:39 <isabela> tx
03:36:58 <isis> isabela: i'll make sure Yawning knows too
03:37:13 <isabela> :)
03:38:12 <isis> i forgot to mention that i also revised BridgeDB's error pages to make them cuter: http://static.inky.ws/image/5205/error.png http://static.inky.ws/image/5206/not-found.png http://static.inky.ws/image/5207/maintenance.png
03:39:07 <isabela> aww
03:39:27 <isis> i am excited about the maintenance page because i get like 5 false bug reports every time the thing goes down for reparsing
03:39:55 <isabela> good point
03:40:12 <isis> :)
03:40:33 <isis> okay, i think the meeting is done?
03:40:44 <isis> #endmeeting