13:59:46 #startmeeting OONI Community meeting 2020-03-31 13:59:46 Meeting started Tue Mar 31 13:59:46 2020 UTC. The chair is hellais. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:59:46 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:00:22 Hello everyone, welcome to the March 2020 OONI Community Meeting! :):ooni::party_parrot: 14:00:46 Hope you and your loved ones are staying safe and healthy! 14:01:03 In these crazy times, we'll probably see more of each other online. :) 14:01:11 hello everyone 14:01:13 Yep, thanks! 14:01:16 Hello everyone. 14:01:45 Please feel encouraged to introduce yourselves asynchronously as you join. 14:02:11 Meanwhile, we have 5 topics on the agenda today: https://pad.riseup.net/p/ooni-community-meeting-keep 14:02:24 If there's anything else you'd like to discuss, please add it to the pad. :) 14:05:01 Hi, I’m Art. Just have interest in OONI. Contributes a bit in test list and translation but not very consistent. 14:05:28 Hi everyone. I am Sigi (ARTICLE 19 EA). Happy to meet everyone and eager to learn. :wave::skin-tone-2: 14:05:29 Hi, I'm Simone, I work for OONI, mainly on the client / measurement engine side 14:05:58 Great, thanks so much to all of you for joining us today! :) 14:06:08 As we have quite a packed agenda, let's get started. :) 14:06:19 #topic 1. Updates on Blocked Biafra Websites in Nigeria (Tunde) 14:06:21 did know before about this, well, honestly. just happened to be around lol. 14:06:33 *did not 14:06:47 Hi, I'm Tunde. I'm a Researcher at Paradigm Initiative, and also a current Open Technology Fund Fellow researching the use of tools like VPNs to circumvent censorship in Nigeria, Cameroon, Uganda and Zimbabwe 14:07:16 @babatunde.okunoye would you like to lead the discussion of the topic you proposed? 14:07:18 hello , i am Louis a digital rights lawyer and FOE consultant based in East Africa 14:07:45 Yes, I've been testing the blocked Biafra sites in Nigeria. 14:07:47 wow, it's start. 14:08:36 On 3 ISPs, I've found that about 9 - 10 still remain blocked while 3 are now accessible. Results uploaded to OONI website 14:09:04 @babatunde.okunoye do you mind sharing a bit more context about the websites (for those who may not be familiar with biafra)? 14:09:13 Ok 14:10:18 The websites carry information about the secessionist movement of Biafra, relating to the Nigeria-Biafra civil war in the 1960's 14:10:57 Tensions sometimes flare up and government are keen to block access to online content on this theme 14:11:26 @babatunde.okunoye it's interesting to hear though that 3 are now accessible. Do we perhaps know why or what changed..? 14:12:06 I'll have to do some research to understand what has changed. I don't have the answers now 14:12:27 Sorry for being a little late. I am Saptak, web developer. I am currently helping with UI implementation in ooni.org and also in some features in ooni explorer 14:13:33 @babatunde.okunoye if I recall correctly, the Nigerian government ordered the blocking of these sites, right? (they were included in a blocklist) 14:13:48 Yes 14:14:43 do changing the url • http:// <--> https:// or • www.website.com <--> website.com (with and without www.) give different results? 14:15:20 @babatunde.okunoye can you share the list of sites you are testing? 14:17:08 In general, you can find website test results from Nigeria here: https://explorer.ooni.org/search?until=2020-04-01&probe_cc=NG&test_name=web_connectivity 14:17:43 Some examples that @babatunde.okunoye mentions (re: blocking of biafra sites) include: 14:17:45 https://explorer.ooni.org/measurement/20200330T115801Z_AS0_1V6YAX9ppALYXcv2B3NmQqABBmdKdhnqX7QfbbAlA66iLKy1GV?input=http%3A%2F%2Fbiafraradio.com%2F 14:17:53 https://explorer.ooni.org/measurement/20200330T115801Z_AS0_1V6YAX9ppALYXcv2B3NmQqABBmdKdhnqX7QfbbAlA66iLKy1GV?input=https%3A%2F%2Fwww.ipobgovernment.org%2F 14:18:00 https://explorer.ooni.org/measurement/20200330T115801Z_AS0_1V6YAX9ppALYXcv2B3NmQqABBmdKdhnqX7QfbbAlA66iLKy1GV?input=http%3A%2F%2Fbiafraland.com%2F 14:20:04 I see that many of these measurements show "AS0" => That may be an error on our side, or perhaps the user has disabled the collection of ASN in the settings. Just a note that if you disable this, it's not possible to know which Internet Service Provider in the country is implementing the censorship. 14:20:52 The network which shows that is MTN 14:21:14 Apparently the resolver is returning 192.0.0.1, which is clearly wrong 14:21:51 @fortuna Sure I'll share the list as soon as possible 14:22:03 file under: bogons detection! 14:23:07 @babatunde.okunoye, you should keep running OONI, but you can also try http://www.blocktest.io/injection/biafraradio.com to check for DNS injection. Or use the script at https://github.com/Jigsaw-Code/net-analysis/tree/master/netanalysis/blocktest to do some local verification. 14:23:29 @babatunde.okunoye did you have other updates to share or questions to ask? 14:24:27 The resolver IP points to a google IP, so it’s likely triggered by an injection 14:24:57 No Maria, nothing as of now 14:25:29 +20 to that! 14:25:31 Alright, thank you @babatunde.okunoye. And thank you for your very important testing! 14:25:44 I'll need to first study the test results first in depth 14:26:15 @babatunde.okunoye and please reach out with any questions you may have. 14:26:25 #topic 2. Running of probe-cli for a custom list (by country or by local list generated by users). [name?] 14:26:30 Ok 14:26:34 Who proposed this topic? 14:27:37 Yes we have an issue for this, here it is: https://github.com/ooni/probe/issues/936 14:27:47 Yeah, we could totally assert blocking in that case. 14:28:33 It’s definitely a high priority feature for the next gen OONI Probe Desktop app and it’s going to probably be part of one of the first next releases after the official launch 14:29:36 In the meantime if you are a power user and need to do that now, I would suggest you build yourself a version of probe-engine and use the command line option `-input` in the `miniooni` tool: https://github.com/ooni/probe-engine 14:30:35 Alternatively, an easier way to test your own list of websites is by copy-pasting your list into the URL slots of OONI Run (https://run.ooni.io/) and then opening the generated link with your OONI Probe mobile app. 14:30:58 You can also test websites of your choice by tapping on the Websites card of the OONI Probe mobile app, and selecting the "Choose websites" button. 14:31:32 And as mentioned by @hellais, custom URL testing is an upcoming feature for the OONI Probe desktop app. 14:32:13 #topic 3. Future of automatically testing in raspberry pi and desktop [name?] 14:32:26 Is the person who proposed this topic here? :) 14:34:01 Similarly, we have plans to add support for automatic testing in both the OONI Probe mobile and desktop apps. 14:34:33 This is tracked in this epic: https://github.com/ooni/probe/issues/955 14:34:51 So far we have only done a bit of cursory research on how this could be achieved on macOS and Windows: https://github.com/ooni/probe/issues/1011 14:35:10 Some of you may remember "Lepidopter", the OONI Probe distribution for Raspberry Pis. This project has since been discontinued because purchasing and shipping hardware around the world wasn't something our small team could keep doing in the long-term. :P 14:36:01 If you are interested in doing automatic testing on desktop now, you can try out some of the code snippets which are documented inside of the doc referenced in ooni/probe#1011 14:36:26 I shared it here for convenience: https://docs.google.com/document/d/1baCWl-mPNlcOkth7Ap-SrukQUKMJ5Rqb7fMM5i0ekiM/edit#heading=h.83pd7oav038d 14:37:19 WRT Raspberry Pi support currently we don’t support building the new probe-cli on arm architectures (the processors used by raspberry pi devices), mostly due to the issues with building the measurement-kit C++ codebase there 14:37:35 @hellais thanks for sharing 14:37:44 We are tracking the work related to support the arm architecture in this issue: https://github.com/ooni/probe/issues/807 14:39:13 However, unless somebody picks this up, we don’t have plans to figure out how to build measurement-kit on arm architectures, but rather will wait until probe-engine is purely in golang and then it will be pretty easy to make arm builds 14:39:18 regarding supporting arm, though, we're rewriting most code in Go, so this will change the feasibility of OONI for raspberry 14:39:23 lol, we wrote the same 14:39:27 :) 14:39:28 We're trying to find a replacement for Raspberry Pi deployments, and so we're evaluating various options... Would having an old/cheap Android phone for the purpose of automatic OONI Probe testing be something reasonable/convenient for you? 14:41:30 Meanwhile, we can proceed to the next topic since we're quite tight on time 14:41:40 #topic 4. Storage data format that is easier to consume [fortuna]  - We are interested again in loading some of the data into BigQuery, but the current format doesn't allow us to do that easily. Revisit issue https://github.com/ooni/backend/issues/191 14:42:19 cc @federico ^ 14:42:55 @fortuna is it correct that the main issue you are having in consuming the data is primarly pertaining to how the files are layed out in the s3 filesystem buckets and not the format itself? 14:43:21 Hi there. So here at Jigsaw we are trying to get OONI into a Bigquery public dataset, so that people can easily query it. First, I have a few questions: • How large is the dataset? • How large is without the body data? • How fast is it growing? 14:43:50 (that would be lovely) 14:44:23 It's a large dataset, and the format is really not suitable for big data tools, so consumers are forced to roll out their own solution for error handling and recovery, which is very hard to get right, and also hard to make fast 14:44:38 o/ 14:47:56 We currently store some features of the data inside of a postgresql database which we also make available publicly here: https://github.com/ooni/sysadmin/blob/master/docs/metadb-sharing.md, which maybe is useful to estimate an answer to your question 14:47:58 @fortuna websites 2-7, 9, 12-14, 16, 18, 22 14:48:56 This database does not include the HTTP response bodies and makes use of the native postgres row compression, so keep that in mind 14:49:06 The data directory for the database currently is at 864 GB 14:50:19 @fortuna do you refer to the contents of the database only or the cans as well (e.g. the contents of webpages?) 14:50:52 It does not include the full response body content, but only the hash, simhash and headers of them 14:51:08 I don't need the body data, so if we can ignore it it would be great 14:51:52 We have done a lot of work on our database in the past weeks, so the growth needs to be taken a bit with a grain of salt, yet the size of the same database 2020-03-05 was 789 GB 14:52:23 I wonder if we can revisit We can keep looking at it. Meanwhile, my other question is if it's possible to revist https://github.com/ooni/backend/issues/191 One of the big challenges is that we have to consume the entire dataset (all countries, all types of tests, with body data) in order to extract what we want. If we could partition the data in a way that I only read what I want, that would already be a huge win 14:52:27 So that is approximately 2.9 GB per day 14:54:13 I imagine that's for all types of tests. I wonder how it breaks down by type of test 14:54:27 Maybe an easy thing to try out would be to use the postgresql database dump as a starting point to ingest all the metrics into BigQuery 14:55:48 @federico was also telling me that he was doing some experiments at reprocessing the OONI data and estimates it would take about 24h (excluding database writing times) to read all the data from cans using an ec2 cluster 14:57:02 (depending how big the cluster is!) 14:57:21 I think it took me a week to transfer all the data back when I first did it and something like $500 per transfer, when using the bucket 14:57:28 It is a bit tricky and requires quite a bit of effort for us to change the can/autoclaved data format. One thing which could be doable, though, is perhaps generating another output format which is more suitable, though that involves duplicating the data 14:58:59 It is however likely that it’s going to be network intensive and slower to do the processing outside of AWS 14:59:03 For the SQL dump, it would be great if it was available in columnar format somewhere 14:59:29 We would be more than happy to also host a mirror on the google cloud if we could get it sponsored there too :) 15:00:47 I'm trying to load it into Bigquery directly. We could try to request cloud storage, but I think just making the data available in a format people won't use won't help much. 15:01:03 For those who may be interested, this blog post explains how to fetch OONI data from the Amazon S3 bucket: https://ooni.org/post/mining-ooni-data/ 15:01:54 @fortuna what format do you think people would find more usable? 15:02:58 Something that makes it easy to load in bulk and analyze. The SQL format is probably ok. The problem is getting the dump. 15:03:33 Restructuring to allow read of what you need and removing the body sizes would be very helpful too 15:03:39 In future we might provide dumps in Parquet / protobuf / arrow / hdf formats. Any strong preference? 15:05:01 I've used protobuf a lot, and it's a nice compact format. However, a columnar format like Parquet is probably better, since it allows you to only read the columns you care about. 15:05:35 Bigquery supports loading from Parquet, so that's good 15:06:43 Maybe the easiest right now would be to provide dumps of the SQL database on a regular basis. 15:07:02 In json, protobuf or Parquet format 15:07:20 Then we could load it in Bigquery easily 15:08:19 Yes, we already provide a dump of our SQL database which is going to be automatically updated to the latest snapshot of the data: https://github.com/ooni/sysadmin/blob/master/docs/metadb-sharing.md 15:08:58 What's the data format? 15:09:15 It’s a postgresql database 15:10:18 Can I convert it to json without having to create a VM? 15:10:29 That’s currently the way we suggest to people interested in doing batch analysis of OONI data 15:10:43 I know people in here have a copy of it setup and can maybe provide you access if interested 15:11:02 #\endmeating? :) 15:11:28 If you need JSON files we already provide those inside of the JSONL tree of the s3 storage 15:11:58 @xhdix you're right to point out we're running late. ,:) If you and others are available, we could stick around for the 5th/final agenda topic? 15:12:04 Maybe we can chat about this out of band and I can help you setup a copy of the database or look into some other option 15:12:07 Oops, we can take this offline 15:12:58 Ah I see that the 5th topic is handled out of band. 15:13:17 I mean we can also discuss it here, if @xhdix wish 15:13:40 I was just trying to do some out of order execution to make sure we don't miss it 15:13:58 @xhdix would you like to discuss the 5th topic that you proposed? 15:16:00 They seem to be offline and we're 16 minutes late, so I guess we can discuss the last topic here at a later stage. 15:16:20 Thank you everyone for joining us today! Hope you stay safe and healthy. 15:16:35 We look forward to connecting more with you online over the coming months. :) 15:16:59 And we loom forward to chatting more with you during the next community meeting on 28th April 2020 (they're usually on the last Tuesday of the month). 15:17:11 Until then, take care, and hope you have a great day/night! :) 15:17:12 #endmeeting