13:03:00 #startmeeting network-health 2026-01-19 13:03:00 Meeting started Mon Jan 19 13:03:00 2026 UTC. The chair is hiro. Information about MeetBot at https://wiki.debian.org/MeetBot. 13:03:00 Useful Commands: #action #agreed #help #info #idea #link #topic. 13:03:11 o/ 13:03:21 #link https://pad.riseup.net/p/tor-nethealthteam-2026-keep 13:03:24 the new pad everyone! 13:04:07 all right who wants to start with this week updates? 13:04:13 the pad is dead, long live the new pad! 13:04:18 i can go 13:04:24 lol 13:04:37 i did more work on p183 wrt the anomaly report writing 13:04:49 and created a project for the documentation 13:05:09 which i put in my private namespace for now as we need to think where we want to have the final work 13:05:36 we have the result on gitlab pages: https://tor-anomaly-detection-59beaa.pages.torproject.net/ 13:05:43 nice that was great GeKo (IRC) 13:05:45 i plan to add more stuff over the next weeks 13:05:50 nice 13:05:55 as i still have plenty of material 13:06:03 but for the sponsor part i think we are done 13:06:20 I have been wondering if this could be a better format for our documentation that the current wiki 13:06:32 +1 :) 13:06:33 yeah, we could do that as well 13:06:34 but this can be a wider discussion for an in person meetup maybe 13:07:19 i've started thinking about what data we need for the anomaly algorithms we want to imlplement 13:07:24 *implement 13:07:43 hiro: what do you and sarthikg[mds] plan for the user data on the new website? 13:08:05 right now we have some .csv data available for the graphs 13:08:14 I am creating the user data computing logic in parser-rs 13:08:15 how is that supposed to look like with the website 2.0? 13:08:29 so far so good, but I am not able to estimate properly snowflake users 13:08:45 I spent all friday trying to understand why was that 13:08:59 i guess i need to look at that then 13:09:14 because that's what we basically need for what juga is working on 13:09:37 i was wondering whether we would have ideally some materialized table or view 13:09:47 ok, for the tuning of the algorithm we can still use what we have in the csv from the current metrics.tpo 13:09:59 holding that data so we can easily query that for arbitrary timeframes 13:10:30 yes so https://gitlab.torproject.org/tpo/network-health/metrics/datastore/-/blob/main/stats_tables.sql?ref_type=heads these would be the tables 13:10:40 yeah, but ideally we would query the db for that i think 13:11:02 and should map what we have right now on the csv (I mean same columns) 13:11:23 okay, great we do have that available then already 13:11:38 so the daily_relay_users has accurate numbers 13:11:52 but not the daily_bridge_users 13:12:15 okay, interesting. i'll take a closer look at those numbers 13:12:28 oh well, it does, but when there is a country or transport that is more predominant in one country I am doing a mistake it seems when estimating the interval windows 13:12:39 because i was concerned we calculate them in a way that they would differ from what we get on the website 13:13:06 so like if you look at russia for example the snowflake estimation is completly wrong, but not the other transports 13:13:07 the input for the tool only needs date and number of users/clients 13:13:44 yep 13:13:59 okay, i'll update team#393 accordingly 13:13:59 Uhm, which one of [tpo/anti-censorship/team, tpo/applications/team, tpo/community/team, tpo/core/team, tpo/network-health/metrics/team, tpo/network-health/team, tpo/operations/team, tpo/team, tpo/tpa/team, tpo/ux/team, tpo/web/team] did you mean? 13:14:04 so I am trying to figure out why 13:14:19 sorry, tor my friend: network-health/team#393 13:14:29 that's all from me 13:14:37 thank you! 13:14:43 on mi side, i'm just focused now on the anomaly tool, so far getting different values in each function for r and python 13:15:01 my update: was migrating the code to the new tables (json ingested with rust), but there were a bunch of changes in the tables, hence tracking down each change requires a whole rewrite of the structs representing the json. I am mostly done with that, and subsequently the getters to the right types. 13:15:07 also, i did a small experiment with changing the aggregator to use clickhouse streams instead of batches. the performance is like 20x faster. both changes paired are almost done, and in local testing. 13:15:13 lastly, I have tried to simplify the logic for calculating each field based on if the router is new or old. it would be a great help if everyone could review this, and suggest if i am missing any edge-cases. (descriptorParser doesn't handle stuff correctly at times) 13:15:13 https://gitlab.torproject.org/tpo/network-health/metrics/aggregator-rs/-/blob/feat/error-handling/src/service/bridge/resolver/is_running.rs?ref_type=heads 13:15:58 @juga check those "nan" values as we said, in my experience moving from R (or other mathematical programming environments) to some programming language, is where one finds issues 13:16:35 hiro: yeah, i'm checking that, but even not having nan as input or output, i'm getting different valuse... 13:16:52 probably the functions i use, i wonder if they're close enough, will continue investigating 13:17:13 sarthikg: I think GeKo (IRC) would be the person to help you with that... and file the appropriate issues on descriptorParser side if that's the case too ;) 13:18:50 hiro: sure, i'll check with GeKo (IRC) once! 13:19:13 sounds good! 13:21:06 ooook! Rohithh do you have an update? I do not want to pressure you but since you are around you might want to give yours too? 13:21:41 on my side I'm still working on nsa bandwidth route will keep updating 13:22:43 about the progress 13:22:53 ok thank you! 13:23:23 on my side I am working on the stats for relay and bridge users and solving a bunch of issues with current metrics.tpo that I haven't finished last week. 13:24:00 alright does anyone have something else for this week meeting? 13:24:18 * juga is good 13:24:31 me too 13:24:50 * hiro is groot too 13:24:55 me too 13:25:05 ok if everyone is groot, I'll end the meeting 13:25:05 #endmeeting