#debian-snapshot log

16:17:47 <h01ger> #startmeeting snapshot.debian.org
16:17:47 <MeetBot> Meeting started Mon May  6 16:17:47 2024 UTC.  The chair is h01ger. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:17:47 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:17:48 <noahm> Hey all. I'm here for the meeting, representing both the cloud team and my employer, who would like to provide a mirror of snapshot.d.o in the Microsoft Azure cloud.
16:17:53 <h01ger> #chair ln5
16:17:53 <MeetBot> Current chairs: h01ger ln5
16:18:04 <h01ger> ln5: so you can also say
16:18:12 <ln5> ok tnx
16:18:16 <h01ger> #topic agenda is at https://pad.sigsum.org/p/2024-05-06_snapshot.do
16:18:26 <h01ger> is the agenda ok or do we need anything else?
16:18:47 <h01ger> #info can be used by anyone to add noteworthy stuff to the log
16:18:53 <ln5> let's add noahm (hi!) offer
16:19:16 <h01ger> oh, nice one!
16:19:22 <noahm> updated the agenda to note that
16:19:27 <ln5> noahm: thanks
16:19:45 <h01ger> shall we start?
16:19:50 <ln5> so, feel free to add to the agenda if you figure out more
16:19:58 <ln5> h01ger: let's go
16:19:59 <weasel> h01ger: please.
16:20:07 <h01ger> #topic status updates
16:20:55 <ln5> what's the protocol here? i'm kinda lost :)
16:21:24 <weasel> somebody needs to run this meeting.
16:21:32 <weasel> ln5: do you want to?  h01ger, you?
16:21:38 <ln5> h01ger: would you run the meeting please?
16:22:22 <ln5> ok, i'll do something
16:22:37 <ln5> lucas said earlier: update about hw for new primary site: the machine has been built and mostly tested; ETA for DSA access to BMC is a week from today
16:22:40 <ln5> meh
16:22:45 <ln5> cut'n'paste fail
16:23:00 <ln5> lucas said earlier: hi, I made progress on my side, which I documented in https://lists.debian.org/debian-snapshot/2024/05/msg00000.html
16:23:13 <h01ger> (sorry, was fetching tee as i expected people with updates share them)
16:23:14 <weasel> that document is about s3 backend for snapshot
16:23:28 <h01ger> tea
16:23:30 <weasel> lucas: great news, and great summary.
16:23:42 <weasel> lucas: do you want us to consider the open questions here now?
16:23:56 <ln5> he's not around, will read backlog
16:24:06 <ln5> so let's take a stab at them now
16:24:37 <noahm> many of the questions that lucas is facing will also apply to an azure-hosted service, fwiw
16:24:56 <noahm> it will be very similar to an S3-backed implementation
16:25:05 <weasel> seems likely
16:25:24 <ln5> so one question is if we should come up with another kind of "snapshot mirror"?
16:25:44 <ln5> ie, not the type already running at lw
16:26:05 <weasel> yes, my gut feeling is that a sane approach would be for the primary site to just import things into its local storage,
16:26:16 <ln5> which iiuc has some quirks like files and db not appearing at the same time
16:26:18 <weasel> and then, once an import is finished, various mirror things can be triggered
16:26:50 <waldi> so option A?
16:26:52 <ln5> weasel: like what we have today, right?
16:27:00 <weasel> right now the only kind of mirror is to sync individual files to a secondary site, but it could be just as easily be a (parallel) upload to s3/azure
16:27:11 <weasel> waldi: yes
16:27:20 <ln5> individual files plus the database
16:27:27 <weasel> as far as farm content is concerned
16:27:52 <weasel> if we run full VMs in azure, it could also host a DB replica
16:28:11 <h01ger> if there are 3 implementations, we would have 2 mirrors, no?
16:28:23 <weasel> not sure if AWS would have anything to run the/a DB
16:28:34 <waldi> does the snapshot stuff have a log of changed things? because trying to find out which objects exists are no good idea on s3 and similar stores
16:28:52 <weasel> waldi: yes, there's something like that IIRC.  and if not, it's easy to add
16:29:03 <waldi> either a local db on the web node or rds
16:29:04 <ln5> a journal?
16:29:08 <waldi> ln5: yes
16:29:09 <weasel> (things also never change, there just are new things)
16:30:00 <weasel> right now the way to get the database copied to another place is pg streaming replication
16:30:03 <h01ger> arent there removals from snapshot.d.o for legal reasons too?
16:30:12 <weasel> they only get marked unreadable
16:30:17 <h01ger> ah, ic
16:30:30 <ln5> the upside of this (A) is that there's one and only one view
16:30:31 <weasel> (and yes, also there's a trivial list which an s3 backend could use)
16:31:55 <weasel> I don't really know if we want the DB in amazon and azure as well, or if we just want the storage parts there.
16:32:00 <ln5> what are the downsides with keeping with only a single primary?
16:32:03 <weasel> does anyone know?
16:32:40 <waldi> yes, there should be db copies
16:32:49 <weasel> ln5: if things break, stuff is broken until somebody gets around to fixing things.
16:33:07 <weasel> waldi: ok.  can be done, makes it more challenging.
16:33:20 <weasel> is this something we want from the start?
16:33:54 <noahm> is the db involved in serving of content, or just in tracking and generating repo metadata?  (sorry, not super familir with the architecture)
16:34:52 <weasel> the DB is what has the file structure.   the mapping of (archive, timestamp, path, filename) to <sha1 of the file>.
16:35:07 <weasel> the storage just has blobs named like their sha1
16:35:54 <noahm> so the db is in the critical path for file access, since it needs to map URIs to blobs?
16:35:58 <noahm> How big is the db?
16:35:58 <weasel> yes
16:36:12 <ln5> noahm: depends on what the client knows -- the db is needed to map file path (in url) to file content
16:36:28 <ln5> noahm: 60-70G iirc
16:36:35 <weasel> noahm: << 100gb on disk
16:37:23 <weasel> and right now our method of "mirroring" that is postgres streaming replication (backed by wal shipping)
16:37:52 <weasel> so there's a tight sync between primary and replica(s), including version and arch constraints
16:38:14 <waldi> what might be challenging as well: a high throughput require that the web frontend only issues redirects to the storage. but the storage does not yet know file names and content type it can tell the client
16:38:38 <weasel> (there is an older way of getting a secondary database from the days before PG had wal shipping, where we dump the metadata of each mirror run to a text file and then import it on the other side.  not sure if it has rotten)
16:38:46 <waldi> weasel: logical replication is easy in the meantime
16:38:56 <weasel> waldi: yes, also an option
16:39:04 <waldi> and does not have such limitations
16:39:15 <weasel> right
16:39:43 <weasel> in general, a web frontend really wants a local copy of the DB
16:40:13 <weasel> we don't have that right now at leaseweb (the DB we use at leaseweb.nl is actually at manda in .de), but that's because of local hw constraints.
16:40:56 <ln5> other considerations for which alternatives make sense?
16:41:10 <ln5> lucas seems to prefer starting at C, for example
16:41:24 <noahm> I wonder how hard it would be to cache the entire db in a giant nginx config file with a bunch of "location" directives.  I'm not sure that nginx would like loading 100 GB of config data, but I don't love the idea of a db in the critical path if it can be avoided.
16:42:11 <weasel> most (all?) of the dynamicly created stuff can be cached for <long time>
16:42:51 <waldi> noahm: no, you don't have that amount of memory
16:42:52 <weasel> the location directives would probably be many.  #files × #mirrorruns?
16:43:02 <ln5> needs quite frequent updates though, but yes -- an append-only config file for each "client" is basically what's needed
16:43:22 <noahm> there are hosts with 100+ gb of memory
16:43:37 <waldi> weasel: just to think about: it might be required to store the filename with the objects, which means the checksum is now over content and filename
16:44:08 <weasel> that would be a radical change
16:44:57 <jas4711> hi! fwiw, i am importing data into git lfs effectively creating another variant of snapshot.debian.org.  not yet sure it can scale to snapshot.debian.org sizes, but archive.d.o 2TB is no problem
16:45:01 <ln5> anyhow, let's continue design discussions in the "other" section?
16:45:10 <waldi> yes
16:45:20 <ln5> to get "updates" done
16:45:45 <ln5> jas4711: hi! please add an entry in "other" in https://pad.sigsum.org/p/2024-05-06_snapshot.do and we'll get there
16:45:47 <jas4711> offering this as a "alternative idea".  however i really hope you get current snapshot into better hosting so will not disturb :)
16:46:31 <ln5> cue next update item: i said earlier that the server for a new primary site is almost built and will be ready for DSA in ~1w
16:47:11 <weasel> \o/
16:47:17 <ln5> it's a 2x12x20TB machine very much like the one we specced earlier
16:47:44 <h01ger> \o/
16:47:55 <ln5> i will exchange ip addr/s and wg keys and whatnot with DSA later this week/early next
16:48:02 <ln5> any other updates?
16:48:37 <ln5> #topic open questions
16:49:07 <ln5> i guess some open questions from lucas report fit in this section; cf. https://lists.debian.org/debian-snapshot/2024/05/msg00000.html
16:49:29 <ln5> but we've talked about them in the previous already and will do more in next, maybe
16:49:51 <ln5> there was a question prior to the meeting about the the ETA, let me find it
16:51:16 <ln5> axhn asked "does 'rough ETA for resumed imports of all archives, "before october"' still hold?"
16:51:32 <ln5> and i think it holds fine
16:51:46 <axhn> thanks
16:51:58 <ln5> more open questions?
16:52:06 <axhn> I'll keep asking the next time :)
16:52:20 <ln5> :)
16:52:34 <h01ger> the software will - for now - stay more or less the same, or?
16:52:54 <h01ger> (eg sha1 but also its exists :)
16:53:03 <ln5> i guess that depends on who's going to own this... :)
16:53:09 <weasel> I'm more than open for anyone to change it,
16:53:27 <weasel> I don't expect to find any time to do redesigns myself.
16:53:38 <h01ger> ok, thats good enough as an answer for this for now.
16:53:41 <ln5> i mean, i'd love to fix things but don't have the time for that for a long while
16:53:42 <weasel> so whoever owns snapshot goings forward gets to decide whether to redesign, change things.
16:53:47 * h01ger nods
16:54:08 <weasel> I expect replacing sha1 with sha256/512 might be high on the list of things one would want,
16:54:14 <weasel> but it's probably not the most pressing issue
16:54:38 <weasel> which is why i'm sceptical of radical changes that require changing the addressing system
16:54:43 <ln5> i'm willing to try to own snapshot but will be busy getting things running in its current incarnation, before making changes
16:55:35 <ln5> if anyone else has more time and want to rebuild the thing i can be part of supporting things i understand, but not much more atm
16:56:03 <axhn> I could check my old databases (ten years) whether we ever had a md5 collision, and I strongly doubt that. So, going away from SHA-1 should happen some day but it's not really urgent.
16:57:07 <h01ger> .oO( and nobody was ever fired for buying IBM, so why change that? )
16:57:16 <h01ger> any other open questions?
16:57:17 <weasel> h01ger: it's MS today :)
16:57:26 <h01ger> weasel: gugle
16:57:28 <ln5> more open questions? or we move further
16:57:46 <ln5> #topic other
16:58:02 <waldi> i would just say that importing into azure/aws is not useful right now. because we can't use the storage format long enough
16:58:20 <weasel> "long enough"?
16:58:38 <h01ger> weasel: -v?
16:58:46 <h01ger> i ment waldi, sorry
16:59:21 <waldi> h01ger: what i just said. for high throughput we need to do redirects to the storage. for this the storage needs to tell the client what the filename is
16:59:41 <waldi> or clients will just store files named after the checksum
16:59:49 <h01ger> ic, thx
16:59:50 <waldi> but we also can't update existing objects
17:00:28 <waldi> so import now, fixup later is also not easy
17:00:29 <h01ger> apt should query for objects with sha256 hashes indeed, and stop using filenames :)
17:00:49 <waldi> h01ger: we have a web interface, so people use it manually, or?
17:01:09 <weasel> right now the web application sends a redirect to /file/<sha1> and that redirect is then magically dealt with in varnish and apache
17:01:16 <weasel> the client never sees the redirect
17:01:21 <h01ger> maybe we should give noahm some space to explain their plans
17:02:58 <noahm> would it not be possible to store the files as (for example) /pool/p/<checksum/pkg.deb  We can reference the file by its actual name in the Packages files, so apt will still work, and we've still got checksum-based deduplication.
17:03:37 <noahm> err, small typo, that was /pool/p/<checksum>/pkg.deb
17:03:55 <noahm> it would definitely be a rearchitecture of things, so not trivial.
17:04:23 <weasel> noahm: file names do not usually uniquely refer to content.  archive and time are also factors
17:04:57 <ln5> noahm: given the current architecture, would you want to host farms only or farms and the db?
17:05:01 <weasel> a given foo.deb name may not change its content, but snapshot makes no such assumptions (and I think we had such cases in the past)
17:05:23 <waldi> the archive does not make such promises
17:05:28 <noahm> right, which is why we still encode the checksum in the URI path
17:05:50 <noahm> just not as the filename itself, since apt cares about that.
17:06:22 <weasel> and then rewrite Packages files?
17:06:28 <noahm> yes
17:06:35 <weasel> snapshot doesn't do that right now
17:06:57 <weasel> it gives you a file system tree as it was on import time.  it doesn't particularly care that it's a debian archive
17:07:10 <weasel> sure, could be done, but that's a different piece of software :)
17:07:35 <weasel> (and it's probably not entirely trivial.  not all archives look exactly like the ftp.debian.org main one)
17:07:36 <noahm> right. I don't mean to suggest that this would be trivial.  But IMO it seems like it might scale better by virtue of taking the db out of the critical path.
17:07:46 <weasel> would it?
17:07:54 <weasel> the Packages file would have to be built somewhere
17:08:03 <noahm> the db would be involved in generating the packages files, but that's asyncronous.
17:08:14 <noahm> a db outage does not prevent clients from accessing the archive
17:08:19 <noahm> as it does today
17:08:52 <noahm> it also means that replica sites don't need a local db
17:08:56 <olasd> my experience with serving redirects to cloud storage buckets is you have to generate a somewhat short lived access signature with your bucket key, and within that signature you tell the bucket the filename / content type that you want it to present to the client
17:08:59 <weasel> sure, it's one option, but that's not the software we have.:)
17:09:26 <noahm> olasd: or you just make them all public.  since this is all public data anyway.
17:09:26 <weasel> a storage object also does not uniquely refer to one file
17:09:35 <waldi> olasd: s3 is able to do thatt, others not
17:09:36 <weasel> we have plenty of objects that are known by different names
17:10:24 <jas4711> re-generating packages files would also break pgp signatures and validation by apt so this is indeed a rather different approach and needs tooling to work
17:10:25 <noahm> yeah, and blob storage systems don't usually support a notion of symlinks.
17:11:30 <weasel> (and not all apts support http redirects)
17:11:47 <weasel> but maybe it's ok to ignore those in this day and age
17:12:13 <ln5> with 5m left of the meeting, i'd like to jump to "next meeting" and then back to jas4711 offer
17:12:48 <ln5> i propose monday june 10 at 1600 UTC for a sync like this one
17:13:28 <weasel> +1
17:14:02 <ln5> going....
17:14:30 <h01ger> works for me
17:14:33 <ln5> ... going, gone. 2024-06-10 16:00Z
17:14:35 <noahm> +1
17:14:44 <ln5> great, thanks
17:15:06 <ln5> noahm: want to explain more about what you'd like to do? with current or future software?
17:16:49 <weasel> jas4711: re git lfs.  I don't expect sheer size of the blobs to be an issue.  the question is how does it look over 80k commits of filesystem trees of 1.5M files each
17:17:15 <ln5> #agreed next meeting 2024-06-10 16:00Z
17:17:58 <waldi> jas4711: is on dated snapshot one commit, or do you have one tree with all snapshots?
17:18:32 <jas4711> weasel: i don't know yet, it is an experiment.  i'm playing with it on e.g. https://gitlab.com/debdistutils/archives/debian/ftp.debian.org
17:19:06 <noahm> ln5: the goal is to support a snapshot-like service as a scalable production service (e.g. can support an arbitrary number of Debian systems pointing apt at it) such that clients can be configured to point to admin-determined versions of the repo.  The admin can then roll forward to a new repo version in a controlled fashion to test and deploy updates.
17:19:27 <jas4711> for snapshot i would expect that you would have all files available in sub-directories debian/20240501T024440Z/, debian/20240501T024441Z/, debian/20240501T024442Z/ etc
17:20:01 <weasel> jas4711: that's not really how things could work, though, is it?
17:20:22 <weasel> jas4711: your "source" would be a debian mirror having /debian/, and you add it, then it changes, you update, etc
17:21:41 <jas4711> i think it is possible to cut in two ways: 1) one git repository with commits showing the evolution of the archives, 2) one git repository with exploded view a'la current snapshot.d.o
17:22:12 <jas4711> i'm not sure git lfs handle 90M files well. but 1.5M files is no problem
17:23:17 <jas4711> or even the 3M files of archive.debian.org, works fast on my laptop
17:23:27 <h01ger> in coordination with ln5 i intend to close the meeting in 3min, unless something super interesting comes up :)
17:23:45 <h01ger> (everyone can continue talking after the meeting, there just wont be logging)
17:23:46 <weasel> h01ger: I think closing it is fine
17:24:24 <ln5> thanks h01ger, and thanks all. i need to move away from keyboard.
17:24:50 <weasel> cheers
17:25:44 <h01ger> thank you all!
17:25:47 <h01ger> #endmeeting