URL Discovery and Storage #4

Closed
opened 2026-04-23 01:32:53 +02:00 by myrmidex · 0 comments
Owner
  • Migration: discovered_urls (url, source, source_instance, source_post_id, engagement_count, first_seen_at, provenance json)
  • DiscoveredUrl model
  • Extract URLs from Mastodon Note content (HTML parsing)
  • Extract URLs from Lemmy Page objects (url + body)
  • URL normalization (strip tracking params, lowercase host)
  • Deduplicate within batch
  • Queue job ProcessFediverseBatch for extraction + storage
- [ ] Migration: discovered_urls (url, source, source_instance, source_post_id, engagement_count, first_seen_at, provenance json) - [ ] DiscoveredUrl model - [ ] Extract URLs from Mastodon Note content (HTML parsing) - [ ] Extract URLs from Lemmy Page objects (url + body) - [ ] URL normalization (strip tracking params, lowercase host) - [ ] Deduplicate within batch - [ ] Queue job ProcessFediverseBatch for extraction + storage
myrmidex added this to the v0.1 milestone 2026-04-23 01:32:53 +02:00
myrmidex added the
enhancement
label 2026-04-26 01:28:09 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lvl0/trove#4
No description provided.