Write posted_at from fediverse post to pages row #33

Closed
opened 2026-04-29 23:56:30 +02:00 by myrmidex · 0 comments
Owner

Context

FediversePost::publishedAt carries the timestamp of the fediverse post that surfaced the URL. This is a freshness signal — a URL shared on Mastodon yesterday is more interesting than one shared three years ago.

Currently UrlDiscoveredListener calls RegisterDiscoveredPageAction which does a firstOrCreate on pages but never writes posted_at. The field exists on the schema but is always null.

Acceptance criteria

  • UrlDiscovered event already carries discoveredAt (a CarbonImmutable). Use this as posted_at on the pages row — set it on INSERT only (don't overwrite if the page was previously discovered via a different post)
  • RegisterDiscoveredPageAction (or UrlDiscoveredListener) updated to pass posted_at as a create-only attribute in firstOrCreate
  • Tests confirming: posted_at is set on first discovery; re-discovery from a different instance does NOT overwrite posted_at

Notes

  • discoveredAt on UrlDiscovered ≈ the poll time, not the original post time. FediversePost::publishedAt is the actual post timestamp but it's not currently threaded through to the event. Decide at implementation: use discoveredAt (simpler, already available) or thread publishedAt through (more accurate). Document the decision.
  • posted_at will later inform ranking (fresher = higher) and re-crawl priority.
## Context `FediversePost::publishedAt` carries the timestamp of the fediverse post that surfaced the URL. This is a freshness signal — a URL shared on Mastodon yesterday is more interesting than one shared three years ago. Currently `UrlDiscoveredListener` calls `RegisterDiscoveredPageAction` which does a `firstOrCreate` on `pages` but never writes `posted_at`. The field exists on the schema but is always null. ## Acceptance criteria - [ ] `UrlDiscovered` event already carries `discoveredAt` (a `CarbonImmutable`). Use this as `posted_at` on the `pages` row — set it on INSERT only (don't overwrite if the page was previously discovered via a different post) - [ ] `RegisterDiscoveredPageAction` (or `UrlDiscoveredListener`) updated to pass `posted_at` as a create-only attribute in `firstOrCreate` - [ ] Tests confirming: `posted_at` is set on first discovery; re-discovery from a different instance does NOT overwrite `posted_at` ## Notes - `discoveredAt` on `UrlDiscovered` ≈ the poll time, not the original post time. `FediversePost::publishedAt` is the actual post timestamp but it's not currently threaded through to the event. Decide at implementation: use `discoveredAt` (simpler, already available) or thread `publishedAt` through (more accurate). Document the decision. - `posted_at` will later inform ranking (fresher = higher) and re-crawl priority.
myrmidex added this to the v0.2 milestone 2026-04-29 23:56:30 +02:00
myrmidex self-assigned this 2026-04-29 23:56:30 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lvl0/trove#33
No description provided.