Backfill command for crawler queue (catch pages missed by observer) #27
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Ticket #8 wires queue population via an Eloquent
Page::createdobserver. That covers the happy path: every newly createdpagesrow gets a correspondingpage_crawlsrow inserted synchronously.Failure mode: anything that creates a
Pagewithout firing thecreatedevent misses the queue. Examples:DB::table('pages')->insert(...), restoring from backup, manual SQL)insertOrIgnore(skips Eloquent events for performance)A scheduled backfill command catches these orphans.
Acceptance
App\Console\Commands\PopulateCrawlQueueCommand—crawler:populate-queue(or similar)App\Actions\PopulateCrawlQueueActionthat findspageswithstatus=DiscoveredAND no pendingpage_crawlsrow (outcome IS NULL), and inserts apage_crawlsrow for eachfirstOrCreatesemantics or equivalent on(page_id)filtered byoutcome IS NULLeveryFifteenMinutes()->withoutOverlapping(5)->runInBackground()inroutes/console.php— backfill cadence; observer handles the hot pathUrlService::host), command exit codesNotes
App\Services\UrlService::host()from #8 — no duplicate URL parsing