myrmidex
  • Joined on 2026-02-08
myrmidex pushed to release/0.1.0 at lvl0/trove 2026-04-27 23:54:11 +02:00
cda1414cd8 9 - Add robots.txt handling with cache and politeness integration
myrmidex closed issue lvl0/trove#9 2026-04-27 23:53:57 +02:00
Crawler: robots.txt handling
myrmidex pushed to release/0.1.0 at lvl0/trove 2026-04-27 01:39:35 +02:00
264180cd36 chore - Move outcome → status mapping into CrawlOutcomeEnum methods
1538ceeb6e 11 - Gate ProcessCrawlJob with per-domain politeness lock
7171348370 11 - Add PolitenessService and crawler delay config
69aa5d9d3e 10 - Add /bot page with crawler identity and opt-out instructions
c80be24e6e chore - Extract mockFetchPageAction helper in ProcessCrawlJobTest
Compare 29 commits »
myrmidex closed issue lvl0/trove#11 2026-04-27 01:26:04 +02:00
Crawler: Per-domain politeness
myrmidex closed issue lvl0/trove#10 2026-04-27 00:41:29 +02:00
Crawler: User agent and /bot page
myrmidex closed issue lvl0/trove#14 2026-04-27 00:18:27 +02:00
Crawler: Queue worker
myrmidex closed issue lvl0/trove#12 2026-04-26 19:49:18 +02:00
Crawler: HTTP fetcher and content extraction
myrmidex opened issue lvl0/trove#29 2026-04-26 16:28:26 +02:00
Build first-party HTML content extractor (replace fivefilters/readability.php)
myrmidex opened issue lvl0/trove#28 2026-04-26 16:24:05 +02:00
URL-pattern pre-filter: skip non-HTML extensions before page row creation
myrmidex closed issue lvl0/trove#8 2026-04-26 16:09:22 +02:00
Crawler: Queue population
myrmidex opened issue lvl0/trove#27 2026-04-26 14:31:42 +02:00
Backfill command for crawler queue (catch pages missed by observer)
myrmidex closed issue lvl0/trove#7 2026-04-26 14:25:25 +02:00
Crawler Data Model
myrmidex closed issue lvl0/trove#5 2026-04-26 11:59:11 +02:00
URL submission form
myrmidex opened issue lvl0/trove#26 2026-04-26 11:53:40 +02:00
Rotate Livewire release_token per deploy for cache-busting
myrmidex opened issue lvl0/trove#25 2026-04-26 11:48:23 +02:00
URL submission: rate-limit message UI polish
myrmidex opened issue lvl0/trove#24 2026-04-26 11:48:02 +02:00
URL submission: tighten validation (max length, reject loopback/private IPs)
myrmidex opened issue lvl0/trove#23 2026-04-26 11:47:44 +02:00
URL normalization on pages.url (strip tracking params, canonicalize)
myrmidex opened issue lvl0/trove#22 2026-04-26 03:53:54 +02:00
Test environment hardening: APP_KEY override and Postgres test runs
myrmidex opened issue lvl0/trove#21 2026-04-26 03:53:43 +02:00
page_links FK on-delete behavior decision
myrmidex opened issue lvl0/trove#20 2026-04-26 03:53:35 +02:00
UrlDiscoveredListener: add tries and failed() handler