URL submission: tighten validation (max length, reject loopback/private IPs) #24

Closed
opened 2026-04-26 11:48:02 +02:00 by myrmidex · 0 comments
Owner

Context

App\Livewire\UrlSubmissionForm::submit() validates with ['required', 'url:http,https']. Two gaps:

  1. No max length — an attacker can submit a 64KB URL and we'll happily try to write it. Affects DB row size + index size on pages.url (which is text, no inherent cap).
  2. Loopback / private IPs acceptedhttp://127.0.0.1, http://192.168.x.x, http://10.x.x.x, http://[::1] all pass url:http,https. Inert in v0.1 (we don't fetch yet), but becomes SSRF when the crawler lands (ticket #12).

Acceptance

  • Add max:2048 to the URL validation rule (covers 99.9% of legit URLs; defends against pathological inputs)
  • Custom validation rule (or closure) rejecting:
    • IPv4 in private ranges: 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16 (link-local)
    • IPv6 loopback (::1), link-local (fe80::/10), unique-local (fc00::/7)
    • localhost hostname literal
  • Same rule reused at UrlDiscoveredListener write site (extracted URLs from fediverse posts could also point at private addresses)
  • Tests covering each rejection case, plus a positive test that public IP literals (https://1.1.1.1) are still accepted (judgment call: probably yes for v0.1, can revisit)

Notes

  • The crawler will need its OWN guard at fetch time (DNS rebinding) — this validation is the first line, not the only line.
  • Consider using spatie/url or a similar lib rather than rolling host parsing by hand — parse_url is famously misleading.
## Context `App\Livewire\UrlSubmissionForm::submit()` validates with `['required', 'url:http,https']`. Two gaps: 1. **No max length** — an attacker can submit a 64KB URL and we'll happily try to write it. Affects DB row size + index size on `pages.url` (which is `text`, no inherent cap). 2. **Loopback / private IPs accepted** — `http://127.0.0.1`, `http://192.168.x.x`, `http://10.x.x.x`, `http://[::1]` all pass `url:http,https`. Inert in v0.1 (we don't fetch yet), but becomes SSRF when the crawler lands (ticket #12). ## Acceptance - [ ] Add `max:2048` to the URL validation rule (covers 99.9% of legit URLs; defends against pathological inputs) - [ ] Custom validation rule (or closure) rejecting: - IPv4 in private ranges: `127.0.0.0/8`, `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `169.254.0.0/16` (link-local) - IPv6 loopback (`::1`), link-local (`fe80::/10`), unique-local (`fc00::/7`) - `localhost` hostname literal - [ ] Same rule reused at `UrlDiscoveredListener` write site (extracted URLs from fediverse posts could also point at private addresses) - [ ] Tests covering each rejection case, plus a positive test that public IP literals (`https://1.1.1.1`) are still accepted (judgment call: probably yes for v0.1, can revisit) ## Notes - The crawler will need its OWN guard at fetch time (DNS rebinding) — this validation is the first line, not the only line. - Consider using `spatie/url` or a similar lib rather than rolling host parsing by hand — `parse_url` is famously misleading.
myrmidex added this to the v0.2 milestone 2026-04-26 11:48:02 +02:00
myrmidex self-assigned this 2026-04-26 11:48:02 +02:00
myrmidex added the
enhancement
label 2026-05-01 01:01:59 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lvl0/trove#24
No description provided.