URL submission: tighten validation (max length, reject loopback/private IPs) #24

New issue

Closed

opened 2026-04-26 11:48:02 +02:00 by myrmidex · 0 comments

myrmidex commented

2026-04-26 11:48:02 +02:00

Owner

Context

App\Livewire\UrlSubmissionForm::submit() validates with ['required', 'url:http,https']. Two gaps:

No max length — an attacker can submit a 64KB URL and we'll happily try to write it. Affects DB row size + index size on pages.url (which is text, no inherent cap).
Loopback / private IPs accepted — http://127.0.0.1, http://192.168.x.x, http://10.x.x.x, http://[::1] all pass url:http,https. Inert in v0.1 (we don't fetch yet), but becomes SSRF when the crawler lands (ticket #12).

Acceptance

Add max:2048 to the URL validation rule (covers 99.9% of legit URLs; defends against pathological inputs)
Custom validation rule (or closure) rejecting:
- IPv4 in private ranges: 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16 (link-local)
- IPv6 loopback (::1), link-local (fe80::/10), unique-local (fc00::/7)
- localhost hostname literal
Same rule reused at UrlDiscoveredListener write site (extracted URLs from fediverse posts could also point at private addresses)
Tests covering each rejection case, plus a positive test that public IP literals (https://1.1.1.1) are still accepted (judgment call: probably yes for v0.1, can revisit)

Notes

The crawler will need its OWN guard at fetch time (DNS rebinding) — this validation is the first line, not the only line.
Consider using spatie/url or a similar lib rather than rolling host parsing by hand — parse_url is famously misleading.

## Context `App\Livewire\UrlSubmissionForm::submit()` validates with `['required', 'url:http,https']`. Two gaps: 1. **No max length** — an attacker can submit a 64KB URL and we'll happily try to write it. Affects DB row size + index size on `pages.url` (which is `text`, no inherent cap). 2. **Loopback / private IPs accepted** — `http://127.0.0.1`, `http://192.168.x.x`, `http://10.x.x.x`, `http://[::1]` all pass `url:http,https`. Inert in v0.1 (we don't fetch yet), but becomes SSRF when the crawler lands (ticket #12). ## Acceptance - [ ] Add `max:2048` to the URL validation rule (covers 99.9% of legit URLs; defends against pathological inputs) - [ ] Custom validation rule (or closure) rejecting: - IPv4 in private ranges: `127.0.0.0/8`, `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`, `169.254.0.0/16` (link-local) - IPv6 loopback (`::1`), link-local (`fe80::/10`), unique-local (`fc00::/7`) - `localhost` hostname literal - [ ] Same rule reused at `UrlDiscoveredListener` write site (extracted URLs from fediverse posts could also point at private addresses) - [ ] Tests covering each rejection case, plus a positive test that public IP literals (`https://1.1.1.1`) are still accepted (judgment call: probably yes for v0.1, can revisit) ## Notes - The crawler will need its OWN guard at fetch time (DNS rebinding) — this validation is the first line, not the only line. - Consider using `spatie/url` or a similar lib rather than rolling host parsing by hand — `parse_url` is famously misleading.