Crawler: User agent and /bot page #10
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Original text was lightweight and accurate. This rewrite just locks in the design choices.
The crawler currently sends
User-Agent: TroveBot/0.1 (+https://trove.lvl0.xyz/bot)(set during #12 as a placeholder), but that URL 404s. This ticket makes the URL real and ensures the UA reaches every outbound HTTP request.Locked-in decisions
TroveBot/0.1 (+https://trove.lvl0.xyz/bot). NoMozilla/5.0 (compatible; ...)prefix — we don't pretend to be a browser. Honest identity./botpage tech: plain Blade view viaRoute::view('/bot', 'bot'). Uses the existing<x-layout>component. No Livewire (no interactivity needed). No controller.User-agent: TroveBotrules in robots.txt. The worker doesn't get pointed at arbitrary domains until #9 (robots.txt handling) lands, so the claim is true as of the actual production rollout. Accept that v0.1 is dev-only at this stage.https://forge.lvl0.xyz/lvl0/trove/issues). No email — avoids harvesting, transparent process.Page content
robots.txt(underUser-agent: TroveBot), per-domain rate limit (TBD per #11), follows ≤5 redirects, fetches only HTML, ignores non-HTML responseshttps://forge.lvl0.xyz/lvl0/trove/issueshttps://forge.lvl0.xyz/lvl0/troveAcceptance
Route::view('/bot', 'bot')registered inroutes/web.phpresources/views/bot.blade.php— uses<x-layout>, contains all content sections aboveconfig('crawler.user_agent')is the final v0.1 string (already is —'TroveBot/0.1 (+https://trove.lvl0.xyz/bot)')GET /botreturns 200, contains the UA string and the robots.txt opt-out exampleFetchPageActionincludes theUser-Agentheader (already covered? verify; if not, add an explicit assertion)/botroute and the bot's public contractOut of scope
/botpage can stay vague about exact rate; reference "polite, configurable" or similar)/still serves Laravel defaultwelcomeuntil search UI lands