Add The Guardian as an RSS feed #37

Closed
opened 2025-07-17 08:06:25 +02:00 by myrmidex · 0 comments
myrmidex commented 2025-07-17 08:06:25 +02:00 (Migrated from codeberg.org)

Summary

Add The Guardian as a new feed provider using RSS parsing.

Tasks

  • Implement RSS feed parsing in ArticleFetcher::getArticlesFromRssFeed() (currently a TODO returning empty collection)
  • Create Guardian parser classes (HomepageParserAdapter, ArticleParser, ArticlePageParser) following existing VRT/Belga patterns
  • Add guardian provider config in config/feed.php with RSS feed URL(s) (e.g. https://www.theguardian.com/international/rss, https://www.theguardian.com/world/rss)
  • Update StoreFeedRequest validation to allow guardian as provider
  • Register Guardian parser in ArticleParserFactory
  • Fix Belga provider config: change type from 'rss' to 'website' (prevents breakage when RSS fetching is implemented)
  • Write tests

Notes

  • The Guardian offers official RSS feeds, which is more reliable than HTML scraping
  • Belga is currently configured as type: 'rss' but actually uses website-style parsing — must fix before RSS fetching goes live or Belga will break
## Summary Add The Guardian as a new feed provider using RSS parsing. ## Tasks - [ ] Implement RSS feed parsing in `ArticleFetcher::getArticlesFromRssFeed()` (currently a TODO returning empty collection) - [ ] Create Guardian parser classes (HomepageParserAdapter, ArticleParser, ArticlePageParser) following existing VRT/Belga patterns - [ ] Add `guardian` provider config in `config/feed.php` with RSS feed URL(s) (e.g. `https://www.theguardian.com/international/rss`, `https://www.theguardian.com/world/rss`) - [ ] Update `StoreFeedRequest` validation to allow `guardian` as provider - [ ] Register Guardian parser in `ArticleParserFactory` - [ ] Fix Belga provider config: change `type` from `'rss'` to `'website'` (prevents breakage when RSS fetching is implemented) - [ ] Write tests ## Notes - The Guardian offers official RSS feeds, which is more reliable than HTML scraping - Belga is currently configured as `type: 'rss'` but actually uses website-style parsing — must fix before RSS fetching goes live or Belga will break
myrmidex changed title from Add The Guardian as a feed to Add The Guardian as an RSS feed 2026-03-08 10:12:55 +01:00
myrmidex self-assigned this 2026-03-08 11:03:53 +01:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lvl0/fedi-feed-router#37
No description provided.