ResultsMotivated.com

tt-rss: great project with high potential

<2023-12-31>

1. RSS

RSS is a minimal protocol that's so basic it doesn't suit anyone's needs. What you really need to use RSS fruitfully is something to solve the problem of syncing your RSS feeds across devices (at bare minimum.) There have been numerous attempts to sell a "managed RSS feed" over the years. I have no doubt that some of these products are worthy, but I'm not here to talk about them. Rather, I'm more interested in the free software project known as tt-rss. Fundamentally, tt-rss is a viable solution to the "Sync my RSS feeds across devices" problem that you can host yourself.

tt-rss is an appealing project from my perspective because it puts valuable data in my hands. The role of broadly-defined "data science" in the world seems to be rapidly expanding right now. Personally, data science is definitely eating my life. I'm noticing that people who never had to care about data science before now care about the field of data science. It's coming not just for the nerds, but also non-technical/non-nerd type people. This is obvious and has been the trend for many years.

2. Requirements

Let's start off with "what do you want?"

  • Sync my RSS feeds across devices
    • I want to read the news on my phone or my computer.
  • Be free software
    • I like to own my own tools as much as possible
    • I have ideas for ways to script my RSS experience that I want to try
  • Have good reader apps
    • No ads, no bullshit, really needs to be usable

That's what I'm hoping to get out of tt-rss.

3. Hosting it

I'm hosting tt-rss on an AWS EC2 t3.micro instance for 10$ / month or something. I'm a cheapskate so this stings a little bit. The cost is one of the reasons why hosting an application yourself is sometimes viewed as "bad." I have come to think that this school of thought is misguided. As an engineer, I can tell you that I have to use AWS at my job, so it'll pay off for me to get good at hosting stuff anyway. My attitude is like "Just suck it up and accept The Cloud, because The Cloud is everywhere these days, and you need it to stay competitive." If the deployment blows up, cull the instance and start over.

No time to yell at the cloud

4. Is tt-rss good?

Yes. I say it's good, but not perfect.

The API is obtuse and overly complicated. However, I've been able to get around problems that I've had with the API by generating the OPML files that I want and then importing them.

There are attempts to rewrite or fork tt-rss, and perhaps viable alternatives exist:

I haven't felt any need to jump ship from tt-rss, so I haven't even tried nor understand these projects. I wouldn't rule out the possibility that there is something better out there.

5. The future of tt-rss

tt-rss is great and I'm happy it exists. Just for fun, and for the sake of inspiring progress in this direction, I'd like to publish a few ideas for where there is opportunity to make it better.

5.1. Some simple recommender systems

Here are some random ideas for what can be done with simple auto-tagging rules:

  • When an article mentions any of the 50 US states, tag it with the name of that state
    • Similar idea: Tag it with the name of any countries that are mentioned by name
  • When an article mentions a specific programming language, tag it with the name of that programming language
  • [ blah blah blah, insert stupid Chat-GPT idea here ]

This might even be achievable with existing tt-rss features, I haven't looked into it yet to be honest. There's thousands of ways to approach this problem, and you can probably get results with certain rules-of-thumb.

5.2. Do what big tech does

Big-tech companies like WalMart, Ebay, Netflix and Amazon all have algorithms to recommend things to consumers so that they spend more money. What if we had a feature like that in tt-rss? The use cases I'm imagining are things like:

  • Show me the good articles at the top, and bury the bad ones
  • Pick out the top 5 from last week based on some criteria.

This idea is basically equivalent to "magically know what I want and give it to me." If Emacs can do it, why not tt-rss?

5.3. Ingest the RSS feed output by tt-rss into OpenSearch

tt-rss exposes an RSS feed itself that is basically the aggregate of all your curated RSS feeds. Presumably, the main application of this is so that dumber RSS readers can subscribe to it. Having used OpenSearch extensively, I know that with a little bit of plumbing it'd be possible to load that bad boy in OpenSearch and then use OpenSearch to exploit that data set. This sounds like fun to me, some please write a blog post showing me how to do this. (I'll write a blog post if I get to it first.)

5.4. Improve the discoverability of RSS feeds

The list of RSS feeds I subscribe to looks like this:

  • 5000 Random Emacs blogs, most of them not updated for 1+ years
  • 15 "real" news outlets
  • 3 organizations publishing press releases

There is a need for a better way to discover new RSS feeds that are worth subscribing to.

6. Scraping feedly

One option for finding RSS feeds is to scrape https://feedly.com/i/all. I tried this, and found that Feedly has measures to thwart scraping. With persistence, however, it is still possible to get a reasonable chunk of the data without paying. Here's a JavaScript one-liner to scrape pages such as this one to get you started:

console.log(document.location.href+"\n"+"#+begin_example\n"+[...document.querySelectorAll(".DiscoverFeed__metadata a")].map(x => x.href).join("\n")+"\n#+end_example")

The output looks like:

https://dcist.com/
https://www.popville.com/
https://ggwash.org/
https://www.washingtonpost.com/
https://www.washingtonian.com/
https://dc.eater.com/
https://www.arlnow.com/
https://thehillishome.com/
https://dc.urbanturf.com/
https://ghostsofdc.org/
http://pqliving.com/
https://www.bizjournals.com/
http://bloomingdaleneighborhood.blogspot.com/
https://www.petworthnews.org/
https://www.nbcwashington.com/news/local/
http://georgetownmetropolitan.com/
https://wamu.org/
http://www.streetsofwashington.com/
https://wtop.com/
https://capitolhillcorner.org/
https://barredindc.com/
https://northernvirginiamag.com/
https://thedcline.org/
https://www.sourceofthespring.com/
http://clarendonnights.blogspot.com/
https://www.kidfriendlydc.com/
https://www.huffpost.com/news/topic/washington-dc
https://www.titanoftrinidad.com/
https://washingtoncitypaper.com/
https://districtdig.com/
https://dcbeer.com/
https://www.alxnow.com/
https://www.washingtontimes.com/news/local/?utm_medium=RSS
https://www.reddit.com/r/washingtondc/new
https://exposedbrickdc.com/
https://www.foresthillsconnection.com/
https://www.wusa9.com/
https://moco360.media/
http://robertdyer.blogspot.com/
https://columbiaheightsinsider.com/
https://www.washingtonexaminer.com/tag/dc
http://thesouthwester.com/
https://mocoshow.com/
https://www.tysonsreporter.com/
https://www.hyattsvillewire.com/
https://www.fcnp.com/
https://www.hillrag.com/
https://boundarystones.weta.org/
https://washingtonlife.com/
https://www.ffxnow.com/

Keywords: rss

Modified: 2024-09-10 10:09:59 EDT

Emacs 29.1.50 (Org mode 9.7.6)