this post was submitted on 07 Sep 2021
27 points (100.0% liked)

Open Source

32381 readers
827 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago
MODERATORS
 

Hi there, looking for an alternative to news.google.com that just simply isn't a Google product. I know it's not open source per say, but just curious.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 6 points 3 years ago (2 children)

On my phone i use feeder ( android, not sure if it is on ios ). On my computer I use newspaper3k ( https://newspaper.readthedocs.io/en/latest/ ) -- I built out some additional summary tools and nltk tools that allow me to find article on similar topic from sources with different bias + some named entity extraction that easily joins into dbpedia. I intend to contribute the additional features I've added but haven't done so yet as the code is rough.

[–] [email protected] 3 points 3 years ago (1 children)

I've been thinking about using nlp to deal with my feeds.

Are you happy with your solution ? Can you share a bit more about your pipeline?

[–] [email protected] 4 points 3 years ago* (last edited 3 years ago) (1 children)

I am not happy with it yet but that is because I want it to be perfect and it never will be but I do find that I engage with content at a larger scale and more varied than I do when I go to a single source. I am using the nltk features from newspaper for key word extraction + the trending sources to monitor a few hundred sources. Currently I store all the meta data + links ( urls ) + wikipedia links in a pandas dataframe ( which is becoming a problem ) and visualize trends and data about news in a jupyter notebook. For the enhanced summaries + named entity extraction I am using spacy (https://spacy.io/) from there I use SPARQL ( https://en.wikipedia.org/wiki/SPARQL ) to query dbpedia (https://en.wikipedia.org/wiki/DBpedia) to augment entity knowledge ( ex: adding data about the size , industry of a company or summary explanations of scientific concepts, etc ). The named entity matching and augmentation is the portion that needs the most work. Newspaper has some nice caching features so I query all sources everyday but only pull in new articles.

I might play around with moving portions of the data into a graph db and some better ways to query based on concepts. Right now I just write python code to query the pandas DB based on different parameters.

Are you happy with your solution ? Can you share a bit more about your pipeline?

[–] [email protected] 3 points 3 years ago (1 children)

Wow that's quite developed.

So you consume content in a jupyter notebook? Or you're interfacing this with a RSS reader?

From what I read the next step is to run it in a real database.

[–] [email protected] 2 points 3 years ago (1 children)

I consume analytics and identify topics I am interested in via jupyter sometimes i just use ipython if I don't want to leave the terminal -- I need to build more of a frontend but I've not got there yet. I mostly read the articles in the terminal. And yup my plan is to find a good db but I am not sure what to use yet.

[–] [email protected] 3 points 3 years ago

You could probably repackages your upgraded feed into a RSS format that you serve locally. But that can be more hassle than it may worth.

Thanks for the info it encouraged me to try that sometime :)

[–] [email protected] 1 points 3 years ago* (last edited 3 years ago)

Yep Feeder is awesome and it is on iOS