this post was submitted on 07 Sep 2021
27 points (100.0% liked)
Open Source
32665 readers
924 users here now
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Useful Links
- Open Source Initiative
- Free Software Foundation
- Electronic Frontier Foundation
- Software Freedom Conservancy
- It's FOSS
- Android FOSS Apps Megathread
Rules
- Posts must be relevant to the open source ideology
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon from opensource.org, but we are not affiliated with them.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I am not happy with it yet but that is because I want it to be perfect and it never will be but I do find that I engage with content at a larger scale and more varied than I do when I go to a single source. I am using the nltk features from newspaper for key word extraction + the trending sources to monitor a few hundred sources. Currently I store all the meta data + links ( urls ) + wikipedia links in a pandas dataframe ( which is becoming a problem ) and visualize trends and data about news in a jupyter notebook. For the enhanced summaries + named entity extraction I am using spacy (https://spacy.io/) from there I use SPARQL ( https://en.wikipedia.org/wiki/SPARQL ) to query dbpedia (https://en.wikipedia.org/wiki/DBpedia) to augment entity knowledge ( ex: adding data about the size , industry of a company or summary explanations of scientific concepts, etc ). The named entity matching and augmentation is the portion that needs the most work. Newspaper has some nice caching features so I query all sources everyday but only pull in new articles.
I might play around with moving portions of the data into a graph db and some better ways to query based on concepts. Right now I just write python code to query the pandas DB based on different parameters.
Wow that's quite developed.
So you consume content in a jupyter notebook? Or you're interfacing this with a RSS reader?
From what I read the next step is to run it in a real database.
I consume analytics and identify topics I am interested in via jupyter sometimes i just use ipython if I don't want to leave the terminal -- I need to build more of a frontend but I've not got there yet. I mostly read the articles in the terminal. And yup my plan is to find a good db but I am not sure what to use yet.
You could probably repackages your upgraded feed into a RSS format that you serve locally. But that can be more hassle than it may worth.
Thanks for the info it encouraged me to try that sometime :)