Open Source

32665 readers

924 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

[email protected]

Alternative to news.google.com (lemmy.ml)

submitted 3 years ago by [email protected] to c/[email protected]

33 comments fedilink hide all child comments

Hi there, looking for an alternative to news.google.com that just simply isn't a Google product. I know it's not open source per say, but just curious.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 4 points 3 years ago* (last edited 3 years ago) (1 children)

I am not happy with it yet but that is because I want it to be perfect and it never will be but I do find that I engage with content at a larger scale and more varied than I do when I go to a single source. I am using the nltk features from newspaper for key word extraction + the trending sources to monitor a few hundred sources. Currently I store all the meta data + links ( urls ) + wikipedia links in a pandas dataframe ( which is becoming a problem ) and visualize trends and data about news in a jupyter notebook. For the enhanced summaries + named entity extraction I am using spacy (https://spacy.io/) from there I use SPARQL ( https://en.wikipedia.org/wiki/SPARQL ) to query dbpedia (https://en.wikipedia.org/wiki/DBpedia) to augment entity knowledge ( ex: adding data about the size , industry of a company or summary explanations of scientific concepts, etc ). The named entity matching and augmentation is the portion that needs the most work. Newspaper has some nice caching features so I query all sources everyday but only pull in new articles.

I might play around with moving portions of the data into a graph db and some better ways to query based on concepts. Right now I just write python code to query the pandas DB based on different parameters.

Are you happy with your solution ? Can you share a bit more about your pipeline?

[–] [email protected] 3 points 3 years ago (1 children)

Wow that's quite developed.

So you consume content in a jupyter notebook? Or you're interfacing this with a RSS reader?

From what I read the next step is to run it in a real database.

[–] [email protected] 2 points 3 years ago (1 children)

I consume analytics and identify topics I am interested in via jupyter sometimes i just use ipython if I don't want to leave the terminal -- I need to build more of a frontend but I've not got there yet. I mostly read the articles in the terminal. And yup my plan is to find a good db but I am not sure what to use yet.

[–] [email protected] 3 points 3 years ago

You could probably repackages your upgraded feed into a RSS format that you serve locally. But that can be more hassle than it may worth.

Thanks for the info it encouraged me to try that sometime :)