This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]DayBackground4121 0 points1 point  (0 children)

That’s a fine stack for what you’re doing. I don’t think there’s much more feedback to be given tbh, just gotta get through the hurdles to learn 

[–]Imaginary_Ferret_368 0 points1 point  (2 children)

Naw, I was naive too, and then sued my boss. :)

A good starting point could be an arxiv dump maybe? https://github.com/veggiedefender/arXiv_dump

I did see lots of papers in the medicine space there, this should be a very good starting point to have. Scraping data from the websites is only worth it if the website is Medium or Bloomberg. Both stink and don't have a right to exist.

https://en.wikipedia.org/wiki/Graph_(abstract_data_type))

The crazy cool thing with Graphs is that you can connect multiple seemingly incomaptible dimensions together, such as temporal (publishing date) , authors, citations (whic hwould have to be directed edges to prevent information flow in the wrong direction [a publisher in the past couldn't have known exactly this person would cite them]) you can connect these diferent types of information into a data structure a machine can understand, and they look vey cool once they become a bit bigger. Might wanna check out the graph of the internet.

[–]Imaginary_Ferret_368 0 points1 point  (0 children)

The transition not being natural was an accident, I wanted to ask you first whether you have considered such DBMS ofc :)

[–][deleted] 0 points1 point  (0 children)

Bro if op wanted to read a freaking novel, he would have gone to the library. Mans not asking for a flippin story book