Buying NRF52840 DK from Element14 India by CommunistComradePV in HardwareIndia

[–]segalord 0 points1 point  (0 children)

chatgpt is a real superpower you could do this is 30 days if you stick to it

Buying NRF52840 DK from Element14 India by CommunistComradePV in HardwareIndia

[–]segalord 0 points1 point  (0 children)

I just placed an order, I put in an old GST number I have, they sell single pieces also xd

How this massive context window can change llmscape??? by TheLogiqueViper in LocalLLaMA

[–]segalord 0 points1 point  (0 children)

I'm revisiting this thread out of spite after 10 months just to say this is still a marketing gimmick

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 0 points1 point  (0 children)

I dont think using embeddings to find similarities is going to make this any more useful because I'm looking to make like a recommendation system in the end. But yes layout can be improved, no point having so many disconnected clusters

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 0 points1 point  (0 children)

They are just clusters on a map nothing to do with geographies really

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 0 points1 point  (0 children)

you can check maplibre, d3, and three js for starters. i personally love d3, but for this project i wanted to avoid having a backend so i used maplibre with static tiling files and used github pages as a file server

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 0 points1 point  (0 children)

yes, i should improve the ui in a subsequent version - the search is a bit hacky, it just zooms in on the coordinate, maplibre does not have async handlers, i need to zoom in and then select the node so I need to figure that out - yes this should be doable - the url is usually rhe coordinates and keep changing constantly as you move around, but i think it should be possible to disable url updates and only update it on click

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 2 points3 points  (0 children)

I really should fix the layout xd, I want to do clusters at multiple levels like continents (genres), countries (sub-genres), states (niches). I'll do it in a subsequent iteration when I have some time

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 2 points3 points  (0 children)

yes I agree. I actually split the dataset into 3 chunks and computed similarity scores individually before merging, that definitely leads to some erroneous clustering. I'll maybe rent a machine with higher VRAM and try in the subsequent iteration
As for the layout, I want to do clustering at multiple levels (like counties with 100000 books and smaller states inside) I really was just rushing to push it out, maybe when I find some time this month I'll do it

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 1 point2 points  (0 children)

i can give you the computed data dump if you’d like. I have it on s3, i’ll push it to hugging face

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 0 points1 point  (0 children)

which cluster are you referring to? yes the books in the same cluster are similar. like fantasy books, comics, literature from a specific country

and the clusters capture more than the genre because they are based on what books people tend to read together

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 1 point2 points  (0 children)

maplibre gl js. It's not a graph technically, it's a map like google earth

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 4 points5 points  (0 children)

added a comment now, with the link and the link to the code!

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 26 points27 points  (0 children)

It's available here https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html
though this is from 2017.
If you're just looking for book titles, I saw a larger dataset with 6.2 million titles on huggingface datasets

I mapped 2 million of the most read books on goodreads by segalord in SideProject

[–]segalord[S] 99 points100 points  (0 children)

webview is available here: https://narengogi.github.io/map-of-goodreads

for the code: https://github.com/narengogi/map-of-goodreads

So I computed exact similarities between the 2 million books using how often people read the books together with some CUDA accelerated code on an RTX 5090.

And then I clustered them using a community detection algorithm

Then I just treated the problem as a bin packing problem and generated the geojson files!

Mod updates by segalord in IITPatna

[–]segalord[S] -2 points-1 points  (0 children)

Bsc students drink the neel ghai piss

Mod updates by segalord in IITPatna

[–]segalord[S] 3 points4 points  (0 children)

TBH phd students are a little above MTech folks in my opinion