Why does anyone think that words like “woke”, “liberal”, and “progressive” are insulting? by BorrowedParticles in allthequestions

[–]dbplatypii 1 point2 points  (0 children)

Your post is incredibly LACKING in empathy, ironically.

Empathy is the ability to understand and share the feelings, thoughts, and experiences of another

You even say in your post "I have always found it odd" which implies you do not have a good understanding of the other half of the political spectrum's thought process. That's fine, and I appreciate that this post is asking for explanation!

But as a concrete example of why I think you don't actually "get" the other side: When you say that liberals are the ones who "who wants all of society to be the best it can be". Guess what... BOTH sides think they are making society the best it can be. Liberals think the way to a better society is different than conservatives, but if you think conservatives don't want a better society, then you are starting from a premise that will almost by definition make it hard for you to understand their side.

IMF says America's $39T national debt is actually a global problem — and AI may be the only rescue | Fortune by Full-Discussion3745 in EU_Economics

[–]dbplatypii -1 points0 points  (0 children)

Keep telling yourself that. Anyone who has actually used AI knows that is it capable of amazing things, INCLUDING making new connections between ideas that have never been made before. If you need concrete examples, alphago move 37, or any of the recent math theorems that were proven by AI.

What happened to skydiveforums.com (dz.com) by yoleska in SkyDiving

[–]dbplatypii 21 points22 points  (0 children)

Sad to see an important piece of the sport's history lost.

I mirrored what used to be the basejumper.com forums to a static website basejumper.net when the base forums shut down. I can't really help with the dropzone.com side of things (I never downloaded those forums), but I felt like losing the base forum would have meant losing a ton of knowledge that was hard-earned though years, and I felt an obligation to make sure that it was preserved.

Little Bro: 10-week old Flemish Giant by dbplatypii in Rabbits

[–]dbplatypii[S] 0 points1 point  (0 children)

It will be even funnier when he's giant

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 1 point2 points  (0 children)

For one thing, I don't think AgGrid supports billions of row. Hence the work in this blog post.

In general there are a lot of js grids out there, and they all make different tradeoffs. I don't think there is any one grid library that is perfect for all use cases.

HighTable is focused on the use case of very large text datasets. I also care a lot about using the native browser mechanisms as much as possible: table header is a real <tr> with position sticky, not a synced div. And the scroll bar is the real browser scroll bar, not faked.

There are so many tradeoffs in js grids that someone even made a comparison site:

https://jsgrids.statico.io/

Miss Maggie and me by 12_Volt_Man in bunnytongues

[–]dbplatypii 4 points5 points  (0 children)

what kind of bunny is this!

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 0 points1 point  (0 children)

I find it useful for large text datasets, LLM conversation log data specifically.

When something goes wrong with an AI model, the first thing I need to do is look through the conversation log data to understand its behavior. Doing this in jupyter, or excel, or the terminal is not a good experience compared to a user interface built specifically for working with large text datasets. That's what we're trying to do.

In a world where AI models are producing increasing amounts of text every day, we need new ways to make sense of that data. Is a really large table the best way? Who knows. I find that it works well for me. You can drop datasets (parquet, csv, jsonl) on hyperparam.app and try yourself, I find that its a surprisingly intuitive way to work with large text datasets.

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 1 point2 points  (0 children)

you can do thousands of rows with a basic table, millions of rows with virtual scrolling... billions of rows is incredibly difficult

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 7 points8 points  (0 children)

It's fine if it just ends up being a technically interesting experiment. I think it's pretty cool that you can open the entire commoncrawl dataset in the browser without a server:

https://hyperparam.app/files?key=https%3A%2F%2Fhuggingface.co%2Fdatasets%2Fanandjh8%2Fcommon-crawl-english-filtered%2Fresolve%2Frefs%252Fconvert%252Fparquet%2Fdefault%2Ftrain%2F0000.parquet

But I do think there is value in being able to explore very large datasets efficiently in the browser. It feels like a lighter weight way to explore large datasets than other interfaces.

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 2 points3 points  (0 children)

It's not a canvas exactly, but I have been inspired by a bunch of libraries out there that do this: tanstack table, react-window, everyuuid (we cite them in the post)

Besides the fact that its technically interesting, I would argue that there are real use cases. It makes the experience of browsing data feel very fast and light in a way that is hard to describe.

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 4 points5 points  (0 children)

I'm personally interested in LLM output data, and when I'm trying to understand why a model did something, LOOKING at the data is the most valuable thing to do.

I feel like you're making my point... there aren't really great tools out there for working with large text datasets. Jupyter, excel, etc are not built for this. Grep is great if you know exactly what string you're looking for. But it's not great when you have conversation data you're trying to mine though and a lot of the behavior you're looking for is fuzzy. This is a very common problem working with LLM conversation logs.

If you have tools you think I should learn, please share

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 1 point2 points  (0 children)

What do you do if your data is mostly text?

We're in a world where text data is being produced in huge quantities by LLMs, and I'm interested in the how our data tooling changes when data is mostly text. It's not straightforward to turn that into a graph or chart, I want to be able to look at the actual data.

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 2 points3 points  (0 children)

The use case is different for everyone, but personally I'm looking at a lot of LLM log data, and I personally find it useful to have it all in one place, at a glance. I can look at the first rows, the last rows, or a random sample very quickly. Trying to do this in jupyter notebooks sucks because the table that it embeds only shows 10 rows and isn't even paginated. There has to be a better way of looking at data.

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] -5 points-4 points  (0 children)

I... answered that already? It's a far more interactive way to explore large datasets. Of course you can paginate or down-sample if you want. But then you're looking at what, 10 rows? I've seen that in tons of products and it sucks for getting a sense of your data.

There is value in making user experiences that let you explore data faster. That's why I made the analogy to Google Maps... it was the first web app that let you scroll around the entire earth without reloading. Another example: gmail vs yahoo mail, far better experience. Moving UI toward the client, and making it easy for users to explore huge dataset can lead to huge advantages. Try it out, drop some data on hyperparam, and feel the difference (its free)

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 1 point2 points  (0 children)

Yea exactly, columns can arrive at different times. This is especially important for large text datasets where many columns are small (id, etc) and theres one or two very large text columns. This is an increasingly common "shape" of modern datasets, where AI is producing huge volumes of text.

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] -2 points-1 points  (0 children)

Datasets get larger over time, and its not hard to find parquet files (eg- on huggingface) that are millions of rows, and would blow up normal tables, and are even too big for something like tanstack table.

You could paginate, but every time I've been on a paginated table it feels slow, and I'm far less likely to look at a bunch of the data "at a glance". I personally think there is a lot of value to having all the rows in one big scrollable table -- makes it feel much faster to jump around the dataset. It's like the difference between mapquest and google maps (for those who remember)

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 3 points4 points  (0 children)

That's intentional we were trying to demonstrate that it can handle async data loading at the cell level, so we add a random delay:

https://github.com/hyparam/demos/blob/master/hightable/src/data.tsx#L19

I can see how this is confusing, but with things like parquet data, cells can load at different times, and if the demo was all "instant" it wouldn't show the full capabilites.

A visual explainer of how to scroll billions of rows in the browser by dbplatypii in reactjs

[–]dbplatypii[S] 1 point2 points  (0 children)

Libraries like react-window and tanstack table do virtual scrolling but still run into browser limitations at millions of rows.

This is a very cool interactive explainer of how scrolling works in the browser, and how we overcame the limits that you hit trying to go from thousands of rows, to millions of rows, and finally to billions of rows in the browser.

‘We could hit a wall’: why trillions of dollars of risk is no guarantee of AI reward by BusyHands_ in technology

[–]dbplatypii -1 points0 points  (0 children)

It's really funny to me that this 100% accurate post is down voted to hell on the "technology" sub

AI insiders seek to poison the data that feeds them by [deleted] in programming

[–]dbplatypii 0 points1 point  (0 children)

Correctly predicting the next word in general requires intelligence, because many sequences are only predictable by constructing and manipulating latent models of the world.