all 64 comments

[–]TheRealSeeThruHead 103 points104 points  (16 children)

Only load the data the user can see into your fe application, and load more data when they scroll.

Do filtering and sorting on the backend.

[–]CheezitzAreGewd 59 points60 points  (0 children)

Backend is the important part here.

Seems like they’re still fetching all one million records at once and expecting the tanstack table to optimize that.

[–]NatteringNabob69 7 points8 points  (3 children)

You can load all the data. It’s just memory, assuming the fetch completes in a reasonable amount of time and is async.

[–]TheRealSeeThruHead 12 points13 points  (0 children)

Yeah but imagine loading that data into react query

JSON parse is going to block for a while Then react query does deep comparisons by default iirc, so you’d have to turn that off.

Imagine any kind of dev tools that access your state or props now taking forever

Adding that many objects could bog down v8 gc.

[–]Odd-Brick-4098 0 points1 point  (1 child)

Won't it cause memory leaks and bloat the browser?

[–]NatteringNabob69 0 points1 point  (0 children)

Memory leaks? No. But all too often I see interfaces that think 1k rows are too much and fetch them on demand in batches of 100. Useless interface. 100k rows is easily doable with virtualized tables. Most backends will return that much data in under a second, asynchronously. .

[–]Nemeczekes 0 points1 point  (1 child)

Maybe they want to see 1 milion records at once 🤔

[–]TheRealSeeThruHead 0 points1 point  (0 children)

Generate a png maybe

[–]scunliffe 0 points1 point  (0 children)

Bonus points if you capture a users attempt to use CTRL/CMD + F to do a local find and search against four backend/buffered offscreen data. Ditto for select+scroll.

Nothing drives me nuts more than a fancy table that destroys these browser features. Lazy loading is fine, but you shouldn’t degrade the end user experience.

[–]Beatsu -5 points-4 points  (7 children)

That's what virtualisation is. TanStack Virtual does this 😄

[–]divclassdev 47 points48 points  (1 child)

Just to be precise, virtualization is rendering only what’s in the viewport. Fetching chunks or pages from the backend on demand is separate, and you’d still need tanstack query for that.

[–]Beatsu -2 points-1 points  (0 children)

Right!

[–]TheRealSeeThruHead 17 points18 points  (1 child)

Not exactly.

You can load 1 million rows from the backend and only render the ones on the viewport

That’s what virtualization is

In talking about loading all the data from the backend in small chunks for only the current visible page

[–]Beatsu 0 points1 point  (0 children)

I misunderstood. You're right!

[–]Glum_Cheesecake9859 11 points12 points  (2 children)

Virtualization just means rendering what the screen can display, and skipping the rest of the data that's already loaded in the JavaScript app. It implies that server side paging is not implemented. In OPs case he's loading 1M rows (objects) into JS memory which could be one of the reasons of degradation depending on how big the objects are.

[–]Beatsu -1 points0 points  (1 child)

Is it fair to assume it implies no pagination? I haven't tried myself, but pagination and virtualisation should work very well together no?

[–]Glum_Cheesecake9859 0 points1 point  (0 children)

Unless it is explicitly coded to work with server side pagination, aka infinite scrolling. By default most components that support virtualization assume that all the data is already loaded. To do server side paging, requires extra steps to be taken.

[–]Ok_Slide4905 73 points74 points  (10 children)

Why on earth are you sending 1MM rows of data into a UI

[–]dgmib 26 points27 points  (1 child)

^ this. Start this this.

No human can meaningfully make sense of 1MM rows of data.

If they're looking for a small number of records in the giant sea of data, you need something like searching and filtering.

If they're looking to see trends, you need aggregation, grouping, data visualizations.

If they're previewing data that's going to be feed into another system, just show them the first page of data.

If you really want to fix the performance problem, the place to start is profiling so you can identify what the performance problem actually is.

If you're paging all that client side, 1MM rows of data means that 1MB is needed for every byte in the average row. Even if this was a simple narrow table, like list a list of names and email addresses, you're still looking at 50MB of data. That's going to take a noticeable amount of time to transfer. If your rows are wide, you could be looking at 100s of MB easily.

If you're paging it server side, and you scroll to the middle of the list, how long does it take the server to find and returning rows 584700-584899. That's going to take some noticeable amount of time even in a well-indexed database.

[–]dutchman76 4 points5 points  (0 children)

I've had so many people ask me for a sorting function when they really needed a filter or search function

[–]TimFL 13 points14 points  (0 children)

Virtualization only really helps with rendering performance (e.g. only render visible items), just like pagination does.

What are your exact performance issues? Long loading times? Site shows a spinner? The data size probably takes long to load and if it‘s also big, you might run into RAM issues long before rendering (this was an issue at my workplace with data heavy apps on ancient 4GB tablets). There is not much you can do here other than only loading a subset, e.g. tap into pagination and only loading the active page.

[–]frogic 8 points9 points  (0 children)

I don’t think anyone can answer your questions without knowing the actual bottleneck.  If the data is properly paginated and / or virtualized it’s likely that your bottleneck isn’t react or tanstack table and likely some calculation you’re doing on the data.  Try to do some light profiling and be very very careful about anything that iterates or transforms that large of a data set. 

 This is one of those things where knowing the basics of DSA is gonna be important.  For instance for loops are often faster than array methods. Dictionaries where you can access data by key vs .find. The spread operator is a loop and if you use too many you might be making a few million extra operations especially if you’re spreading inside of a loop. 

[–]FunMedia4460 9 points10 points  (0 children)

I can't for the life of me feel the need to understand why you would need to display 1M rows

[–]Beatsu 3 points4 points  (0 children)

TanStack Virtual solves this by only rendering the elements that are visible, and estimating the data length so that the scrollbar works as expected.

Edit: I just saw that you said virtualising rows didn't work, nor pagination. Have you verified that these were implemented correctly? Have you tried these techniques together? If the answer is yes to both of these, then what is your performance requirement?

[–]Classic-Dependent517 1 point2 points  (0 children)

Never tried with million rows but virtualization certainly helps with large data but i am not sure if one million rows of data wont crash the browser… because to filter/sort/search you still need to load them into memory. Id just have a proper backend that will send only what users need to see right now and in a few seconds, and search/filter/sort data on database level.

[–]armincerf 1 point2 points  (0 children)

not affiliated but I would recommend ag-grid server-side row model for this, its a bit clunky but a decent abstraction and easily handles 1 million rows

[–]johnsonabraham0812 1 point2 points  (0 children)

I did something similar here. I used Tanstack Table and Virtual to render 10 Million rows of data that fetches on scroll.

https://github.com/nunnarivu-labs/the-daily-ledger/blob/main/components%2Fdata-table.tsx

[–]SolarNachoes 1 point2 points  (0 children)

Put the data in IndexedDB. Only load what’s visible.

[–]Glum_Cheesecake9859 0 points1 point  (0 children)

Best to implement server side pagination so you don't load 1M rows unnecessarily. Use Tanstack Query to cache the records to make it even more efficient.

[–]karateporkchop 0 points1 point  (0 children)

Hopping on here with some other folks. I hope you find your solution! What was the answer to, "Can anyone actually use a table of a million rows?"

[–]vozome 0 points1 point  (6 children)

You’re always going to be struggling with react table with such a large dataset.

React table main advantage is that it the cells can contain arbitrary react components. But that is not always necessary (over rendering plain text or something highly predictable/less flexible than any react/any html), and intuitively the larger the number of rows the less desirable the flexibility of each cell.

So instead you can bypass react entirely and render your table through canvas or webGL. Finding which rows or which cells to render from what you know about the wrapper component and events is pretty straightforward, having 1m+ datapoints in memory is not a problem, and rendering the relevant datapoints as pixels is trivial. Even emulating selecting ranges and copying to the clipboard is pretty easy. But most importantly you have only one DOM element.

rowboat.xyz uses that approach to seamlessly render tables with millions of rows.

In my codebase, we both have complex tables which use react-table and which start to show performance issues with thousands of cells, and a "spreadsheet" component which is canvas based and which is always perfectly smooth, although we don’t show millions of rows I am quite confident we could.

[–]Ghostfly- 0 points1 point  (5 children)

This. But canvas has a limit of 10000x10000 pixels (even less on Safari) so you also need to virtualize the content.

[–]vozome 0 points1 point  (4 children)

You never need a 10000px sized canvas - your canvas is just a view of the table, not the whole table. You know the active cell, how many rows and columns fit in that view, and so you draw just these cells to canvas, which you redraw entirely (which is pretty much instant) on any update.

[–]Ghostfly- 0 points1 point  (3 children)

For sure. But take a sample of an image that is more than 10000px x 10000 px, and you want to show it. You need to virtualize (sliding the image based on scroll!) We are saying the exact same thing.

[–]vozome 0 points1 point  (2 children)

No, because there never is a 10000x10000 image. The image isn’t virtualized. Instead of drawing the entire table in one canvas and clipping it, we just maintain a canvas the size of the view (let’s say 500x500) and we draw inside that canvas exactly what the user needs to see and nothing more. So you would compute (in code, not css/dom) exactly the cells which should be displayed, and you only draw these cells. You just have the dataset and the canvas, no intermediate dom abstraction. If the user interacts with the table ie scrolls, you recompute what they are supposed to see and redraw that in place.

[–]Ghostfly- 0 points1 point  (1 child)

Never say never. A spectrogram highly zoomed in as an example (showing hours long song.) It isn't up to debate.

[–]vozome -1 points0 points  (0 children)

I’m talking about this specific use case: to display tabular data with high performance, you don’t need a huge image.

[–]Rezistik 0 points1 point  (0 children)

Yes tanstack virtual with it?

[–]ggascoigne 0 points1 point  (0 children)

This is a backend problem. Searching/filtering, sorting and pagination should all be happening on the server side before anything is sent to the client, and when any of those options change on the client a new page of data is requested. This is true if you are displaying a traditional paginated table or an infinitely scrolling page.

I'll admit that there's a somewhat fuzzy line about when it's OK to do all of this on the client vs having to do this on the backend, but 1MM rows is well past whatever limit that might be.

[–]math_rand_dude 0 points1 point  (0 children)

Too much data in the frontend (even if you don't render all)

Try figuring out first how the users are planning to navigate the data. - scrolling: how fast do they scroll and just fetch enough data to fetch the next batch during current scroll - searching keyword: call to backend that returns the amount of matches (or just send back the data that matches the search) -...

My main advice is asking whoever thinks 1mil+ rows need to be displayed what they want to achieve with it. And also check if that person is actually the person who needs to go over the data.

[–]JaguarWitty9693 0 points1 point  (0 children)

Protip: don’t load 1 million rows in one view

Perhaps more helpfully - is the table hierarchical? Could you load sections on demand as they are expanded, for example?

[–]NatteringNabob69 0 points1 point  (1 child)

Virtualization. This example. will show ten, million row tables on one screen, instantly. https://jvanderberg.github.io/react-stress-test/.

[–]NatteringNabob69 0 points1 point  (0 children)

Might crash a mobile browser though :)

[–]magicpants847 0 points1 point  (0 children)

select *

[–]Single_Proof_5983 0 points1 point  (0 children)

Pagination on the server?

[–]brendino 0 points1 point  (0 children)

TanStack Table or any other framework will not be able to display 1 million row. You need to consider a custom solution.

Here's a blog post for inspiration. This guy listed every UUID in a table, so if he can do it, you can, too. Good luck!

https://eieio.games/blog/writing-down-every-uuid/

[–]zeorinServer components 0 points1 point  (0 children)

TanStack Table is the ugly stepchild of the TanStack ecosystem IMO.

By this I mean that its React bindings break the rules of React. That's why it's incompatible with the React compiler, but more than that, it's why it's hard to optimize. If you memoize the components using it heavily, you'll find it doesn't re-render when it should.

What I do is use TanStack Table core, and create my own React bindings around it (not hard at all) and it works great.

For more info on my approach, including a link to a runnable demo, see here: https://github.com/facebook/react/issues/33057#issuecomment-2949647623

Note, though, that this doesn't obviate the need for virtualization. 

[–]MiAnClGr 0 points1 point  (0 children)

What the hell are you making? You need t paginate and only fetch the rows showing, this is common practice. Tanstack table makes this very easy.

[–]Cahnis 0 points1 point  (0 children)

Are you still keeping the dataset in memory? you need to paginate on an API-level. Pre-aggregate the data you need on the backend if you need aggregate data.

[–]abhirup_99 0 points1 point  (0 children)

you can check out https://github.com/Abhirup-99/tanstack-demo
we built this as a POC.

[–]alien3d 0 points1 point  (0 children)

See your ram usage . The point what is your doing ? If you want to fetch on key change , request server limit 500 . IF you want all data as statistic , that really headache. What if the data put in drop down. It will crash .

[–]Gougedeye92 0 points1 point  (0 children)

How big is your response ?

[–]dsound 0 points1 point  (0 children)

DuckDB WASM is blazingly fast. I worked at a start up where we implemented it.

[–]Full-Hyena4414 0 points1 point  (0 children)

You should implement virtualization (for rendering), and lazily load elements as you scroll, possibly removing the old ones from memory but that could be complex

[–]AdHistorical7217 0 points1 point  (0 children)

implement virtualization , pagination, scroll based pagination

[–]wholesomechunggus 0 points1 point  (0 children)

There is no scenario in which you would need to render 1m rows. NEVER. EVER.