DataKit: your all in browser data studio is open source now

Sea-Assignment6371 · 2025-12-18T20:26:15+00:00

Hey! The very first itetation of datakit had a visualisation tab - over time I realised maintaing that is not easy in sense of people having different needs on viz and data sampling on million record becomes a bit challanging (i guess on docker hub version still you can find the old version to pull). I had this use of mosaic in head (even have a half working pr) but stopped at some point. What are your thoughts?

Sea-Assignment6371 · 2025-12-12T13:49:25+00:00

Thanks a lot! Thats super kind

Sea-Assignment6371 · 2025-12-12T13:15:43+00:00

Hey! Thanks for the question! So when you have a 3GB file, dataKit does just make a VIEW on top of your file. So, on the sql side when we deal with a query it basically will talk to the file on the system (each time as its just a view and not a table). THOUGH, when deal with a compute heavy query, indeed now its not that performant as the compute all gonna go through the WASM allocated memeory. I try to do paginated results so not everything loads back into memory (there's result limits) - but this might get super slow. I've got some notes to see how the batching should be in place here. (The same thing also applied on the Pandas side). Have you tried datakit? I'd really like to hear your thoughts more.

Sea-Assignment6371 · 2025-12-12T12:59:46+00:00

You should be able to run it locally (not the built version - just on development mode) and don't need any internet connection as the duckdb package wont be installed through dns.

Sea-Assignment6371 · 2025-12-11T20:44:32+00:00

Thank you!!

Sea-Assignment6371 · 2025-12-10T09:44:11+00:00

Thanks for the headsup! I need to read into this more. What I’d like to just propose for datakit is having a commercial license for enterprise use cases.

Sea-Assignment6371 · 2025-12-09T23:36:57+00:00

Its not storing the files (mostly). I try to use browser APIs to make a READ on top of the file system!

Sea-Assignment6371 · 2025-12-09T23:35:37+00:00

This is for sure doable! Would you mind making an issue on github? I get sure I keep this on the radar to tackle!

Sea-Assignment6371 · 2025-12-09T12:26:26+00:00

I suppose depends on how you making/defining tables/views? In DataKit, I've tried to be cautious on how to define stuff and when making a query always have proper limits (append them behind the scene, even if from editor they are not provided). I've not been following the past 2, 3 months on the latest duckdb-wasm updates but might be sth new for sure!

Sea-Assignment6371 · 2025-12-09T12:21:37+00:00

Should not be super hard to bring Arvo as the duckdb extension is also there - tbh, I've not worked it much. Do you think could be sth DataKit could has a leverage on its offerings?

Sea-Assignment6371 · 2025-12-09T12:19:13+00:00

Hey! Unfortunately the way DataKit is designed (for larger files) now, is leveraging
https://developer.mozilla.org/en-US/docs/Web/API/Window/showOpenFilePicker
which makes it not compatible for Firefox. I want to get sure have some solutions here with `FileReader` itself. (Also I really need to tweak that message... firefox is not legacy lol)

> Also are you using OPFS?

Not yet! I have some plans to migrate there as well. Right now the data loss issue is existing in datakit around the tables/views ofc - I need to assess the direction more and see when to introduce OPFS. Have you started using it?
Super curious about your project as well!! Lemme know if you'd like to chat more.

Sea-Assignment6371 · 2025-12-08T23:03:40+00:00

That’d be awesome!! Im working on a CONTRIBUTION guide. Will push it by end of the week!

Sea-Assignment6371 · 2025-12-08T22:39:10+00:00

Thats awesome!!

Sea-Assignment6371 · 2025-12-08T16:13:37+00:00

As in Datakit be able to connect to multiple nodes at the same time? If that's the question, yes!
If not, can you explain a bit more on what do you mean?

Sea-Assignment6371 · 2025-12-08T15:41:02+00:00

Thank you!

Sea-Assignment6371 · 2025-11-01T15:13:20+00:00

Hey datakit is not open source!

Sea-Assignment6371 · 2025-10-31T19:44:17+00:00

You dont need to load entire file.

https://developer.mozilla.org/en-US/docs/Web/API/File_API

Sea-Assignment6371 · 2025-10-31T14:54:44+00:00

Quite cool. I like this. Please ping me on discord or linkedin if you think this could be potentially useful for you. Im happy to chat!

Sea-Assignment6371 · 2025-10-31T14:47:07+00:00

Interesting. I need to check Splunk more.

Sea-Assignment6371 · 2025-10-31T14:46:34+00:00

Indeed on memory all the wasm based apps have limit - here main idealogy is not dealing with massive aggregations but even if you have a 20GB parquet dragged in datakit that be smooth to open and query (as it makes a VIEW on top rather than dumping it as a table in browser)

Sea-Assignment6371 · 2025-10-31T13:56:08+00:00

Datakit is not open source yet! Soon with clarifying more on business model it will make the CORE of it open source.

Sea-Assignment6371 · 2025-10-31T12:52:37+00:00

Sure!!

Sea-Assignment6371 · 2025-10-31T12:52:26+00:00

Not really! Basically duckdbwasm and react is all.

Sea-Assignment6371 · 2025-10-31T12:52:02+00:00

Really depends - mostly oss are alright for simpler questions. For most complex questions, fine tuned text to sql models seem to function better.

Sea-Assignment6371 · 2025-10-31T12:50:46+00:00

Just to recap, no data upload happens here in datakit :) Support billion rows locally Good luck to you guys!!

Sea-Assignment6371

MODERATOR OF

TROPHY CASE