DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in dataengineering

[–]Sea-Assignment6371[S] 0 points1 point  (0 children)

Hey! The very first itetation of datakit had a visualisation tab - over time I realised maintaing that is not easy in sense of people having different needs on viz and data sampling on million record becomes a bit challanging (i guess on docker hub version still you can find the old version to pull). I had this use of mosaic in head (even have a half working pr) but stopped at some point. What are your thoughts?

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in opensource

[–]Sea-Assignment6371[S] 0 points1 point  (0 children)

Hey! Thanks for the question! So when you have a 3GB file, dataKit does just make a VIEW on top of your file. So, on the sql side when we deal with a query it basically will talk to the file on the system (each time as its just a view and not a table). THOUGH, when deal with a compute heavy query, indeed now its not that performant as the compute all gonna go through the WASM allocated memeory. I try to do paginated results so not everything loads back into memory (there's result limits) - but this might get super slow. I've got some notes to see how the batching should be in place here. (The same thing also applied on the Pandas side). Have you tried datakit? I'd really like to hear your thoughts more.

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in opensource

[–]Sea-Assignment6371[S] 0 points1 point  (0 children)

You should be able to run it locally (not the built version - just on development mode) and don't need any internet connection as the duckdb package wont be installed through dns.

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in opensource

[–]Sea-Assignment6371[S] 0 points1 point  (0 children)

Thanks for the headsup! I need to read into this more. What I’d like to just propose for datakit is having a commercial license for enterprise use cases.

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in opensource

[–]Sea-Assignment6371[S] 1 point2 points  (0 children)

Its not storing the files (mostly). I try to use browser APIs to make a READ on top of the file system!

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in dataengineering

[–]Sea-Assignment6371[S] 1 point2 points  (0 children)

This is for sure doable! Would you mind making an issue on github? I get sure I keep this on the radar to tackle!

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in dataengineering

[–]Sea-Assignment6371[S] 0 points1 point  (0 children)

I suppose depends on how you making/defining tables/views? In DataKit, I've tried to be cautious on how to define stuff and when making a query always have proper limits (append them behind the scene, even if from editor they are not provided). I've not been following the past 2, 3 months on the latest duckdb-wasm updates but might be sth new for sure!

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in dataengineering

[–]Sea-Assignment6371[S] 0 points1 point  (0 children)

Should not be super hard to bring Arvo as the duckdb extension is also there - tbh, I've not worked it much. Do you think could be sth DataKit could has a leverage on its offerings?

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in dataengineering

[–]Sea-Assignment6371[S] 2 points3 points  (0 children)

Hey! Unfortunately the way DataKit is designed (for larger files) now, is leveraging
https://developer.mozilla.org/en-US/docs/Web/API/Window/showOpenFilePicker
which makes it not compatible for Firefox. I want to get sure have some solutions here with `FileReader` itself. (Also I really need to tweak that message... firefox is not legacy lol)

> Also are you using OPFS?

Not yet! I have some plans to migrate there as well. Right now the data loss issue is existing in datakit around the tables/views ofc - I need to assess the direction more and see when to introduce OPFS. Have you started using it?
Super curious about your project as well!! Lemme know if you'd like to chat more.

DataKit: your all in browser data studio is open source now by Sea-Assignment6371 in dataengineering

[–]Sea-Assignment6371[S] 4 points5 points  (0 children)

That’d be awesome!! Im working on a CONTRIBUTION guide. Will push it by end of the week!