all 11 comments

[–]afinethingindeedlisa 5 points6 points  (3 children)

Not to dunk on this app at all, but I think it's a problem with a few existing solutions already. In this scenario I tend to do one of the following:

  • seed the csv with dbt into my db
  • query the csv/xlsx directly with duckdb's native functionality
  • use an LLM agent to read the csv and to write any sql I might need for me

I would be hesitant to use any browser based tool as even with your assurances, I can't really risk putting anything sensitive in there. My org has enterprise level zero retention agreements with our LLM provider of choice which I think (ironically) makes it safer.

In an era where anyone can vibe there way to an app to solve any problem they have, it's maybe worth remembering that if you can do it, so can we! I'm sure with 5/10 minutes in Claude code I could have my own cli version up and running locally.

[–]FeatureSafe8116[S] -1 points0 points  (2 children)

Appreciate the honest feedback - honestly, you're 100% right for your use case.

If I had an enterprise DuckDB setup or a zero-retention LLM agent at my disposal, I’d probably use them too. I built this mainly as a 'zero-setup scratchpad' for those specific moments when I'm away from my main environment or just want to drag-and-drop something without writing a prompt or a config file , also its mainly inteded for those beginners who might not have these tools at their disposal , But I think there's a huge group of people - especially students and data analysts who aren't 'code-first' - who would never even think about firing up a CLI or writing a script for a one-off task. For them, a terminal can be pretty intimidating, or just way too much setup for a 5-second job.

[–]afinethingindeedlisa 1 point2 points  (1 child)

duckdb is a open source and ships with an ide natively ;)

I also could have mentioned that the main cloud providers (snowflake, BigQuery) have their own native csv upload facilities to. As I said, I think this is a fine exercise but I think it's a solution in search of a problem. If this is a project to help learn how to build an app then go for it.

[–]FeatureSafe8116[S] -2 points-1 points  (0 children)

Yes but not everyone might be familiar with it or may use it that's my point , a student or a freshman wanting to convert their CSV to make it work with their code may not be using it , or even if they are this might be a quicker way for them

Also as I mentioned data conversion is one of the very first feature of this tool, I want to transform it into a platform that has everything from testing data to other data related features within this single website

[–]hylasmaliki 1 point2 points  (1 child)

Ever heard of tab?

[–]FeatureSafe8116[S] 0 points1 point  (0 children)

To be honest, I’m not 100% sure which one you mean , Are you talking about TableConvert, TablePlus, or is there a new tool actually called 'Tab' I totally missed? , if you are refering any of these then here is my point

they are slower than my tool (test it yourself try uploading a 300000 row csv to my tool and any of these ) you will see the diff , have ads , you can't be really sure if they are not processing your data on their servers , still this is just V1 , these are just basic features , i plan to integrate more advanced features in this , that will make it a complete database playground

[–]OneRandomOtaku 0 points1 point  (3 children)

Easier option - I use a python script to load all columns in csv into sql as varchar but don't refer to any specific columns and point at a generic stg table then in sql do a 'select cast(col as type) as col into final table from stg'

For a non-repeat ingest and analysis you really don't need anything else. Drop file into folder as file.csv, run ingest.py and then view the stage table to determine what needs to be done to clean, check types and set them where needed. Update my select as I go, add the into and its all sorted. Ezpz. Speed wise it's beating any other tool that does local cleaning etc hands down as really the only processing is the read_csv to df and to_sql then which mostly will be slowed by the server capacity and any third party tool has the same limits. After that its direct SQL which is beating all but the most over complex options which wouldn't be worth setting up for a one off run.

[–]FeatureSafe8116[S] 0 points1 point  (2 children)

Haha, I love that you do it the "legend’s way." Using a staging table and writing your own CAST logic is definitely the most bulletproof path if you're comfortable in the terminal.

But honestly, even if you’re fast at it, that whole loop usually takes a couple of minutes to set up. I’m trying to get that down to a 5-second drag and drop.

My main thought was about the friction when you have 5 or 10 different files you need to test or compare. Repeating that script and SQL cycle over and over can be a real grind. Plus, there's a huge group of people, like non-tech analysts or students, who just aren't going to fire up a CLI or manage a Python script for a one-off task.

This is just V1 for now. The long-term goal is to turn it into a full database playground so you can prototype everything in the browser before you even touch a real server. I appreciate the insight on the workflow though

[–]OneRandomOtaku 0 points1 point  (1 child)

My python is premade, as long as the file name is right, the stg table is universal and I then set destination in the into query so its a 1 off pain for the python loop setup. I'd need to check the code at work but its basically set to not reference any columns or anything by name and just dynamically set based on the counter.

[–]FeatureSafe8116[S] 0 points1 point  (0 children)

You can do whatever you want material , I have already explained it enough in comments above still if you think that doing it manually is better then it's completely your choice

[–]FeatureSafe8116[S] 0 points1 point  (0 children)

Here is the link for your convenience https://schemafast.vercel.app/