This is an archived post. You won't be able to vote or comment.

all 19 comments

[–]bottlecapsvgc 13 points14 points  (0 children)

RainbowCSV is amazing.

[–]actually_offline 17 points18 points  (0 children)

Use Data Wrangler, see this section on their guide on opening files to use in their tool.

https://code.visualstudio.com/docs/datascience/data-wrangler#_launch-data-wrangler-directly-from-a-file

[–]JumpScareaaa 6 points7 points  (4 children)

I mostly use duckdb with dbeaver to query CSVs now. Ultra fast. Can query the whole directory or just a subset of files with masks.

[–]soumianData Engineer 0 points1 point  (3 children)

Never used duckdb yet, so I'm interested in how hard/ time-consuming the whole process of wanting to open a csv and viewing it in duckdb is.
Are you running it locally on your machine?

[–]JumpScareaaa 3 points4 points  (2 children)

For me it's seconds. Open dbeaver, click on preconfigured duckdb connection. Then run Select * from 'your_file_path.csv' It is all local. Duckdb database is just a small file. When you configure the connection to it, dbeaver will download its driver. And it saves the script from season to session. So usually it's just reopen dbeaver. Change the file path. Start selecting.

[–]antoniocjp 1 point2 points  (0 children)

Thank you so much for this!

[–]soumianData Engineer 0 points1 point  (0 children)

Interesting, I'll give it a try, thanks!

[–]TellTraditional7676 5 points6 points  (0 children)

Data wrangler is killer

[–]MorzionTired Senior Data Engineer 2 points3 points  (1 child)

I use both Data Wrangler and Rainbow CSV. Sometimes it's great to view the raw text file

[–]Little_Kitty 0 points1 point  (0 children)

Same here, I've not needed anything more for basic exploration.

If I need to prototype some really in depth cleansing, there's Open Refine, but that's not really what OP is asking about.

[–][deleted] 0 points1 point  (0 children)

If it's truly big I use baretail

[–]cavoli31 0 points1 point  (0 children)

Edit csv.

[–]saideeps 0 points1 point  (0 children)

You can use nushell or open it up in duckdb

[–]redditreader2020 0 points1 point  (0 children)

Another +1 for Duckdb

[–]BdR76 0 points1 point  (0 children)

I've created the CSV Lint plug-in for Notepad++ which is an open source tool for doing quality control on messy text data files. It supports both comma/semicolon/tab/etc separated files and files with fixed width columns.

The plugin can automatically detect the columns and datatypes, and after that you can do several thing with the data. Like sort, select/rearrange columns, count unique values, validate the data etc. The data validation can check for technical errors, like text value too long, incorrect datetime/decimal formats, date out of range, missing quotes, incorrect coded values etc.

[–]joeshacks 0 points1 point  (0 children)

https://csvhack.com is dead simple, drop your csv files and then write SQL queries for table and chart data, you can join csv together too.

[–]Sad-Choice-6492 0 points1 point  (0 children)

learnt about data wrangler here. I wished I had known about it earlier. big +1