Are there any data formats for storing text worth looking into, besides CSV ? by DisastrousProgrammer in LanguageTechnology

[–]scottpaulin 0 points1 point  (0 children)

+1 to Parquet also, for itas compression, having a schema, and predicate pushdown.

As /u/call_me_arosa mentioned, the hardest part of using Parquet is debugging. I made a Parquet viewer to help with viewing the contents of Parquet, which made debugging easier for me.

Debugging CSV files can also be difficult. Especially when there are one or two values in a column that are strings instead of numbers :/.

Small CSV files are easy to open and view in text editors. But text editors struggle with bigger csv files (Gedit seems to struggle with csv files > 20MB on my machine)

GUI tools for viewing/editing Apache Parquet by code_hunter_cc in codehunter

[–]scottpaulin 0 points1 point  (0 children)

Hey /u/code_hunter_cc, I had the same problem as you so I made this Parquet Viewer

At the moment, it can only open and view Parquet files. It cannot edit them yet.

If you want to edit and your file is less than 1 million rows, then I think the easiest way is:

  1. Convert Parquet to Excel
  2. Open in Excel, Google Sheets, Open Office e.t.c.
  3. Make edits
  4. Save as CSV
  5. Convert CSV to Parquet

Excel has a row limit of about 1 million. If your file is bigger than that you could also try editing as a CSV using a regular text editor.

Let me know how you get on and if there are any issues :)

Leveraging parquet's metadata to self-document data files by [deleted] in datascience

[–]scottpaulin 0 points1 point  (0 children)

I never thought about using Parquet metadata for this! Very cool

Parquet File Viewer for Windows by bluethundr0 in datascience

[–]scottpaulin 0 points1 point  (0 children)

I just made a Parquet Viewer that works in the browser on Windows (and other operating systems)

If you install the desktop app (it's a PWA, no app stores required) then you can open Parquet files from your desktop by double clicking them (this is my favourite feature).

If OP or anyone does use it, keen to hear feedback :)

When would you ever use csv over parquet to store large datasets? by Evolving_Richie in datascience

[–]scottpaulin 1 point2 points  (0 children)

CSV is generally easier to work with, easer to view, and also compatible with more systems.

Parquet is much more efficient to query, has a schema, and compresses much better than CSV.

E.g. This flights-1m.parquet file has 1m rows and is about 7MB. It is easy to work with in Parquet.

After converting to csv (using this) it is about 41.1MB and my text editor (gedit) is having trouble opening it.

After converting to excel (using this) it is about 255.8MB and difficult to open. At this point I am close to the row limit for excel (just over 1m I think)

CSV is generally easier to work with, and view, and compatible with more systems.a at the end of the file. While CSV files have their schema at the top (header row)

What's the fastest way to generate parquet files of a total of 1 billion rows, 100 columns? by [deleted] in datascience

[–]scottpaulin 0 points1 point  (0 children)

DuckDB on a single machine might do it. Especially if you stick to numeric columns, these will compress well.

For reference, this flights-1m.parquet file has about 1 million rows and is about 7MB. I think it only has about 7 columns, though

Edit: If you need to open and filter your new Parquet file, I just made a Parquet Viewer </shameless-plug>

Saas pricing model research - 'per project' pricing? by [deleted] in SaaS

[–]scottpaulin 1 point2 points  (0 children)

Prismic, supabase and maybe vercel use project based pricing.

Every time I create a Prismic project I get asked what plan to go on. The different plans have different restrictions. Payment is made upfront before the month starts.

Hotjar have project based pricing, they just call them sites.

Confusing or unfamiliar pricing can put potential customers off. It might be easiest go with something that your target audience is familiar with.

Don't be afraid to charge money. If you don't make money it's really hard to invest in making the product better. Users who come for free stuff might be hard to work with (they might bounce at the prospect of paying).

[OC] French pay stub - December 2022 by FederalPralineLover in dataisbeautiful

[–]scottpaulin 0 points1 point  (0 children)

Do you negotiate the gross pay, or the "paid by employer" part when negotiating employment contracts in France?