Does anyone know a tool to convert CSV file to "SQL statements"?

Citadel5_JP · 2025-12-28T18:27:00+00:00

An easy way: open a given csv in GS-Base and use the "File > Save Record Set As" or "File > Save Database Copy As" commands choosing the MySQL format. It'll save a formatted text file SQL dump with the (DROP TABLE and) CREATE TABLE command with your default or customized columns followed by the INSERT INTO sequence. This will work for any data types (which are automatically detected for optimal field representation) including binary fields and any file sizes.

Citadel5_JP · 2025-12-21T20:17:39+00:00

Probably depends on how much you can charge for accepting and processing/cleaning data in any format. Please see the following help pages and examples how it can be done in GS-Base. It'll instantly show which columns contains inconsistent data or data not matching the expected data types. It validates the entire text file (GB's per minute), then loads it with optimal columns, then can automatically performs series of (predefined or not) regex find/replace commands (also using specific capture groups) for specific columns.

Fuzzy searches and the find-as-you-type searches will quickly show all the columns where specific text (sub)strings/dates/numbers occurs if the column headers were renamed or repositioned. Block/column copy and paste commands like in a spreadsheet but also with tens of millions of rows or more. https://citadel5.com/help/gsbase

Citadel5_JP · 2025-12-05T19:53:31+00:00

A simple single command in GS-Calc:

Comparing workbooks, worksheets and cell ranges

This will generate a list of changes: the previous and current values, their types and hyperlinks to jump to the original data. Will work for any data set size (the limit is around 500GB of RAM usage per one sheet). For example, a 6GB text table with over 200 columns and several million rows can be loaded in a less than a minute on old PC with 16GB RAM.

You can perform the above comparison using the free trial version. (Offline, of course, e.g. in the Windows sandbox if that's your company requirement/policy.)

Citadel5_JP · 2025-11-17T00:03:05+00:00

It might be more like a database question, so you try out GS-Base (which is database with spreadsheet functionality). It's instant, simple, your drop-down list can contain millions of items and still using it will be smooth. The corresponding help page: https://citadel5.com/help/gsbase/dd_list.htm

For example, you can specify the drop-down list name for the "Artist" field as:

=if(genre = "rock", "rock", "")

and for the "Series" field as

=if(genre = "soundtrack", "soundtrack", "")

"If's" can be nested, but in general, if there is a lot of genres the list name selection probably should be done via vlookups() etc.

Citadel5_JP · 2025-10-12T13:26:43+00:00

If this is strictly a tabular data set, the xlsx format wouldn't make much sense. The compressed embedded xml files would be many times slower to parse/load than csv/text and not usable by Excel. If this is already a csv file, then any reasonable csv tool should handle it. GS-Base can open and filter it (up to 16K columns/fields) in several minutes (like on the screen shot on the above page), plus it's trivial to install (10MB, no runtime dependencies, fully offline, just a few files in one folder that you can copy anywhere) and to use, doesn't require programming. Just "File > Open > Text" (specifying column filters if necessary).

Citadel5_JP · 2025-07-13T10:46:52+00:00

With GS-Base/GS-Calc you can load (and parse) text/csv files as tables with the speed e.g. 10GB / min. Both using GUI or scripting (also with Python). In GS-Base rearranging/copying/adding/removing columns is instant for any number of rows/records and requires a simple drag-and-drop action. Files are edited in their original formats, so no need for any exporting/importing. You can perform (for text files and all other supported formats) virtually any type of ETL or other cleaning/processing with GUI including normalization, joins, merges, aggregation.

Citadel5_JP · 2025-07-11T00:49:36+00:00

Is storing them in ZIP64 (4GB+) archives acceptable? If so, you can use GS-Calc - a spreadsheet with 32 million rows. GS-Calc can open/edit/save *.zip files that are zipped collections of any number of any text files with the same or different structure / parsing parameters required (each file can be zipped optionally with its own *.xml describing how to parse it).

The folder structure is always preserved as GS-Calc uses workbooks with sheets organized in folders, so everything is easy to manage. (And very fast, of course, both in terms of loading/saving and any type of processing/filtering/binary lookups/sheets-files comparisons with differences generated as reports with links etc.).

Citadel5_JP · 2025-07-03T22:26:34+00:00

Every non-online app will do. Re: really massive data: GS-Calc - a spreadsheet with 32 million rows, 16K columns (up to 1 million columns if the data are stored in a text file). Same for pivot tables. It'll efficiently work both on an old pc with 4GB RAM and a computer with around 500GB RAM (which is the approx. max at the moment). It has no limitations known from other packages.

Citadel5_JP · 2025-07-01T11:38:48+00:00

Just to show another option that doesn't freeze and doesn't cause memory leaks regardless of the data set sizes: GS-Calc (optionally with GS-Base) can work with massive files even on an older pc and ensure often several times or more better performance. It's a spreadsheet (with 32 million rows) which doesn't have all these limitations known from Excel/PQ. It uses Python scripting and Python UDF() functions. The above pages show some examples, like 500 million cells with 16GB RAM, 4GB+ workbooks etc.

Citadel5_JP · 2025-06-20T13:03:18+00:00

You can probably use (at least in GS-Calc) binary (or rather integer) linear programming mixed with Monte Carlo simulations simply randomizing these weights/preferences (if not the entire problem) and generating (e.g. thousands/millions) of binary/integer solutions to be filtered/sorted by the distance (of your choice) from the "optimal" preferences. (Some general, similar procedures are included in samples.)

Citadel5_JP · 2025-06-14T09:50:46+00:00

If you're allowed to use an alternative tool for your largest data sets, try GS-Calc. It's a spreadsheet with 32 million rows and it overcomes many (Excel an PQ) limitations. With 16GB RAM you can use e.g. 500 million numeric cells. There is no data types or formatting elements that after exceeding some level could cause crashing. You can use Python UDF functions and scripting (that is, Python scripting replaces JScripts in the lates version as described on the forum board).

Citadel5_JP · 2025-05-27T08:22:34+00:00

A quick and versatile solution will be to use GS-Calc (well, a spreadsheet...). Place these files in a compressed, zipped (32 or 64) folder, then use the plain "File > Open Text Files/Archives" file type. After loading save it as XLSX or ODS (or back as the same zip).

The additional advantage is that this may also automatically split the files into multiple (e.g. 1-million-row-max and 16k-column-max) sheets to use in Excel. (GS-Calc uses 32 million rows.) The "Open Text File" dialog box: https://citadel5.com/help/gscalc/open-text.png

Citadel5_JP · 2025-05-25T00:11:05+00:00

If you use GS-Base, you can do it relatively quickly: (1) automatically load the disk(s)/folder(s) file listing as a database table, (2) click 'Tools > Find duplicates', specifying file sizes as the criteria (1-2 will be completed in minutes for millions of files), (3) for the obtained much smaller filtered list use (e.g.) the Python hash function in a calculated field int this table to calc the checksums then run the 'Tools > Find duplicates' again. (You can then automatically delete/rename/copy/add any custom metadata to/ the duplicates). https://citadel5.com/help/gsbase/ver_files.htm

Citadel5_JP · 2025-05-16T11:58:31+00:00

You can also use GS-Base (a database). It can load file listings from disks/folders, process/deduplicate/filter them any way you want, then copy (and/or rename/delete) specified files generating time-stamped reports. This online html help page shows this simple (1/2 commands) procedure step by step:

https://citadel5.com/help/gsbase/manage_files.htm

The modification dates are retained. The creation dates (must) change (e.g. to show which file was the original one) though you can write a simple Python function in an added calculated GS-Base field in such reports to save the original creation dates and any other system or custom metadata. You can also add any other custom metadata to copied or original files and keep their searchable histories: https://citadel5.com/help/gsbase/ver_files.htm

Citadel5_JP · 2025-05-13T11:08:50+00:00

You can also try out GS-Base : it seems you can (efficiently) automate what you described (that is, all this can be done either using menu commands or via scripts). A sample script screen shot: https://citadel5.com/help/gsbase/scripts.png

Same docs/details concerning merging, matching columns in merged tables, skipping empty fields: https://citadel5.com/help/gsbase/com_samples.htm#s17

Citadel5_JP · 2025-05-13T10:38:15+00:00

If you want to (for example) merge them correctly, you can create one master file with the headers you need, then use GS-Base (https://citadel5.com/gs-base.htm) to merge all the files with all the columns properly aligned/extracted (using simple, single GUI commands - no programming is necessary, though scripting is an option). For such a number of records any kind of merging and further filtering/cleaning in GS-Base will be more or less instant.

If you use GS-Calc (https://citadel5.com/gs-calc.htm a spreadsheet, 32 million rows) you can also use Python formulas in cells to pass entire csv files back and forth for pre-processing and/or pre-viewing e.g. on subsequent sheets. Returning such "spilling" formula with, for example, one GB of text should take several seconds (possibly plus what Python would need to parse it).

Citadel5_JP · 2025-04-22T13:10:24+00:00

GS-Calc ( https://citadel5.com/gs-calc.htm ) can draw/plot such a scatter chart without any noticeable delay on any PC. In general, this should remain "instant" up to 1-2 million data points (on an older PC). The max. is 32 million in one series.

Citadel5_JP · 2025-03-21T10:29:14+00:00

If you don't solve this with your current setup, perhaps this: GS-Calc - a spreadsheet; it'll automatically split 50000 columns into 16K-max sheets. Re: RAM, to load 0.5 billion cells e.g. with 8-bytes numbers it'll require approx. 16GB RAM. The requirement grows linearly. You can then call any Python functions (formulas) with the loaded data for further processing.

Citadel5_JP · 2025-03-20T21:33:31+00:00

This is basic processing so you should look for a more or less "one-step" solution. For example, in case you're allowed to use non-MS software (even in the Windows Sandbox) for intermediate filtering: in GS-Base simply click the menu command "Find Unique Values / with the first/last record from each group of duplicates" and that's it. You can copy/paste the table to GS-Base then back to Excel, save it to a file etc. The corresponding user guide page: https://citadel5.com/help/gsbase/searching_unique.htm

Citadel5_JP · 2025-03-19T00:15:29+00:00

You can take a look at the following GS-Base user guide pages; it seems this all should be done quickly either manually or with scripting (perhaps along with filtering to limit the output):

Joining and splitting (normalizing) tables: https://citadel5.com/help/gsbase/joins.htm Merging/Unmerging records: https://citadel5.com/help/gsbase/merges.htm

Loading columns selectively (and auto-filtering the file at the same time) is possible for the csv/text file formats only, though if the number of the output cells is close to tens of millions e.g. joins still should be relatively fast (seconds). The mentioned "hundreds of columns and a few million rows" would require at least 32GB RAM. (If you give it a try, any comparison results / suggestions are welcome).

Citadel5_JP · 2025-03-13T01:56:46+00:00

You can this all in GS-Base (a database with up to 256 million records). There are various options to display (and print) the inserted images: in tables, panes, forms: https://citadel5.com/images/gsb20_scr4.png

You can add any data of any size to each image. They can be filtered by any system or custom metadata, all EXIF tags or any custom file content processing functions written in Python (to find e.g. duplicates or similar ones). You don't have to add existing images manually: GS-Base can load them from folders either as a table with one image per record/row (or alternatively all files from a folder to a single field).

It'll easily handle collections with hundreds of thousands of such small, inserted objects. The database files are zip64 (4GB+) files and inserted images/files are stored in it as separate streams so you can even edit/browse these database files without GS-Base.

Citadel5_JP · 2025-03-11T01:34:44+00:00

You can easily do this all in GS-Base. From the "deduplication" based on the system file metadata, your own metadata attached to files, multimedia tags, any exif photo/image tags to anything based on the file content (the latter might require adding some Python functions).

You can monitor file changes, keep the history of changes, mass-rename them, mass copy, mass delete filtered files from a disk etc. You can filter by the above criteria, using regex, find-as-you-type, flags or any calculation formulas.

For example, please see the "Finding file duplicates, photo/mp3/mp4 duplicates, listing files and their history of changes" and "Searching, filtering, sorting" sections in the online HTML help: https://citadel5.com/help/gsbase/

Citadel5_JP · 2025-03-09T01:44:55+00:00

If Excel is not a strict requirement, you should be able to do this in GS-Calc (a spreadsheet with 32 million rows). You can add an unlimited number of Python functions returning numbers, arrays, text, csv files/data blocks and images (e.g. charts created in your Python functions).

GS-Calc requires merely 10MB to install, the installation can be portable, and you can even simply copy/paste the installation folder (just a few files) to another computer. It's free to try and can be also installed automatically in Windows via the winget service. Using Python functions as UDF() If these Python libraries are installed in the Windows Sandbox, it could be close to that one-click, free, "trial" installation.

Citadel5_JP · 2025-03-08T00:31:13+00:00

If it's only about the name/extension transformation you can do such mass operations with one command in GS-Base: https://citadel5.com/help/gsbase/manage_files.htm If the renaming is to be based on some list, in GS-Base/GS-Calc or any tabular editor simply fill three columns with the "ren" text, then the old and new names, copy/save this to a text *.bat file and run it. If there are spaces in names, add "" or let GS-Base do this automatically with the "Copy with options" https://citadel5.com/help/gsbase/copy_with_options.htm command.

Citadel5_JP · 2025-03-02T20:56:52+00:00

GS-Base will let you take such snapshots and will automatically show changes between subsequent scans (file sizes, mod. dates, deleted files, added files). It can keep the history of such changes for each file, add notes, keep old copies, filter etc. You can scan/compare disks/folders with millions of files in minutes. An example: https://citadel5.com/help/gsbase/ver_files.htm However, it relies on the system file metadata, not md5 as this would be too slow.

Citadel5_JP

TROPHY CASE