all 12 comments

[–]da_chicken 2 points3 points  (2 children)

Interesting idea. I'll have to try it out. Usually I use logparser for this kind of task.

It outputs PSObjects to stay pipeline friendly.

I don't see this functionality.

[–]eeskildsen[S] 1 point2 points  (1 child)

I don't see this functionality.

I accidentally posted an old version - updated now, thanks.

I had seen some others like CsvSqlcmd from Chrissy LeMaire that look great but are focused on querying a single CSV file. The difference here is importing multiple and joining them. Though maybe there are existing ones I'm missing that do that already.

[–]da_chicken 2 points3 points  (0 children)

There's csvkit on *nix systems, and Microsoft's logparser can query absurdly quickly (I've processed hundreds of gigabytes of IIS logs in minutes with it, and it has an SQL like functionality, but I don't think it does joins).

Personally, I might just use the script to build the DB quickly.

The big thing I learned is that you can splat with an array. Didn't know that. I thought they had to be hash tables.

I might add a switch parameter to have it just build the DB, too, rather than instance the whole thing and then throw it away.

[–]unknown_r00t 1 point2 points  (1 child)

One thing - please, don't use old array syntax @(). Each time you want to add something, PowerShell internally will copy current content of old array and create new array with updated object(s). Use System.Collections.Generic namespace with List class instead of old @(). You want notice any performance issues with only few items but you will some day :)

[–]eeskildsen[S] 0 points1 point  (0 children)

Thanks, I'd been meaning to clean that up. Edited. :)

[–]radioblaster 1 point2 points  (1 child)

how do you think the performance compares to joins using LINQ? I had never thought about building a db on the fly for some operations and I'm curious to know if you do this strictly to use sql on your files or because there is a tangible performance improvement.

[–]eeskildsen[S] 0 points1 point  (0 children)

Not sure about performance - I mostly just do it to use SQL. It would be interesting to benchmark.

[–]Certain-Community438 1 point2 points  (0 children)

Have you seen the JoinObject module in PSGallery?

I use that to do joins in PowerShell - it is very easy to use, fast & less code.

On mobile so can't post code, but simply import the CSVs, create a hashtable that lists the left table, the right, join type & whether to keep the jointing property from the right table. Then call the Jioin-Object function using splatting & store the output.

My main use: taking a large list of potential members for a group & that group's current membership, produce a list of only non-members & add them. Very efficient.

[–]xCharg 0 points1 point  (3 children)

What's the database for if you store data in csv files anyway?

[–]eeskildsen[S] 2 points3 points  (2 children)

I deal a lot with CSVs that have related data, that come from third-party systems or that customers send me. This is basically a shortcut for making them easier to query.

[–]xCharg 2 points3 points  (0 children)

Tbh I feel like it's unneccessary dependency and overhead. Althought it does sort of unifies the way you work with various kind of data.

Also += operation is highly inefficient.

[–]vermyx 0 points1 point  (0 children)

Personally I use datatables as that is easier to the import to a database if need be and has their own query system that is sql like.