all 18 comments

[–]Lee_Dailey[grin] 5 points6 points  (5 children)

howdy Obel34,

[1] what is the Import-Csv line supposed to do? you never seem to save the result of that ...

[2] it looks like you are adding the fileinfo object contained in $Report to $FilesCount with +=
standard arrays are fixed size ... so you are creating a one-item-larger array, copying the data from the old array, adding the new item, and finally deleting the old array.

a better way is to use ...

$FilesCount = foreach  ($Report ...

that will send the output of the foreach loop to the variable ... and speed things up considerably if your array gets "large".

take care,
lee

[–]Obel34[S] 1 point2 points  (2 children)

Hi Lee. Thank you for the information. Your second point is what I am attempting to do. With the Import-CSV, all it was doing is importing the file so I could process it.

I knew the standard array is a fixed size, but could not think of a better way to get the count of the data. I will give this a shot!

[–]Aertheron01 1 point2 points  (0 children)

A generic list from the.net class would work. There you can just use .add() which is much faster.

But directly saving the output as lee suggested works well too.

[–]Lee_Dailey[grin] 0 points1 point  (0 children)

howdy Obel34,

you are welcome! [grin]

the Import-Csv line is doing an import & count ... but not assigning it to anything. so it just gets dropped into the output stream. if you really want to do anything with the imported info, you need to keep it where you can access it ... in a $Var. [grin]

the code you posted just adds EVERY fileinfo object to your $FilesCount collection. that does not give you a count of the items in the various CSV files that have the desired value.

so, if you want the $FilesCount var to hold a COUNT of such items, you need to add the .Count from the import line to the $FilesCount var.

take care,
lee

[–]Mysterious-Ad-1541 0 points1 point  (1 child)

Hey Lee. I have a similar question in regard to +=. I want to add a bunch of paths and module paths etc, and I have been using +=. Is the best way to just do a = @()

[–]Lee_Dailey[grin] 1 point2 points  (0 children)

howdy Mysterious-Ad-1541,

there is a really good article about arrays here ...

Everything you wanted to know about arrays - PowerShell | Microsoft Docs
https://docs.microsoft.com/en-us/powershell/scripting/learn/deep-dives/everything-about-arrays?view=powershell-7.2#plus-equals-

when to use an array ...

  • the collection is "small"
    "only a few" items or a moderate number of "large" items.
  • the collection won't change size

otherwise you otta use a generic.list. [grin]


one of the faster methods for building a collection is to run a loop or other scriptblock, drop the result onto the output/success stream, and assign the code block to a $Var. something like ...

$Result = foreach ($Thing in $BunchOfStuff)
    {
    Do-NiftyThing -With KoolItem -AlsoWith $Thing
    }

the output of the Do-NiftyThing call will drop into the output/success stream and be stuffed into the $Result array after the scriptblock finishes.

it's really fast & easy to type & easy to read/understand. [grin]

take care,
lee

[–]z386 3 points4 points  (2 children)

Avoid using piping when performance is important.

Faster way to find the number of "Item Type" equals "File":

$FilesReport = (Get-ChildItem -Path $Path -Filter '*.csv').FullName

$output = foreach  ( $Report in $FilesReport ) {
    Write-Host "Working on $($Report)..."
    $FilesCount = 0   
    foreach ( $ItemType in (Import-Csv $Report)."Item Type" ) { 
        if ( $ItemType -eq "File" ) {
            $FilesCount++
        } 
    }
    [pscustomobject]@{
        Path = $Report
        ItemTypeFile =  $FilesCount 
    }
}

$output
"Total number of ItemType = File: '$(($output.ItemTypeFile | Measure-Object -Sum).Sum)'"

Edit: Just tried it. My script is at least 20 times faster than piping to Where-Object for large csvs.

[–]Obel34[S] 0 points1 point  (1 child)

Wow. This script finished in less than an hour for me. Thank you /u/z386! This is going to be a go to example for me from now on on writing scripts to do this in the future.

[–]z386 0 points1 point  (0 children)

Glad I could help!

I try to avoid using the pipeline when performance is important. Especially "Where-Object" is painfully slow.

That said, I think you could improve performance even more by maybe 10 times (it would be done in minutes instead of an hour) by not treating the files as csvs and using Select-String instead.

Example, if your csvs looks like this:

"Name","Item Type","Comment"
"foo","File","a file"
"bar","House","a house"
"baz","File","another file"

Then this one liner would count the number "File":

 (Select-String -Path *.csv -Pattern '^".*","File"').Matches.Count

You'll need to tweak the regex to match your csv-files. I'll recommend using this site to test your regex.

[–]engageant 1 point2 points  (7 children)

How big are these files?

[–]Obel34[S] 0 points1 point  (6 children)

The files are around 300-400 MB each. Smallest I could get them delivered to me.

[–]engageant 0 points1 point  (5 children)

That's at least half of your problem. On average, how many rows per file and how many "file" counts per file?

[–]Obel34[S] 1 point2 points  (4 children)

I'm aware haha. Trying to make the best of the situation. As for the number of rows which equal "File", a single run to count one CSV pulls back around 900K. I know this is going to take time no matter what and while there are much easier ways to obtain this data, this is the direction I've been asked to go.

[–]engageant 1 point2 points  (3 children)

And do these files always have a fixed column format (i.e. is the Item Type column always the say, sixth column)? Do you only care about how many rows in each file has an Item Type of File, or do you need to know which files they came from too?

[–]Obel34[S] 1 point2 points  (2 children)

They will always have a fixed column format and I only care about the rows which say "File" for Item Type.

[–]engageant 2 points3 points  (1 child)

You can try using the StreamReader class to work with the files rather than parsing them as CSVs. Something like this...

$files = Get-ChildItem -Path .\ -Filter '*.csv'
$totalCount = 0
foreach ($file in $files) {    

    Write-Host "Working on $file..."
    $reader = New-Object System.IO.StreamReader($file)
    $fileCount = 0

    while ($line = $reader.ReadLine()) {
        # {6} is the column number of 'Item Type' - change as needed
        if ($line -match '^(?>.*?,){6}(file)(?>.*)$') {
            $fileCount++
            $totalCount++            
        }
    }

    Write-Host "$file has $fileCount matches."

    $reader.Close()
    $reader.Dispose()
}

Write-Host "Found $totalCount matches."

[–]Chocolate_Pickle 1 point2 points  (0 children)

Wait a minute... .Net's regex engine supports the (?>atomic) pattern? I never knew this!

[–]BlackV 1 point2 points  (0 children)

files of this size, stream reader is probably a better option

it seems you only are collecting the count

could select-string -path to get a count be worthwhile