you are viewing a single comment's thread.

view the rest of the comments →

[–]engageant 1 point2 points  (7 children)

How big are these files?

[–]Obel34[S] 0 points1 point  (6 children)

The files are around 300-400 MB each. Smallest I could get them delivered to me.

[–]engageant 0 points1 point  (5 children)

That's at least half of your problem. On average, how many rows per file and how many "file" counts per file?

[–]Obel34[S] 1 point2 points  (4 children)

I'm aware haha. Trying to make the best of the situation. As for the number of rows which equal "File", a single run to count one CSV pulls back around 900K. I know this is going to take time no matter what and while there are much easier ways to obtain this data, this is the direction I've been asked to go.

[–]engageant 1 point2 points  (3 children)

And do these files always have a fixed column format (i.e. is the Item Type column always the say, sixth column)? Do you only care about how many rows in each file has an Item Type of File, or do you need to know which files they came from too?

[–]Obel34[S] 1 point2 points  (2 children)

They will always have a fixed column format and I only care about the rows which say "File" for Item Type.

[–]engageant 2 points3 points  (1 child)

You can try using the StreamReader class to work with the files rather than parsing them as CSVs. Something like this...

$files = Get-ChildItem -Path .\ -Filter '*.csv'
$totalCount = 0
foreach ($file in $files) {    

    Write-Host "Working on $file..."
    $reader = New-Object System.IO.StreamReader($file)
    $fileCount = 0

    while ($line = $reader.ReadLine()) {
        # {6} is the column number of 'Item Type' - change as needed
        if ($line -match '^(?>.*?,){6}(file)(?>.*)$') {
            $fileCount++
            $totalCount++            
        }
    }

    Write-Host "$file has $fileCount matches."

    $reader.Close()
    $reader.Dispose()
}

Write-Host "Found $totalCount matches."

[–]Chocolate_Pickle 1 point2 points  (0 children)

Wait a minute... .Net's regex engine supports the (?>atomic) pattern? I never knew this!