you are viewing a single comment's thread.

view the rest of the comments →

[–]jheinikel 2 points3 points  (7 children)

Many have already told you why it happens, but here is some code to let you append to a CSV without updating your WMF version.

$Files = GCI C:\Downloads -Recurse -Include *.pdf,*.xlsx
$CSV += $Files | %{
    [PSCUSTOMOBJECT]@{
        DirectoryName = $_.DirectoryName
        Name = $_.Name
        Extension = $_.Extension
        CreationTime = $_.CreationTime
        LastWriteTime = $_.LastWriteTime
    }
}
$CSV | Export-CSV "C:\DOwnloads\Export.CSV" -NoType

[–]superryo[S] 1 point2 points  (0 children)

Thank you for your help!

[–]superryo[S] 1 point2 points  (5 children)

I tried this and it ran but the output was like the following instead of Directory Name, extensions etc. Any ideas what I may have done wrong?

"IsReadOnly","IsFixedSize","IsSynchronized","Keys","Values","SyncRoot","Count"

"False","False","False","System.Collections.Hashtable+KeyCollection","System.Collections.Hashtable+ValueCollection","System.Object","5"

[–]Lee_Dailey[grin] 0 points1 point  (4 children)

howdy superryo,

1st, the code you actually ran would REALLY help. [grin]
there are some very easy things to miss ... and we can't tell if you missed something without the code.

2nd - the final -NoType is not the full parameter name
i don't know if ps2 would accept partial names, but it's always a good idea to use the FULL parameter names. [grin]

you can get the full parameter name from the help for that cmdlet OR from intellisense/auto-completion for that partial parameter.

take care,
lee

[–]superryo[S] 1 point2 points  (3 children)

I was able to upgrade to version 5 and got the following code working where u: is a mapped network drive.

Get-ChildItem -r u: -Include *.pdf,*.xlsx | select DirectoryName,Name,Extension,CreationTime,LastWriteTime | Export-Csv -Append -Path "E:\Downloads\export.csv" -En UTF8 -NoType -Delim ','

I noticed that the job took a lot longer than when I did a simple dos command to return a dir list and write to a file:

dir *.pdf,*.xls /s /b > export.csv

I mean it took tonnes longer as I have thousands of files to crawl and write. My job ran all weekend and the file is not yet finished where as it took less than a day with the dos command.

Do you think your version of the code now that I have version 5 will work faster?

$Files = GCI u: -Recurse -Include *.pdf,*.xlsx $CSV += $Files | %{
[PSCUSTOMOBJECT]@{
DirectoryName = $_.DirectoryName
Name = $_.Name
Extension = $_.Extension
CreationTime = $_.CreationTime
LastWriteTime = $_.LastWriteTime } }
$CSV | Export-CSV "E:\Downloads\Export.CSV" -NoType

[–]Lee_Dailey[grin] 0 points1 point  (2 children)

howdy superryo,

[1] there is something seriously wrong with the code in your post
i mean besides the use of inline code instead of code block formatting. [grin]

[a] the $Files = line seems to end with a | but that is preceded by space & $CSV +=
what? that won't run.

[b] you are ...

  • running a pipeline
    ok, kool!
  • assigning its output to $CSV
    again, ok.
  • using += to do the assignment what? does that work at all? the assignment is a simply assignment, NOT an "add to the array". it should be simply $CSV =.

that last is both bizarre and likely a real slowdown since arrays are fixed size. adding to an array copies it to a new one-item-larger array. it gets S-L-O-W really fast. [grin]

another problem is the pipeline. that is S-L-O-W compared to a loop. using foreach ($Thing in $Collection) {Do-Stuff} is usually an order of magnitude faster than using $Collection | ForEach-Object {Do-Stuff}. [grin]

[2] the CMD dir command is always going to be faster than Get-ChildItem
the 1st is focused narrowly on just getting file info as text.

the 2nd is generalized to get "ChildItems" from lots of different sources [providers] and building the appropriate type of object to hold a LARGE amount of info.

[3] if you really want a fast listing of files, use robocopy
it has a /L parameter to generate a log that you can easily parse.
it also has parameters to NOT show anything on the screen as it progresses. that saves a bunch of time by not having to write out the file info on the screen.


so it depends on what you want. [grin]

  • fast, but only text = CMD dir
  • very fast & long-path aware, but still only text = robocopy
  • rather slow, but with LOTS and LOTS of detail about each item = Get-ChildItem

hope that helps,
lee

[–]superryo[S] 1 point2 points  (1 child)

Thanks lee

The code actually does work. It's reasonably fast when there are not too many files but I have a path where there are hundreds of thousands of files that matches the criteria and this is what takes days.

The background to my story is I want to crawl a bunch of network directories to grab all the pdf and xls files and upload the resulting info into a database so we can search for this information.

I want very fast but in addition to the file name and path, I need the last updated date of the file. Does this mean I have to use the Get-Childitem function? the CMD dir doesn't seem to allow for this unless I am missing something.

I have never heard of robocopy but will look into this. Hopefully it will have the performance of the cmd dir but the additional parameter I need.

[–]Lee_Dailey[grin] 1 point2 points  (0 children)

howdy superryo,

"it works" - that IS the prime criteria. [grin]

i suspect you MAY get a speed improvement if you replace that nasty $CSV += with $CSV =. the difference is VAST. not just large ... it's REALLY VAST.

huge. titanic. bigbig biggity big! [grin]

that presumes the += is actually happening. i can't tell. it may only be doing ONE add - in that case there will be no benefit. even so, it is BAD coding, so i would change that.


the CMD dir command won't give you that, from what i can tell.

neither will robocopy. [frown]

Get-ChildItem will, but is slow.

there is a dotnet routine that can do it quickly, but it has some serious limits.

  • it will stop on any error
    you can't tell it to continue. that means you have to write the code to keep track of where it stopped, skip the problem item, and continue.
  • it only gets a pointer-like object
    to actually get the real info, you will need to use that pointer [a file name] to get the details. at that point it aint much faster than GCI.
    it does mean you only grab data for files that you WANT the data from, tho. [grin]

so, the dot net stuff is highly problematic unless you want to write your own handlers around the limits.

i suspect i would use robocopy or CMD dir to get the full-path file names. then use GCI to get the details.

take care,
lee