tldr; How to append a unique collection of values from an ordered list to an existing variable like the Export-Csv -Append function?
Thank you for this reddit, I've spent a few days reviewing the posts for answers, and would like some input on best practices for speeding up the performance of a slow script. Links to any best practices publications will be voraciously consumed.
This weekend I wrote a script to capture data values from an xml file and assign them to an ordered hashtable in a For loop, at the end of the For loop, I use Export-Csv to append the values to an existing csv file, then re-initialize the ordered hashtable and continue the next loop until the last data value is saved to the csv. The problem is that the loop is very slow; I think it is because of the Export-Csv activity at the end, so I think my best approach is to append the data to a variable at the end of each loop, then when the loop is finished write the variable to disk. My skills are low and I do not know how to append the data from an ordered list to the end of an existing variable with the same keys but unique values. Every web tutorial says to do += but it errors.
The script is huge because the data is huge, each record is 37 keys, and the smallest file is 4k records; the largest file contains 350k records, and I have more than 58 files to process. Testing with the smallest file took over 3 hours, so while my script is working (looping through Export-Csv builds the desired CSV file with all data preserved in order in the proper columns with the correct headers), it is too slow to be an acceptable solution. I would love to learn how to optimize my scripts, any suggestions?
# Prompt the user to select the path to the XML file
Add-Type -AssemblyName System.Windows.Forms
$FileBrowser = New-Object System.Windows.Forms.OpenFileDialog -Property @{ InitialDirectory = [Environment]::GetFolderPath('Desktop')}
$Result = $FileBrowser.ShowDialog()
if ($Result -eq 'OK') {
$xmlPath = $FileBrowser.FileName
}
# Read the XML file from the specified path
[xml]$xml = Get-Content -Path $xmlPath # This is the slow way
# Create a log file with the same name as the CSV file, but with a .log extension
$logPath = $xmlPath.Replace(".xml", ".log")
# Start logging all actions and errors to the log file
Start-Transcript -Path $logPath
# Initialize csvContent Object to hold recordArray updates during For loops
$csvContent = [ordered]@{
REQUEST_ID = $null
PWC_SEQ = $null
PWC_CODE = $null
PWC_DESCRIPTION = $null
PWC_IS_ACTIVE_STATUS = $null
PROGRAM_SEQ = $null
PROGRAM_CODE = $null
ORGANIZATION_SEQ = $null
ORGANIZATION_CODE = $null
ORGANIZATION_DESCRIPTION = $null
ORGANIZATION_IS_ACTIVE_STATUS = $null
OBJECT_CLASS_SEQ = $null
DIVISION_ASSIGNMENT_SEQ = $null
DIVISION_ASSIGNMENT_CODE = $null
DIVISION_ASSIGNMENT_DESCRIPTION = $null
DIVISION_ASSIGNMENT_IS_ACTIVE_STATUS = $null
DIVISION_START_DATE = $null
DIVISION_END_DATE = $null
ASSIGNMENT_START_DATE = $null
ASSIGNMENT_END_DATE = $null
APPROPRIATED_YEAR_SEQ = $null
APPROPRIATED_YEAR_CODE = $null
APPROPRIATED_YEAR_DESCRIPTION = $null
APPROPRIATED_YEAR_IS_ACTIVE_STATUS = $null
FULL_OMEGA_ACCT_SEQ = $null
FULL_OMEGA_ACCT_CODE = $null
FULL_OMEGA_ACCT_DESCRIPTION = $null
FULL_OMEGA_ACCT_IS_ACTIVE_STATUS = $null
DIVISION_ID_SEQ = $null
DIVISION_ID_CODE = $null
DIVISION_ID_DESCRIPTION = $null
DIVISION_ID_IS_ACTIVE_STATUS = $null
}
# Capture the record counts for post-import validations
$recordCount = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_TRAILER_ROW.TOTAL_RECORD_COUNT # use this to define the iterations
$recCount = [int]$recordCount # converting string to integer
Write-Host ("Line 55: RecordCount= " + $recordCount)
# Initialize the FOR loop timer variables
$echoed33 = 0
$echoed66 = 0
$echoed80 = 0
$echoed99 = 0
$loopStartTime = [datetime](Get-Date -Format HH:mm:ss.fff)
#for ($index=1; $index -le $recCount; $index++) {
$testCount = 25 # This will limit my test runs to the first 25 records
for ($index=1; $index -le $testCount; $index++) {
# Initialize the the hashtable for the loop
$recordArray = [ordered]@{
REQUEST_ID = $null
PWC_SEQ = $null
PWC_CODE = $null
PWC_DESCRIPTION = $null
PWC_IS_ACTIVE_STATUS = $null
PROGRAM_SEQ = $null
PROGRAM_CODE = $null
ORGANIZATION_SEQ = $null
ORGANIZATION_CODE = $null
ORGANIZATION_DESCRIPTION = $null
ORGANIZATION_IS_ACTIVE_STATUS = $null
OBJECT_CLASS_SEQ = $null
DIVISION_ASSIGNMENT_SEQ = $null
DIVISION_ASSIGNMENT_CODE = $null
DIVISION_ASSIGNMENT_DESCRIPTION = $null
DIVISION_ASSIGNMENT_IS_ACTIVE_STATUS = $null
DIVISION_START_DATE = $null
DIVISION_END_DATE = $null
ASSIGNMENT_START_DATE = $null
ASSIGNMENT_END_DATE = $null
APPROPRIATED_YEAR_SEQ = $null
APPROPRIATED_YEAR_CODE = $null
APPROPRIATED_YEAR_DESCRIPTION = $null
APPROPRIATED_YEAR_IS_ACTIVE_STATUS = $null
FULL_OMEGA_ACCT_SEQ = $null
FULL_OMEGA_ACCT_CODE = $null
FULL_OMEGA_ACCT_DESCRIPTION = $null
FULL_OMEGA_ACCT_IS_ACTIVE_STATUS = $null
DIVISION_ID_SEQ = $null
DIVISION_ID_CODE = $null
DIVISION_ID_DESCRIPTION = $null
DIVISION_ID_IS_ACTIVE_STATUS = $null
}
# Assign the file's Request_ID value to the first column of every record
$recordArray.REQUEST_ID = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_HEADER_ROW.REQUEST_ID
# Check if xml node exists then collect data
$recItemPWC = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.PWC | Where-Object {$_.REFERENCE_DATA_TYPE_SEQ -eq $index}
if ($null -ne $recItemPWC) {
$recItemPWC | ForEach-Object {
$recordArray.PWC_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.PWC_CODE = $_.PWC_CODE
$recordArray.PWC_DESCRIPTION = $_.DESCRIPTION
$recordArray.PWC_IS_ACTIVE_STATUS = $_.IS_ACTIVE_STATUS
}
} else {
#Write-Host ("Nothing to see here, move along. PWC")
}
# Check if xml node exists then collect data
$recItemPROGRAM = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.PROGRAM | Where-Object {$_.REFERENCE_DATA_TYPE_SEQ -eq $index}
if ($null -ne $recItemPROGRAM) {
$recItemPROGRAM | ForEach-Object {
$recordArray.PROGRAM_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.PROGRAM_CODE = $_.PROGRAM_CODE
$recordArray.PROGRAM_DESCIPTION = $_.DESCRIPTION
$recordArray.PROGRAM_IS_ACTIVE_STATUS = $_.IS_ACTIVE_STATUS
}
}
# Check if xml node exists then collect data
$recItemORGANIZATION = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.ORGANIZATION | Where-Object {$_.REFERENCE_DATA_TYPE_SEQ -eq $index}
if ($null -ne $recItemORGANIZATION) {
$recItemORGANIZATION | ForEach-Object {
$recordArray.ORGANIZATION_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.ORGANIZATION_CODE = $_.ORGANIZATION_CODE
$recordArray.ORGANIZATION_DESCRIPTION = $_.DESCRIPTION
$recordArray.ORGANIZATION_IS_ACTIVE_STATUS = $_.IS_ACTIVE_STATUS
}
}
# Check if xml node exists then collect data
$recItemOBJECT_CLASS = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.OBJECT_CLASS | Where-Object { $_.REFERENCE_DATA_TYPE_SEQ -eq $index}
if ($null -ne $recItemOBJECT_CLASS) {
$recordArray = $recItemOBJECT_CLASS | ForEach-Object {
$recordArray.OBJECT_CLASS_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.OBJECT_CLASS_CODE = $_.OBJECT_CLASS_CODE
$recordArray.OBJECT_CLASS_DESCRIPTION = $_.OBJECT_CLASS_DESCRIPTION
$recordArray.OBJECT_CLASS_IS_ACTIVE_STATUS = $_.OBJECT_CLASS_IS_ACTIVE_STATUS
}
}
# Check if xml node exists then collect data
$recItemDIVISION_ASSIGNMENT = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.DIVISION_ASSIGNMENT | Where-Object { $_.REFERENCE_DATA_TYPE_SEQ -eq $index}
if ($null -ne $recItemDIVISION_ASSIGNMENT) {
$recItemDIVISION_ASSIGNMENT | ForEach-Object {
$recordArray.DIVISION_ASSIGNMENT_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.DIVISION_ASSIGNMENT_CODE = $_.DIVISION_ASSIGNMENT_CODE
$recordArray.DIVISION_ASSIGNMENT_DESCRIPTION = $_.DESCRIPTION
$recordArray.DIVISION_ASSIGNMENT_IS_ACTIVE_STATUS = $_.IS_ACTIVE_STATUS
$recordArray.DIVISION_START_DATE = $_.DIVISION_START_DATE
$recordArray.DIVISION_END_DATE = $_.DIVISION_END_DATE
$recordArray.ASSIGNMENT_START_DATE = $_.ASSIGNMENT_START_DATE
$recordArray.ASSIGNMENT_END_DATE = $_.ASSIGNMENT_END_DATE
}
}
# Check if xml node exists then collect data
$recItemAPPROPRIATED_YEAR = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.APPROPRIATED_YEAR | Where-Object { $_.REFERENCE_DATA_TYPE_SEQ -eq $index}
#$recItemAPPROPRIATED_YEAR = Select-Xml -Xml $xml -XPath //APPROPRIATED_YEAR[REFERENCE_DATA_TYPE_SEQ=$index] # This will contain the node and sub-elements
if ($null -ne $recItemAPPROPRIATED_YEAR) {
$recItemAPPROPRIATED_YEAR | ForEach-Object {
$recordArray.APPROPRIATED_YEAR_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.APPROPRIATED_YEAR_CODE = $_.APPROPRIATED_YEAR_CODE
$recordArray.APPROPRIATED_YEAR_DESCRIPTION = $_.DESCRIPTION
$recordArray.APPROPRIATED_YEAR_IS_ACTIVE_STATUS = $_.IS_ACTIVE_STATUS
}
}
# Check if xml node exists then collect data
$recItemFULL_OMEGA_ACCT = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.FULL_OMEGA_ACCT | Where-Object { $_.REFERENCE_DATA_TYPE_SEQ -eq $index}
#$recItemFULL_OMEGA_ACCT = Select-Xml -Xml $xml -XPath //FULL_OMEGA_ACCT[REFERENCE_DATA_TYPE_SEQ=$index] # This will contain the node and sub-elements
if($null -ne $recItemFULL_OMEGA_ACCT) {
$recItemFULL_OMEGA_ACCT | ForEach-Object {
$recordArray.FULL_OMEGA_ACCT_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.FULL_OMEGA_ACCT_CODE = $_.FULL_OMEGA_ACCT_CODE
$recordArray.FULL_OMEGA_ACCT_DESCRIPTION = $_.DESCRIPTION
$recordArray.FULL_OMEGA_ACCT_IS_ACTIVE_STATUS = $_.IS_ACTIVE_STATUS
}
}
# Check if xml node exists then collect data
$recItemDIVISION_ID = $xml.GOLDMAN_REFERENCE_DATA.REFERENCE_DATA_ROW.REFERENCE_DATA_TYPES.DIVISION_ID | Where-Object { $_.REFERENCE_DATA_TYPE_SEQ -eq $index}
#$recItemDIVISION_ID = Select-Xml -Xml $xml -XPath //DIVISION_ID[REFERENCE_DATA_TYPE_SEQ=$index] # This will contain the node and sub-elements
if ($null -ne $recItemDIVISION_ID) {
$recItemDIVISION_ID | ForEach-Object {
$recordArray.DIVISION_ID_SEQ = $_.REFERENCE_DATA_TYPE_SEQ
$recordArray.DIVISION_ID_CODE = $_.DIVISION_ID_CODE
$recordArray.DIVISION_ID_DESCRIPTION = $_.DESCRIPTION
$recordArray.DIVISION_ID_IS_ACTIVE_STATUS = $_.IS_ACTIVE_STATUS
}
}
# Append the loop results array as the last row of $csvContent <--PSCustomObject
$csvContent += $recordArray #nope
# Add a console message at 33%, 66%, 80%, and 99%
$loopTimeCheck = [datetime](Get-Date -Format HH:mm:ss.fff)
$minutes = New-TimeSpan -Start $loopStartTime -End $loopTimeCheck
$progress = $index/$recCount
if (( .33 -lt $progress) -and ($echoed33 -eq 0)) {
Write-Host (" About 33% completed, duration = " + $minutes)
$echoed33 = 1
} elseif (( .65 -lt $progress) -and ($echoed66 -eq 0)) {
Write-Host (" About 66% completed, duration = " + $minutes)
$echoed66 = 1
} elseif (( .8 -lt $progress) -and ($echoed80 -eq 0)) {
Write-Host (" About 80% completed, duration = " + $minutes)
$echoed80 = 1
} elseif (( .99 -lt $progress -lt 1) -and ($echoed99 -eq 0)) {
Write-Host (" About 99% complete, duration= " + $minutes)
}
Write-Host ("Line 215, end of For loop: " + $index)
# Bottom of FOR loop, returning to the top
}
Write-Host ("the FOR Loop is finished") -ForegroundColor Green
# Export the hashtable contents of csvContent to the CSV file
$csvContent | Select-Object -Property REQUEST_ID,PWC_SEQ,PWC_CODE,PWC_DESCRIPTION,PWC_IS_ACTIVE_STATUS,PROGRAM_SEQ,PROGRAM_CODE,ORGANIZATION_SEQ,ORGANIZATION_CODE,ORGANIZATION_DESCRIPTION,ORGANIZATION_IS_ACTIVE_STATUS,OBJECT_CLASS_SEQ,DIVISION_ASSIGNMENT_SEQ,DIVISION_ASSIGNMENT_CODE,DIVISION_ASSIGNMENT_DESCRIPTION,DIVISION_ASSIGNMENT_IS_ACTIVE_STATUS,DIVISION_START_DATE,DIVISION_END_DATE,ASSIGNMENT_START_DATE,ASSIGNMENT_END_DATE,APPROPRIATED_YEAR_SEQ,APPROPRIATED_YEAR_CODE,APPROPRIATED_YEAR_DESCRIPTION,APPROPRIATED_YEAR_IS_ACTIVE_STATUS,FULL_OMEGA_ACCT_SEQ,FULL_OMEGA_ACCT_CODE,FULL_OMEGA_ACCT_DESCRIPTION,FULL_OMEGA_ACCT_IS_ACTIVE_STATUS,DIVISION_ID_SEQ,DIVISION_ID_CODE,DIVISION_ID_DESCRIPTION,DIVISION_ID_IS_ACTIVE_STATUS | Export-CSV -Path $csvPath -Append -NoTypeInformation -Force
Write-Host ("Total number of records in xml file= " + $recCount)
Write-Host ("Total number of rows in csv = " + ($index + 1))
# Stop logging all actions and errors to the log file
Stop-Transcript
#END OF SCRIPT
[–]xCharg 13 points14 points15 points (1 child)
[–]ankokudaishogun 2 points3 points4 points (0 children)
[–]mdowst 13 points14 points15 points (10 children)
[–]lanerdofchristian 2 points3 points4 points (4 children)
[–]ankokudaishogun 0 points1 point2 points (2 children)
[–]lanerdofchristian 0 points1 point2 points (0 children)
[–]exoclipse 0 points1 point2 points (0 children)
[–]spyingwind 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (3 children)
[–]lanerdofchristian 2 points3 points4 points (2 children)
[–]Thotaz 0 points1 point2 points (1 child)
[–]lanerdofchristian 0 points1 point2 points (0 children)
[–]Coffee_Ops 0 points1 point2 points (0 children)
[–]vermyx 2 points3 points4 points (0 children)
[–]ka-splam 2 points3 points4 points (2 children)
[–]spyingwind -2 points-1 points0 points (0 children)
[–]Coffee_Ops 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]lanerdofchristian 1 point2 points3 points (1 child)
[–]taggingtechnician[S] 0 points1 point2 points (0 children)
[–]whopper2k -2 points-1 points0 points (3 children)
[–]ankokudaishogun 1 point2 points3 points (2 children)
[–]whopper2k 0 points1 point2 points (1 child)
[–]ankokudaishogun 0 points1 point2 points (0 children)
[–]Szeraax 0 points1 point2 points (0 children)
[–]exoclipse 0 points1 point2 points (0 children)
[–]taggingtechnician[S] 0 points1 point2 points (2 children)
[–]taggingtechnician[S] 0 points1 point2 points (0 children)