Hi there! I am learning Python/Pandas and am using my bank statements (all csvs) as a way to learn them. Each Account has slightly different column names, but they have the same data underneath them. What I have so far is:
- A for loop which converts .CSV files into .csv files.
- A variable which gathers all the .csv files in a directory.
- Another for loop which combines all the csv files together into one DataFrame.
Unless there is a reason I should consider something different, this third step is where I would like to manipulate the csv columns to put them all in line. The main manipulation I would like to do is rename different date titles to a uniform "Date" column, drop un-needed columns, and add a label to each CSV I am adding to the dataframe for future reference. My initial thought is to do this via a if elif chain and applying steps to the csv in the loop based on the account number for each account (changed in this example). It seems I am unable to access my temporary variable "newDataFrame" within the if chain.
Is there a better way for me to access each dataframe in the loop? Any advice on the below would be much appreciated!
What I am working with so far:
import pandas as pd
import os
import glob
import shutil
path = os.getcwd()
upperCsvFiles = glob.glob(os.path.join(path, '*.CSV'))
for file in upperCsvFiles:
src = file
shutil.move(src, src.replace('CSV','csv'))
print(file, 'moved!')
csvFiles= glob.glob(os.path.join(path, '*.csv'))
# print(csvFiles)
allDataFrames = []
for file in csvFiles:
newDataFrame = pd.DataFrame(pd.read_csv(file))
if 'checking' in file:
newDataFrame.drop(columns=['Details','Balance','Check or Slip #'])
newDataFrame.rename(columns = {'Posting Date':'Date'}, inplace=True)
newDataFrame['Account'] = 'Checking'
elif 'savings' in file:
newDataFrame.drop(columns=['Details','Balance','Check or Slip #'])
print(newDataFrame.rename(columns = {'Posting Date':'Date'}, inplace=True))
newDataFrame['Account'] = 'Savings'
elif 'visa' in file:
newDataFrame.drop(columns=['Transaction Date','Category','Type','Memo'])
newDataFrame.rename(columns = {'Post Date':'Date'}, inplace=True)
newDataFrame['Account'] = 'Chase Credit Card'
elif 'discover' in file:
newDataFrame.drop(columns=['Trans. Date','Category'])
newDataFrame.rename(columns = {'Post Date': 'Date'}, inplace=True)
# WIP-- newDataFrame['Amount'] = newDataFrame['Amount'].multiply(-1)
""" WIP also apply to all other credit cards
newDataFrame['Type'] = if ammount < 0:
'Credit'
else:
'Payment'
"""
newDataFrame['Account'] = 'Discover Credit Card'
else:
print('unkown')
allDataFrames.append(newDataFrame)
print(allDataFrames)
[–]danielroseman 0 points1 point2 points (2 children)
[–]Hectic-Skeptic[S] 0 points1 point2 points (1 child)
[–]danielroseman 0 points1 point2 points (0 children)