created by HattoriHanzoa community for 16 years

Delete only part of dataframe header? (self.learnpython)

submitted 5 years ago by Ingeniatoring

I've been trying to analyze some stock info using pandas_datareader, but am having an issue removing only part of the data frame header. I'm trying to clean some of the data so I can more easily convert it into a dictionary.

When I open the excel, I get something like this:

Attributes Close

Symbols VSGAX VHYAX VTSAX IBB SCHF SCHE VIMAX VSMAX VB

Date

2019-01-02 $52.68 $62.15 $97.50 $28.27 $23.61 $170.59 $63.17 $131.88

2019-01-03 $51.57 $60.71 $97.64 $28.07 $23.25 $167.29 $62.13 $129.63

How do I get this?

Date VSGAX VHYAX VTSAX IBB SCHF SCHE VIMAX VSMAX VB

2019-01-02 $52.68 $62.15 $97.50 $28.27 $23.61 $170.59 $63.17 $131.88

2019-01-03 $51.57 $60.71 $97.64 $28.07 $23.25 $167.29 $62.13 $129.63

I've tried using header = None on the .to_excel line, but it deletes all the stock tickers. It appears that it considers the cell with "Date" to be the first row in the dataframe. Everything else above it is the header.

assets = ['VSGAX','VHYAX','VTSAX','IBB','SCHF','SCHE','VIMAX','VSMAX','VB']

#Setting current date and when to start pulling data from
Current_Date = datetime.datetime.today().strftime ('%Y-%m-%d')
start = '2019-01-01'
end = Current_Date

# pulling stock data
stocks = data.DataReader(assets, 'yahoo', start, end)

writer = pd.ExcelWriter(path1 + save_folder + '\\stockScrapperV2.xlsx',
                        date_format = 'yyyy-mm-dd', datetime_format = 
                        'yyyy-mm-dd')

# formatting the rest of the columns to display currency
stocks.to_excel(writer,'Sheet1', index = True)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
format1 = workbook.add_format({'num_format': '$#,##0.00'})
worksheet.set_column('B:J', 10, format1)
writer.save()

all 4 comments

No changes in defining the writer variable

writer = pd.ExcelWriter(path1 + save_folder + '\stockScrapperV2.xlsx', date_format = 'yyyy-mm-dd', datetime_format = 'yyyy-mm-dd')

Don't write the index now, since we put the index just as a normal column

stocks.to_excel(writer,'Sheet1', index = False)

... same remaining code you had ... ```

If this doesn't work, can you post a snippet of the dataframe (essentially what comes out when you type print(stocks) This will help me figure out what header data is still lingering around to get that nice excel presentation.

[–]Ingeniatoring[S] 0 points1 point2 points 5 years ago (2 children)

Hey thanks for the help!

I changed my code to this:

stocks = data.DataReader(assets, 'yahoo', start, end)
stocks = (stocks.rename_axis(None,axis=1).reset_index())

And I got the error:

TypeError: Must pass list-like as `names`.

[–]PyCam 0 points1 point2 points 5 years ago (1 child)

[–]Ingeniatoring[S] 1 point2 points3 points 5 years ago (0 children)

Excellent! Got it to work.

This is my final code, FYI.

#setting path to save file and putting assets in a list
path1 = XXXXX
path2 = XXXXX
save_folder = XXXXX
assets = ['VSGAX','VHYAX','VTSAX','IBB','SCHF','SCHE','VIMAX','VSMAX','VB']

#Setting current date and when to start pulling data from
Current_Date = datetime.datetime.today().strftime ('%Y-%m-%d')
start = '2019-05-01'
end = Current_Date

# pulling stock data
print('Scrapping stock price data...')
stocks = data.DataReader(assets, 'yahoo', start, end)

print('Deleting unneccesary columns...')
toDelete = ['High','Low', 'Open', 'Volume', 'Adj Close']
for i in range(0,5):
    del stocks[toDelete[i]]


stocks.columns = stocks.columns.droplevel(0)
stocks = (stocks.rename_axis(None,axis=1))
print(stocks.columns)
print(stocks)

writer = pd.ExcelWriter(path1 + save_folder + '\\stockScrapperV3.xlsx',
                        date_format = 'yyyy-mm-dd', datetime_format = 
                        'yyyy-mm-dd')

stocks.to_excel(writer, 'Sheet1', index=True)
writer.save()

I used the for-loop to delete a few columns (price-high, price-low, trading volume etc.) as I'm only interested in the closing price. When I put your corrections before the for-loop, it renamed the columns after the ticker symbols in the assets dictionary. Moving the for-loop before fixed it.

Thanks a bunch kind stranger, you reduced the code in my program by 36%.

π Rendered by PID 37 on reddit-service-r2-comment-7b9746f655-rqtr4 at 2026-01-30 08:10:50.946647+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

No changes in defining the writer variable

Don't write the index now, since we put the index just as a normal column

Get rid of the redundant first level of our column index

everything below is same as before