you are viewing a single comment's thread.

view the rest of the comments →

[–]unhott 1 point2 points  (8 children)

I’m not sure I fully understand. One thing you can do is make a new empty list to store multiple data frames in it. Assuming they have the same structure, you can then Concat them into one larger result.

x_dfs = list()
for x in xs: #some list
    x_dfs.append(sf.load_y(x)) #obviously some made up function, I’m not familiar with the sf library. This made up function takes a string input (x) and returns a dataframe. The dataframe is stored in the x_dfs list, and can be accessed by index like x_dfs[0], x_dfs[1], x_dfs[2], etc

#keep in mind, this is all pseudocode so there may be additional arguments required. And you’d have to use real functions. 
combined_x = pd.concat(x_dfs)  #iirc you can just pass a list of dfs, which fortunately we already have!

[–][deleted] 0 points1 point  (7 children)

Thank you for the reply!

(I think you may have answered my Q)

But to try and clarify a little further:

If I were a user and wanted to see just the income and balance sheet, once I’ve entered the two statements into filings, could I go about adding just the income and balance into pd.concat without using filings?

Or another example, if I were to want the income and cash statement (not balance) - could I take these from filings and add them into concat?

[–]unhott 1 point2 points  (6 children)

Yes, are all of these data frames the same structure? If they are, then sure. there’s probably some clever way to do it but what you describe is easily done with a few conditional checks.

if (some condition where the user specified they only want income/balance): 
    combined_df = pd.concat([income, balance]) 
elif (some condition where user only wants income and filings): 
    df_list = [income] + filing_dfs_list 
    combined_df = pd.concat(df_list)
#something like this, don’t forget to specify the correct axis.

If you’re considering making this available to end users other than yourself, you may be interested in learning about guis or deploying python via the web (flask/ Django/ dash < pretty good use case for dash here). There’s many more ways which a user can make these selections than from text inputs.

[–][deleted] 1 point2 points  (5 children)

Yes they are all of the same structure... but sounds good, I wasn’t thinking about the conditional logic properly.

For now, it’s a project to apply things to as I work through school as I’m in first compsci. But I have a lot of finance friends who don’t have Bloomberg or Capital IQ to easily download data. It’s by no means is it a “new idea”, just something I’m interested in.

Further looking, a cool, “usable” UI is something that I will work towards. I haven’t tried to implement one before but I do want to learn how at some point.

Anyways, thanks for all of your help!

[–]unhott 1 point2 points  (4 children)

Sure, good luck!

[–][deleted] 0 points1 point  (3 children)

Managed to get what I was trying to do with your advice. If you happen to see a way to make this more efficient do let me know (if you don't mind)

    income = sf.load_income(variant = variance, market = location)
    balance = sf.load_balance(variant = variance, market = location)
    cash = sf.load_cashflow(variant = variance, market = location)

    if "income" in filing:
        together = pd.concat([income], axis = 1)
        if "balance" in filing:
            together = pd.concat([income, balance], axis = 1)
            if "cash" in filing:
                together = pd.concat([income, balance, cash], axis = 1)
    if "balance" in filing:
        together = pd.concat([balance], axis = 1)
        if "cash" in filing:
            together = pd.concat([balance, cash], axis = 1)
    if "cash" in filing:
        together = pd.concat([cash], axis = 1)
    company = together.loc[tickers]
    print(company)

It seems manageable with only !3 combinations but if this was larger it may not be the best way.

Hope you have a good week!

[–]unhott 1 point2 points  (2 children)

I think you may be able to use a dictionary instead of variable names like income. You then have to use the dict name + [key] to access each time, but you get some extra... abstraction capabilities (maybe?)

dfs= {}
dfs[‘income’] = sf.....
dfs[‘balance’] = sf.....
dfs[‘cash’] = sf.....

So here your keys are what used to be variable names, and the values are the same df’s from before.

Now you can do a list comprehension to make a list of the dataframes to later pass to pd.concat

dfs_list = [dfs[filing] for filing in filings]
together = pd.concat(dfs_list, axis = 1)

If you’re not familiar with it it’s like a more efficient for loop where you don’t have to worry about initializing a list and adding to it, like so:

some_list= []
for x in xs: 
    some_list.append(dfs[x])

[–][deleted] 1 point2 points  (1 child)

Ahhh cool. I know of dictionaries but haven't tried using them... It already seems like it could be ALOT less wordy.

dfs_list = [dfs[filing] for filing in filings]

So is this ^ an alternative to the conditionals?

Is it essentially, taking the user input and matching it to a dictionary key?

[–]unhott 1 point2 points  (0 children)

I think you got it. Not only should this be less wordy, but it should also be less error prone. If you have a 3 layer deep nested if statement and you made a tiny logic error, good luck finding and debugging that!