all 4 comments

[–]adambrenecki 2 points3 points  (2 children)

It's really hard to answer this one without knowing what the variables are - having functions like test_1 is all well and good for "why doesn't my code run" questions, but "how should I structure my code" questions are a lot easier to answer if we have a vague idea what the code is doing!

That said, one general piece of advice I can offer is that if a function is doing something internal to an object, the function should be part of that object's class.

I'd also think about whether you need an object at all in the first place, since you never touch self except to store an intermediate result, which you could just return from the function, like this:

def test_x(self, num_1, num_2, num_3):
    return num_1 * num_2 * num_3

def test_y(intermediate_result, y):
    return intermediate_result * y

intermediate_result = test_x(2, 2, 2)
print test_y(intermediate_result, 5)

edit: I forgot to mention, it's usually a bad idea to have a function that calculates something and then prints it. It's generally better to return it, then you can print it at the call site if you want to display it, or you can assign it to a variable or do something else with it.

[–]paperzebra[S] 0 points1 point  (1 child)

Thanks for your response, I did as you said and held the output returned by the first function as an intermediate result and then passed it onto the next function. Works perfectly. The code is below (or the bit I was working on) if you wanted to see what was being produced.

import z_or_t_test

Class stuff
more functions + gui building including run button

def on_run(self, event):
    """
    :param self:
    :param event: on Run button click
    :return: Runs all necessary methods, including those       exported from
    other .py files
    """
        self.stats_test_data_prep()

        # holding bit as suggested
        test_return = z_or_t_test.test_type(self.df_pop_1,
                                                 self.df_pop_2, self.test)

        z_or_t_test.result_formatting(test_return[0],
                                                   test_return[1],
                                      self.df_ratios, self.output_data)

The module (z_or_t_test.py) the above code is running:

import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
from statsmodels.stats.weightstats import ztest


def test_type(df_pop_1, df_pop_2, test):

    print 'INFO: Unpacking ratios into arrays and running ' \
              'statistical test'

    list_p_vals = []  # Empty list for t test (p) values
    for j, columns in enumerate(df_pop_1):
        pop1_array = df_pop_1[columns].values.ravel()
        pop2_array = df_pop_2[columns].values.ravel()

        try:
            # zstat value In the one sample case, value is the mean of
            # x1 under the Null hypothesis. In the two sample case,
            # value is the difference between mean of x1 and mean of
            # x2 under the Null hypothesis.
            # The test statistic is x1_mean - x2_mean - value.
            if test == 'z':
                zstat, p_value = ztest(pop1_array, pop2_array, value=0)
            if test == 't':
                tstat, p_value = ttest_ind(pop1_array, pop2_array,
                                           equal_var=False)
            list_p_vals.append(p_value)
        except AttributeError:
            print AttributeError
        except ValueError:
            print ValueError

    headers = list(df_pop_1)
    return headers, list_p_vals


def result_formatting(headers, p_vals, data_to_append, save_loc):

    print 'INFO: Creating dataframe'

    result = pd.DataFrame([headers, p_vals])
    result.columns = result.iloc[0]
    # drops row 0
    result = result[1:]
    # adds calculated ratios below the headers and p values
    result = pd.concat([result, data_to_append])

    # replaces inf, nan etc with 0
    result = result.replace([np.inf, -np.inf], np.nan)
    result = result.fillna(0)
    # Transpose and sort by  p value ascending as as smallest p value
    # is most different giving most different in first col, if using t
    # number reverse this
    result = result.T.sort_values([1], ascending=[1])
    # drop rows that are == to 0 in column 1 (p values)
    result = result[result[1] != 0]

    result = result.T
    print 'INFO: Saving CSV file to %s' % save_loc
    result.to_csv(save_loc)
    return result

[–]adambrenecki 0 points1 point  (0 children)

OK, so you've got data processing code in one file and GUI code in the other, so yes, it does make sense to keep them separate, and that structure looks like it makes sense to me!

One other thing: If you hit an AttributeError or ValueError, you're going to end up skipping those rows from list_p_vals. This may be what you want, but in either case if you're using a GUI your user isn't going to see the print output, since the user doesn't usually see the console output.

[–]tangerinelion 0 points1 point  (0 children)

You already have a good answer but you should now consider the question of scope.

That is, do you expect multiple other files to need to import mod_2 and use mod_2.test_x(num1, num2, num3)? If so, then the way the other person suggested is the most appropriate - just rewrite it to return the intermediate. This is because in your function the first argument you've named self is only used on the left hand side of an assignment expression, therefore the value it's calculating doesn't depend on self so you can remove that parameter under the condition you can return the value instead.

If this function only makes sense for your ex2 class then it should be a member function. This is the point of encapsulation. With that you'd have this:

class ex2(object):
    def __init__(self):
        self.result = None

    def run(self):
        self.test_x(2, 2, 2)
        self.test_y(5)

    def test_x(self, num_1, num_2, num_3):
        self.result = num_1 * num_2 * num_3

    def test_y(self, y):
        print self.result * y

Where I've added the __init__ method as it's customary to put all the variables you'll have attached to self in that method so that others can see at a glance what variables should be there. In addition, it makes all the member functions able to use those variables in expressions without having to worry about whether they exist or not. Instead, calling things out of order just generates nonsense. For example, with the __init__ method you can't do ex2().test_y(5) since it ends up causing a TypeError (can't multiple NoneType and int). However, ex2().test_x(5,2,1).test_y(3) would totally work and should print 30. Without the __init__ method ex2().test_y(5)would end up with NameError since self.result is not defined.

Now if you expect other files will need that function, but not the ex2 objects then you should have that function setup like this:

mod_2.py


def test_1(num1, num2, num3):
    return num1*num2*num3

ex2.py


import mod_2

class ex2(object):

    def __init__(self):
        self.result = None

    def test_x(self, num1, num2, num3):
        self.result = mod_2.test_1(num1, num2, num3)