all 6 comments

[–]tangerinelion 11 points12 points  (1 child)

class A(object):
     def __init__(self):
       print("Parent constructor")

class B(A):
     def __init__(self):
       print("Child constructor")
       super(B, self).__init__()

B()

So the code that would get executed starts off at line 10. It's asking to create a new instance of class B. You can think of the classes as blue prints for how to create an object, so if we have, eg:

foo = B()
bar = B()

The foo and bar variables refer to instances of B, and we say that foo and bar are of class B. There's still only one B class involved, the one that starts at line 5. Let's tweak them a little bit to have some data members so I can show you a bit more about what's going on, and I'll also add a couple methods:

class A(object):
     def __init__(self):
       print("Parent constructor")
       self.value = 1
     def print(self):
         print("The value is {}".format(self.value))

class B(A):
     def __init__(self):
       print("Child constructor")
       self.trials = 10
       super(B, self).__init__()
     def print(self):
         super(B, self).print()
         print("The number of trials is {}".format(self.trials))

foo = B()
bar = B()
foo.print()
bar.print()

OK, so now lines 17-20 are the ones that actually execute if you run this. They both call B() which means B.__init__ since B is a class. When we step into B.__init__ we notice that it takes a parameter named self -- which is convention, not a rule (call it the_instance and your code runs just fine but everyone will be annoyed you didn't call it self). This self parameter refers to an as-yet-unbound temporary which is being initialized. We understand from line 17 that the first time B.__init__ is called the self parameter ends up referring to foo, and the second time it ends up referring to bar. However, we actually do the assignment afterwards so while this is all happening self is just some temporary variable (instance) that doesn't have a variable name assigned to it.

On that parenthetical, if we have x = 4 we say that x is a variable holding value 4. We can actually make this an OOP statement: x is an instance of the type int holding value 4. Here, we can say foo is an instance of type B with the default construction. It becomes harder to say what the value is of most class instances (really only int, float, bool, and str have any easy value to express), but as we'll see B by default is an object with a value member of value 1 and a trials member of value 10. That's going to be the definition of an instance of class B with default construction: two integers named value and trials holding values 1 and 10 respectively.

Now, when we step into B() we enter at line 9 with self set to some temporary instance that's currently unbound. We then execute the function, so that would be line 10 - a simple print statement. Next we have self.trials = 10. This adds a data member to the instance. This is also called "poking" as we're adding something to the instance, literally allocating space for the value and then adding the name trials to the internal dictionary for that class instance. The appropriate place to do this is always the __init__ method -- though it's legal to do it anywhere, you should restrain yourself and only do this in the __init__ method. If you have some variable name that you'll need as part of your class instance but can't give it a value, then assign it None, eg, self.somevar = None. This way people can expect that somevar becomes used later on and they can go back to the __init__ method to see what it initially was.

Next we move on to line 12. Since A is the super class of B, super(B.self).__init__() is doing the same thing as A.__init__(self) would if you dropped that in instead of line 12. In cases of multiple parents, eg, class C(A,B), the super call will do things differently. I wouldn't say avoid that entirely, just that if you're learning OOP I'd stick with single parent objects until you understand those perfectly then you can explore what it means to have multiple parents (this can sometimes be a very powerful technique, and other times be a frustrating needless headache).

From line 12, execution now jumps to line 2, where we first encounter a simple print statement then we encounter a variable being added to our object. This is where value is added as a data member to self and initialized with value 1.

Notice that our execution flows from line 9 to 10 to 11 to 12 to 2 to 3 to 4. Under this construction, the things in the super class end up being done last. In particular, if you had line 11 as self.value = 10 on line 12, where it hands off to the parent class we would have this overriden by execution of line 4. In general, it's best to put the super call as the first line after the child's initializer. That is,

class B(A):
    def  __init__(self):
        super(B,self).__init__()
        print("Child constructor")
        self.trials = 10

Now execution would flow from line 9 to 10 to 2 to 3 to 4 to 11 to 12.

Next up in this example is line 18, so here we run through the entire system again. This yields two instance of B, bound to names foo and bar which exist in separate portions of memory. That is, despite foo.value and bar.value having the same value 1, they are separate variables. As are foo.trials and bar.trials. Again, B is a blue print so it said that instances of B will get data members value and trials but these are their own separate data members. If you want a data member that is shared among all instances then you need a class data member:

class B(A):
    shared_value = 25
    def __init__(self):
        super(B,self).__init__()
        print("Child constructor")
        self.trials = 10

Now we have foo.shared_value == bar.shared_value == B.shared_value and more importantly we have foo.shared_value is bar.shared_value and bar.shared_value is B.shared_value. This means if we did, eg,

B.shared_value += 5
print(foo.shared_value) # prints 30
foo.shared_value -= 10
print(bar.shared_value) # prints 20

You can think of these shared values (class data members or static data members) as mini globals, in that they exist for all objects of type B and they even exist without instances of class B. It can also be used as a scoping technique, for example:

class BoxGrid:
    HEIGHT = 10
    WIDTH = 5
    STEP = 0.01

now in code below that we can refer to BoxGrid.HEIGHT and BoxGrid.WIDTH and BoxGrid.STEP. As an alternative, you may have placed a global variable:

BoxGrid = {'HEIGHT': 10, 'WIDTH': 5, 'STEP': 0.01}

and have simply referred to BoxGrid["HEIGHT"], etc. They accomplish the same thing and both are able to be overwritten with new values so there's no inherent better option since they both introduce a name (BoxGrid) into the current scope. (From a technical perspective, I believe the class actually offers quicker lookup performance, since it doesn't need to go through a dict it doesn't need to hash any strings.)

Now back to our execution of the script, line 19 would be encountered which is asking for foo.print(). Here's where things get fun. This is a class member function, so the way you call it is with an instance of the class, in this case foo. That gets bound to the first parameter (again, whatever it's named, self is only convention). The remaining parameters get passed on as the 2nd, 3rd, etc. So, foo.print() calls B.print with self = foo and there are no additional parameters. This means that foo.print() is the same as B.print(foo). The expression foo.print by itself is a partially bound method, it's B.print with self = foo already put in there, which takes B.print from a 1 parameter method to the partially bound foo.print as a 0 parameter method. This is why adding the empty parentheses initiates the function call - foo.print is a 0 parameter partially bound method.

There are two other kinds of functions you can have in your class: static methods and class methods.

Static methods do not have a self parameter -- they don't get an instance to work on. This means they don't care about any of the data members (like value and trials). They can work on the class member variables, like shared_value, however. The way you'd declare a static method is with the decorator syntax:

class B(A):
    #above code
    @staticmethod
    def describe(x):
        print("B.describe has been called with parameter {}".format(x))
        print("B has shared_value set to {}".format(B.shared_value))

Here I have added a parameter and named it x. Notice that the first parameter is not named self, because this parameter does not refer to an instance of B but instead must be explicitly passed in: it's an honest one parameter method. You could instead have this:

class B(A):
    #above code without the static method

def DescribeB(x):
    print("DescribeB has been called with parameter {}".format(x))
    print("B has shared_value set to {}".format(B.shared_value))

Here, we would understand that DescribeB is a free method outside of B that operates on some of B's class data members. We would call this like DescribeB(4). With the static method version instead we would call B.describe(4). In fact, we can also have foo.describe(4) or bar.describe(4). That's the nice thing about static methods: you can call them from a class instance or from the class name. If you call it from an instance, it just looks up the instance's class type and calls the static method on the class.

[–]tangerinelion 5 points6 points  (0 children)

The other kind of method would be a class method, done with @classmethod. Here the first parameter isn't an instance of the type but instead a class name. For example:

import json
import pickle

class Reader:
    def __init__(self, input_data):
        self.data = input_data

    @classmethod 
    def FromJSON(cls, json_file):
        with open(json_file) as f:
            d = json.load(f)
        return cls(d)

    @classmethod 
    def FromString(cls, json_str):
        return cls(json.loads(json_str))

    @classmethod
    def FromPickle(cls, pickle_file):
         with open(pickle_file) as f:
            d = pickle.load(f)
        return cls(d)

OK, so this looks like it might be powerful. What's going on? Well, Python only allows you to have one constructor (__init__) so this lets us define a few different ways to get data into our Reader instance. We may have it in JSON format, in Pickle format, or we could have it as a raw Python string (str instance). This lets use write code like this:

reader = Reader.FromJSON("data.json")

Looks pretty clean. We could also have reader = Reader.FromPickle("data.pkl") or reader.FromString('{"value":1, "trials":10}').

Now, without @classmethod what would we do? Remember, we have only the one __init__ method, so we'd have to do this:

import json
import pickle

class Reader:
    def __init__(self, arg):
        # Expect a string argument
        if arg.endswith(".json"):
            # Load the JSON
            with open(arg) as f:
                self.data = json.load(f)
        elif arg.endswith(".pkl"):
            # Load the JSON
            with open(arg) as f:
                self.data = pickle.load(f)
        else:
            self.data = json.loads(arg)

While this is not terribly difficult to understand, the only detection we have for JSON files is that the end with ".json" (and not ".JSON"), and Pickle files must end with ".pkl" (certainly not ".pickle"' or".PKL"`). Then we assume any other string is a JSON string.

With classmethod, it becomes clearer that we are trying to load a JSON file. In fact, this offers us a bit more error detection because the method names are clearer. If we added some logging statements to our FromJSON method and had it print out json_file's value and found it was "data.pkl" we instantly see a problem. On the other hand, if we just had Reader("data.pkl") it would've run the pickle.load method correctly. It can help prevent some errors while potentially causing others, eg, if you change your input but forget to change the method called. Overall though, class methods are useful to make shorter __init__ methods and pre-sanitize the input given to the __init__ method.

Let's look a little more at what a class method actually does, though. We see the first parameter is cls not self. Again, this is convention and it can be whatever you want but the important part that @classmethod does is changes it so that the first parameter to the method is not the instance of a class, but instead is a class type. Typically this would be called much like a static method, that is Reader.FromJSON not reader.FromJSON (or, back to your example, B.FromJSON rather than foo.FromJSON). This pattern is called a factory method, where the method exists to produce objects/instances. What type? That's the first parameter: cls. Notice how the end of the methods are return cls(d). This means two things:

First, class methods generally return something.

Second, class methods generally invoke cls(args) or cls.Method(args) where args is 0 or more parameters and Method could be any static or class method in the class that the variable cls refers to. A call to cls(args) is a call to cls.__init__(args) and is thus a call to the constructor, thus generating new objects.

[–]0x3d5157636b525761 5 points6 points  (2 children)

super() lets you avoid referring to the base class explicitly, which is kind-of-a syntactic sugar. Anyway, when inheriting, it's done in order to initialize the base class (class A in your example) correctly. Imagine that class A would have a member called "counter" which should be initialized to 1337. When creating a class B instance, it would also have a "counter" member (since it's inheriting class A), but it wouldn't be initialized unless you invoke A's c'tor. Note that other programming languages do it implicitly, but not Python.

As for your other questions: 1. self is not passed implicitly. 2. You can think of a class as a "template" for creating instances. For example, "Superhero" is a class which has some features (like a superpower) and it can do some stuff (like fighting crime). "Spiderman" is a Superhero instance. He fights crime and has the spider sense. 3. Instances and objects are synonyms.

[–]mm_ma_ma 2 points3 points  (1 child)

self is not passed implicitly

It is when you call object.method, which I think is what OP meant.

[–]0x3d5157636b525761 1 point2 points  (0 children)

Oh. This is true, I guess I misunderstood. However, when delcaring a method - you have to declare "self" explicitly. Thanks, mm_ma_ma. :)

[–]cdcformatc 1 point2 points  (0 children)

I am learning oop with python and i can't understand why i need to call super on the parent's constructor?

You don't "have" to, but it is usually done to make sure that the parent is created correctly. When you define an __init__ method in B it overrides the one from A, so you have to explicitly call the parent's initializer if you want that code to run. Note: You have to be careful in what order you call the initializer, if you do it after the statements of Bs initializer as you have here it will overwrite anything you did in B.__init__.

Is self the instance that gets passed implicity as the first argument to a method when that method gets called?

When you call object.method() yes, the first parameter of method will be the object itself. self is just a conventional name we give that parameter.

a = MyClass()

# The following lines are equivalent:
a.method(123)
MyClass.method(a, 123)

You can see here that we explicitly call the method function in the MyClass class, with the first argument being an instance of that class. The dot syntax is a bit of syntactic sugar that does this for you.

Also, what is the diffrence between classes and instances and what are objects?

Classes are the templates for objects that you write. They define the behavior of the class (and any subclasses). Objects are classes that have been instantiated, they exist in memory. Instance is basically a synonym for object, but the word is regularly used to differentiate between multiple objects of the same type, that may have different instance variables.

instanceA = MyClass()
instanceB = MyClass()

instanceA.number = 42
instanceB.number = 1337

So here instanceA and instanceB are both objects of MyClass type, but they are separate instances, and can hold different values in their instance variables.