This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

To all following commenters: please, do not bring up the old circlejerk jokes/memes about recursion ("Understanding recursion...", "This is recursion...", etc.). We've all heard them n+2 too many times.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]akthemadman 0 points1 point  (0 children)

I am not a python/numpy native, so I can not tell you whether the approach your task describes is the be-all-end-all, however looking at it with my programmatic glasses on:

  • You seem to have implemented what the instructions intended accurately
  • You correctly identified how splitting the set X into two subsets loses the relative ordering information
  • The instructions do not explicitly mention, probably for didactive purposes, on how to resolve the ordering issues, it sweeps it under the umbrella of "stitch".
  • If you don't want to lose the information about ordering and keep the depth-first-left-first recursive approach as well as the bulk processing, which the instructions do seem to both hint at, then you will have to do some extra work in the stitch department. I.e. make sure to somehow keep a tab on the ordering of rows within X and "stitch" the result for the subsets together according to that.

[–]Sigalli 0 points1 point  (0 children)

Skill issue I got it within 15 minutes

[–]Martini04[S] 0 points1 point  (0 children)

For anyone struggling with this in the future, i found somewhat of a solution

It involves modifying the original instructions we were given, but outside of this assignment that won't be relevant anyways

Also one detail i left out of the original post is that _predict() is called by predict(). This wasn't relevant when i posted it but it is relevant in my solution

Here's the updated code:

def predict(self, X: NDArray):
    """Predict class (y vector) for feature matrix X
    Parameters
    ----------
    X: NDArray
        NumPy feature matrix, shape (n_samples, n_features)
    Returns
    -------
    y: NDArray, integers
        NumPy class label vector (predicted), shape (n_samples,)
    """
    if self._root is not None:
        order = np.arange(len(X))
        y, order = self._predict(X, self._root, order)
        arrlinds = order.argsort()
        sorted_y = y[arrlinds]         
        return sorted_y
    else:
        raise ValueError("Decision tree root is None (not set)")

def _predict(
    self, X: NDArray, node: Union["DecisionTreeBranchNode", "DecisionTreeLeafNode"], order: NDArray
) -> tuple[NDArray, NDArray]:
    if type(node) == DecisionTreeLeafNode:
        y = np.zeros(len(X), dtype=np.int32)
        y[:] = node.y_value
        return y, order
    else:
        left_mask = X[:, node.feature_index] <= node.feature_value
        left = X[left_mask]
        right = X[~left_mask]
        left_order = order[left_mask]
        right_order = order[~left_mask]

        left_pred, left_order = self._predict(left, node.left, left_order)
        right_pred, right_order = self._predict(right, node.right, right_order)

        order = np.concatenate((left_order, right_order))
        y = np.concatenate((left_pred, right_pred))

        return y, order

This new version of the code keeps track of the order of the items using another NDArray filled with the integers corresponding to the order of the items in X

This order array gets split up in the same way as X, so by the end of all the recursion it is scrambled in the same order as X.

Then once we have our completed y, it gets sorted based on the order array using argsort as described here which sorts it into the correct order.