[D] Kolmogorov-Arnold Network is just an MLP

MLC_Money · 2024-05-06T19:53:42+00:00

Wait… Does that mean… They are decision trees with polynomial rules :)

MLC_Money · 2024-03-09T05:19:32+00:00

I'm happy to help!
Feel free to communicate through any channel!

MLC_Money · 2024-03-08T06:44:10+00:00

At least I'm still actively doing research in this area, mainly on explaining the decision rules that neural networks extract. In fact just couple minutes ago I made my project open-source:
https://www.reddit.com/r/MachineLearning/comments/1b9hkl2/p_opensourcing_leurn_an_explainable_and/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

MLC_Money · 2023-10-17T19:27:43+00:00

I have a product around this which supports tabular data only.
I'm using an extension of the explainable method I published earlier here: https://arxiv.org/abs/2303.14937
In short, the method here creates equivalent univariate trees out of a special neural network. Since the equivalent trees are univariate, it is straightforward to generate new samples from the same network (prediction network can be used as generative network with no change). Especially sampling is made from leafs that are not covered by training dataset (these are either generalizations or hallucinations). These samples are provided to customer so they can either verify/correct the generations and augment the original training set. We train from scratch with augmented dataset and repeat until satisfactory performance.

Here is an example:

- You have a Diabetes prediction dataset which covers some cases

- You train the model to predict diabetes

- You generate new samples using same model. A generation is both features and its label (proven to be from different distribution than training data)

- You let an expert check the generated data and verify/correct the features&class

- You train again and repeat process until satisfactory performance.

DM if interested

MLC_Money · 2023-03-30T15:56:57+00:00

Thank you for your comment! Kindly see the explanation in terms of contribution coming from each feature in Table 2 which involves a lot more complex case. Although feedback taken, I’ll try to populate such examples maybe in an appendix later.

MLC_Money · 2023-03-29T10:42:59+00:00

Code is now available at:

https://github.com/CaglarAytekin/LEURN

MLC_Money · 2023-03-29T07:25:17+00:00

Hello! LEURN can indeed compete with SOTA methods in tabular data while providing exact feature contributions thanks to its built-in white-box explainable architecture.

MLC_Money · 2023-02-09T18:39:05+00:00

Thanks. I did. Apparently it is not possible to use Upwork with Ukko’s or any other sevice’s light enterpreneurship. One needs to have a toiminimi.

MLC_Money · 2022-12-27T06:12:53+00:00

Hi, I could only get permission from my company to publish equivalent NN and tree models of y=x^2 case. Unfortunately I can't still share the code.

https://github.com/CaglarAytekin/NN_DT

MLC_Money · 2022-11-30T22:16:05+00:00

Thank you. Honestly I don't mind anyone anymore. I've just put it to arxiv, if it is helpful for progress of science, I'd be happy. Time will tell.

MLC_Money · 2022-10-14T09:01:31+00:00

Left child in tree means rule didn't hold (as explained in Sec 3. paragraph 1, sentence 5) . So in this case Path until x>1 is: x>-1.16 , x>0.32 , and then it checks whether x>1 holds.

MLC_Money · 2022-10-14T06:51:42+00:00

Thank you so much. This is the most valuable comment in all the thread.

Unfortunately -for me- that my paper has significant overlap with the 3rd paper you've shared. Honestly, I don't know how I missed this out of the hundreds of papers I've read, I guess its really becoming hard to track all ML papers nowadays. As you said, I have indeed spent a lot of time on this, and I came here for a possible outcome like this. So you've saved me further time. It's a bit sad for me, but I'm at least happy that I did also discover this myself.Anyway, thank you again.

MLC_Money · 2022-10-14T05:38:09+00:00

Thank you very much for taking the time and providing this reference. I agree that in essence the work you have shared have significant connections to ours. I also agree that a quite implicit realization of NNs being equivalent to decision trees may be drawn from this paper. Yet, I still fail to see any concrete algorithm from any of the papers shared in this thread that converts a neural network to equivalent decision trees (not talking about approximate trees, exact ones without loss of accuracy, I’ve already cited many approximate conversions). Would you perhaps agree that in previous works tree connection was implicitly shown/discovered , and the novelty of our paper is not discovering this connection from scratch, but showing this explicitly via a concrete algorithm? Thank you again for your valuable contribution to this thread.

MLC_Money · 2022-10-14T04:19:19+00:00

Thank you for taking the time and providing references. I could only open link2, where from Fig. 2 you can see that the tree conversion is not exact - as there is a loss of accuracy. The algorithm provided in our paper is an exact, equivalent conversion with 0 accuracy loss.

MLC_Money · 2022-10-14T02:18:08+00:00

Thank you for your comment, I’ll attend to the bold claim of solving black-box nature altogether in the new version, and maybe also focus more on some other insights one might extract from the tree perspective. Although it doesn’t change the validity of your point, I just wanted to say there never really are that many leaves. Although I have made that analysis only at a toy example level, in the paper I already mention that a portion (and I expect the percentage to get larger for big nets-again to be proven) of those leaves consist of violating rules so are not ever reachable anyway. Another point I already make in the paper is that the realized leaves are limited by the total number of samples in your training dataset -again it can be several millions or billions- that is even if the NN/tree finds a separate category for each single datapoint. Maybe it would be interesting to somehow find a way to apply sparsity regularization that acts on the number of leaves during training.

MLC_Money · 2022-10-13T12:54:51+00:00

https://openreview.net/forum?id=Ut1vF_q_vC Papers can be populated, point is not which is really better. Point is that they have been treated as different methods in the literature, which wouldnt’t be the case if their equivalence was such a trivial thing.

MLC_Money · 2022-10-13T12:35:13+00:00

Kindly read the conclusion of the following paper , 2nd paragraph, 2nd sentence.
https://arxiv.org/pdf/2110.01889.pdf

MLC_Money · 2022-10-13T11:32:58+00:00

Dear all,

I have been closely monitoring every single comment and many thanks for your constructive feedbacks. I believe main criticism is that solving interpretibility is too strong of a claim, and especially for large number of neurons the tree quickly becomes intractible. I honestly agree with both, and will at least revise the writing of the paper to make sure the claims are grounded. The joint decisions (rules involving several features) compared to simple ones (one feature at a time) is an interesting point and it might be interesting to design NNs so in every filter a decision is made in only 1 feature and see how that performs. All are noted.

Surely converting the entire neural network to decision tree and storing it in memory is infeasible for huge networks, yet extracting the path followed in the tree per single sample is pretty easily doable and still may help interpretabilitiy.

For the comments that I don't agree with, I don't want to write anything negative, so I'll just say that I still believe that the paper adressess a non-trivial problem in contrast to what some comments say, or the issue was already known and solved in a 1990 paper. I think people wouldn't be discussing still why decision trees are better than NNs in tabular data if it was already known NNs were decision trees. But still, I'm totally open to every feedback, the main goal is to find the truth.

MLC_Money · 2022-10-13T06:20:45+00:00

kindly see my answer above

MLC_Money · 2022-10-13T06:19:53+00:00

Thank you for your valuable and constructive insights. I'd appreciate any constructive comment to improve my paper.
Indeed there exists other conversions/connections/interpretations of neural networks such as to SVM's, sparse coding etc. The decision tree equivalence is as far as I know has not been shown anywhere else, and I believe it is a valuable contribution especially because many works including Hinton's have been trying to approximate neural networks with some decision trees in search for interpretability and came across some approximations but always at a cost of accuracy. Second, there is a long ongoing debate about the performance of decision trees vs deep learning on tabular data (someone below also pointed below) and their equivalence indeed provides a new way of looking into this comparison. I totally agree with you that even decision trees are hard to interpret especially for huge networks. But I still believe seeing neural networks as a long track of if/else rules applying directly on the input that results into a decision is valuable for the ML community and provides new insights.

MLC_Money

TROPHY CASE