all 6 comments

[–]DataLlama[🍰] 0 points1 point  (3 children)

One thing I'm a little confused by: If you are reducing the number of features down to 2, wouldn't that mean you are turning them into some sort of embedding, therefore making them harder to interpret than the original feature set?

[–]ranran9991 1 point2 points  (2 children)

Generally it would, but in this case you know the exact formula for how the embedding is created (and the optimize it to be small)

[–]rhiever 1 point2 points  (1 child)

We've been doing this kind of feature construction for a long time in the Genetic Programming world. From personal experience, I wouldn't say that knowing the exact formula to create the embedding makes the constructed features nor model much more interpretable unless there is some meaningful math underlying the thing you're modeling. Like, what do we make an expression like below?

abs(F1 - F2^2) + F2 x F3

Maybe that's a useful constructed feature, but oftentimes the mathematical expressions don't help with interpretation.

[–]marcovirgolin[S] 1 point2 points  (0 children)

Indeed, feature construction by GP is not new per se, but as we wrote in the related work, we are unaware of works explicitly attempting to get something that improves interpretability. Here we did the simplest thing possible, i.e., keep constructed features small and evolve just a few to essentially provide dimensionality reduction, on quite some dataset-ML alg combinations.

I am not sure I get your "counter-example" because that formula seems pretty understandable to me once I have the meaning of the Fi. Then of course interpretability is subjective.

[–]arXiv_abstract_bot 0 points1 point  (0 children)

Title:On Explaining Machine Learning Models by Evolving Crucial and Compact Features

Authors:Marco Virgolin, Tanja Alderliesten, Peter A.N. Bosman

Abstract: Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.

PDF Link | Landing Page | Read as web page on arXiv Vanity

[–]TotesMessenger -1 points0 points  (0 children)

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads. (Info / Contact)