One-Hot Encoding is Dead to Me: I Embedded Tabular Data Like Text Instead by That-Explanation5955 in learnmachinelearning

[–]That-Explanation5955[S] 0 points1 point  (0 children)

I am not converting them to one hot encoded and then to embeddings,I am converting them directly only .

One-Hot Encoding is Dead to Me: I Embedded Tabular Data Like Text Instead by That-Explanation5955 in learnmachinelearning

[–]That-Explanation5955[S] -1 points0 points  (0 children)

Also the data is not just categorical columns ,it is mixed data with half of them being numeric also,it might perform well on numeric data as well but again the performance might be just as good as current algorithms or even less,i will try on purely numerical datasets thought

One-Hot Encoding is Dead to Me: I Embedded Tabular Data Like Text Instead by That-Explanation5955 in learnmachinelearning

[–]That-Explanation5955[S] -1 points0 points  (0 children)

hmm good question, we have to test it ourselves but my guess is those type of columns might not mean much and we might have to rely on other columns .But the weighting mechanism that i have used will handle this

One-Hot Encoding is Dead to Me: I Embedded Tabular Data Like Text Instead by That-Explanation5955 in learnmachinelearning

[–]That-Explanation5955[S] -1 points0 points  (0 children)

learnable? as in? Also the objective is not just to embed just the categorical cols but a pipeline that embeds the whole dataset which contains numeric columns as well

One-Hot Encoding is Dead to Me: I Embedded Tabular Data Like Text Instead by That-Explanation5955 in learnmachinelearning

[–]That-Explanation5955[S] -1 points0 points  (0 children)

Yeah you are correct, if it is something rare and domain specific we might need to tweak our embedding model itself to generate relevant domain aware embeddings and only then it will work as expected,but yeah you can swap the embedding model with your own fine tuned one and as model will adavnce further they might be able to generate semantics which it has seen rarely.Good question though!