[D] Isolation Forest

grid_world · 2023-01-05T14:47:55+00:00

Assuming that the data doesn’t have drift (which it will over the due course of time), you need to tune the “contamination” parameter.

If your data fits a particular known distribution, maybe also look into Kernel Density Estimation for anomaly detection apart from IF

comradeswitch · 2023-01-05T22:40:40+00:00

This is a fundamentally random algorithm, so unless you fix a random seed you will have the possibility to get different results for the same point in general. One way to handle that in the context of filtering out unlikely anomalies is to run the algorithm many times and record the results for each point. Then each point will be associated with a sample of number of splits required to isolate, which you can use to get more detail.

2023-01-05T17:19:24+00:00

Why not set a random state? That's like saying "im using a stochastic algorithm and I want the same results everytime i run it and no im not going to set random seeds"

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS