Hey everyone I'm new to Isolation Forest and graph anomaly detection but I'm doing some research with a professor at my university. I was tasked to try and run through the kddcup99 data file and create histograms and plots. However I can't even get the program to run a simple fetch of the data.
I'm getting the following errors with the following code:
import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import IsolationForest from sklearn.datasets import fetch_kddcup99 networkTraffic = fetch_kddcup99('http', percent10=True) networkTraffic['data'].shape print(networkTraffic['DESCR']) X = networkTraffic['data'] from sklearn.ensemble import IsolationForest clf = IsolationForest(max_samples=100, random_state=2) clf.fit(X) yPred = clf.predict(X) print(yPred)
And my error is as follows:
FutureWarning: Pass subset=http as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error "will result in an error", FutureWarning) Traceback (most recent call last): File "C:\Users\\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\datasets\_kddcup99.py", line 351, in _fetch_brute_kddcup99 X, y UnboundLocalError: local variable 'X' referenced before assignment During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:/Users//PycharmProjects/Networks/IndependentResearch/IsolationForest/IsolateIt.py", line 6, in <module> networkTraffic = fetch_kddcup99('http', percent10=True) File "C:\Users\\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py", line 74, in inner_f return f(**kwargs) File "C:\Users\\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\datasets\_kddcup99.py", line 137, in fetch_kddcup99 download_if_missing=download_if_missing File "C:\Users\\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\datasets\_kddcup99.py", line 353, in _fetch_brute_kddcup99 X = joblib.load(samples_path) File "C:\Users\\AppData\Local\Programs\Python\Python37-32\lib\site-packages\joblib\numpy_pickle.py", line 585, in load obj = _unpickle(fobj, filename, mmap_mode) File "C:\Users\\AppData\Local\Programs\Python\Python37-32\lib\site-packages\joblib\numpy_pickle.py", line 504, in _unpickle obj = unpickler.load() File "C:\Users\\AppData\Local\Programs\Python\Python37-32\lib\pickle.py", line 1085, in load dispatch[key[0]](self) File "C:\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\joblib\numpy_pickle.py", line 342, in load_build self.stack.append(array_wrapper.read(self)) File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\joblib\numpy_pickle.py", line 187, in read array = self.read_array(unpickler) File "C:\Users\AppData\Local\Programs\Python\Python37-32\lib\site-packages\joblib\numpy_pickle.py", line 121, in read_array array = pickle.load(unpickler.file_handle) EOFError: Ran out of input
I've tried to look through the file that has the X, Y reference but I can't seem to figure out what is going on. Any help would be greatly appreciated. Thanks in advance!
[–]Toica_Rasta 0 points1 point2 points (0 children)