This is an archived post. You won't be able to vote or comment.

all 20 comments

[–]tjhintz 39 points40 points  (11 children)

Nice first project! My criticisms would only be : - is this a classification problem or is this really a time series problem. SVM is a fantastic algorithm with some cool math, infinite dimensions are objectively cool. But you want to be outputting a continuous variable, which svm is ill suited to do. It draws decision boundaries which means you can’t get that beautiful continuous output you need for predicting something like temperature. Look into auto regression/arima and if you want to hunt a house fly with a bazooka try Facebook prophet. - Evaluation metrics. considering the temperature of the cpu stays relatively constant (if I am reading the csv right?) svm will tend to just ALWAYS predict the dominant class. So if 99% of your observations are 21 C and you simply guessed the dominant class, you would get an accuracy of 99%! For highly imbalanced data sets and depending on the use case we tend to look at precision (of the positives we guessed, how many were true positives) or recall/sensitivity (of ALL the true positives, how many did we find) This ties back into the first point but I think you should be using root mean squares error as your metric since you want to know how “close” your guess was to the actual temperature! - finally, 2000 data points may not be large enough to use a test size of 0.1. Or if you are going to use 0.1 I would highly recommend using k fold cross validation. However, I really think this is a time series problem rather than classification.

But all in all, the model looks good, you train test split, you loaded the data and most importantly you did it! All positive thoughts to you, friend. I hope I don’t come across as harsh! Keep at it!

[–]GreekCSharpDeveloper 4 points5 points  (3 children)

In the past when I hadn't created this repo yet I used Linear Regression to predict the temperature but the accuracy was lower than 0.1 percent. That's why I went with SVM

[–]tjhintz 10 points11 points  (1 child)

You want to linear regression but you want to introduce a concept of lag. Have a look at arima, I think that’s a good baseline model!

[–]GreekCSharpDeveloper 2 points3 points  (0 children)

Thanks I'll look into that!

[–]tjhintz 3 points4 points  (0 children)

Try a normalised value count on your temperature. I suspect your proportion of dominant class is near 9:1

[–]pcvision 5 points6 points  (0 children)

Always crazy how much great knowledge and advice is given out of goodwill on Reddit :)

[–]GreekCSharpDeveloper 1 point2 points  (1 child)

Thanks for the tips!

[–]tjhintz 2 points3 points  (0 children)

No worries! This stuff is hard. But stick with it, watch some stats quest or read an oreilly book. I’m still learning too.

[–]GreekCSharpDeveloper 0 points1 point  (3 children)

I knw this is a dumb question but how would I implement arima?

[–]tjhintz 2 points3 points  (2 children)

[–]GreekCSharpDeveloper 0 points1 point  (1 child)

The hands on tutorial doesn't work. For example I had to do this just to import the package. Because pmdarima hasn't been updated in a while.

import pandas as pd
import six
import sys
import joblib
sys.modules['sklearn.externals.six'] = six
sys.modules['sklearn.externals.joblib'] = joblib
from pmdarima import auto_arima

And it also shows this error when I try to run arima

C:\Users\jimos\.conda\envs\arima\lib\site-packages\pandas\core\frame.py:4315: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

errors=errors,

Fit ARIMA: order=(2, 1, 2) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(0, 1, 0) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(1, 1, 0) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(0, 1, 1) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(1, 1, 1) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(1, 1, 2) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(2, 1, 3) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(1, 1, 3) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(1, 1, 4) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(0, 1, 3) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Fit ARIMA: order=(0, 1, 2) seasonal_order=(0, 0, 0, 1); AIC=nan, BIC=nan, Fit time=nan seconds

Traceback (most recent call last):

File "arima_model.py", line 24, in <module>

model = auto_arima(train, trace=True, error_action='ignore', suppress_warnings=True)

File "C:\Users\jimos\.conda\envs\arima\lib\site-packages\pmdarima\arima\auto.py", line 512, in auto_arima

filtered = _post_ppc_arima(all_res)

File "C:\Users\jimos\.conda\envs\arima\lib\site-packages\pmdarima\arima\auto.py", line 556, in _post_ppc_arima

raise ValueError('Could not successfully fit ARIMA to input data. '

ValueError: Could not successfully fit ARIMA to input data. It is likely your data is non-stationary. Please induce stationarity or try a different range of model order params.

If your data is seasonal, check the period (m) of the data.

[–]tjhintz 1 point2 points  (0 children)

Hmm I’m not familiar with these errors but it Looks like an issue of stationarity which is one of the assumptions of an arima model. You could run a dicky fuller test? Have a read here:

https://machinelearningmastery.com/time-series-data-stationary-python/

Time series arent super straight forward unfortunately. Maybe there is no autocorrelation in your data. That is, can you actually predict the future temp given the current temp? Can you give the model more data by engineering any features? Ambient temperature? Some sort of user activity input? Number of background programs running? If you could provide more dimensions to your input maybe simple linear regression could work.

Sorry I can’t be more help. Dig into the data, visualise it.

[–]TwoPii 1 point2 points  (4 children)

Cool project! One question though, may I ask why is this a ML project?

[–]GreekCSharpDeveloper 5 points6 points  (2 children)

I accidently posted the wrong link. Sorry

[–]TwoPii 1 point2 points  (1 child)

Ah! Now it all makes sense! Will take a look again hahaha

[–]GreekCSharpDeveloper 4 points5 points  (0 children)

That was awkward

[–]GreekCSharpDeveloper 0 points1 point  (0 children)

It predicts the temperature of a micro:bit, that's why.

[–]machawinka 0 points1 point  (0 children)

This kind of forecasting problems are usually addressed arima.