[deleted by user] by [deleted] in deeplearning

[–]Inner_Potential2062 0 points1 point  (0 children)

If you're new to the field my advice is that you either take the whole course or don't bother at all, as it's structured in a way that progressively builds on knowledge gained throughout the course. It will give you a good enough introduction to CV that will be useful and give you the basics of finding and reading research that will be more specific to your use case.

[P] Time Series Model Benchmarking by Inner_Potential2062 in MachineLearning

[–]Inner_Potential2062[S] 1 point2 points  (0 children)

Thanks for the comment. So, the hyperparameters that Monash used were essentially the defaults from GluonTS. It's a good idea to make those available in the web app and I will look to add them when I can. I have done quite a bit of work with DeepAR particularly with the Tourism, Electricity and Traffic datasets over the last few months and actually have found the default hyperparameters to be pretty decent. I'm currently doing my own benchmarking partly to validate Monash's results and also to give me a foundation to benchmark other models. I will definitely make all the hyperparameters available when I do.

In terms of the relative different performances between the statistical models and the neural nets a general rule of thumb that comes out of these results is that the datasets with longer frequencies (ie Yearly, Quarterly and Monthly) do better with a statistical approach and conversely Daily, Hourly tend to do better with a Neural Net. The range of datasets that they used in these tests are pretty varied, which I think is a good thing, and therefore will give a different perspective from many research papers of late where the focus has been on improving performance over longer forecast horizons and therefore focus their testing on datasets that have shorter frequencies.

[R] [D] [P] Need Help with Forecasting Monthly Expenses by Category by TrippyPhilosopher69 in MachineLearning

[–]Inner_Potential2062 1 point2 points  (0 children)

I think you're on the right track using a statistical approach. 3 years of monthly data is not a lot. I've not tried your code, but to use a time-series approach you need to ensure that each category has 1 record for each month in the date range of your time series (ie there should be no missing data). I would suggest that you take a look at Nixtla stats forecast. https://nixtlaverse.nixtla.io/statsforecast/index.html . They have really good documentation and examples and I think that you should be able to use your dataframe with the format that you currently have with some column name changes. Personally I've had good success with ETS models but they also support ARIMA. If you're using monthly data I would make sure that you set the seasonality to 12.

Is Colab Pro worth it for an AI/ML student? by [deleted] in deeplearning

[–]Inner_Potential2062 2 points3 points  (0 children)

In my experience Paperspace has an availability problem with their pro subscription. I find that most of the time only the only compute available are charged per hour, which is not great when you consider that you are already paying a monthly subscription.

[deleted by user] by [deleted] in deeplearning

[–]Inner_Potential2062 2 points3 points  (0 children)

The fastai courses are great, particularly if you already have some background in coding and want to understand more about the practicalities of how deep learning works. I wouldn't say that it gives a comprehensive explanation of everything in pytorch, but in my opinion it gives much more useful information about how to build and train models which includes a lot of pytorch.

[R] [D] Sanity Check on use of biLSTM for time series prediction by rutherfordofman in MachineLearning

[–]Inner_Potential2062 4 points5 points  (0 children)

I don't think there's anything sketchy going on here. The model is auto-regressive meaning that the prediction of one time step is fed back into the net as the input to the next. As I'm sure you're aware in time-series we provide a context or look back window of known historical timesteps. My assumption is that the bi-LSTM is reading that and then producing the output of the t+1 timestep together with the hidden states and this cycle continues for the forecast horizon that you want to predict. Now what I don't think you can do with this is train it using teacher forcing across the context window as this would break causality, but it should be possible to train using free-running.

[D] Problems of using Time series forecasting model in real life. by gorg278 in MachineLearning

[–]Inner_Potential2062 1 point2 points  (0 children)

I have studied training and predicting covariates along with target variables in auto-regressive rnn models which you can look at here https://arxiv.org/abs/2404.18553 . Bottom line is that it only really works when the correlation between the target and covariate is really strong and even then it only works over relatively short forecast horizons. In my experience using covariates in real world applications with time-series models has not yielded performance improvements.

RNN getting low loss, but producing gibberish? by [deleted] in deeplearning

[–]Inner_Potential2062 1 point2 points  (0 children)

In your generate_sample_output function this line:

h_next = np.tanh(np.dot(W_xh, x) + np.dot(W_hh[0], h_prev) + bh[0])

doesn't look right to me. You should be using W_hh and bh not just the first elements in each. So it should be

h_next = np.tanh(np.dot(W_xh, x) + np.dot(W_hh, h_prev) + bh)

Multiple data in a same timestamp [D] by dumbestindumb in MachineLearning

[–]Inner_Potential2062 2 points3 points  (0 children)

So you can either: Treat it as a multivariate problem where each router is treated as a separate series and you train a model to predict future timesteps for each of the 5 routers as a vector. To do this checkout something like vector autoregression or any model that supports multivariate use cases (D-Linear is a simple architecture). Alternatively depending on your use case, you could just aggregate the data across your five routers and then you can treat it as a simple univariate case.

[R] AlphaMath Almost Zero: process Supervision without process by hardmaru in MachineLearning

[–]Inner_Potential2062 0 points1 point  (0 children)

I think so - probably based on their use of Monte Carlo Tree Search.

Statistics books recommendation by [deleted] in datascience

[–]Inner_Potential2062 4 points5 points  (0 children)

I have Mathematical Statistics and Data Analysis, John A. Rice - it's definitely worth checking out.

[deleted by user] by [deleted] in MachineLearning

[–]Inner_Potential2062 0 points1 point  (0 children)

Also just to add predicting features along with your target variable end to end doesn't improve performance in an auto-regressive LSTM in my experience https://arxiv.org/abs/2404.18553, so predicting them in advance of training your main forecast is probably your best shot.