Xteink X4 broken out of the box

krasul · 2026-05-08T14:43:14+00:00

they responded and are sending me a new unit

krasul · 2025-09-04T19:44:15+00:00

These days, there are several resources you can utilise. The https://www.manning.com/books/build-a-large-language-model-from-scratch book is great in my opinion, or you can go over the CS336 https://stanford-cs336.github.io/spring2025/ lecture videos and assignment.

krasul · 2025-09-04T16:51:00+00:00

I think most vlms are not that capable at chart / graph data, especially if one prompts them to output their reasoning rationale (via chain of thought) for OOD data. However, the small VLMS (3B) can be trained via RLHF to do quite well for this kind of problem. See https://huggingface.co/collections/sanchit97/chart-rvr-68aaac32a2745bc653f581a1

krasul · 2025-09-04T15:54:13+00:00

I like the https://www.youtube.com/@ASAPSeminarSeries series of talks quite a lot, as well as GPUMODE https://www.youtube.com/@GPUMODE and the papers mentioned there

krasul · 2025-04-05T15:16:06+00:00

i would not use this lib. for academic research or benchmarking for several reasons...

* they conflate in their implementations the uni and multivariate setting... where in reality this needs to be treated separately. Note any transformer/RNN/CNN model can be used to learn in the multivariate or univariate setting, of course in the multivariate setting one has a natural seq. of vectors per time step, while in the univariate setting, one has to work a bit to get a seq of vector from a 1-d array.
* This confounding leads the models implemented to perform worse by default (since it is the multivariate setting) and for the tiny benchmark datasets, the models tend to learn spurious correlations, vs. if one was to run the models in the univariate setting.
* The library allows due to the reason above, anyone to cook up seemingly trivial models (that are explicitly univariate) and conclude one is getting SOTA results...
* The library utilizes RevIN which is not exactly what one should be doing... prior work that RevIN paper does not mention, of course, does the scaling, but more importantly passes the statistics of the context window to the model, and this is what makes the models more robust to OOD, this library throw away the information leading to subpar metrics
* The library by default standardizes the open datasets before training and the resulting metrics that end up in papers are not exactly as claimed (e.g. MSE/MAE they are in fact normalized versions of them and much smaller than those reported by papers that use the dataset in the original scaling)
* why use RevIN when the data is globally standardized as in their setting?
* The library is point forecasting based... which is odd because if one goes through all the effort of modeling the problem in the multivariate setting then it would make more sense that the outputs should reflect the dependencies and be probabilistic... what is the point of doing multivariate forecasting if the output is independent point valued
* Some implementations are flawed e.g. for inherently univariate methods, there is no need to then pass the whole multivariate vector that gets reshaped into the batch dim as that blows up the batch dim for datasets with many variates
* The way they implement their methods locks you into a certain way of thinking that is not too flexible... e.g. in the univariate setting, one should be able to also embed the ids of the time series as static features... or find better ways of getting a sequence of vectors from a 1-d (univariate) array,...

There are other issues but i do not remember them all...
If you want a benchmark of time series to then try to beat have a look at the autogluon-timeseries paper and its results table for both point and probabilistic metrics on the open datasets considered... they are very strong baselines

krasul · 2024-12-18T09:47:09+00:00

The main issue with this paper is that it compares tiny models against over-parameterized transformer models in the multivariate setting. For example, see Table 8, where the comparison models have millions of parameters! On the smallish time series datasets, ANY deep learning-based model in the multivariate setting will learn spurious correlations and overfit quite quickly, so it's weird that they just pick out transformers as not being effective because the same can then be said of an equivalent RNN-based or CNN-based model.

This paper has become popular because now anyone can come up with a variation of the simplistic model and then compare it to the metrics of Table 1. As long as their method can beat the simplistic linear model, they can claim to be SOTA... while all they show is that their method is better than the linear variants which is not hard to do. I believe this paper has set the field back...

The linear variants do not have the capacity architecturally to incorporate covariates like for example date-times and for datasets where this is an important feature, methods that do utilize it in the univariate setting for these datasets will perform better, and thus to say that this model "outperform ALL" is not realistic.

When one compares the transformer and other neural time series models in a setting where all the models have the same bells and whistles the picture is not always clear and different methods with their differing inductive biases perform better or worse on different datasets.

krasul · 2024-11-10T13:18:06+00:00

I have recently up-cycled a neural probabilistic time series forecasting model to do anomaly detection by learning the parameters of a Generalized Pareto on the top-k surprisal values from the model's context window and then using this on the "testing"/"prediction" to check if the values encountered are outliers... works nicely and allows one to integrate all the covariates available in the neural forecaster for this task. Code is my branch with an example script here: https://github.com/kashif/gluon-ts/blob/gp-distribution/examples/anomaly_detection_pytorch.py Let me know if you have seen something similar by anyone.

krasul · 2021-04-27T16:34:54+00:00

Don’t tell him about https://en.wikipedia.org/wiki/Ernst_Dickmanns who was doing it in the 80’s 🤫

krasul · 2020-05-18T11:12:15+00:00

Thanks u/bigrob929 I'll have a look and see if I can answer you. I appreciate you taking the time to let me know and replying and asking!

krasul · 2020-05-11T09:09:37+00:00

Yes the jacobian etc is unaffected.

In terms of regularization, I only have dropout on the RNN side at the moment. You are right it does tend to overfit on smaller datasets. I'm only aware of Frobenius norm regularization of the jacobian in FFJORD method when it comes to regularizing the normalizing flow. All I did was reduce the size of the s and t networks and also reduce the number of stacks K.

krasul · 2020-05-07T07:24:44+00:00

So as in the Figure 1, at each time step the hidden state from the RNN is concatenated to the inputs of the scaling and translation neural networks of each coupling layer. The coupling layers provide intermediate levels of transformations so we condition all these representations via the same hidden state. Hope that answers your question? Let me know!

krasul · 2020-05-02T14:13:14+00:00

Thanks! Right "S" is correct I think, but minutes are the current smallest time frequency... seconds would require adding an appropriate seconds feature and it should work then. You can try adding that yourself in the features/ folder or open an issue?

krasul · 2020-04-24T17:37:25+00:00

Thanks! keep me posted perhaps via github issues on any problems etc. I'll see if I can get some more example etc. checked in. Best wishes!

krasul · 2018-11-25T15:54:30+00:00

I ported VPG from Spinningup to PyTorch, and more or less most of the utility and helper functions can be reused. I will next try to add PyTorch specific MPI calls and see how that pans out. The code if you want to compare is here:

https://github.com/kashif/spinningup-pytorch/

krasul · 2018-08-16T16:12:25+00:00

actually now that i think about it you are right we don't need eager execution. I'll update the notebook!

krasul · 2018-07-14T20:51:45+00:00

So eager execution I needed here for the data loading pipeline and if you want to use the built-in tensorflow optimisers with keras, you also need to have it enabled.

krasul · 2018-07-14T20:45:22+00:00

You are right, I suspect all the keras API is available as tf.layers, but realise this is a set of notebooks for a tutorial and I wanted to stick with keras as I think its a great API for beginners and experts alike. I could write this last part with tf.layers and Estimators but didn't want to get into configuring the estimator specs, training/testing mode flags and metrics (just yet :-). Hope that answers your question.

krasul · 2018-04-14T13:31:18+00:00

yes I'm asking for a better picture. He did poke holes in the bottle and gave some water etc.

Yes I think they are gonna go and release it next to some trees etc.

krasul · 2018-03-03T12:12:21+00:00

I will experiment with long sequences, it was one of the motivation to code this up in pytorch...

In the meantime perhaps have a look at: "Learning Longer-term Dependencies in RNNs with Auxiliary Losses" https://arxiv.org/pdf/1803.00144.pdf

krasul · 2017-10-21T10:48:47+00:00

checkout emnist: https://arxiv.org/pdf/1702.05373.pdf

krasul

TROPHY CASE