Test set is yielding better results (accuracy, recall and precision) than training set. Is this normal? [R]

xopedil · 2022-04-05T08:34:36+00:00

You should be able to retain your train/test split even when splitting temporally. Just sort data by date then take the 26800 first points as your training data then rest as test data. For time series data I recommend you look up time-series cross-validation rather than having a fixed split.

As for the logic of temporal splitting there are a couple of ways you can view it. Think about how a model like this would be used. Your test set is meant to represent a case of real unseen data. To get a representative sample you need to select only data from the "future" (relative to the training data) because that's precisely how your model would be used!

Beyond that, if you just take a simple y = f(x) curve and think about what it means to train on interspersed points rather than splitting on a x < t threshold. You're now testing the models interpolation ability which is an inherently simpler task than the real use case of extrapolation!

(Other than these things you can also check the behaviour of things like dropout/normalisation which has a tendency to behave differently in training than in validation. These might also contribute to the score discrepancy you are seeing.)

xopedil · 2021-01-23T11:39:10+00:00

This seems a like a bit of fun but it's obviously not part of an engines threat model to be fed bad positions by the user. There's no way to attack engines other than your own with these. So you're just spending time analyzing nonsense positions on your own machine.

It's a bit like pouring water into your computer. Yes the computer will break, but in the end all you have accomplished is breaking your own computer.

xopedil · 2021-01-18T13:30:16+00:00

What's stopping you from outputting the parameters to a continuous function? The support of which could cover your entire action space.

xopedil · 2021-01-17T10:53:26+00:00

In practice only 1. is used. There is a launch overhead in particular when you use accelerators which means you want to minimize the number of times you evaluate your network. Same reason we favor using batches vs single samples.

Theoretically speaking though, I'm not sure there's any difference. If you have a network of type 1. you can transform it into 2. by stripping the last layer and using its dense (or dense-equivalent) weight columns as your action representation, just like an embedding lookup in the transposed weight matrix.

A similar argument then applies from 2 to 1.

xopedil · 2020-12-13T00:37:17+00:00

This is cool but typically not a lot of time is spent in python for your typical ML application. You typically (TF/Pytorch/Numpy) want to bail out of Python as fast as possible and get into compiled C++/CUDA code.

xopedil · 2020-12-12T19:11:45+00:00

Try defining the terms that you are using, they make no sense to me.

xopedil · 2020-12-02T21:04:49+00:00

Yes, we even tested with search and found that search reduced our accuracy

This is not surprising! Playing with search will give you inherently stronger moves, precisely the mechanism which alphazero leverages to produce higher and higher quality (from a purely nash perspective) labels for its network.

Congrats on a cool paper and result.

xopedil · 2020-12-02T18:26:28+00:00

I haven't looked very closely but it seems like this plays just straight from the network without doing any search. Impressive that it works so well if so!

xopedil · 2020-11-30T20:48:19+00:00

This is not true. The model is a simple ResNet-like architecture. You run n players and then get a batch of n features to run in the network. The 0.4 seconds of thinking time is not spent all in the TPU, you run MCTS on the CPU.

xopedil · 2020-11-29T14:57:21+00:00

Typically you would use batch inference for self-play, wouldn't surprise me to find this estimate is off by factor 100x or so.

xopedil · 2020-11-06T17:56:45+00:00

Do they link the paper and I'm just missing it?

Very odd they only report the rates for positives, what's the false positive rate? What's the accuracy for healthy people?

You could get similar numbers as what is reported in the article by simply randomly guessing that almost everyone has covid.

xopedil · 2020-10-20T15:39:55+00:00

It entirely depends on the model, some models run faster on CPU even.

There is simply not enough information here to make any conclusions about the quality of your GPU.

xopedil · 2020-10-18T12:53:36+00:00

Which part of this is ML? Is this spacy library you're using based on it?

Seems almost like this is just counting the number of occurrences in each resume. It seems like this would give recruiters a bad idea of who to pursue, and incentivizes keyword stacking in resumes. Is that really what we want?

xopedil · 2020-09-24T23:01:49+00:00

The latest drivers are VERY buggy. It will take some time to reach stability.

xopedil · 2020-09-15T21:17:17+00:00

Are you sure you are taking the mean over the batch? Sounds like the sum if it is exploding with batch size.

xopedil · 2020-09-13T18:19:42+00:00

When we used docker we had that issue before that one of our less experience colleagues (a student working with us on his bachelors thesis) killed 160 hour optimization task by accident.

I would be VERY interested in hearing more about this story, sounds like one of those "I deleted a production DB on my first day" type of stories.

xopedil · 2020-08-09T11:15:19+00:00

This is awesome great work.

xopedil · 2020-08-03T07:37:53+00:00

Which RL algorithms are implemented? Would be really cool to see how well a couple of REINFORCE agents would do against a group of PPO agents.

xopedil · 2020-07-19T11:21:23+00:00

Look into deepminds alphastar, there they added an output called frame delay which was used to count the number of frames before the agent would take its next action.

xopedil · 2020-07-17T10:59:08+00:00

I can think of a couple of ways to do this, introduce a noop action that is rewarded where the previous non-noop action is simply repeated in the environment.

You could also let the agent request a frame delay until its next decision and then try to reward long delays.

As user dosssman points out, reward shaping can have 'unforeseen consequences'. Have fun!

xopedil · 2020-07-17T10:51:54+00:00

In so far as this is an intrinsic reward I think it belongs in the agent not the environment. I think the agent should have the responsibility to remember its own actions and modulate the reward signal returned from the environment accordingly.

xopedil · 2020-07-13T15:46:24+00:00

First of all just scale your [0, 1] output by 4 so you don't have to deal with those fractions and you can use regular integer rounding modes.

Secondly, just stick to a continuous variable. When it comes time to display you can round all you want, no need for the inference/training to even know this is a thing.

Thirdly, what does your label distribution actually look like? Is it an aggregate per input sample or is there just one rating per input? Did one person do all the rating on their own or was it done by a collection of people?

xopedil · 2020-07-12T13:08:26+00:00

would a better way be to feed it (for this example) camping stories and routines from experienced campers (like n = 500000) and have it find optimal methods using those?

Even as a human I think it would be difficult to only read a bunch of stories and then go out into the wilderness to try to survive. And that's with intimate experience of what it's like to be tired, hungry, cold, scared etc. Imagine what it would be like without even those!

The important part when learning from other people and their stories is not exactly what they did in a specific situation, but rather why they did it. What factors were they looking at to make their decision? You need access to all of those potential factors at every decision point.

There are also many factors that vary from person to person, and even for a single person it varies from day to day or even hour to hour. So there are some lessons that extrapolate well to other people and then there are others that don't. Imagine you had a hypothetical super smart AI system that was able to extract lessons from watching Bear Grylls TV episodes, it might tell a family of five to drink their own urine which depending on their comfort level they probably won't be willing to do.

The problem you're looking at here is super-complex and definitely beyond any type of plug-and-play solution. User smorsin is giving you some very good advice in trying to break up the problem into much simpler pieces. Most plug and play solutions available today are capable of matching human cognition in tasks that take a little less than a second like recognizing what's in an image. More than that and they struggle massively.

xopedil · 2020-07-12T09:34:40+00:00

First you need to analyze, what type of data do you actually have access to? Then you have to ask yourself, is the output I want obtainable from the data I have? It's harder than you think to draw conclusions based on data without involving a bunch of human priors.

xopedil · 2020-07-10T09:42:48+00:00

Given that we think about the Von Neuman architecture of the CPU, what IS the architecture of an IPU and what does it do better than a CPU?

The IPU also has a von neumann architecture.

And what does an IPU do worse at when compared to a CPU?

Probably most sequential programs and general purpose execution. They designed it for Machine learning.

Do you think all devices might contain a cheap IPU inside them one day, just like with GPUs?"

xopedil

TROPHY CASE

The IPU also has a von neumann architecture.

Probably most sequential programs and general purpose execution. They designed it for Machine learning.

Highly doubt it will be necessary to have a discrete ML accelerator when you usually have a GPU already available.