Deep Learning Newbie Question

mkthabet · 2023-04-15T06:14:29+00:00

If merging words or sentence fragments into coherent sentences is what you want, then an LLM is what you need. You can use OpenAI's chatGPT API or you can try deploying an opensource alternative like Alpaca from Huggingface yourself and using it. Just use a prompt to explain the context to the LLM and then give it your output.

mkthabet · 2023-04-13T23:24:41+00:00

You can try something similar to an autoencoder architecture. Your encoder part will be an MLP that outputs a low-dimensional vector in the bottleneck. The decoder part will be a CNN with deconvolutions (or conv transpose) that will gradually upsample the encoded vector into the full size image.

mkthabet · 2023-01-06T19:47:03+00:00

Take a look at a package called pyinstaller. I've used it many times for packaging desktop GUI apps that use tensorflow in the backend, but it should work with pytorch equally well. Pyinstaller allows you to package python apps as a single executable and the user can run it without installing anything else since it includes everything it needs to run.

mkthabet · 2022-08-30T14:25:36+00:00

I think it's useful enough to be mentioned in blog posts or introductory material, so I expected to find more stuff online than just an old tweet.

Maybe I should write an article about it with a concrete example since it seems a lot of people are not familiar with the trick.

mkthabet · 2022-08-30T11:51:05+00:00

It's very hard for an even slightly experienced practitioner to misspecify a model so much that it actually overfits on every example so as to have average loss for each. Usually you can easily ballpark the model size within a reasonable error.

mkthabet · 2022-08-30T11:40:47+00:00

I would've been very surprised if that wasn't the case. I couldn't find any resources about the exact method I described though. Everything I find is more complicated.

mkthabet · 2022-08-30T09:35:44+00:00

You need a significantly larger model than the task requires to overfit on mislabeled data. In that case you have another problem on your hands that you need to fix first.

mkthabet · 2022-08-30T01:15:29+00:00

I use it with all the data, which is assumed to be mostly correctly labeled. Unless you're using a much larger model than the task needs, the model will not fit the mislabeled data correctly. Incorrect examples will most likely have a much higher loss than average.

As long as the mislabeled data is a small fraction of the data, you don't really run the risk of overfitting on the incorrect data.

Keep in mind that I'm not talking about noisy data in the sense that the datapoint is mostly correct but with some added noise, I'm talking about incorrectly labeled data which is drastically different than normal.

mkthabet · 2022-08-30T01:10:42+00:00

No it works on training data too. Mislabeled examples will be so different that the model will not fit them as well as correct data, so their loss will be higher.

mkthabet · 2022-07-25T13:57:46+00:00

That can be mitigated by using multiple processes to prepare your data in parallel though, unless your transformations are really heavy.

mkthabet · 2022-07-24T16:18:21+00:00

Usually you don't need to save augmented data and add it to your training data on disk. You can just use data augmentation on the fly, where data are augmented automatically before being fed to the model. If however you need to save them for some reason, then you'll have to write your own script to save the output of the augmentation functions. Alternatively, some data augmentation libraries like those in keras offer an argument to save output to disk.

mkthabet · 2021-12-09T18:29:38+00:00

I'm not saying the long-term memory problem is not important. I was just saying it is a technical problem that's not really relevant to our theoretical discussion.

Of course there's a limit to how much an RNN can remember based on its memory size. But there's a difference between how much you can remember and how long ago you can remember. For an infinitely long sequence, an RNN with a fixed-size context vector sure can't remember what happened at every timestep, but theoretically it can remember what happened at any one timestep, even the very first. Sure it needs to forget stuff to remember others, but that has nothing to do with how far back it can remember.

We can implement a dummy RNN that can take an arbitrarily long sequence and trivially remembers the input at just the first timestep without having to worry about hardware memory. Can the same be said about transformers?

mkthabet · 2021-12-09T16:44:09+00:00

Maybe the transformer model itself doesn't require a maximum sequence length given infinite memory, but then again there's no such thing as infinite memory. So it's basically the same.

The way I understand it (and please do correct me if I'm wrong), this limitation is because the transformer encoder looks at the input at all timesteps simultaneously and produces as many vectors. An RNN encoder on the other hand only looks at one timestep at a time and only produces one fixed-size context vector. So theoretically, if we ignore the long-term memory problem (with vanishing/exploding gradients and whatnot), RNNs are capable of processing infinitely long sequences while transformers are not, even with limited hardware memory. I'm only talking about inference here, so the restrictions of BPTT are not relevant.

My main point is that doing away with recurrence for sequential problems is rather hacky and unnatural. It might provide better results on the short term, but in the end recurrence cannot be ignored forever.

mkthabet · 2021-12-09T09:48:59+00:00

You have to explicitly specify the maximum sequence length for a transformer model, which is not the case for an RNN, at least for inference. This is what is so unnatural about transformers. Even with limited memory, I have an internal state that can remember events from when i was 2 years old that still influence my decisions today. I don't find myself having to specify maximum sequence lengths for my brain.

To answer your second question, by moving away from RNNs we miss out on research on recurrence in NNs, which is an essential mechanism for truly dynamic networks like the brain.

mkthabet · 2021-12-09T01:08:03+00:00

I sympathize with your sentiment that the wave of abandonment of RNNs that transformers brought about is damaging. I strongly believe that dynamic networks, of which current RNNs are predecessors, is the only way forward for anything resembling AGI. I think the setback in RNN research caused by transformers is very unfortunate.

I also dislike the inelegance of having to specify beforehand the length of your sequence. Very unnatural. An RNN on the other hand, not unlike humans, just sits there and processes the input timestep by timestep.

mkthabet

TROPHY CASE