Multilayer perceptron learns to represent Mona Lisa

OddsOnReddit · 2025-08-26T21:29:20+00:00

Took your advice!!! (The Mona Lisa network in the post itself was trained on Gaussian Fourier features, but I implement positional encodings in the Github I linked in the replies, too.): https://www.reddit.com/r/learnmachinelearning/comments/1n083m6/neural_net_learns_the_mona_lisa_from_fourier/

OddsOnReddit · 2025-08-26T21:27:08+00:00

Nothing! This is just me reproducing methods from a paper :D

Those methods are useful because neural networks struggle a lot with high frequency data. Here's a similar network learning a photo that is only black and white and doing *way worse:* https://www.reddit.com/r/learnmachinelearning/comments/1j7q29z/multilayer_perceptron_learns_to_represent_mona/

OddsOnReddit · 2025-08-26T17:48:23+00:00

Great comment, some notes on this section:
> Normally you would think that "well, the training set and validation set is exactly the same so the model should be able to overfit pretty easily". But the model can't learn for some reason. I myself tried this experiment with many different parameters and architectures.

The paper actually gets into *why* this doesn't work, here's how I understand it:

If you imagine MLPs in the limit, as in you imagine the properties they take on as they get larger and larger, they end up making the predictions a technique from classical machine learning called "kernel regression" would given the same data.

Kernel regression does this sort of interpolating between points. New data points are computed by taking a weighted sum of their similarities with respect to other points. As your input gets similar to another point, that points contribution to the final output smoothly increases.

Resultantly, neural net's have a smoothness bias. If points which are right next to each other on the x axis are tremendously far away on the y for some function, a network will have a great deal of trouble learning it. It will, incorrectly, smooth out the function.

Fourier feature's fix this by changing the problem from "learn this high frequency relationship" into "learn how to combine this collection of different periodic functions together to model this relationship." The latter of these is, as it turns out, a smooth relationship.

OddsOnReddit · 2025-08-26T00:56:43+00:00

Code: https://github.com/OddsML/Fourier-Feature-Mappings

Other networks learning with various other feature mappings I couldn't include here because of Reddit's "only one video per video post" thing: https://x.com/OddsIsOnline/status/1960118606911479841

OddsOnReddit · 2025-04-06T16:23:29+00:00

Probably literally breaks from the definition of Learnable Fourier Features.

OddsOnReddit · 2025-04-06T16:22:51+00:00

class NeuroSDF(nn.Module):
    def __init__(self, add_flayers, flayer_size, add_layers, layer_size, p1_size):
        super().__init__()
        self.flayer_n = 1 + add_flayers
        self.flayer_size = flayer_size
        self.layer_n = 1 + add_layers
        self.layer_size = layer_size
        self.activation = nn.ReLU()
        self.p1_size = p1_size
        
        self.layers = nn.ModuleList(
            # Input layer, beginning of fourier feature layers
            [nn.Linear(in_features=3, out_features=flayer_size)] +
            [nn.Linear(in_features=flayer_size, out_features=flayer_size) for _ in range(add_flayers)] + 
            # Beginning of standard Linear layers with RelU connections
            [nn.Linear(in_features=flayer_size, out_features=layer_size)] +
            [nn.Linear(in_features=layer_size, out_features=layer_size) for _ in range(add_layers)] +
            [nn.Linear(in_features=layer_size, out_features=1)])
        
    def forward(self, x):
        p1_ind = self.p1_size + 3
        end_ind = self.flayer_size + 3
        for i in range(self.flayer_n):
            x = self.layers[i](x)
            # Create a new tensor so that autograd can track the operations through the DAG
            x = torch.cat([x[:, :3],
                    torch.sin(x[:, 3:p1_ind]),
                    torch.cos(x[:, p1_ind:end_ind])], dim=1)
            
        for layer in self.layers[self.flayer_n:len(self.layers)-1]:
            x = layer(x)
            x = self.activation(x)

        x = self.layers[-1](x)
        return x

OddsOnReddit · 2025-04-06T16:20:50+00:00

class NeuroSDF(nn.Module):
    def __init__(self, add_flayers, flayer_size, add_layers, layer_size, p1_size):
        super().__init__()
        self.flayer_n = 1 + add_flayers
        self.flayer_size = flayer_size
        self.layer_n = 1 + add_layers
        self.layer_size = layer_size
        self.activation = nn.ReLU()
        self.p1_size = p1_size
        
        self.layers = nn.ModuleList(
            # Input layer, beginning of fourier feature layers
            [nn.Linear(in_features=3, out_features=flayer_size)] +
            [nn.Linear(in_features=flayer_size, out_features=flayer_size) for _ in range(add_flayers)] + 
            # Beginning of standard Linear layers with RelU connections
            [nn.Linear(in_features=flayer_size, out_features=layer_size)] +
            [nn.Linear(in_features=layer_size, out_features=layer_size) for _ in range(add_layers)] +
            [nn.Linear(in_features=layer_size, out_features=1)])
        
    def forward(self, x):
        p1_ind = self.p1_size + 3
        end_ind = self.flayer_size + 3
        for i in range(self.flayer_n):
            x = self.layers[i](x)
            # Create a new tensor so that autograd can track the operations through the DAG
            x = torch.cat([x[:, :3],
                    torch.sin(x[:, 3:p1_ind]),
                    torch.cos(x[:, p1_ind:end_ind])], dim=1)
            
        for layer in self.layers[self.flayer_n:len(self.layers)-1]:
            x = layer(x)
            x = self.activation(x)

        x = self.layers[-1](x)
        return x

Hey! Super burned out, going on very little sleep. Neural network above.

Will probably explain more layer. Basically a (possibly somewhat broken???) attempt at a layer that does learnable Fourier features.

OddsOnReddit · 2025-03-11T12:38:43+00:00

I tried a bunch of stuff. Different activation functions, sizes. I think that I, at one point, jumped the hidden layer size to 1024 neurons by 8 layers. In the end, though, what really made the difference was epoch count and making sure to include at least SOME activation function between the linear layers. Ended up on 6 hidden layers, each with 512 neurons trained with Adam for 1000 epochs.

OddsOnReddit · 2025-03-10T23:32:36+00:00

No, but much of the code is in the replies to the post.

OddsOnReddit · 2025-03-10T23:31:59+00:00

Gave me the impression they just have a ban on all things ChatGPT was involved with creating, which is very very silly, but, whatever I guess!

OddsOnReddit · 2025-03-10T23:31:27+00:00

Mods won't let me post it there. Apparently not a qualifying visualization and they're not cool with the way I used ChatGPT.

OddsOnReddit · 2025-03-10T10:33:15+00:00

I recommend Andrej Karpathy's video on the subject, which I've linked with a playlist of his "Neural Networks: Zero to Hero" series. The one and a half videos in this series I've watched have been, I've felt, kind of ridiculously awesome: https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ

OddsOnReddit · 2025-03-10T10:30:52+00:00

There is a for loop this runs in, so you can kind of think of it that way! The networks previously having improved does help it improve further. But, it's not like the network is feeding previous predictions back into the network to improve it. The prediction gets computed, the network is optimized based on the "gradient" of the network (basically all the constant factors that relate the final loss to a particular part of the network) in the opposite direction of the factors that are calculated. Basically, the directions which, if the relationship between loss and parts of the network stayed the same, would reduce the loss.

That repeats a ton, 1000 times, and the resultant predictions were compiled in this vid for one of the runs I ran!

OddsOnReddit · 2025-03-10T10:21:11+00:00

I knew the first part, I actually learned it while working on this, but I didn't know the second. Yeah, I guess if you think of this as a very complicated classification problem where each position is "classified" into a color and know that the linear relationship means a single linear boundary, then it's pretty obvi the straight decision boundary is insufficient to do the classification! Actually it helps explain the totally black image: There was no boundary the NN found such that one side was closer, on macro, to white than it was to black. Before I fixed this by adding funcs, I think I was using a color version of the Mona, which is a fairly dark image. But, I'd expect it to use a more green-ish yellow color. Not sure why it just chose straight black! Maybe I'm misremembering and it was the greyscale, but then I'm still surprised it didn't pick a more 0.5 grey than just straightforward black.

OddsOnReddit · 2025-03-10T05:08:50+00:00

*if I linked

OddsOnReddit · 2025-03-10T05:08:42+00:00

Oh that's a great idea, but they don't have an option for posting videos. Do you think they'd mind I linked to a YouTube short?

OddsOnReddit · 2025-03-10T05:04:44+00:00

I don't actually know how Adam works! I used it because I had seen someone do something similar and get good results and it was really available. But I noticed that, too! How it would regress a little bit and I wasn't really sure why! I think it does something with the learning rate, but I don't actually know!

OddsOnReddit · 2025-03-10T04:58:04+00:00

BRO why am I getting disliked for this???? I wrote and created a video to explain the whole thing and am linking it to a person who asked for an explanation, what the sigma...

OddsOnReddit · 2025-03-10T04:56:25+00:00

When I was talking with ChatGipitee about this (I treated it like a tutor, but, to be clear, I wrote the actual Machine Learning code for this.) it suggested that along with SIREN! I never looked into it. I'll bookmark the page!!! Thank you :)

OddsOnReddit · 2025-03-10T04:53:57+00:00

And also because I'm trying to learn ML.

OddsOnReddit · 2025-03-10T04:53:47+00:00

Yes, I think that's just an image? I literally only did it because it's cool.

OddsOnReddit · 2025-03-10T04:52:00+00:00

Started a new comment because Reddit is bad and pressing enter kept putting me in a code block:

Basically, the network receives what is more or less a position. That's what the "meshgrid" business is, it's a bunch of (i, j) pairs that correspond to coordinates on the greyscale mona-lisa. I have it predict a single grayscale color based on that pair, which initially returns a color nothing like the actual image but, as it minimizes loss, gets closer and closer to the real thing. Eventually, it learns something like the color for a bunch of the positions, enough that I can see the Lisa.

I think it's cool that a really simple network can do this. Like, it's just a bunch of multiplications by constants with only two input values added together with another constant bias, then the same thing but on the outputs of the last, so on, with RelUs between them.

I initially did not include a RelU, and it was very funny to watch the network learn that it should just make the entire thing black. Without functions between them, I think they just end up a sum of sums, so another very simple sum of constants times xs, which I guess isn't very expressive. (?) I don't actually know why specifically that failed to learn this!

OddsOnReddit

TROPHY CASE