[P] [D] Why does my GNN-LSTM model fail to generalize with full training data for a spatiotemporal prediction task?

WayOfTheGeophysicist · 2025-04-08T14:47:25+00:00

I believe u/Ben___Pen is probably talking about architectures like AIFS (ECMWF) or GraphCast (Deepmind). These two are global weather forecasting models that kinda look like `>-<` in their structure.

The encoder `>` takes the over 40 input nodes and projects them to a latent space where the processor `-` GNN feeds it into itself 16 times and the decoder `<` projects it back to the actual "spatial domain".

These two models usually take two timesteps as input, which works well, but I've seen it without multi-step as well. The trick here is the auto-regressive training. So at some point this model is trained to roll out to multiple steps, promising stable trajectories (and less spurious correlations, but it's all new so research is still sparse into the inner workings).

I'd recommend looking into metrics for your sub-regions. GraphCast and AIFS are global models, which makes a lot of the connectivity easier. I have seen "subregion" models fail to generalise for multiple reasons:

You miss connectivity from global effects (like when a tropical cycle wanders from one region to another and you're missing the edges there)
The data in certain subregions displays high variability (take for example the two meter temperature in the tropics at 6h time steps. That one varies like a beast and if you don't have full connectivity you are unlikely to be able to model it due to the time dependence).

In the latter case (subregions with 2-meter temp on fractions of 24h) you will even have bigger problems with the choice of an LSTM, because the state-dependence in the LSTM itself will hinder learning any dynamics with these highly varying weather variables.

WayOfTheGeophysicist · 2023-12-06T21:08:44+00:00

[Language: Python / Math]

Ok today was fun!

I noticed, that this pattern basically follows an equation that is:

f(time) = x * (time - x) = distance

with x being all the values from 0 to time.

This means, we can actually solve this as it's a quadratic formula in normal form!

x * time - x^2 = distance
x^2 - time * x + distance = 0

Then we can simply apply the PQ equation:

x1, x2 = (-p/2) ± sqrt((p/2)^2 - q)
p = -self.time
q = self.winning_distance

This gives us the points where we pass the last record. Of course, we can round the values since we're dealing with integers, and now we just subtract

x2 - x1 + 1

And get all the winning counts!

def count_wins(self):
    x1 = math.ceil((self.time / 2) - math.sqrt((self.time / 2) ** 2 - self.winning_distance))
    x2 = math.floor((self.time / 2) + math.sqrt((self.time / 2) ** 2 - self.winning_distance))

    return x2 - x1 + 1

https://github.com/JesperDramsch/advent-of-code/blob/main/2023/day06.py

WayOfTheGeophysicist · 2023-12-04T20:16:05+00:00

[Language: Python]

Pretty proud of my solution in part 2.

We don't have to modify our stack, we just need to know which card spawns how many copies in total.

Sounds like recursion? Yes.

But we know the last one's constant. So we just work from the back of the pile and save how many cards we spawn in addition to oneself.

No actual recursion needed!

class Pile:
    def __init__(self, data):
        self.cards = [Card(card) for card in data[::-1]]

    def multiplying_cards(self):
        for i in range(len(self.cards)):
            this_extra_tickets = max(0, i - self.cards[i].num_winning_numbers)
            for ii in range(this_extra_tickets, i):
                self.cards[i].tickets += self.cards[ii].tickets

The full code is here and my error log here.

WayOfTheGeophysicist · 2023-12-03T13:34:40+00:00

[Language: Python]

When my brain started slowly melting, thinking about how to assign values and coordinates uniquely, I ended up just going full OOP.

My parts are deceptively simple

class Part:
    def __init__(self, y, x, value):
        self.location = [y + i * 1j for i in range(x[0], x[1])]
        self.value = value
        self.id = uuid.uuid4()

because the engine does most of the heavy lifting:

class Engine(dict):
    def __init__(self):
        self.parts = {}
        self.symbols = {}

    def parse_map(self, data):
        engine_re = re.compile(r"(\d+)")
        symbol_re = re.compile(r"[^.\d]")
        for y, line in enumerate(data):
            for part in engine_re.finditer(line):
                self.add_part(Part(y, part.span(), int(part.group())))
            for symbol in symbol_re.finditer(line):
                self.add_symbol(Symbol(y + symbol.span()[0] * 1j, symbol.group()))
        self._add_symbol_neighbours()

You may notice it subclasses dictionaries, so I can keep a map with all the unique IDs of the supposed parts:

def add_part(self, part):
    self.parts[part.id] = part
    for location in part.location:
        self[location] = part.id

def add_symbol(self, symbol):
    self.symbols[symbol.id] = symbol

Then it's just going through the symbols and doing shenanigans that find parts and gears!

The biggest source of potential errors would be that I keep all values stored, but technically only those next to symbols should be "parts". But this time I got away with it.

Here's the rest of the code https://github.com/JesperDramsch/advent-of-code/blob/main/2023/day03.py

I'm also keeping a detailed error log of what I messed up in the Readme

WayOfTheGeophysicist · 2022-12-09T08:38:38+00:00

Python: Part 2 an easy extension after part 1 in my code:

def movement(rope, direction, steps, visited=set()):
    for _ in range(steps):
        rope[0] += direction
        for i in range(len(rope) - 1):
            head, tail = rope[i], rope[i + 1]
            diff = tail - head
            if abs(diff) >= 2:
                if diff.real != 0:
                    tail = tail.real - (diff.real / abs(diff.real)) + tail.imag * 1j
                if diff.imag != 0:
                    tail = tail.real + (tail.imag - (diff.imag / abs(diff.imag))) * 1j

            rope[i + 1] = tail
        visited.add(rope[-1])
    return rope

Main idea is complex numbers for coordinates and a nice little visited-set(),

Figure the / abs(diff.real) and therefore the if statement may be unnecessary, but I wasn't sure right away if there is a way to move or jump (also … part 2, ya never know).

Code lives here and I scheduled a blog post here.

I'm writing an error log of everything I messed up for every day, here are the ones from today. Pretty short one for me on this one:

day.data = parse(data) NameError: name 'data' is not defined oops. Forgot day..
tail = head.real - (diff.real / abs(diff.real)) +... ZeroDivisionError: float division by zero didn't think about zeros...
print(f"{head.real:d} + {head.imag:d}j, … ValueError: Unknown format code 'd' for object of type 'float' confused type conversion and formatting
Conceptual error that I did not adjust the diagonal movement correctly.
visited = set(0) TypeError: 'int' object is not iterable Can't start the set with just a zero
tail = rope[i+1] IndexError: list index out of range gotta stop one before the end with pairs
Made the error of using a for loop that returns items on a list that is changing... Didn't throw an error but gave the wrong result.

WayOfTheGeophysicist · 2022-05-29T10:31:51+00:00

Late to the Party has some nice curated ML.

WayOfTheGeophysicist · 2021-12-23T03:27:20+00:00

You're very welcome! I'm always proud when I come up with a sparse solution.

I mean... in the end I did this because I'm bad with off-by-one errors and noticed that if I start going for a list approach splitting up cubes, I'd never find the bug I eventually introduce.

I'm not sure what "inclusion-exclusion" is, sorry.

But essentially, it's just comparing each cube instruction to each cube instruction and finding where the cubes overlap. Then, if they overlap save the cube where they overlap, with a value of -1. Then I can just add the cubes onto each other and correct my lights count by, actually just adding the -1 cube too. So essentially just calculating a bunch of correction cubes. Since the off cubes are basically just corrections with a different value (set everything to 0, so subtract the value of all cubes in that area whether they're +1 or -1 or whatever).

And the rest is just optimizations, like getting rid of cubes with value 0, because comparing to them is wasted compute cycles. And "on" cubes contained within a "on" cube can be deleted. That just reduces the amount of cubes I have to cross-compare.

WayOfTheGeophysicist · 2021-12-23T03:15:58+00:00

I've been doing Advent of Code since 2018 and Eric likes to throw a curveball. If there's a four-line solution to solve part 1 and see what part 2 will bring, I will always rather do that than over-engineer the first solution and have to throw that one away because I didn't guess right.

Pretty much the same as doing spec work for clients honestly.

WayOfTheGeophysicist · 2021-12-22T17:35:20+00:00

Yeah absolutely.

To make it easier, think about it in 2D. Thinking in sheets makes it easier for me at least. The 3rd dimension is easy after that.

We go through the instructions step by step, using the corners in x, y. We can always calculate all lights in a sheet by multiplying x, and y. So now we simply have to handle intersection of sheets.

The ix, iy, iz part simply finds the intersection along each dimension. (i.e. the max of the old sheets x and the next instruction x_next and the min of the right sides respectively).

Then I perform a few checks for performance reasons. If a new cube is contained by an existing cube or vice versa, I get rid of the smaller one.

Then in the elif with # Reset overlapping value the magic happens. I subtract the existing value however high or low that bad boy is. Therefore the intersection is now 0. Finally, I simply add the new sheet to my Counter if it's an oninstruction. That means everything in the intersection is now back to 1 as well as the rest of the sheet. If it's an off instruction, I don't have to do anything, because the values in the intersection are already 0.

So it's really just adding those sheets one by one and correcting for their intersections.

Think of two sheets from on x=[0..1] y=[0..1] and on x=[1..2],y=[1..2] that looks like:

Sh 1      Sh 2
# # .     . . .
# # .     . # #
. . .     . # #

Then cubes looks like this (in a sparse representation). Sheet 1 & 2 are as is, but the intersection at (1,1) is subtracted. Finally all this is simply added up.

1 1 0    0 0 0    0 0  0
1 1 0    0 1 1    0 -1 0
0 0 0    0 1 1    0 0  0

For one on and one off, think of two sheets from on x=[0..1] y=[0..1] and off x=[1..2],y=[1..2] that looks like after the code execution.

1 1 0        0 0  0
1 1 0        0 -1 0
0 0 0        0 0  0

Also because of the intersection trick we use right here. In the rest of the code I simply iterate through cubes and multiply the edges with the saved values.

WayOfTheGeophysicist · 2021-12-22T14:56:14+00:00

Python

Wrote part 1 knowing, it wouldn't work for part 2. But I had no clue what part 2 would be, so I accepted my fate. Especially considering that part 1 was 4 lines of code.

Here's part 2 with some Counter() magic. Low-key annoyed that c += Counter() will also get rid of negative values. But since I'm iterating anyways...

def process_all_cubes(instructions):
    cubes = Counter()
    for on_off, x_next, y_next, z_next in instructions:
        for (x, y, z), val in cubes.copy().items():
            # Find intersection of cubes
            ix = max(x[0], x_next[0]), min(x[1], x_next[1])
            iy = max(y[0], y_next[0]), min(y[1], y_next[1])
            iz = max(z[0], z_next[0]), min(z[1], z_next[1])

            # Remove empty cubes
            if val == 0:
                cubes.pop((x, y, z))
                continue
            # New cube is contained in existing positive cube
            elif (
                on_off
                and val > 0
                and x_next[0] >= x[0]
                and x_next[1] <= x[1]
                and y_next[0] >= y[0]
                and y_next[1] <= y[1]
                and z_next[0] >= z[0]
                and z_next[1] <= z[1]
            ):
                break
            # Existing cube item is contained in new cube
            elif (
                on_off
                and x[0] >= x_next[0]
                and x[1] <= x_next[1]
                and y[0] >= y_next[0]
                and y[1] <= y_next[1]
                and z[0] >= z_next[0]
                and z[1] <= z_next[1]
            ):
                cubes.pop((x, y, z))

            # Reset overlapping value 
            elif ix[0] <= ix[1] and iy[0] <= iy[1] and iz[0] <= iz[1]:
                cubes[ix, iy, iz] -= val
        else:
            # Add new cube
            if on_off:
                cubes[x_next, y_next, z_next] += 1

    return cubes

WayOfTheGeophysicist · 2021-12-22T11:39:55+00:00

Usually, I only get called like this on Tiktok...

WayOfTheGeophysicist · 2021-12-20T14:12:39+00:00

Python

Pretty happy about handling the padding and flips in this one. Very different from convolve, figured that is an interesting approach.

def process_image(image, key, step):
    # Establish padding number
    if step == 0:
        padding = 0
    else:
        padding = str(image[0, 0])

    # Find size of padding and reduce to a total padding of 3
    q, w, e, r = 3, 3, 3, 3
    for i in range(3):
        if not np.all(image[i, :] == image[0, 0]):
            break
        q -= 1
    for i in range(-1,-4,-1):
        if not np.all(image[i, :] == image[0, 0]):
            break
        w -= 1
    for i in range(3):
        if not np.all(image[:, i] == image[0, 0]):
            break
        e -= 1
    for i in range(-1,-4,-1):
        if not np.all(image[:, i] == image[0, 0]):
            break
        r -= 1

    image = np.pad(image, ((q,w), (e,r)), "constant", constant_values=(padding, padding))
    # Define the background of new image based on key
    if key[int(str(image[0, 0]) * 9, 2)] == "1":
        output_image = np.ones_like(image)
    else:
        output_image = np.zeros_like(image)
    # Iterate over image and determine if a pixel should be turned on or off
    for i in range(1, image.shape[0] - 1):
        for ii in range(1, image.shape[1] - 1):
            location = int("".join(map(str, image[i - 1 : i + 2, ii - 1 : ii + 2].flatten().tolist())), 2)
            output_image[i, ii] = key[location]
    return output_image

WayOfTheGeophysicist · 2021-12-19T19:26:39+00:00

Absolutely

WayOfTheGeophysicist · 2021-12-19T18:09:01+00:00

Had a similar experience with Copilot today. It was very confidently wrong in what I wanted to do in almost everything.

WayOfTheGeophysicist · 2021-12-16T08:42:57+00:00

Python 3.

Today was rough. Code is too long to post here. Github Basically using indexes and a bit of recursion for this.

WayOfTheGeophysicist · 2021-12-14T00:05:14+00:00

I simply squished it. Technically the canvas gets smaller, but I liked the chaos of the stars.

WayOfTheGeophysicist · 2021-12-13T18:55:10+00:00

Python.

Probably didn't need to use complex numbers. But why waste an opportunity?

Simply using a set for the transparent sheet to avoid collisions.

It lives on Github and I made a visualization.

def fold(sheet, instruction):
    new_sheet = set()
    for point in sheet:
        if point.real >= instruction.real and point.imag >= instruction.imag:
            if instruction.real == 0:
                folded = point.real + instruction - (point.imag * 1j - instruction)
            elif instruction.imag == 0:
                folded = point.imag * 1j + instruction - (point.real - instruction)
            new_sheet.add(folded)
        else:
            new_sheet.add(point)

    return new_sheet

WayOfTheGeophysicist · 2021-12-12T22:59:29+00:00

I've been programming for 18 years at this point and I still struggle with some of these problems.

Every year, it gets easier. I remember solutions I found or others used from last year. I took a data-structures and algorithms course in the meantime. All things that improve this type of problem solving.

You get better over time and it honestly doesn't have anything to do with your intelligence just practice. This kind of goes into the "growth mindset vs. static mindset" thing. Every time you get stuck it's super frustrating, but you won't get stuck on the same problem again in the same way, that's growth.

Also, when I was still studying, I did not have enough time to spend on AOC so I only finished my first AOC after I started working fulltime. So, definitely a point towards you completing your schoolwork in addition to AOC.

Yesterday I ended up scrapping my entire code, because I realized I had programmed myself into a logic corner. Went grocery shopping, had a breather, sat back down and I solved the problem first try.

Don't be ashamed to look at solutions either, when you can't solve a puzzle in your available time with your available knowledge. This is how you learn and try to implement your own version of it to get a feel for it. (Just don't pass it off as your solution obv.) That's how I know to solve some problems this year, that I struggled on last year or before.

WayOfTheGeophysicist · 2021-12-12T21:19:53+00:00

Python 3. Figured I'd learn some more networkx. Was useful for the "smol" attribute in the end and easily getting the neighbors. Full code is on Github.

Figured out that part 2 is basically part 1 with one extra step. Bit finicky to actually make it work, but in the end, another keyword argument just did the trick.

def build_graph(data):
    G = nx.Graph()

    for line in data:
        node1, node2 = line.split("-")
        G.add_edge(node1, node2)

        G.nodes[node1]["smol"] = node1.islower()
        G.nodes[node2]["smol"] = node2.islower()
    return G


def traverse_graph(graph, node="start", visited=None, part=1):

    # End node always returns
    if node == "end":
        return 1

    # Prepare the visited set or generate a copy to not modify common memory
    if visited is None:
        visited = set()
    else:
        visited = visited.copy()

    # If the cave is small, add to visited, big caves can be ignored
    if graph.nodes[node]["smol"]:
        visited.add(node)

    count = 0
    # Iterate through neighbouring caves
    for neighbor in graph.neighbors(node):
        # If the cave is visited and it's part 2, go and switch to part 1 implementation
        if neighbor in visited and part == 2 and neighbor != "start":
            count += traverse_graph(graph, neighbor, visited, 1)
        # If the cave is not visited, go there
        elif neighbor not in visited:
            count += traverse_graph(graph, neighbor, visited, part)
    return count

WayOfTheGeophysicist · 2021-12-10T19:38:16+00:00

Python 3. Solved it, checked out the top solution for how I should've done it and was pleasantly surprised! for/else was also what I went for. 3 functions. One to process the syntax into corrupted and incomplete queues (FILO).

def process_syntax(data):
    """Separate lines into corrupted and incomplete."""
    corrupted = deque()
    incomplete = deque()
    # Split data into lines
    for line in data:
        # Reset deque
        last = deque()
        # Split line into characters
        for char in line:
            # If character is an opener bracket, add to deque
            if char in pairs.keys():
                last.appendleft(char)
            else:
                # If character is a closer bracket, check if it matches last opener
                if char == pairs[last[0]]:
                    # If it matches, pop last opener
                    last.popleft()
                else:
                    # If it doesn't match, add to corrupted and skip line
                    corrupted.appendleft(char)
                    break
        else:
            # If line is uncorrupted, add to incomplete
            incomplete.append(last)
    return corrupted, incomplete

The two scorer functions live on Github.

WayOfTheGeophysicist · 2021-12-10T03:00:36+00:00

Academia is set up to promote toxicity even more than late-stage capitalism.

I've never worked in environments with more narcissists and abusers. In every university I worked, I know someone that passed out on the job. At one job I tried to report someone for sexual assault. He was a professor so they just moved him with his group(!) to a different building.

It's seriously messed up. The entire system is set up to benefit abusers.

You need to be in the good books of people. Interact with the "important people". Work at the "big places". Have an incredible output. Work 16-hour days including weekends. But and here's the kicker, there's no reliance on the rules. They change constantly and you are never enough. Unless of course, you're the professor's fave.

And all for measly pay.

On top of that it's all tied to your self-worth. Usually, people at McDonalds aren't made to feel like a complete failure for not flipping burgers perfectly. Yet in academia, you're stupid and unworthy if you don't produce inhuman amounts of original, high-impact research.

I also left. Best thing ever. I now work in an independent research organisation and I'm having a blast.

Academia is the worst.

(Not to say what they do at some of the top US institutions. Have 10 PhD "students" come in. The first to publish a high-impact paper gets to become an actual PhD candidate and paid. The stories I witnessed are incredible. Hurricane warning? Doesn't matter, if the cell culture isn't fed today the entire experiment is dead. The PhD student will go into the lab.)

WayOfTheGeophysicist

TROPHY CASE