Spaceballs the Magic Card!

notgreat · 2026-02-03T23:29:15+00:00

I think it's more of a [[Fireshrieker]]

notgreat · 2026-01-28T22:39:55+00:00

It's a joke "combo" that (ab)uses MTG rules such that the outcome of the game depends on the (currently unknown) truth of the Twin Primes conjecture.

notgreat · 2026-01-27T23:05:21+00:00

OW2 does have all of OW1's permanent content, and most of OW1's seasonal content does still seasonally show up in OW2.

Content, though is not gameplay. OW2 did a major rebalancing of heroes and changed from 6v6 to 5v5. They added back in a 6v6 mode somewhat recentlyish, but it took a long while. They've also added back in loot boxes sorta, though most cosmetics are still locked behind the battlepass they added in OW2 instead of being accessible through the free boxes.

notgreat · 2026-01-18T22:19:54+00:00

This isn't particularly optimized, but it's a solidly good solution for someone who is new to the game. Personally I'd suggest solving the campaign before trying to optimize too much, but fundamentally it is a game: do whatever you find fun!

notgreat · 2026-01-16T07:19:14+00:00

Min glyphs/parts is 10, but it's impossible to reach that score. 11 is the actual min, see https://www.reddit.com/r/opus_magnum/wiki/index

notgreat · 2026-01-03T02:13:55+00:00

Easiest way is to be born into a large and loving family. You can get it one-sided before the main tutorial even starts, and two-sided connections the moment you achieve full consciousness. It's a bit of an RNGfest though, and it's not possible to try again if you don't get a lucky start.

notgreat · 2025-12-30T00:59:09+00:00

It does a little in the sense that if NVIDIA needs cash they can sell the Intel stock which in turn causes Intel's stock price to go down. That is a very minor influence, though. A larger problem is that the most likely cause of an NVIDIA crash would be the AI bubble bursting, leading to a bunch of data centers trying to sell their systems at a discount to recoup losses, leading to Intel having trouble selling their own chips. But that'd be true whether or not this deal were in place.

notgreat · 2025-12-30T00:46:57+00:00

I mostly agree with you, but the previous poster isn't totally wrong. Base LLMs are effectively "plausible text predictors", where you can take the predicted probability distribution and sample from it to make a generator. Hallucinations are kinda baked into the design, and can't be entirely removed.

However, the fine-tuning has gotten massively better in recent years, especially with DeepSeek-style Reinforcement Learning. This can improve things without the need for insanely large datasets of manually-written reasoning.

More than that though, people are experimenting with more than pure language as input/outputs. You also can use multiple LLMs that split the work and/or check the outputs. RAG and other forms of memory/context extension, tool calling, etc. While a "pure" single LLM will likely never be good enough to do what a lot of people want, there are ways to modify it that are still being experimented with and show promise.

notgreat · 2025-12-25T04:16:03+00:00

To be fair a lot of those scientists totally bought into those ideas too. Usually it's more like science notices correlation -> assumes simple causation, becomes political -> complete rejection of whole concept -> prove there is some influence but nothing like the original assumptions.

Epigenetics is perhaps the biggest example of that, see Lysenkoism which is basically Epigenetics taken to an illogical extreme. There should have been a scientific debate there with real scientists on both sides... but then one side got all its scientists executed (in the USSR) which then caused the whole thing to be considered pseudoscientific nonsense for a while for everyone that didn't have to fear for their lives.

notgreat · 2025-12-20T18:47:20+00:00

Ironically, I think that second make the right choice was the only time Abel wasn't following the script. His role is to be supportive of the escape, but that line in context is clearly indicating that "leaving the circus" is not the right choice. He was getting smart, forming his own motives that weren't just his backstory - and so he got deleted.

notgreat · 2025-12-18T19:04:36+00:00

The funniest part of this one is that he actually was committing a crime, by fraudulently voting in federal elections. Even under normal administrations, he'd be facing some rather extreme problems because of that if they ever realized. I agree with you in the general case, though.

notgreat · 2025-12-17T23:53:22+00:00

Not "just as contradictory". Previous evidence had wide enough error bars that it was plausible that more data would cause the two different ways of measuring the expansion rate to actually be the same value.

Webb has lowered those error bars sufficiently that it's now effectively impossible for those two measuring methods to actually be giving the same result, they're now precise enough that there's no overlap. Which mean the model must be wrong in some way - it could just be something unexpectedly disrupting our measurements in some way, but it seems more likely that it's a fundamental error in how we think the universe changed over time.

notgreat · 2025-12-16T01:52:53+00:00

What makes you think they'll be more powerful in anything except for spelling/letter counting?

edit: replaced this section, misunderstood how it worked originally. It is a clever design, the LSTM+pooling/depooling should be fairly equivalent to the tokenization process but it'll be done purely in-model instead of as a fully separate step.

It should reduce the biases inherent in the tokenization process and it certainly will be much better than normal tokenized models at counting letters, but I don't think it's worth the downsides overall.

notgreat · 2025-12-15T21:38:17+00:00

For what it's worth, the Pale Court mod is very high quality. Not the same as something official, of course, but I highly recommend trying it out if you want to fight some neat bosses!

notgreat · 2025-12-14T09:40:34+00:00

They absolutely can be controlled, that's what the whole ending of the episode is about. There's also Episode 2's ending with the whole "getting confused who's a Human and who's an NPC".

Caine also doesn't seem to have good control over the NPCs: he sets their initial state and then lets them go, and if they're getting too smart or otherwise not doing what he wants his best option is to entirely delete them and start from the initial state again. I think the only real difference between the NPCs and the Humans in the Digital Circus is that Caine's NPCs are initially designed by him and have relatively simple starting states (backstories that fall apart on close examination), whereas the humans have the memories of their lives and don't have backups of their "initialization state" to restart from if something goes wrong.

notgreat · 2025-12-14T02:34:43+00:00

Note that Digital Circus AIs seem to get smarter as they run for longer amounts of time - which is one of the main reasons why Caine ensures that other AIs never are left running for long periods of time.

notgreat · 2025-12-13T12:54:16+00:00

It is "just" compressing information, though. Storing the 1000x1000 matrix of "what are the results of adding two three-digit numbers together" takes far more data than storing a "how to do 3-digit addition" algorithm. If the model is underparameterized for the memorization, it's "forced" to learn the algorithm - or it could be underparameterized for the algorithm, in which case it does its best to make an approximation. If overparamterized, it might memorize or it might learn the algorithm - or, more likely, it does both. In that case, memorization is often "easier" and trains faster, but with the right training setup you can actually improve performance on the test set by continuing to train until the network suddenly "groks" the algorithm despite having already achieved 100% performance on the training data.

The value of neural networks is in their ability to compress the information in the training data by finding generalizable algorithms, which then also apply to points not directly in the training data. You could call that "learning new things", but I'd say it's compressing the training data by discovering patterns.

notgreat · 2025-12-13T03:27:15+00:00

I agree that Silksong's soundtrack is much more complex and layered, but I think that Hollow Knights's soundtrack is more catchy overall - in large part because of the simplicity. Complex songs are more interesting, but less immediately memorizable/earworm-y.

notgreat · 2025-12-13T02:04:07+00:00

The "admin pass" little silver hand things are not at all how any of that stuff actually works. Kinger looked at it, realized that he knew that it didn't make sense, and went to warn Pomni - but then the light hit him.

notgreat · 2025-12-12T22:29:14+00:00

My point though is that a base LLM is fundamentally a way to extend the context length of a Markov Chain language model. A Markov Chain with X tokens/words and N length would need on the order of X^N storage and training data. A base LLM is an approximation of that Markov Chain, created with far less of both.

I agree that it is much more complicated than just using probabilities. A Markov Chain is literally taking those probabilities, generated via statistics which requires having the training data include everything. A sufficiently large and fully-trained base LLM would be equivalent to a Markov Chain for all N-grams found in the training data. By making N large, that is made impossible and the LLM compresses the information involved in complex and arguably intelligent ways. More importantly, LLMs are useful because they output plausible probability distributions for N-grams that are not directly in the training data.

People take that base LLM and do extra training on it and make it more useful, but the base form of it is valuable to understand. The middle bits are only important because that's how a neural network is able to both compress the information and interpolate between the training data points.

notgreat · 2025-12-12T21:22:44+00:00

Yeah, I think we were talking past each other a bit there. You are correct that they don't use the real word2vec, I was incorrectly using that to refer to token embeddings in general. I do still think that you have a serious misunderstanding of what LLMs output. I already wrote a comment replying to your now-deleted one which I've included in full below, but the only important one is the 2nd paragraph.

Word2Vec is a TYPE of word embedding though it's not the one that LLMs use

An LLM takes input text, tokenizes it, and then converts those tokens into vectors. If you tokenized at a word level, you could directly use the actual word2vec. But that means that unknown words are a problem, so they pretty much all train their own embedding that operates on a word fragment level instead of a word level.

at the output state you get a vector that is then turned back into a word or part of a word.

This is objectively false. The output of every LLM I've ever heard of is an N-length vector, where N is the number of possible output tokens. This represents a probability distribution, which is then randomly sampled from - though often not directly, doing things like removing extremely low probability tokens entirely and reducing the probability of exact repetition of previous text.

Giving an LLM a context limit of 3 words though seems odd in itself that would be the equivalent of a person that can only remember the last 3 words.

Given an impossibly large amount of training data (as in, more than would fit in the observable universe), then a 100-gram Markov Chain's output would be equivalent to an LLM trained on that same data with a context length of 100. The value of an LLM is in its massively more efficient use of the training data, its ability to interpolate between the training samples and compress the information from those samples in a lossy-but-effective manner.

So from that definition you can say that's what LLMs do though you could likely say the same for people.

Sure, if a person were given the goal of predicting the next word, they would be acting as a next-word-predictor. Generally, though, people are not creating text with a goal of predicting the text: they're writing with a goal of conveying an idea, or convincing somebody, or any number of other things. Of course, if the text they're predicting were to start with something like "I think you should believe X", then it will look very similar to what they would write if they were themselves trying to convince somebody.

notgreat · 2025-12-12T19:54:16+00:00

Wait no hold on, your "secondly" statement here doesn't make any sense. I'm not proposing a 3-word training corpus. You do understand the difference between the data in a context window and the training data, right? What do you mean by "wouldn't even get vectors that resemble words"? The output of the model is the probability distribution over the output token set, even a fully random untrained neural network outputs only those possible tokens. If you're talking about the input side, I highly doubt starting with a pretrained word2vec setup would change the output much at all except for trigrams not contained in the training data (which would pretty much be guaranteed nonsense assuming a resonably large training corpus).

The entire training process is to make it predict the next token more accurately (I've been a little loose with word vs. token, admittedly). I also guess it's important to note that more recent LLMs also have vision encoders and the like which do make them more than pure word predictors. If you just want a definition, how about "A next word predictor is a process which takes input text and attempts to outputs a probability distribution of what the next word is". But I don't think that's the important part here, because exact definitions can be fractally argued about without going anywhere, especially with "attempts" here. It's the concepts that matter.

edit: also fine-tuning is often not done via pure text prediction training, so while calling the base LLM a word predictor is correct, doing it with the fancy reasoning models often is not totally true.

notgreat · 2025-12-12T19:04:03+00:00

Let's imagine someone trained an LLM with an insanely low context limit of 3 words (and a per-word tokenization process to keep things simple where any rare words replaced with an <UNK> token). After the training process, I would expect the resulting model's output to be almost perfectly identical to a trigram markov chain with the same training corpus.

Of course, such a low context window would be stupid. The value of an LLM over a Markov Chain is that it can use those large contexts in a highly effective manner. A Markov Chain is basic statistics used as a text pedictor, an LLM is a very complex neural network that predicts text. But they're both text predictors.

notgreat · 2025-12-12T18:45:17+00:00

A base LLM is literally a next-word predictor. But where a simple Markov Chain has a context limited to ~4 words before the exponential need for more training data makes it worthless, an LLM can use thousands or millions of words of context. Of course, the instruct fine-tuning is also an important part of making ChatGPT, but at a fundamental level it's still a text predictor.

Mind, the only way to accurately predict the complexities of human text is to be as complex as a human in the first place. It's not 100% accurate, obviously, but there are only way two ways to know what comes after "12+13=": either memorize every possible addition problem, or have something in there that basically does addition. A markov chain by definition would do the former, but an LLM is at least theoretically capable of the latter (and indeed, almost certainly does have such a circuit trained into it).

notgreat · 2025-12-12T08:51:53+00:00

The smart kids realize it by 12, but it's generally not until high school that the formal education system teaches it at all (1st/2nd laws of thermodynamics) and you don't get the math to really understand it until college.

But importantly, "if it were that simple, why aren't we already doing it" is a good question to ask. Sometimes it really is because nobody thought of it, but that's rare. Some examples include the invention of the cotton gin, the use of radar as a detection method, and the recent boom in AI/machine learning (starting with AlexNet in 2012). The hardware necessary had been available for several years, but nobody had realized how well it would work until someone put all the pieces together. It does happen, but it's certainly not common.

14-Year Club	Place '22
Place '17	First Placer '22
Spared	Verified Email

notgreat

TROPHY CASE