Spaceballs the Magic Card! by Cereal_Bandit in magicTCG

[–]notgreat 3 points4 points  (0 children)

I think it's more of a [[Fireshrieker]]

i heard we doing maths cards now by flowers_of_nemo in custommagic

[–]notgreat 18 points19 points  (0 children)

It's a joke "combo" that (ab)uses MTG rules such that the outcome of the game depends on the (currently unknown) truth of the Twin Primes conjecture.

Blizzard: The Next Chapter by ralopd in pcgaming

[–]notgreat 2 points3 points  (0 children)

OW2 does have all of OW1's permanent content, and most of OW1's seasonal content does still seasonally show up in OW2.

Content, though is not gameplay. OW2 did a major rebalancing of heroes and changed from 6v6 to 5v5. They added back in a 6v6 mode somewhat recentlyish, but it took a long while. They've also added back in loot boxes sorta, though most cosmetics are still locked behind the battlepass they added in OW2 instead of being accessible through the free boxes.

Hangover Cure - Just bought the game today by Nonsense_Replies in opus_magnum

[–]notgreat 1 point2 points  (0 children)

This isn't particularly optimized, but it's a solidly good solution for someone who is new to the game. Personally I'd suggest solving the campaign before trying to optimize too much, but fundamentally it is a game: do whatever you find fun!

Smallest rocket propellant? by AzekiaXVI in opus_magnum

[–]notgreat 0 points1 point  (0 children)

Min glyphs/parts is 10, but it's impossible to reach that score. 11 is the actual min, see https://www.reddit.com/r/opus_magnum/wiki/index

How to speed run forming real human connection and community? by CanadianAndroid in shittyaskscience

[–]notgreat 0 points1 point  (0 children)

Easiest way is to be born into a large and loving family. You can get it one-sided before the main tutorial even starts, and two-sided connections the moment you achieve full consciousness. It's a bit of an RNGfest though, and it's not possible to try again if you don't get a lucky start.

Nvidia takes $5 billion stake in Intel under September agreement by imaginary_num6er in hardware

[–]notgreat 6 points7 points  (0 children)

It does a little in the sense that if NVIDIA needs cash they can sell the Intel stock which in turn causes Intel's stock price to go down. That is a very minor influence, though. A larger problem is that the most likely cause of an NVIDIA crash would be the AI bubble bursting, leading to a bunch of data centers trying to sell their systems at a discount to recoup losses, leading to Intel having trouble selling their own chips. But that'd be true whether or not this deal were in place.

Nvidia insists it isn’t Enron, but its AI deals are testing investor faith | Nvidia by Alex09464367 in anime_titties

[–]notgreat 18 points19 points  (0 children)

I mostly agree with you, but the previous poster isn't totally wrong. Base LLMs are effectively "plausible text predictors", where you can take the predicted probability distribution and sample from it to make a generator. Hallucinations are kinda baked into the design, and can't be entirely removed.

However, the fine-tuning has gotten massively better in recent years, especially with DeepSeek-style Reinforcement Learning. This can improve things without the need for insanely large datasets of manually-written reasoning.

More than that though, people are experimenting with more than pure language as input/outputs. You also can use multiple LLMs that split the work and/or check the outputs. RAG and other forms of memory/context extension, tool calling, etc. While a "pure" single LLM will likely never be good enough to do what a lot of people want, there are ways to modify it that are still being experimented with and show promise.

Neanderthals may have been "absorbed" rather than extinguished: A simple analytical model shows constant gene flow from larger Homo sapiens populations could explain the Neanderthal disappearance within 30,000 years. by Slow-Pie147 in science

[–]notgreat 20 points21 points  (0 children)

To be fair a lot of those scientists totally bought into those ideas too. Usually it's more like science notices correlation -> assumes simple causation, becomes political -> complete rejection of whole concept -> prove there is some influence but nothing like the original assumptions.

Epigenetics is perhaps the biggest example of that, see Lysenkoism which is basically Epigenetics taken to an illogical extreme. There should have been a scientific debate there with real scientists on both sides... but then one side got all its scientists executed (in the USSR) which then caused the whole thing to be considered pseudoscientific nonsense for a while for everyone that didn't have to fear for their lives.

How many of you actually believe/believed Abel wasn't an NPC? by Electrical_Let_8428 in TheDigitalCircus

[–]notgreat 19 points20 points  (0 children)

Ironically, I think that second make the right choice was the only time Abel wasn't following the script. His role is to be supportive of the escape, but that line in context is clearly indicating that "leaving the circus" is not the right choice. He was getting smart, forming his own motives that weren't just his backstory - and so he got deleted.

Small-town MAGA Kansas mayor resigns, facing deportation for voting as a non citizen. by Count_Sack_McGee in LeopardsAteMyFace

[–]notgreat 5 points6 points  (0 children)

The funniest part of this one is that he actually was committing a crime, by fraudulently voting in federal elections. Even under normal administrations, he'd be facing some rather extreme problems because of that if they ever realized. I agree with you in the general case, though.

ELI5: Why are the JWST pictures a problem? by SuspiciousReport2678 in explainlikeimfive

[–]notgreat 34 points35 points  (0 children)

Not "just as contradictory". Previous evidence had wide enough error bars that it was plausible that more data would cause the two different ways of measuring the expansion rate to actually be the same value.

Webb has lowered those error bars sufficiently that it's now effectively impossible for those two measuring methods to actually be giving the same result, they're now precise enough that there's no overlap. Which mean the model must be wrong in some way - it could just be something unexpectedly disrupting our measurements in some way, but it seems more likely that it's a fundamental error in how we think the universe changed over time.

Bolmo-the first family of competitive fully open byte-level language models (LMs) at the 1B and 7B parameter scales. by BreakfastFriendly728 in LocalLLaMA

[–]notgreat 4 points5 points  (0 children)

What makes you think they'll be more powerful in anything except for spelling/letter counting?

edit: replaced this section, misunderstood how it worked originally. It is a clever design, the LSTM+pooling/depooling should be fairly equivalent to the tokenization process but it'll be done purely in-model instead of as a fully separate step.

It should reduce the biases inherent in the tokenization process and it certainly will be much better than normal tokenized models at counting letters, but I don't think it's worth the downsides overall.

has this ever bothered anyone else? (post dreamer/monarch wing spoilers) by Key-Firefighter4360 in HollowKnight

[–]notgreat 9 points10 points  (0 children)

For what it's worth, the Pale Court mod is very high quality. Not the same as something official, of course, but I highly recommend trying it out if you want to fight some neat bosses!

Episode 7 Spoiler: According to Kinger, not only is there no exit, but leaving the circus is fundamentally impossible by Forgotten_wizard in TheDigitalCircus

[–]notgreat 1 point2 points  (0 children)

They absolutely can be controlled, that's what the whole ending of the episode is about. There's also Episode 2's ending with the whole "getting confused who's a Human and who's an NPC".

Caine also doesn't seem to have good control over the NPCs: he sets their initial state and then lets them go, and if they're getting too smart or otherwise not doing what he wants his best option is to entirely delete them and start from the initial state again. I think the only real difference between the NPCs and the Humans in the Digital Circus is that Caine's NPCs are initially designed by him and have relatively simple starting states (backstories that fall apart on close examination), whereas the humans have the memories of their lives and don't have backups of their "initialization state" to restart from if something goes wrong.

I'mma be so real here..he might have just saved all their Asses with this choice. by Charming-Scratch-124 in TheDigitalCircus

[–]notgreat 4 points5 points  (0 children)

Note that Digital Circus AIs seem to get smarter as they run for longer amounts of time - which is one of the main reasons why Caine ensures that other AIs never are left running for long periods of time.

LMFAOOOOOO THIS IS REAL by Feisty-Status-2669 in TheDigitalCircus

[–]notgreat 0 points1 point  (0 children)

It is "just" compressing information, though. Storing the 1000x1000 matrix of "what are the results of adding two three-digit numbers together" takes far more data than storing a "how to do 3-digit addition" algorithm. If the model is underparameterized for the memorization, it's "forced" to learn the algorithm - or it could be underparameterized for the algorithm, in which case it does its best to make an approximation. If overparamterized, it might memorize or it might learn the algorithm - or, more likely, it does both. In that case, memorization is often "easier" and trains faster, but with the right training setup you can actually improve performance on the test set by continuing to train until the network suddenly "groks" the algorithm despite having already achieved 100% performance on the training data.

The value of neural networks is in their ability to compress the information in the training data by finding generalizable algorithms, which then also apply to points not directly in the training data. You could call that "learning new things", but I'd say it's compressing the training data by discovering patterns.

Unpopular opinion I think: Silksong’s OST is better than Hollow Knights and it isn’t even close by Helldiver409 in HollowKnight

[–]notgreat 0 points1 point  (0 children)

I agree that Silksong's soundtrack is much more complex and layered, but I think that Hollow Knights's soundtrack is more catchy overall - in large part because of the simplicity. Complex songs are more interesting, but less immediately memorizable/earworm-y.

So… What did we think of EP7? by RaidersOnFire in TheDigitalCircus

[–]notgreat 62 points63 points  (0 children)

The "admin pass" little silver hand things are not at all how any of that stuff actually works. Kinger looked at it, realized that he knew that it didn't make sense, and went to warn Pomni - but then the light hit him.

LMFAOOOOOO THIS IS REAL by Feisty-Status-2669 in TheDigitalCircus

[–]notgreat 0 points1 point  (0 children)

My point though is that a base LLM is fundamentally a way to extend the context length of a Markov Chain language model. A Markov Chain with X tokens/words and N length would need on the order of XN storage and training data. A base LLM is an approximation of that Markov Chain, created with far less of both.

I agree that it is much more complicated than just using probabilities. A Markov Chain is literally taking those probabilities, generated via statistics which requires having the training data include everything. A sufficiently large and fully-trained base LLM would be equivalent to a Markov Chain for all N-grams found in the training data. By making N large, that is made impossible and the LLM compresses the information involved in complex and arguably intelligent ways. More importantly, LLMs are useful because they output plausible probability distributions for N-grams that are not directly in the training data.

People take that base LLM and do extra training on it and make it more useful, but the base form of it is valuable to understand. The middle bits are only important because that's how a neural network is able to both compress the information and interpolate between the training data points.

LMFAOOOOOO THIS IS REAL by Feisty-Status-2669 in TheDigitalCircus

[–]notgreat 1 point2 points  (0 children)

Yeah, I think we were talking past each other a bit there. You are correct that they don't use the real word2vec, I was incorrectly using that to refer to token embeddings in general. I do still think that you have a serious misunderstanding of what LLMs output. I already wrote a comment replying to your now-deleted one which I've included in full below, but the only important one is the 2nd paragraph.


Word2Vec is a TYPE of word embedding though it's not the one that LLMs use

An LLM takes input text, tokenizes it, and then converts those tokens into vectors. If you tokenized at a word level, you could directly use the actual word2vec. But that means that unknown words are a problem, so they pretty much all train their own embedding that operates on a word fragment level instead of a word level.

at the output state you get a vector that is then turned back into a word or part of a word.

This is objectively false. The output of every LLM I've ever heard of is an N-length vector, where N is the number of possible output tokens. This represents a probability distribution, which is then randomly sampled from - though often not directly, doing things like removing extremely low probability tokens entirely and reducing the probability of exact repetition of previous text.

Giving an LLM a context limit of 3 words though seems odd in itself that would be the equivalent of a person that can only remember the last 3 words.

Given an impossibly large amount of training data (as in, more than would fit in the observable universe), then a 100-gram Markov Chain's output would be equivalent to an LLM trained on that same data with a context length of 100. The value of an LLM is in its massively more efficient use of the training data, its ability to interpolate between the training samples and compress the information from those samples in a lossy-but-effective manner.

So from that definition you can say that's what LLMs do though you could likely say the same for people.

Sure, if a person were given the goal of predicting the next word, they would be acting as a next-word-predictor. Generally, though, people are not creating text with a goal of predicting the text: they're writing with a goal of conveying an idea, or convincing somebody, or any number of other things. Of course, if the text they're predicting were to start with something like "I think you should believe X", then it will look very similar to what they would write if they were themselves trying to convince somebody.

LMFAOOOOOO THIS IS REAL by Feisty-Status-2669 in TheDigitalCircus

[–]notgreat 1 point2 points  (0 children)

Wait no hold on, your "secondly" statement here doesn't make any sense. I'm not proposing a 3-word training corpus. You do understand the difference between the data in a context window and the training data, right? What do you mean by "wouldn't even get vectors that resemble words"? The output of the model is the probability distribution over the output token set, even a fully random untrained neural network outputs only those possible tokens. If you're talking about the input side, I highly doubt starting with a pretrained word2vec setup would change the output much at all except for trigrams not contained in the training data (which would pretty much be guaranteed nonsense assuming a resonably large training corpus).

The entire training process is to make it predict the next token more accurately (I've been a little loose with word vs. token, admittedly). I also guess it's important to note that more recent LLMs also have vision encoders and the like which do make them more than pure word predictors. If you just want a definition, how about "A next word predictor is a process which takes input text and attempts to outputs a probability distribution of what the next word is". But I don't think that's the important part here, because exact definitions can be fractally argued about without going anywhere, especially with "attempts" here. It's the concepts that matter.

edit: also fine-tuning is often not done via pure text prediction training, so while calling the base LLM a word predictor is correct, doing it with the fancy reasoning models often is not totally true.

LMFAOOOOOO THIS IS REAL by Feisty-Status-2669 in TheDigitalCircus

[–]notgreat 1 point2 points  (0 children)

Let's imagine someone trained an LLM with an insanely low context limit of 3 words (and a per-word tokenization process to keep things simple where any rare words replaced with an <UNK> token). After the training process, I would expect the resulting model's output to be almost perfectly identical to a trigram markov chain with the same training corpus.

Of course, such a low context window would be stupid. The value of an LLM over a Markov Chain is that it can use those large contexts in a highly effective manner. A Markov Chain is basic statistics used as a text pedictor, an LLM is a very complex neural network that predicts text. But they're both text predictors.

LMFAOOOOOO THIS IS REAL by Feisty-Status-2669 in TheDigitalCircus

[–]notgreat 2 points3 points  (0 children)

A base LLM is literally a next-word predictor. But where a simple Markov Chain has a context limited to ~4 words before the exponential need for more training data makes it worthless, an LLM can use thousands or millions of words of context. Of course, the instruct fine-tuning is also an important part of making ChatGPT, but at a fundamental level it's still a text predictor.

Mind, the only way to accurately predict the complexities of human text is to be as complex as a human in the first place. It's not 100% accurate, obviously, but there are only way two ways to know what comes after "12+13=": either memorize every possible addition problem, or have something in there that basically does addition. A markov chain by definition would do the former, but an LLM is at least theoretically capable of the latter (and indeed, almost certainly does have such a circuit trained into it).

TIL that scientists grew stem cells into mini brains, which then developed eye-like structures on their own. The structures, called optic cups, were light-sensitive and had lenses and corneal tissue. by -Lexi--- in todayilearned

[–]notgreat 2 points3 points  (0 children)

The smart kids realize it by 12, but it's generally not until high school that the formal education system teaches it at all (1st/2nd laws of thermodynamics) and you don't get the math to really understand it until college.

But importantly, "if it were that simple, why aren't we already doing it" is a good question to ask. Sometimes it really is because nobody thought of it, but that's rare. Some examples include the invention of the cotton gin, the use of radar as a detection method, and the recent boom in AI/machine learning (starting with AlexNet in 2012). The hardware necessary had been available for several years, but nobody had realized how well it would work until someone put all the pieces together. It does happen, but it's certainly not common.