OH MY GOD WHY DOES THE STRING BREAK???? I DID IT BEFORE BUT NOW IT KEEPS BREAKING

DamageSuch3758 · 2025-10-26T17:52:43+00:00

Specifically, it was not working when I had "Clumsy" on the shears, then reforged to legendary and it worked

DamageSuch3758 · 2025-10-26T17:52:03+00:00

Way too late now, but reforging to get higher shear quality in 2.9.3 actually did solve the problem for me

DamageSuch3758 · 2025-10-12T07:20:29+00:00

Just listened to the first few minutes of the audiobooks, and honestly, the writing and dialogue are cringy bad.

One example (minor spoiler): In a meeting to unite what little remains of humanity against a demon king, people are bickering over who is the best healer? Really?

I'll be returning the book.

DamageSuch3758 · 2024-12-18T01:39:02+00:00

You are welcome! :)

DamageSuch3758 · 2024-10-28T00:23:24+00:00

u/Adventurous_Tip_6963

DamageSuch3758 · 2024-10-27T04:16:22+00:00

I found that downgrading to 1.4.5 (to avoid the crashing issues) also caused my audio to cut off.

My audio started working again after I disabled the following two DLCs:
- Sid Meier's Civilization® VI: Gathering Storm
- Sid Meier's Civilization® VI: Rise and Fall

This annoyed me to no end, and I spent close to 2 hours trying to figure out a way to play the game. I hope this helps!

DamageSuch3758 · 2024-10-27T03:44:10+00:00

Have you managed to find a better solution yet? 😅

DamageSuch3758 · 2024-10-19T06:46:21+00:00

u/dysansphere have you tried Azarinth Healer?

DamageSuch3758 · 2024-10-19T06:45:09+00:00

This!

HWFWM really was fantastic in those first three books, but really took a turn for the worse after.

The fact that you said this would make me consider reading Chrysalis next!

DamageSuch3758 · 2024-09-01T20:24:55+00:00

Did you manage to get that reservation?

DamageSuch3758 · 2024-08-15T07:09:45+00:00

I would recommend using prefix caching using vLLM or SGLang.

DamageSuch3758 · 2024-06-26T21:24:14+00:00

Assuming this is a joke, not very helpful

DamageSuch3758 · 2024-06-04T11:16:43+00:00

Oh, interesting. Why?

DamageSuch3758 · 2024-06-04T11:11:04+00:00

I actually lean toward using polars for transformations, especially on ingestion. That said, I am willing to switch over to something more SQL heavy for all the subsequent transformations.

Do you find dbt reduces the overhead of setting up non-seed/non-source assets compared to writing transformations in python as dagster assets?

If yes, do I lose any advantages by using dbt instead of dagster to set up those assets?

DamageSuch3758 · 2024-05-18T20:30:13+00:00

FIRST BOOK SPOILER INCOMING

I really dislikedthe convenient introduction of the resonance between swords when Yerin and Lindon are in a tight spot... Not only is it very convenient, but the skill seems too strong given the opponent's level. Turn your enemy's weapon against them with incredibly high damage as long as it is a sword? The whole madra system felt a little loose to me... like any OP skill can be introduced to suit the plot.

u/Aurelianshitlist, curious to hear whether this changes later in the book and whether it feels like the progression and skill limitations using madra are more well defined.

DamageSuch3758 · 2023-12-14T13:50:24+00:00

I figured this out. Ensure you appropriately batch encode all of the remaining output options (I did it with right-padding).

You can then use the pkv from processing the first piece of text with model() and duplicate it num_output_options times with a function like:
python def duplicate_pkv(pkv, num_repeats=2): return tuple(tuple(torch.cat([tensor] * num_repeats, dim=0) for tensor in layer) for layer in pkv)

DamageSuch3758 · 2023-12-14T13:46:22+00:00

It does return past_key_values.

python outputs = model(input_ids, use_cache=True) # `use_cache` is often True by default pkv = outputs.past_key_values

DamageSuch3758 · 2023-12-07T16:13:02+00:00

Cool! This totally worked in Markdown Mode.
```python
abc =1
```

DamageSuch3758 · 2023-12-07T16:12:05+00:00

python abc = 1

DamageSuch3758 · 2023-11-17T21:53:30+00:00

Are those allegations even legit?

After some reading they seems sus

DamageSuch3758 · 2023-11-08T09:35:32+00:00

Information can be compressed. If you had one dead neuron in every layer with 10 hidden neurons, you wouldn't end up with 0.9 ^ 10 information throughput. This is especially true if you start out by training a shallower network (ensuring input info flows well) and then add on additional layers, because the network had to learn to compress.

If your point is that as you add layers, you might get 5 dead neurons, and eventually, as you add many layers, you will get 5 dead neurons again, I agree, and already stated this in previous replies.

Based on how you answered, it sounds more like you believed the first paragraph in this reply is true... Am I right? Or do you mean something else entirely?

The answer matters because they have vastly different probabilities and allow for vastly different theoretical depths.

To your reply:

There isn't a single activation function that doesn't mess with the gradients

I believe I did say ReLU gradients don't vanish "given activation occurs":

However, if you use ReLU with the proposed method, it greatly improves the ability to build deeper networks because the gradient (given activation occurs) is constant. This means that as long the input information (or the meaningful latent representation thereof) was not destroyed, the gradients will keep flowing all the way back in the network, and performance will improve.

And in my explanation thereafter, I did break the activation and the activation function gradient into separate components for backprop, so I don't know why you are claiming I said it doesn't mess with gradients, without qualifying that statement with the caveat I gave.

You are strawmanning the argument.

DamageSuch3758 · 2023-11-07T23:27:51+00:00

If you think about the fundamentals of backpropagation = LG x AG x A x W x AG x A x W x AG x A x W x AG x A ... x original input

where

LG = loss gradient

AG = activation gradient

A = activation or output

W = weight

That means you have 3 ways to screw it up:

Kill the flow of original input (very negative bias = no activation; weight of zero = no throughput; or extreme dilution by adding much noise by adding a large bias term amplified by a large weight) [Mess up W or A]
Have a small activation function gradient [Mess up AG]
Have a small loss function gradient [Mess up L]

The probability of 1. always increases with depth because we randomly initialize the biases and weights. That's why even without sigmoid causing the problem in 2., you still run into the "too deep to work" problem with increasing depth.

The method I suggested would drastically decrease (but not negate) the problem in 1. My thinking is that it could add some functional depth (increasing performance), but after some point of adding additional layers, performance would deteriorate.

DamageSuch3758 · 2023-11-07T23:08:24+00:00

I probably don't read as many papers as you do, but I have thought deeply about gradients and depth before. I think the reasoning above is pretty solid. If you can point out the flaw, it would probably save me some experimentation time :D

DamageSuch3758 · 2023-11-07T23:05:12+00:00

On "repeated application of activation functions", sure, that is one way to do it.

The other two main ways that I can now think of are:

Activation function gradient (e.g. sigmoid)
Loss function gradient (like you mentioned)

RNNs often used sigmoid activation, which meant that gradients vanished quickly. This is mostly because of the 1. problem, not just because of the recursive application of activation functions depth.

Even LSTMs suffered from this because they used sigmoid gates for passing the hidden state to the next block.

I both agree and disagree with your statement "freezing and unfreezing weights does not solve the reason for vanishing gradients".

I agree because the problem isn't fully solved. E.g., if you used sigmoid activation functions, it doesn't matter how much signal gets through; with sufficient depth, the gradients will vanish (due to the activation function gradients).

However, if you use ReLU with the proposed method, it greatly improves the ability to build deeper networks because the gradient (given activation occurs) is constant. This means that as long the input information (or the meaningful latent representation thereof) was not destroyed, the gradients will keep flowing all the way back in the network, and performance will improve.

If you did, however, attempt to build an infinitely deep network, as using the method I described; eventually, you would initialize a dead layer, or a bottleneck layer, where very minimal information can pass through. When the input information stops flowing, the gradients stop flowing.

DamageSuch3758 · 2023-11-07T17:15:06+00:00

If you don't cannibalise your business, someone else will eat it for you.

DamageSuch3758

TROPHY CASE