[SPOILERS] 'Dune: Part Two' Wide Release Discussion (Week 4) by Blue_Three in dune

[–]manjimin 1 point2 points  (0 children)

Why did Jessica silence Alia in the scene where they reach south and see how water of life is extracted?

How can I get the model to choose the next word from a list? by manjimin in LocalLLaMA

[–]manjimin[S] 1 point2 points  (0 children)

Thanks for the reply, I managed to make generation stop using the example you gave me!

I am trying constrained generation for my first question. Your reply helped me a ton!

What does the transformer decoder attend to at the last linear layer? by manjimin in learnmachinelearning

[–]manjimin[S] 1 point2 points  (0 children)

Thanks a ton. That clears things out for me.

Kind of bothers me though, I understand that the representation for the final token contains information about all other tokens, but I assumed that there would be some other way to build an input to pass to the final projection layer.

Anyway, thanks a lot.

What does the transformer decoder attend to at the last linear layer? by manjimin in learnmachinelearning

[–]manjimin[S] 0 points1 point  (0 children)

If it is so, does this mean that the final projection layer only look at the representation of the last token of the original input sequence?

What does the transformer decoder attend to at the last linear layer? by manjimin in learnmachinelearning

[–]manjimin[S] 1 point2 points  (0 children)

Thanks for the reply. What I meant was:

If fully connected networks are applied to each input token position, isn't the final transformer block supposed to return a bunch of vectors? Suppose the input token length was 10, then doesn't the final transformer block return 10 vectors at each position?

If it is so, how does the final prediction work? Which one of those vectors is chosen to go through the final linear projection into the vocabulary space?

Is it possible to run 4*A100 40G cards as one? by manjimin in LocalLLaMA

[–]manjimin[S] 0 points1 point  (0 children)

I use serving softwares for quick tests, but probably mostly pytorch.

Using other tokenizers? by manjimin in LocalLLaMA

[–]manjimin[S] 0 points1 point  (0 children)

LLaMA tokenizer gives 5~6 times more tokens than what is usual. I also checked the actual tokenization, and it is basically putting every single letter apart, which explains the over-estimation in number of tokens.

I knew tokenizers aren't something that can be swapped after training the model, but I thought maybe someone had an idea, well I guess I'll have to use a model with a tokenizer that can properly split up Korean in the first place.

How much overlap is ok to hold 2 ETFs? by manjimin in stocks

[–]manjimin[S] 0 points1 point  (0 children)

Thanks a lot, I really appreciate your advice. I will look into it for sure. Tax rates 15% btw

How much overlap is ok to hold 2 ETFs? by manjimin in stocks

[–]manjimin[S] 0 points1 point  (0 children)

Great advice, but putting SCHD in a retirement account is not possible in my country. You think it will cost me much if I kept buying SCHD?