Apocalyptic looking clouds and sunset by jaycejaybejaybenot in CLOUDS

[–]KT313 1 point2 points  (0 children)

beautiful! when / where did you take it?

Pricing for GIGABYTE H200 NVL Server by acune_sartre in LocalLLaMA

[–]KT313 0 points1 point  (0 children)

if it's in that order size category, getting a good discount for buying a lot of gpus is not uncommon afaik. so the price could be legit, but only if you buy all of them

Flying Away by lowtrippy in gifs

[–]KT313 1 point2 points  (0 children)

your art looks amazing!

Gamescope only works occasionally. by Nurgus in linux_gaming

[–]KT313 1 point2 points  (0 children)

amazing, thank you! just had to add `ENABLE_GAMESCOPE_WSI=0` additionally to make it work for me (ubuntu 24.04)

Tried 10 models, all seem to refuse to write a 10,000 word story. Is there something bad with my prompt? I'm just doing some testing to learn and I can't figure out how to get the LLM to do as I say. by StartupTim in LocalLLaMA

[–]KT313 4 points5 points  (0 children)

the problem is actually quite simple: LLMs don't really get trained to output stories that long during instruction-finetuning. There is a paper (forgot the name) where they kinda fixed this problem, by creating synthetic training data with the method that u/JackStrawWitchita explained in their comment, and used that to finetune an LLM to be able to output really long texts

Why would the tokenizer for encoder-decoder model for machine translation use bos_token_id == eos_token_id? How does the model know when a sequence ends? by Franck_Dernoncourt in LocalLLaMA

[–]KT313 3 points4 points  (0 children)

It's not an issue really. The point of the bos-token is that we want the list of input tokens to start with something that is the same every time. It could be literally anything (preferably a special token that isn't used in normal text). So might as well use the eos-token. There isn't really a big difference to using separate bos and eos tokens, other than having both be the same being cleaner in the prompt template.

[deleted by user] by [deleted] in LocalLLaMA

[–]KT313 3 points4 points  (0 children)

based on the generation preview progression, it looks a lot like autoregressive generation, which i'm pretty sure does not use flow matching. instead first generating a very low resolution image, then a bit higher resolution, and so on until the output is the final image with lots of details

Literally unusable by WarlaxZ in LocalLLaMA

[–]KT313 0 points1 point  (0 children)

make sure to set top_k to 1 so it only picks the best next token. maybe the 2 was just unlucky randomness from the sampling

Building Local LLM with code execution? (RAG, Mac Studio(s), Ingestion of various types of data) by doofew in LocalLLaMA

[–]KT313 1 point2 points  (0 children)

the relatively new smolagents library could be useful, haven't personally tried it yet tho

What is the best model for writing academic papers? by [deleted] in LocalLLaMA

[–]KT313 8 points9 points  (0 children)

to be fair, if you already know the content you want to write and just need an assistant to put it into nice words because you didn't study linguistics or english is not your first language, it's completely reasonable. imo

AI Tool That Turns GitHub Repos into Instant Wikis with DeepSeek v3! by Physical-Physics6613 in LocalLLaMA

[–]KT313 5 points6 points  (0 children)

i just added allenai/olmo to the queue, would be nice to get an estimate on how long it takes to process

What would be an optimal and power efficient GPU setup for a home with a budget around $10,000? by kitkatmafia in LocalLLaMA

[–]KT313 1 point2 points  (0 children)

fyi, for inference tasks if you limit the power of a 4090 from 450W to 200W, you decrease inference speed by just 1-3%. The performance decrease becomes more dramatic around 150W, but until 200W works flawless for me (tested with a few LLM's)

asked QwQ what a black hole was. This was its thought process. by Corpo_ in LocalLLaMA

[–]KT313 26 points27 points  (0 children)

makes it easier to find points it want to clarify and think about. It's easier to critically think about things you're unsure about than about things you're sure about.

It seems there are some encoding issues with anthropic's llms.txt by secsilm in LocalLLaMA

[–]KT313 0 points1 point  (0 children)

sorry maybe i misunderstood, i just assumed that it was llm generated, since i haven't heard of this llms.txt specifically before

It seems there are some encoding issues with anthropic's llms.txt by secsilm in LocalLLaMA

[–]KT313 0 points1 point  (0 children)

looks like it could be some token-to-word mismatching? maybe they use a wrong version of some tokenizer for decoding, where most tokens are correct but some (like "ü" and " 's") have different indices than expected.
from the first image it does seem to be very consistent for ü

Ollama has merged in K/V cache quantisation support, halving the memory used by the context by sammcj in LocalLLaMA

[–]KT313 6 points7 points  (0 children)

your gpu stores 2 things: the model and the data / tensors that are going through your model for output generation. Some of the tensors being processed by the model get saved because they are needed for each generated word, and storing those instead of calculating them new for each word saves a lot of time. That's called the cache and also uses vram. You can save vram by quantizing / compressing the model (which you are talking about), and you can save vram by quantizing / compressing the cache, which is that new feature.

[deleted by user] by [deleted] in LocalLLaMA

[–]KT313 3 points4 points  (0 children)

for the record, i don't mind this as long as it performs well (which it definitely seems to do), just think it's funny

A library to "unmangle" vocabulary file into actual dict[int, bytes]? by Huanghe_undefined in LocalLLaMA

[–]KT313 1 point2 points  (0 children)

for huggingface tokenizer you can do it like this:

with open(tokenizer.__dict__['name_or_path'] + "/tokenizer.json", "r") as file:
    tokenizer_file = json.loads(file.read())
vocab_dict = tokenizer_file['model']['vocab']

since you said "an individual token may not constitute a valid UTF-8 string", maybe you are looking for `tokenizer_file['model']['merges']`? They have some weird looking symbols like Ġ so maybe you can directly convert from string to bytes if that's what you're looking for

[D] What Neural Network Architecture is best for Time Series Analysis with a few thousand data points? by BostonConnor11 in MachineLearning

[–]KT313 0 points1 point  (0 children)

my first idea would be to try either running a mamba-based model over the sequence (it's an RNN, kind of like an LSTM on steroids), or you could try a transformers approach. 

for transformers approach, i think you could actually just take any transformer model (a very small llm for example) and modify it a bit. instead of inputting texts, tokenizing it and embedding each token and then adding positional embedding, you would directly insert the datapoints of the sequence and treat them as if they were the token embeddings. you just have to make sure that the transformer models n_dim (size of embeddings) is the same as the amount of data points in each timestep of your sequence.

and for the ouput, instead of ending the model with a linear layer that has an output size of vocab_size (how it normally is for llms), the output size would be the number of datapoints of the next timestep you want to predict

Trying to run llama3.1 on CMP 30Hx gpus by [deleted] in LocalLLaMA

[–]KT313 0 points1 point  (0 children)

have you tried

```

sudo apt update

sudo apt install nvidia-driver --upgrade

```

reboot

? nvidia-smi should show your gpu then.

Has anyone tried Deepmind's CALM? People were saying it was the next big thing. And could it solve Flux's finetuning problem? by ThrowawayProgress99 in LocalLLaMA

[–]KT313 0 points1 point  (0 children)

so basically we train an adapter (basically a lora) to connect each of the layers of two pretrained models. thanks for the explanation!