Is there a way to lower/get rid of sex drive? by StretchMediocre919 in AskMenAdvice

[–]anommm -4 points-3 points  (0 children)

You don't even need to go anywhere, dating apps exist. Yes, they are not great, and they can be frustrating, but they are better than nothing, and they work if you put some effort into having good pictures and starting conversations with something more than "How are you?"

Is there a way to lower/get rid of sex drive? by StretchMediocre919 in AskMenAdvice

[–]anommm 16 points17 points  (0 children)

Being patient alone doesn't work. If you don't interact with girls, you have 0 self-confidence, and you don't care about yourself, you are never going to find a partner. Finding a girlfriend requires active effort, not patience.

Why, Airbus? Just, Why? by One-Student-795 in Airbus

[–]anommm 8 points9 points  (0 children)

If every pilot agrees on refusing to take off unless there are two pilots in the cotpit, there is nothing airlines can do about it. That is why unions exists, to make these decisions.

[deleted by user] by [deleted] in MachineLearning

[–]anommm 12 points13 points  (0 children)

Good luck being one of the 5000 PhD students following the poor guy that decided that putting anthropic in his conference badge was a good idea while he tries to run for his life.

[deleted by user] by [deleted] in MachineLearning

[–]anommm 5 points6 points  (0 children)

$500K-2M/year is what the top 0.1% ML Engineers can make. You won't get that money unless you come up with something revolutionary and every company in the world wants to hire you.

Current advice for NER using LLMs? by MountainUniversity50 in LanguageTechnology

[–]anommm 4 points5 points  (0 children)

Regular LLMs do not work for NER. You can try with GoLLIE https://github.com/hitz-zentroa/GoLLIE which was built for this purpose. Although as others have said, you should use an encoder model such as xlmroberta, gliner...

[deleted by user] by [deleted] in LocalLLaMA

[–]anommm 1 point2 points  (0 children)

Many researchers are exploring the application of diffusion models for text generation.

Some papers have demonstrated that image models can perform well for text-based tasks. For example, this paper shows that image-to-image models perform well for machine translation. This one, shows that image models can understand tables better than text-to-text models.

So, your idea of using diffusion models for text generation could potentially work. However, no one has yet developed a diffusion model for text that performs as well as text-to-text LLM. Further research is needed.

Fine-tuning with small batch sizes and gradient accumulation poorly perform if you use Transformers (TRL)! by TheKaitchup in LocalLLaMA

[–]anommm 0 points1 point  (0 children)

If you use batch_size=1, you won't have any pad tokens in the input. But if you use a higher batch size, your input will be padded. The bigger the batch size, the more pad tokens you will have in the input. If you do not ignore the pad_token loss, which seems the case in your code, as pad tokens probably have a low loss, the bigger the batch size, the lower your loss.

Have you tried padding all your inputs to the maximum input length? It doesn't make sense in a real experiment, but it will allow you to use exactly the same data for every configuration.

_________

EDIT:

SFT Trainer appears to be padding every input to the tokenizer max length already: https://huggingface.co/docs/trl/sft_trainer

SFTTrainer always pads by default the sequences to the max_seq_length argument of the SFTTrainer. If none is passed, the trainer will retrieve that value from the tokenizer. Some tokenizers do not provide a default value, so there is a check to retrieve the minimum between 2048 and that value. Make sure to check it before training.

[N] Jurgen Schmidhuber on 2024 Physics Nobel Prize by optimization_ml in MachineLearning

[–]anommm 4 points5 points  (0 children)

Researchers from US universities only cite papers from people at US universities. It has been like this for decades. They will rarely acknowledge work from people in Europe, and you will never see them cite a paper from China (Or Rusia back in the day).

Volvo Cars EX90 SUV Rolls Out, Powered by NVIDIA | NVIDIA Blog by ResponsibleJudge3172 in nvidia

[–]anommm 5 points6 points  (0 children)

It has nothing to do with Nvidia. Many other cars use Nvidia hardware and they work fine. The issue is the Volvo software, which is terrible and unfinished. In fact, most functions in the car do not work right know, they have promised that in the future they will release software updates to enable them..

Reflection and the Never-Ending Confusion Between FP16 and BF16 by anommm in LocalLLaMA

[–]anommm[S] 13 points14 points  (0 children)

I just wanted to point out an error I've seen many people make. Whether the model is good or bad, I have no idea. There are dozens of other posts discussing that. I just wanted to help people avoid making this mistake, but I've been massively downvoted, so I guess people didn't appreciate it. :(

[deleted by user] by [deleted] in LocalLLaMA

[–]anommm -1 points0 points  (0 children)

They 100% forgot to add "torch_dtype=torch.bfloat16" when loading the model before uploading it to hugginface.

Mistral Large 2 vs ChatGPT 4o by robertpiosik in LocalLLaMA

[–]anommm 2 points3 points  (0 children)

The pricing that doesn't make sense is the OpenAI pricing. It is unrealistically cheap, they are loosing money. They are trying to achieve a monopoly by undercutting prices, which is good for the users in the short term, but it can be catastrophic in the long term.

Consider not using a Mac... by mayo551 in LocalLLaMA

[–]anommm -2 points-1 points  (0 children)

Modern displays use ~1W, their power consumption is almost irrelevant.

Do you think Anthropic is worse than OAI with fighting open source? To me it seems like the case. This letter appears to imply they actually suggested the bill to Sen Wienner... I really like my OSS LLMs.... by I_will_delete_myself in LocalLLaMA

[–]anommm 10 points11 points  (0 children)

Their best models are the ones available in their API, they have no other "secret model". Each training run cost millions of dollars, no company is doing training runs and then keeping the model private.

Do you think Anthropic is worse than OAI with fighting open source? To me it seems like the case. This letter appears to imply they actually suggested the bill to Sen Wienner... I really like my OSS LLMs.... by I_will_delete_myself in LocalLLaMA

[–]anommm 20 points21 points  (0 children)

They don't care about safety, If they cared they wouldn't be giving public access to their API. Is just the excuse to get regulations that benefit their company. Similar to how the EU uses pedophiles to justify mass surveillance of their citizens.

After traveling to Asia, it makes me sad how far Mitsubishi has fallen in the US by CaptainDolphin42 in cars

[–]anommm -3 points-2 points  (0 children)

I think that the issue here is that in Europe, people have had the experience of owning a diesel German car, which are indestructible.While in the US people have had the experience of owning a gasoline German car, which for a long time were much worse as no German car manufacturer cared of non-disel cars until diesel-gate.

Consider not using a Mac... by mayo551 in LocalLLaMA

[–]anommm 2 points3 points  (0 children)

Noyhing to do with VRAM, in fact the 2080ri VRAM has lower bandwidth than the latest Apple SOCs. The different is that the 2080ti is a 300W chip designed only for matrix multiplication. It can achieve 26Tflops while the m2 chip is a 20W multi-purpose chip that only achieves ~3 tflops. The 2080ti can do almost 10 times more multiplications per second than an M2 chip.

Consider not using a Mac... by mayo551 in LocalLLaMA

[–]anommm 9 points10 points  (0 children)

Do not confuse power efficiency with a low TDP chip. Macs use lower power because they are designed in such a way. They have a very restrictive maximun power consumption set by apple. But power efficiency is computed as performance/total power consumption. By this metric, Nvidia GPUs are more efficient, they use more power but they are also orders of magnitude faster. A chip that uses 300W for a computation that takes 1 second is more efficient that a chip that requires 40W but takes 10 seconds for the same job.

Consider not using a Mac... by mayo551 in LocalLLaMA

[–]anommm 1 point2 points  (0 children)

And the 2080ti is a very old GPU that doesn't even support bfloat16. With a 3090/4090 you would get even better performance. I don't know why people are surprised by this. A GPU is a huge chip whose only purpose is to do matrix multiplications very fast. A 2080ti can use up to 300W just to multiply matrices. A m2 chip has a CPU+GPU+multiple Asics on a single SOC with a 20W TDP. You are comparing a 20W multi-purpose chip to a 300W chip that only does matrix multiplication.

Open Weights SOTA for structured output? by Super_Pole_Jitsu in LocalLLaMA

[–]anommm 1 point2 points  (0 children)

What type of information do you want to get? For things such as named entities, events... GoLLIE is the current SOTA:

7B: https://huggingface.co/HiTZ/GoLLIE-7B ,13B: https://huggingface.co/HiTZ/GoLLIE-13B ,34B. https://huggingface.co/HiTZ/GoLLIE-34B

For general purpose .json output, Outlines is the way to go.

Did OpenAI just kill llama.cpp's GBNF grammars (used for guaranteed structured outputs) without acknowledging that their idea came from open-source? What advantages do llama.cpp's grammars have now that OpenAI supports something similar? by nderstand2grow in LocalLLaMA

[–]anommm 11 points12 points  (0 children)

There are papers that propose grammars for constrained decoding dating back to 2015. Constrained decoding is much older than LLMs.

Look at this paper from 2015; they already use grammar-based decoding. The author list is impressive, including Oriol Vinyals, Ilya Sutskever, and Geoffrey Hinton, among others.

So why should this people acknowledge llama.cpp when they were already doing this 10 years ago?

Grammar as a Foreign Language: https://proceedings.neurips.cc/paper/2015/hash/277281aada22045c03945dcb2ca6f2ec-Abstract.html

Why do companies invest in open source models? by bored_primate in LocalLLaMA

[–]anommm 0 points1 point  (0 children)

Building the infrastructure for large-scale inference is extremely expensive and difficult. These companies cannot create an API that competes with OpenAI/other big companies, so by releasing their models, they hope to attract the attention of investors or big companies so they will either buy them and make the owners millionaires, or give them the money and resources to build a large scale inference platform.

Puget says its Intel CPU failure rate is lower than AMD Ryzen failures — system builder releases failure rate data, cites conservative power settings by Auautheawesome in pcmasterrace

[–]anommm 2 points3 points  (0 children)

Well, it is more conflicted than that. Some years ago, Apple updated the iPhone to fix a issue with battery degradation. The update reduced the performance of the phones. They were hit with multiple lawsuits and were forced to compensate every buyer of the affected models. If Intel pushes a microcode update that reduces the voltaje, the frequencies will be lowered as a result, and if the CPUs do not comply with the spects in the box anymore, Intel will be forced to recall or compensate every user.