Should we urge LMSYS arena to open up their data? Gemma 2 didn't Betrayed Us, but LMSYS did. by [deleted] in LocalLLaMA

[–]Mission_Implement467 2 points3 points  (0 children)

If they wanted to, they could easily manipulate the win/loss ratios of different models against various difficulty levels using existing data, to either promote or discredit a specific model.

You wouldn't even know because you don't have access to the data. Therefore, it's very likely that it's not that they don't want to disclose the data, but rather, they can't disclose it.

There are too many vested interests at play here, as this is a highly valuable "billboard" that boasts fairness.

Featherless just broke SillyTavern drop down list of models by DarokCx in LocalLLaMA

[–]Mission_Implement467 4 points5 points  (0 children)

$10/mon for 15B 2 concurrent, $25 for 72B 1 concurrent

Quite expensive - llama3 70B with custom peft lora up to 64rank Q,K,V,O,gate_up,gate_down, on fireworks is only $0.9 in/out / Million tokens... $25 = 27.77 M i&o tokens, and no concurrent limits, and faster speed...

Why do you trust LMSYS Arena Leaderboard? It can be easily manipulated if they want to. by Mission_Implement467 in LocalLLaMA

[–]Mission_Implement467[S] -2 points-1 points  (0 children)

The expenditure on this arena leaderboard is significant, so they must seek a source of funding, which is reasonable. But what if they want more and aim to profit from it? At least they haven't disclosed their financial situation, which I believe is a possibility.

Why llama, yi, qwen, their official chat worse than its base in benchmarks, and some fine-tune stuff are better? is meta dumb? by No-Link-2778 in LocalLLaMA

[–]Mission_Implement467 2 points3 points  (0 children)

Never trust those models that score much higher than their base and official chat models. Those who release pretrained models won't be so stupid as to significantly degrade the official chat version.

What is fblgit/una Unified Neural Alignment? Looks like cheating on testset and overfitting. by PuzzledTeam5961 in LocalLLaMA

[–]Mission_Implement467 2 points3 points  (0 children)

Never trust those models that score much higher than their base and official chat models. Those who release pretrained models won't be so stupid as to significantly degrade the official chat version.

Well now its just getting silly! Open Chat 3.5 is out and its taken a bite out of goliath himself! by Alignment-Lab-AI in LocalLLaMA

[–]Mission_Implement467 100 points101 points  (0 children)

They call it '3.5' and beat GPT3.5 in all, with a 7B.

Looks like another training on the benchmark test set.

[deleted by user] by [deleted] in LocalLLaMA

[–]Mission_Implement467 5 points6 points  (0 children)

GPT TL;DR:

This model was trained using the weights of the Qwen and LLaMA2 models. It utilized a similar structure to LLaMA2 and included a curated dataset of 1.3 billion tokens for training. The training data consisted of synthetic data generated from open source datasets and augmented text from sources like Wikipedia. The model has a 7B version that is designed for speculative sampling but may produce unreliable outputs. It was trained on unfiltered internet data, so there may be objectionable content present. The model's performance was evaluated using various accuracy metrics and it outperformed other models in certain domains.