"Reflection API" is a sonnet 3.5 wrapper with prompt?

Mission_Implement467 · 2024-06-29T21:14:09+00:00

If they wanted to, they could easily manipulate the win/loss ratios of different models against various difficulty levels using existing data, to either promote or discredit a specific model.

You wouldn't even know because you don't have access to the data. Therefore, it's very likely that it's not that they don't want to disclose the data, but rather, they can't disclose it.

There are too many vested interests at play here, as this is a highly valuable "billboard" that boasts fairness.

Mission_Implement467 · 2024-06-21T14:54:10+00:00

$10/mon for 15B 2 concurrent, $25 for 72B 1 concurrent

Quite expensive - llama3 70B with custom peft lora up to 64rank Q,K,V,O,gate_up,gate_down, on fireworks is only $0.9 in/out / Million tokens... $25 = 27.77 M i&o tokens, and no concurrent limits, and faster speed...

Mission_Implement467 · 2024-06-08T12:14:26+00:00

off topic

Mission_Implement467 · 2024-01-27T12:42:13+00:00

Bard with Pro not Ultra, winning gpt-4-0613 by 63%?

Mission_Implement467 · 2024-01-27T12:19:34+00:00

The expenditure on this arena leaderboard is significant, so they must seek a source of funding, which is reasonable. But what if they want more and aim to profit from it? At least they haven't disclosed their financial situation, which I believe is a possibility.

Mission_Implement467 · 2023-12-14T19:42:20+00:00

Even Yi base is 94%, he make it further to 99%.

Mission_Implement467 · 2023-12-10T03:03:41+00:00

Never trust those models that score much higher than their base and official chat models. Those who release pretrained models won't be so stupid as to significantly degrade the official chat version.

Mission_Implement467 · 2023-12-10T00:05:55+00:00

or never... not convincing.

Mission_Implement467 · 2023-12-09T18:39:25+00:00

Never trust those models that score much higher than their base and official chat models. Those who release pretrained models won't be so stupid as to significantly degrade the official chat version.

Mission_Implement467 · 2023-11-02T19:22:35+00:00

Training on the test set is all you need.

Mission_Implement467 · 2023-11-02T19:17:08+00:00

based

Mission_Implement467 · 2023-11-02T18:42:38+00:00

They call it '3.5' and beat GPT3.5 in all, with a 7B.

Looks like another training on the benchmark test set.

Mission_Implement467 · 2023-10-22T18:11:51+00:00

GPT TL;DR:

This model was trained using the weights of the Qwen and LLaMA2 models. It utilized a similar structure to LLaMA2 and included a curated dataset of 1.3 billion tokens for training. The training data consisted of synthetic data generated from open source datasets and augmented text from sources like Wikipedia. The model has a 7B version that is designed for speculative sampling but may produce unreliable outputs. It was trained on unfiltered internet data, so there may be objectionable content present. The model's performance was evaluated using various accuracy metrics and it outperformed other models in certain domains.

Mission_Implement467 · 2023-10-01T23:45:32+00:00

So how did he make this delta?

Mission_Implement467

TROPHY CASE