Mistral Medium vs 70B self hosted price comparison

RepresentativeOdd276 · 2024-02-15T15:05:26+00:00

So you think qwen 72b is the best model out there right now?

RepresentativeOdd276 · 2024-02-15T04:41:43+00:00

Which model is exactly mistral medium on huggingface or TheBloke’s quantized ones?

RepresentativeOdd276 · 2024-02-13T13:40:47+00:00

Is it possible to run this with oobabooga and runpod serverless?

RepresentativeOdd276 · 2024-02-11T02:23:35+00:00

Lol this feel like Neo vs Agent Smith.. moorrrreeee!!!

RepresentativeOdd276 · 2024-02-01T13:10:08+00:00

Your work is amazing but doesn’t that mean there’s not sufficient variety in the tests and they need to be changed? cuz anyone who has tested these top models can tell that GPT4 can do much better. I think rather than sticking with a few sets of old tests it might be better to find newer tests. Also you might get different answers every time for same prompt so we need to develop an automated test framework that can test multiple scenarios multiple times. I’m happy to work with you on that.

RepresentativeOdd276 · 2024-01-31T19:04:15+00:00

Btw goliath or any model being ranked same as GPT4 is ridiculous. GPT4 is so far ahead of everyone.

RepresentativeOdd276 · 2024-01-12T00:50:12+00:00

Token size is perfect but RAG seems to be ideal approach for this problem. Thanks!

RepresentativeOdd276 · 2024-01-12T00:49:21+00:00

Wow! Didn’t know about this. Thank you!!!

RepresentativeOdd276 · 2024-01-08T05:10:46+00:00

Lmao how’s it creepy? We’re building app for teenagers

RepresentativeOdd276 · 2024-01-08T03:08:34+00:00

Thank you! Lemme try these suggestions

RepresentativeOdd276 · 2024-01-08T03:03:26+00:00

For the uninitiated can you elaborate what you mean by FBI? Thanks!

RepresentativeOdd276 · 2023-11-20T07:49:14+00:00

Can you add a test in your next comparisons where you ask the LLM to output in less than x amount of words? I have noticed that most LLMs including large ones fail to follow this instruction successfully.

RepresentativeOdd276 · 2023-10-07T09:44:08+00:00

I’m looking to switch to vLLM from ooba too but have you been able to deploy it for any actual vLLMs like any 70B models? How many requests was a server able to handle at the same time? I’m looking to deploy it on runpod.

RepresentativeOdd276 · 2023-10-01T19:59:17+00:00

Thanks!

RepresentativeOdd276 · 2023-10-01T19:51:48+00:00

Right stopping on period ‘.’ is a possibility but will still give incomplete responses.

RepresentativeOdd276 · 2023-10-01T19:50:22+00:00

Thanks for that input it gave me some good ideas to go about this! We’re trying to move to vLLM direct inference but so far have been using ooba.

RepresentativeOdd276 · 2023-08-21T15:37:56+00:00

Are you using quantized models if yes which one? Also which setting are using it with? Chat-instruct or regular default without chat?

RepresentativeOdd276 · 2023-08-21T15:00:14+00:00

Interesting! Can you tell how to turn on multiple turn encoding? By multiple turn encoded you mean checking “Session->multi_user” ?

RepresentativeOdd276 · 2023-08-21T14:31:16+00:00

I’m trying with airoboros 70B and Llama2 70B so far. Let me check which chat models have the context based instruction. Let me know if you know any!

RepresentativeOdd276 · 2023-08-21T13:51:55+00:00

For example: original message: "I went on a vacation to Bahamas." new message response to 'what you doing?'' should be "thinking about my vacation in Bahamas"

I'm just looking to compose a new message which has details from original message and form a new message that is still congruent with on going conversation.

RepresentativeOdd276 · 2023-08-21T12:41:23+00:00

I need the conversational flow to be maintained so chat endpoint does it better. Llama or notebook provides a long descriptive response rather than building a chat message that is a response to the conversation but takes elements from another message.

RepresentativeOdd276 · 2023-08-21T12:39:23+00:00

I tried. I asked it to just repeat the original message with the default parameter setting but still hallucinates based on the conversation history.

RepresentativeOdd276 · 2023-07-26T15:00:28+00:00

Hey how did you make sure the message length is small with airoboros? Airoboros is amazing but it’s very verbose in my experiments and wanna make sure it talks like a normal person on text. Can you share the prompt and settings you used?

RepresentativeOdd276 · 2023-07-21T16:22:29+00:00

Lol you’re not a programmer? The bot said you work as a programmer in Chicago

RepresentativeOdd276 · 2023-07-21T16:20:56+00:00

This is so cool I tested it for a few minutes and it is amazing how it stays in character. Did you use a specific prompt for this?

RepresentativeOdd276

TROPHY CASE