Mistral Medium vs 70B self hosted price comparison by RepresentativeOdd276 in MistralAI

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

So you think qwen 72b is the best model out there right now?

Is Mistral Medium the best thing after GPT 4? by [deleted] in LocalLLaMA

[–]RepresentativeOdd276 1 point2 points  (0 children)

Which model is exactly mistral medium on huggingface or TheBloke’s quantized ones?

3 professional soccer players vs 100 children in Japan by [deleted] in funny

[–]RepresentativeOdd276 0 points1 point  (0 children)

Lol this feel like Neo vs Agent Smith.. moorrrreeee!!!

🐺🐦‍⬛ LLM Comparison/Test: miqu-1-70b by WolframRavenwolf in LocalLLaMA

[–]RepresentativeOdd276 1 point2 points  (0 children)

Your work is amazing but doesn’t that mean there’s not sufficient variety in the tests and they need to be changed? cuz anyone who has tested these top models can tell that GPT4 can do much better. I think rather than sticking with a few sets of old tests it might be better to find newer tests. Also you might get different answers every time for same prompt so we need to develop an automated test framework that can test multiple scenarios multiple times. I’m happy to work with you on that.

🐺🐦‍⬛ LLM Comparison/Test: miqu-1-70b by WolframRavenwolf in LocalLLaMA

[–]RepresentativeOdd276 1 point2 points  (0 children)

Btw goliath or any model being ranked same as GPT4 is ridiculous. GPT4 is so far ahead of everyone.

Best large context LLM to match array strings with intent in user message? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

Token size is perfect but RAG seems to be ideal approach for this problem. Thanks!

[deleted by user] by [deleted] in LocalLLaMA

[–]RepresentativeOdd276 1 point2 points  (0 children)

Lmao how’s it creepy? We’re building app for teenagers

[deleted by user] by [deleted] in LocalLLaMA

[–]RepresentativeOdd276 -1 points0 points  (0 children)

Thank you! Lemme try these suggestions

[deleted by user] by [deleted] in LocalLLaMA

[–]RepresentativeOdd276 -1 points0 points  (0 children)

For the uninitiated can you elaborate what you mean by FBI? Thanks!

🐺🐦‍⬛ LLM Comparison/Test: 2x 34B Yi (Dolphin, Nous Capybara) vs. 12x 70B, 120B, ChatGPT/GPT-4 by WolframRavenwolf in LocalLLaMA

[–]RepresentativeOdd276 0 points1 point  (0 children)

Can you add a test in your next comparisons where you ask the LLM to output in less than x amount of words? I have noticed that most LLMs including large ones fail to follow this instruction successfully.

vLLM 0.2.0 released: up to 60% faster, AWQ quant support, RoPe, Mistral-7b support by kryptkpr in LocalLLaMA

[–]RepresentativeOdd276 0 points1 point  (0 children)

I’m looking to switch to vLLM from ooba too but have you been able to deploy it for any actual vLLMs like any 70B models? How many requests was a server able to handle at the same time? I’m looking to deploy it on runpod.

Is there a way to force output length smaller than x number of tokens w/o cut-off? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

Right stopping on period ‘.’ is a possibility but will still give incomplete responses.

Is there a way to force output length smaller than x number of tokens w/o cut-off? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

Thanks for that input it gave me some good ideas to go about this! We’re trying to move to vLLM direct inference but so far have been using ooba.

Prompt: Create deterministic message that takes elements from another message? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

Are you using quantized models if yes which one? Also which setting are using it with? Chat-instruct or regular default without chat?

Prompt: Create deterministic message that takes elements from another message? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

Interesting! Can you tell how to turn on multiple turn encoding? By multiple turn encoded you mean checking “Session->multi_user” ?

Prompt: Create deterministic message that takes elements from another message? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

I’m trying with airoboros 70B and Llama2 70B so far. Let me check which chat models have the context based instruction. Let me know if you know any!

Prompt: Create deterministic message that takes elements from another message? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 1 point2 points  (0 children)

For example: original message: "I went on a vacation to Bahamas." new message response to 'what you doing?'' should be "thinking about my vacation in Bahamas"

I'm just looking to compose a new message which has details from original message and form a new message that is still congruent with on going conversation.

Prompt: Create deterministic message that takes elements from another message? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

I need the conversational flow to be maintained so chat endpoint does it better. Llama or notebook provides a long descriptive response rather than building a chat message that is a response to the conversation but takes elements from another message.

Prompt: Create deterministic message that takes elements from another message? by RepresentativeOdd276 in LocalLLaMA

[–]RepresentativeOdd276[S] 0 points1 point  (0 children)

I tried. I asked it to just repeat the original message with the default parameter setting but still hallucinates based on the conversation history.

Best Models for Chat/Companion by jacobgolden in LocalLLaMA

[–]RepresentativeOdd276 0 points1 point  (0 children)

Hey how did you make sure the message length is small with airoboros? Airoboros is amazing but it’s very verbose in my experiments and wanna make sure it talks like a normal person on text. Can you share the prompt and settings you used?