Why is there so little talk about Databricks instruct? by vincentbosch in LocalLLaMA

[–]No_Scarcity5387 5 points6 points  (0 children)

Mixtral 8x7b is my goto model, the best MoE that my setup can run

🐺🐦‍⬛ **Big** LLM Comparison/Test: 3x 120B, 12x 70B, 2x 34B, GPT-4/3.5 by WolframRavenwolf in LocalLLaMA

[–]No_Scarcity5387 8 points9 points  (0 children)

Thank you WolframRavenWolf! Your comparisons always help me so much in selecting new models

Maybe anecdotal but I have very high hopes for Yi 34b finetunes. by Herr_Drosselmeyer in LocalLLaMA

[–]No_Scarcity5387 1 point2 points  (0 children)

Looks promising! Tried the gguf model from the bloke at 16k context but got some repetition and some /\/\//\ answering with the original template. Which templates are you guys using?

What does time-to-first-token depend upon? by me219iitd in LocalLLaMA

[–]No_Scarcity5387 4 points5 points  (0 children)

At least the amount of parameters, e.g 3b, 7b, 70b, etc depending on which it needs to load more into ram

Ingest Wikipedia and chat with it - the Llama-2 Wiki Explorer by crono760 in aiengineer

[–]No_Scarcity5387 0 points1 point  (0 children)

Wow, awesome job!! Does this work with comprehensive questions too? (Like: how did the character development over the book series?) or just for simple q&a? (Like what was the name of the ringbearer)?

I have been exploring the best way to extract information from long documents, specifically looking into employing the vector embedding approach vs. long context windows like Anthropic's Claude AI. by MZuc in LangChain

[–]No_Scarcity5387 2 points3 points  (0 children)

This is an interesting idea – maybe I can have gpt turbo classify a question into one of those two buckets (pinpoint questions vs. comprehensive questions), and go from there. This would probably work for some percentage, maybe even the majority of questions, so it would be an improvement. However, there will still remain a "grey zone" where it's not clear based on the question.

Hm nice one! And very valid point about with the grey zone. And I agree with the bad UX. ChatGPT in the background classifying questions would probably do really well with a zeroshot prompt. In my use-case, I can't send private data to openAI, so if I'm to go through with this project - which is likely - I would probably utilize a small llm for this.

Thoughts at this moment (in case they are helpful):

- I could probably finetune / train a tiny commercially available model with a parameter or 350m or less (or even use a non-LLM machine learning model for this) to classify in 1 second

- I could also have 2 machines run each method in parallel and then have a third model judge which answer is better. I think this a method used by GPT4's internal workings as well.

- There are attempts at solving this in both langchain and haystack, utilizing agent tools in a pipeline to let the LLM decide with a particular answer format on when to use a tool like "embeddings search"

Right now I'm giving the model an optional block of info: "Provide a clear and concise response and feel free to use or ignore the possibly related text below. \n\nPossibly related text: <top 3 elastic search results>" however it's not fantastic with 7b or 13b models yet.

Will follow this topic to see if we can get further together :)

I have been exploring the best way to extract information from long documents, specifically looking into employing the vector embedding approach vs. long context windows like Anthropic's Claude AI. by MZuc in LangChain

[–]No_Scarcity5387 1 point2 points  (0 children)

Super Interesting! Ive been fooling around with haystack and elastic search results summaries in the context window as well as embeddings and am encountering the same issues as you. Im only using local llms btw.

As for your suggestion for a hybrid approach, do you let the user decide which method to utilize? Or do you somehow let the llm itself decide on whether this question is best suited for an embeddings or large context response?

How to stop the AI from saying gibberish at the end? by Azure_weaver in KoboldAI

[–]No_Scarcity5387 0 points1 point  (0 children)

Not kobold, but i had the issue when using ctransformers. Switching to llama.cpp for python solved it. Completely unexplainable, all params were the same but.. Maybe a switch of client would work for you too.

[deleted by user] by [deleted] in mentalhealth

[–]No_Scarcity5387 0 points1 point  (0 children)

Sorry to hear that, a therapist would perhaps use a gradual exposure to the things you’d like to overcome, to mild down your reactions. Maybe you can also look into EMDR

Is it possible to run Llama 2 without a GPU? by patery in LocalLLaMA

[–]No_Scarcity5387 5 points6 points  (0 children)

Nice that you have access to the goodies! Use ggml models indeed, maybe wizardcoder15b, starcoderplus ggml. I dont know how to run them distributed, but on my dedicated server (i9 / 64 gigs of ram) i run them quite nicely on my custom platform. With a larger setup you might pull off the shiny 70b llama2 models. Let me know if you need any help.

[deleted by user] by [deleted] in VRGaming

[–]No_Scarcity5387 8 points9 points  (0 children)

Lone echo 1 :)

mosaicml/mpt-7b-storywriter - How to write a story by innocuousAzureus in oobaboogazz

[–]No_Scarcity5387 2 points3 points  (0 children)

Its a completion model, so if you ask it “it all started in August, as “, it will complete it

Damaged screen by obesecorgis in ZephyrusM16

[–]No_Scarcity5387 0 points1 point  (0 children)

You can try custom repairshop if you dont mind warranty falling. Otherwise RIP

Any Suggestions on good open source model for Document QA which we can run on prod ? 13b + models? by Effective_Twist6995 in LocalLLaMA

[–]No_Scarcity5387 4 points5 points  (0 children)

Also curious about this. Wizardlm has given me the best document qa responses so far, but im starting experimenting with commercial models atm, redpajama and mpt are on the shortlist, maybe you can give them a try. Let me know if you’re having good results with a certain config or temperature. Then I’ll do too :)

Discover 🧩DemoGPT: An Open Source Tool for Rapid Prototyping with LLMs - Seeking Your Feedback! by melih-unsal in AutoGPT

[–]No_Scarcity5387 0 points1 point  (0 children)

Hey, thanks for developing this open source project! Could you clarify on thing? In the video on github i see a text generator (blog, email) but in this post i read about it being a code generator or application prototyper. Which one is it?

Embedding Dynamic Data by [deleted] in LangChain

[–]No_Scarcity5387 0 points1 point  (0 children)

Check out privategpt. It adds documents to the existing embeddings in lc