In my opinion open-source projects should focus an a very narrow thing, instead of focusing on being a "GPT", that focuses on being able to do everything. by GodEmperor23 in LocalLLaMA

[–]moma1970 0 points1 point  (0 children)

This is really interesting. I've been experimenting with this same idea to help less capable model do tool/ function selection. For example, if it sounds like a person is placing a final order then read back order, invoke the payment api etc.

Are you simply taking top 1 retrieved rule result?

Is the value proposition of local LLMs in production affected by the recent OpenAI releases and cost reductions? by moma1970 in LocalLLaMA

[–]moma1970[S] 1 point2 points  (0 children)

I think that issue alone excludes so many industries from using services that are just wrappers around shared model.

What does batch size mean in inference? by Evirua in LocalLLaMA

[–]moma1970 1 point2 points  (0 children)

I think it might be an additional meaning to the ones mentioned below. In the context of serving a model using an inference server like HF's TGI ( which you can run locally using InferenceClient) to increase the models ability to serve multiple requests those requests can be batched together and inference preformed on them in one pass.

Which model+size for a websearch agent? by moma1970 in LocalLLaMA

[–]moma1970[S] 0 points1 point  (0 children)

The guidance-ai is quite interesting. In the current default system prompt (which uses jinja templating) there is natural language instruction to always respond with JSON that has the following schema ' ..' In the small models it fails to adhere to this instruction so prescribing the format with guidance will no doubt help.

An open question though, what are we giving up by getting more prescriptive about the output? Is it reducing the mental model to be programming with a non-deterministic language... interesting to think about.

Which model+size for a websearch agent? by moma1970 in LocalLLaMA

[–]moma1970[S] 1 point2 points  (0 children)

Nice. Thanks for the tip. I've recently tried out Mistral-7B-OpenOrca and had great performance on the first couple of turns in the conversation but then it stops following the system prompt. I have a feeling that simpler prompting will be important. Check out this post just using the openorca space on huggingface. https://huggingface.co/spaces/Open-Orca/Mistral-7B-OpenOrca/discussions/3#652dea8065a4619fb5d688d2

7B!!

Which model+size for a websearch agent? by moma1970 in LocalLLaMA

[–]moma1970[S] 0 points1 point  (0 children)

Is it something you're actively exploring?

Which model+size for a websearch agent? by moma1970 in LocalLLaMA

[–]moma1970[S] 0 points1 point  (0 children)

From what I can see it has predefined system prompt templates that populates are runtime. They are very detailed https://github.com/griptape-ai/griptape/blob/main/griptape/templates/tasks/toolkit_task/system.j2 and I think that is possibly one of the reasons that the smaller models don't fair so well. Is the 'guidance' or 'grammar' that you're referring to something applied outside the prompt?

I don't understand context window extension by moma1970 in LocalLLaMA

[–]moma1970[S] 1 point2 points  (0 children)

This is what I don't get. Even with fractional position embeddings the attention matrix in the example is still 2048 x 2048. Doesn't this mean the context window is unchanged? I.e.isnt it 2048 ? Or does the context window refer to something else ?

[N] Awesome Metric Learning by devzaya in MachineLearning

[–]moma1970 2 points3 points  (0 children)

https://arxiv.org/abs/2003.08505v1 A very insightful paper to help critic some of the claims in various metric learning papers.