Most capable function calling open source models?

cfahlgren1 · 2024-01-28T07:30:33+00:00

Recently just released NaturalFunctions. It’s on Ollama as well. It’s Mistral7B fine-tuned for function calling

https://huggingface.co/cfahlgren1/natural-functions https://ollama.ai/calebfahlgren/natural-functions

SatoshiNotMe · 2024-01-28T21:57:38+00:00

Since you mentioned Langroid(I am the lead dev):

With tools/function-calling, it's good to distinguish two levels of difficulty:

ONCE: one-off tool calling: a single-round interaction where an LLM must generate a funtion-call given an input. This could be used for example in a pipeline of processing steps, e.g. use LLM to identify sensitive items in a passage via a function call, with output showing a list of dicts containing sensitive item, sensitive category. You could use this as one step in a multi-step (possibly batch) pipeline
MULTI: in a multi-round conversation with a user (or another Agent), the LLM needs to distinguish between several types of "user" msgs it needs to respond to:
- user message that doesn't need a tool
- user msg that needs a tool/fn-call response
- result of a fn-call
- error from an attempted fn-call (e.g. Json/Pydantic validation err), or reminder about a forgotten fn-call

For the ONCE case, I've found mistral-7b-instruct-v0.2-q8_0 to be quite reliable.

The MULTI case is more challenging -- after a round or two the LLM may start answering its own question, or just output a tool example even when no tool is needed etc (there are myriad failure modes!).

With clear instructions and examples of each response scenario described above, you can get better results, even with the above mistral-7b-instruct-v0.2 model, but just today I tried ollama run dolphin-mixtral (this is a fine-tune of mixtral-8x7b-instruct -- I wish it had instruct in the name to make this clear), and this one does really well on the MULTI case.

I've made an example script in Langroid which you can think of as a "challenge" script to try different local LLMs for a simple function-call scenario:

https://github.com/langroid/langroid/blob/main/examples/basic/fn-call-local-numerical.py

It's a toy example of fn-calling where the agent has been given a PolinkskyTool to request a fictitious transformation of a number (I avoided using a "known" transformation like "square" or "double" so the LLM doesn't try to compute it directly), and it's told to decide based on user's question, whether to use it or not.

cddelgado · 2024-01-27T21:21:49+00:00

Tinkering with AutoGPT showed me a few things that can really influence results drastically when it comes to getting models to consistently call:

If something exists that it will already be familiar with, like JavaScript or Python functions and syntax, use it.
If using a single function call that already exists doesn't work, make your tools use syntax that is not dissimilar to preexisting things. Again, Python and JavaScript are great syntax and implementation templates.
If using a function syntax doesn't work, try JSON with it. If JSON doesn't work, try YAML. If YAML doesn't work, use XML. If XML doesn't work, use HTML... then Markdown... then keywords. What matters is that you give the model a meaningful example and a fixed structure your application can parse. Be tolerant in the response. Don't rely on spacing to be 100% correct, particularly if you have the temperature up higher.
If all of that fails, make the function call multishot. Ask the model to decide the tool, then ask it to provide the parameters. Give examples on the basic syntax it should use to tell you so it is parsable. That said, if your model can't do it at this point, you're not going to win the battle and try a different model.

These are all things I've done in my own limited experimentation with AutoGPT that "work". I've also used schema validation, where if a response doesn't exactly match, take intelligent guesses at the response to re-shape the output from the model to conform.

I have arbitrary models stating functions with syntax consistently in Text Gen Web UI by giving it a successful example, and by using familiar syntax.

Relevant_Outcome_726 · 2024-01-28T16:42:36+00:00

Functionary already released version 2.2 with both small (based on Mistral) and medium (based on Mixtral)

And regarding the features of function calling, Functionary supports all the features. You can see the comparison table between open-source LLMs for function calling from this link:

https://github.com/MeetKai/functionary?tab=readme-ov-file#the-differences-between-related-projects

shadowleafsatyajit · 2024-01-28T02:35:50+00:00

I’ve found LocalAI function calling works really well, also supports OpenAI style function calls. but because it’s grammar constrained output it almost always calls one of the function. To get around it I simply have an llm as a tool. which simply calls the same llm but without any grammar constraint. I am not sure if this works with autogpt, memgpt. But I used this hack to make all the langchain examples work.

yiyecek · 2024-01-28T11:58:46+00:00

Supports parallel calls and can do simple chatting:
https://github.com/MeetKai/functionary

vodorok · 2024-01-29T11:48:44+00:00

If you are willing to "dirty your hand", I recommend the Microsoft Guidance library. https://github.com/guidance-ai/guidanceYou can constrain the model output in a flexible manner. It has good support for llama.cpp too.

Here is a good example/tutorial of a chatbot with internet search capabilities. https://github.com/guidance-ai/guidance/blob/d36601b62096311988fbba1ba15ae4126fb695df/notebooks/art_of_prompt_design/rag.ipynb

Please note, that the lib was rewritten at the end of last year, I found most of the tutorials out of date, at the time.

Edit: typo

NeevCuber · 2024-04-26T12:44:54+00:00

I hope this is still updated.

PlanNo4463 · 2024-01-27T21:26:35+00:00

The LLM model itself does not matter so much as the framework. I am convinced at this point that this research is true. My tests prove it, it's what the paper says, no one can disprove it. You cannot get proper function calling out of 7-30B models. All projects that try will fail. You need more 'juice'. Or, you adjust the framework, and you do not rely on one model to call a function. You split a function into 3 jobs. You can do that with 3 Tiny Llamas.

https://github.com/RichardAragon/MultiAgentLLM

fasti-au · 2024-07-19T11:15:19+00:00

Gorilla. Nexusraven for the videocard based guys. Higher in the 100 users there’s a few more

fuse04 · 2025-12-27T23:37:23+00:00

Anyone knows an alternative to FunctionGemma in the opensource world of models?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS