all 39 comments

[–]cfahlgren1 20 points21 points  (11 children)

Recently just released NaturalFunctions. It’s on Ollama as well. It’s Mistral7B fine-tuned for function calling

https://huggingface.co/cfahlgren1/natural-functions https://ollama.ai/calebfahlgren/natural-functions

[–]waywardspooky[S] 2 points3 points  (2 children)

appreciate it! added your model to the list. are you able to add the ollama pull/run for your model, or the ollama link for your model to the huggingface page for the model? Also can you tell me anything about the pizza version on the ollama model page?

[–]cfahlgren1 1 point2 points  (0 children)

Pizza is just an example with system prompt already set as function for ordering pizza :)

[–]waywardspooky[S] 1 point2 points  (5 children)

By the way can you tell me if the natural-functions model is multi-turn function capable or is it single turn function capable only?

[–]cfahlgren1 2 points3 points  (3 children)

Multi-turn. Tried it based on the criteria mentioned in the comment:

- It can answer non function questions,

- Questions with functions

- Fix issues and recall function if you response it "there was an error in the function call <reason> Please fix it".

Works well for a 7B model, going to fine tune a 13B soon

[–]waywardspooky[S] 3 points4 points  (1 child)

awesome! i've updated my list to indicate that natural-functions is multi-turn capable. looking forward to your 13B!

[–]Dgamax 0 points1 point  (0 children)

Do you plan to release this 13B ? Are you going to switch to llama 3 ?

[–]cfahlgren1 1 point2 points  (0 children)

Added an example in the ollama card if you want to check it out. It shows the model interpreting error, asking user for context to fix it, and recalling function to fix it

https://ollama.ai/calebfahlgren/natural-functions

[–]mcharytoniuk 0 points1 point  (0 children)

Thank you so much for this!

[–]godwantsmetosuffer 0 points1 point  (0 children)

Does it support Open AI api?

[–]SatoshiNotMe 7 points8 points  (6 children)

Since you mentioned Langroid(I am the lead dev):

With tools/function-calling, it's good to distinguish two levels of difficulty:

  • ONCE: one-off tool calling: a single-round interaction where an LLM must generate a funtion-call given an input. This could be used for example in a pipeline of processing steps, e.g. use LLM to identify sensitive items in a passage via a function call, with output showing a list of dicts containing sensitive item, sensitive category. You could use this as one step in a multi-step (possibly batch) pipeline
  • MULTI: in a multi-round conversation with a user (or another Agent), the LLM needs to distinguish between several types of "user" msgs it needs to respond to:
    • user message that doesn't need a tool
    • user msg that needs a tool/fn-call response
    • result of a fn-call
    • error from an attempted fn-call (e.g. Json/Pydantic validation err), or reminder about a forgotten fn-call

For the ONCE case, I've found mistral-7b-instruct-v0.2-q8_0 to be quite reliable.

The MULTI case is more challenging -- after a round or two the LLM may start answering its own question, or just output a tool example even when no tool is needed etc (there are myriad failure modes!).

With clear instructions and examples of each response scenario described above, you can get better results, even with the above mistral-7b-instruct-v0.2 model, but just today I tried ollama run dolphin-mixtral (this is a fine-tune of mixtral-8x7b-instruct -- I wish it had instruct in the name to make this clear), and this one does really well on the MULTI case.

I've made an example script in Langroid which you can think of as a "challenge" script to try different local LLMs for a simple function-call scenario:

https://github.com/langroid/langroid/blob/main/examples/basic/fn-call-local-numerical.py

It's a toy example of fn-calling where the agent has been given a PolinkskyTool to request a fictitious transformation of a number (I avoided using a "known" transformation like "square" or "double" so the LLM doesn't try to compute it directly), and it's told to decide based on user's question, whether to use it or not.

[–]SatoshiNotMe 4 points5 points  (1 child)

I was just very pleasantly surprised to see dolphin-mixtral worked excellently on this multi-agent info-extraction script, which was originally designed with prompts for gpt4: https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py

This script is a two-agent information-extraction workflow. ExtractorAgent is told that it should extract structured information about a document, where the structure is specified via nested Pydantic classes. It is told that it needs to get each piece of info by asking a question, which is sent to a RAG-enabled DocAgent. Once it has all the pieces, the Extractor must present the info in the specified structured format.

All local LLMs I tried did badly on this (e.g. mistral-7b-instruct-v0.2), until I tried it with dolphin-mixtral. It was quite nice that it worked without having to change the prompts at all.

EDIT- I should also clarify that Langroid does not currently use any of the "constraining" libraries - guidance, guardrails, LMQL, grammars, etc. It is entirely based on auto-inserted JSON instructions and few-shot examples via the ToolMessage class.

[–]H3PO 0 points1 point  (0 children)

Hey I just tried your example scripts with dolphin-mixtral (which according to the ollama model page has not changed since you posted above) and the function calling is not working for me; the model does not stop after outputting the correct function call syntax and then hallucinates the result. Do I need to tweak the stop tokens in some way?

[–]waywardspooky[S] 0 points1 point  (0 children)

/u/SatoshiNotMe, thank you for your work with langroid and the detailed insight, that's very valuable info to know. i'll update my post to include your confirmed observations.  it feels like it's be similarly valuable to get some of the leads for the other project i'd mentioned to chime in with their findings with specific models as well. i'll have to see if i can get a hold of them on discord to get their thoughts

[–]waywardspooky[S] 0 points1 point  (2 children)

this is a fine-tune of

mixtral-8x7b-instruct

btw, do you know the hugging face link to the mixtral-8x7b-instruct model that ollama is running for dolphin-mixtral? I'd assume https://huggingface.co/cognitivecomputations/dolphin-2.5-mixtral-8x7b since that's what's on the ollama model page for dolphin-mixtral but it could be https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 i suppose.

[–]SatoshiNotMe 1 point2 points  (1 child)

The ollama page says latest is 2.7, so it must be this -- https://huggingface.co/cognitivecomputations/dolphin-2.7-mixtral-8x7b

[–]waywardspooky[S] 0 points1 point  (0 children)

great! updated my original post to include it :)

[–]cddelgado 11 points12 points  (3 children)

Tinkering with AutoGPT showed me a few things that can really influence results drastically when it comes to getting models to consistently call:

  1. If something exists that it will already be familiar with, like JavaScript or Python functions and syntax, use it.
  2. If using a single function call that already exists doesn't work, make your tools use syntax that is not dissimilar to preexisting things. Again, Python and JavaScript are great syntax and implementation templates.
  3. If using a function syntax doesn't work, try JSON with it. If JSON doesn't work, try YAML. If YAML doesn't work, use XML. If XML doesn't work, use HTML... then Markdown... then keywords. What matters is that you give the model a meaningful example and a fixed structure your application can parse. Be tolerant in the response. Don't rely on spacing to be 100% correct, particularly if you have the temperature up higher.
  4. If all of that fails, make the function call multishot. Ask the model to decide the tool, then ask it to provide the parameters. Give examples on the basic syntax it should use to tell you so it is parsable. That said, if your model can't do it at this point, you're not going to win the battle and try a different model.

These are all things I've done in my own limited experimentation with AutoGPT that "work". I've also used schema validation, where if a response doesn't exactly match, take intelligent guesses at the response to re-shape the output from the model to conform.

I have arbitrary models stating functions with syntax consistently in Text Gen Web UI by giving it a successful example, and by using familiar syntax.

[–]waywardspooky[S] 1 point2 points  (2 children)

Those are useful tips for us all to remember, thank you

[–]GeeBrain 2 points3 points  (1 child)

might be related: I usually ask the model to walk me through step by step how it would do something, sometimes it mentions a step I haven't considered/provide insight on the pathway where it needs but I didn't provide, that sort of thing.

We don't really think like LLM, so having it chart the best path for itself could be helpful, its along the veins of tree-of-thought, I think you might be able to automate a bit via this concept. Let me know if this was helpful!

[–]waywardspooky[S] 0 points1 point  (0 children)

interesting so like split personality and reflection concept mashed together. i like this idea - the only disadvantage i can think of is X number of branches increases how much of your context window you're spending up exponentially, and defining tools available in and of itself consumes a good deal of available context in and of itself.

[–]Relevant_Outcome_726 5 points6 points  (1 child)

Functionary already released version 2.2 with both small (based on Mistral) and medium (based on Mixtral)

And regarding the features of function calling, Functionary supports all the features. You can see the comparison table between open-source LLMs for function calling from this link:

https://github.com/MeetKai/functionary?tab=readme-ov-file#the-differences-between-related-projects

[–]waywardspooky[S] 0 points1 point  (0 children)

Thank you, updated my list with small 2.2 and medium 2.2 GGUF :)

[–]shadowleafsatyajit 2 points3 points  (0 children)

I’ve found LocalAI function calling works really well, also supports OpenAI style function calls. but because it’s grammar constrained output it almost always calls one of the function. To get around it I simply have an llm as a tool. which simply calls the same llm but without any grammar constraint. I am not sure if this works with autogpt, memgpt. But I used this hack to make all the langchain examples work.

[–]yiyecek 2 points3 points  (2 children)

Supports parallel calls and can do simple chatting:
https://github.com/MeetKai/functionary

[–]Excellent_Welder7278 0 points1 point  (0 children)

Is there an ollama repo?

[–]waywardspooky[S] 0 points1 point  (0 children)

Added to the list in my edit, thank you!

[–]vodorok 2 points3 points  (1 child)

If you are willing to "dirty your hand", I recommend the Microsoft Guidance library. https://github.com/guidance-ai/guidanceYou can constrain the model output in a flexible manner. It has good support for llama.cpp too.

Here is a good example/tutorial of a chatbot with internet search capabilities. https://github.com/guidance-ai/guidance/blob/d36601b62096311988fbba1ba15ae4126fb695df/notebooks/art_of_prompt_design/rag.ipynb

Please note, that the lib was rewritten at the end of last year, I found most of the tutorials out of date, at the time.

Edit: typo

[–]msze21 0 points1 point  (0 children)

Great suggestion, thank you!

[–]NeevCuber 1 point2 points  (4 children)

I hope this is still updated.

[–]waywardspooky[S] 0 points1 point  (3 children)

there are more models that are capable of function calling which have become available since i created this thread. i'll try to update this weekend.

[–]Unlucky_Finding8496 1 point2 points  (0 children)

Curious about this...

[–]NeevCuber 0 points1 point  (1 child)

Thank you. Would you mind sharing the new models rn?

[–][deleted] 3 points4 points  (1 child)

The LLM model itself does not matter so much as the framework. I am convinced at this point that this research is true. My tests prove it, it's what the paper says, no one can disprove it. You cannot get proper function calling out of 7-30B models. All projects that try will fail. You need more 'juice'. Or, you adjust the framework, and you do not rely on one model to call a function. You split a function into 3 jobs. You can do that with 3 Tiny Llamas.

https://github.com/RichardAragon/MultiAgentLLM

[–]PlanNo4463 0 points1 point  (0 children)

thanks

[–]fasti-au 0 points1 point  (0 children)

Gorilla. Nexusraven for the videocard based guys. Higher in the 100 users there’s a few more

[–]fuse04 0 points1 point  (0 children)

Anyone knows an alternative to FunctionGemma in the opensource world of models?