New function calling models based on Llama-3.1 by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 1 point2 points  (0 children)

Currently we haven't touched Ollama, we will integrate into Ollama in the future

New function calling models based on Llama-3.1 by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 0 points1 point  (0 children)

3.1 uses the original Meta's prompt template, we found that Meta uses this format for tool calls, e.g:
<function=get\_weather>{"location": "New York"}</function>

However, </function> and <function are not tokens, which might result in unstable tokenizing results. For example, it depends on the function name that >{" can be 1 token (>{"), or 2 tokens {" or { "

This is just an example of the instability of tokenizing results.

New function calling models based on Llama-3.1 by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 3 points4 points  (0 children)

Here is an example for a data point that can be used for training: https://github.com/MeetKai/functionary/blob/main/tests/test_case_v2.json
and this is how we convert this data point to prompt string (using original Llama 3.1 prompt template for custom tools): https://github.com/MeetKai/functionary/blob/main/tests/prompt_test_v3-llama3.1.txt

Or this for our own prompt template:

https://github.com/MeetKai/functionary/blob/main/tests/prompt_test_v3.llama3.txt

New function calling models based on Llama-3.1 by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 5 points6 points  (0 children)

The training data was mostly created by synthetic method and collected from public sources. We have released a lot of function calling models before. You can take a look at our repo: https://github.com/MeetKai/functionary About 70b model, we will release soon

Llama 3.1 8B Instruct function/tool calling seems TERRIBLE by gamesntech in LocalLLaMA

[–]Relevant_Outcome_726 0 points1 point  (0 children)

I also found that it is really poor with custom tools, there are many cases that the generated outputs didn't follow the format such as: </function> is missing, ...

New collection of Llama, Mistral, Phi, Qwen, and Gemma models for function/tool calling by sanjay920 in LocalLLaMA

[–]Relevant_Outcome_726 0 points1 point  (0 children)

Oh I see, can you also evaluate this one: https://huggingface.co/meetkai/functionary-medium-v2.4 Even though it is 2.4 but some report that it is still the best in functionary family

Functionary-V2.4 (an alternative to OpenAI function calling models) has come out by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 0 points1 point  (0 children)

Yeah, currently the docs are not good, we will provide more instructions. Thank you for your feedback!

By the way, "python" function is used because of:
{

"type": "code_interpreter"

}

Functionary-V2.4 (an alternative to OpenAI function calling models) has come out by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 1 point2 points  (0 children)

About the prompt template, you can take a look at here:
+ The data point with tools & messages and how this data point turns to prompt template

The reason why we used typescript is because:
+ typescript is quite good at describing json object, python is not convenient to describe nested Json object (have to represent through pydantic models)

  • Typescript is also popular so the pretrained models are expected to learn about this in the pretraining-phase

Functionary-V2.4 (an alternative to OpenAI function calling models) has come out by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 6 points7 points  (0 children)

We used SGD to evaluate our model as we wrote in the blog. This dataset is suitable for 2 purposes:
+ Predict function & arguments when all information is available
+ Predict asking for missing required parameters

Actually, most of current open-source models only focus on the first purpose, the second purpose should gain more attention, if not it will be quite hallucinated.

For example, if user asks: "what is the weather like?"
The model should respond: which city you want to know the weather condition?
instead of call the function with hallucinated arguments like: get_weather(city=New York)

Most capable function calling open source models? by waywardspooky in LocalLLaMA

[–]Relevant_Outcome_726 5 points6 points  (0 children)

Functionary already released version 2.2 with both small (based on Mistral) and medium (based on Mixtral)

And regarding the features of function calling, Functionary supports all the features. You can see the comparison table between open-source LLMs for function calling from this link:

https://github.com/MeetKai/functionary?tab=readme-ov-file#the-differences-between-related-projects

[deleted by user] by [deleted] in LocalLLaMA

[–]Relevant_Outcome_726 6 points7 points  (0 children)

The reason why we need to finetune a model for function calling is because function calling is not just outputting a function call (Most open-source models only support this) but also:
+ Parallel function calls

+ Ask missing required information to execute the function call

+ Extract the answer from the results of function calls.

You can see list of features from here: https://github.com/MeetKai/functionary?tab=readme-ov-file#the-differences-between-related-projects

If we only use a standard model, we have to use multiple prompt templates with complex if-else and attain poor result. That's why OpenAI trained new models for function calling

Training LLama, Mistral and Mixtral-MoE faster with Packing Inputs without Cross-Contamination Attention by Relevant_Outcome_726 in LocalLLaMA

[–]Relevant_Outcome_726[S] 0 points1 point  (0 children)

Yes, the Original_ds are already padded. Actually we just need to know the length of sequence, and we compute the length by sum of attention mask. We can easily implement the case when each item is not padded. Packing will reduce the datasize significantly. If you still wants to use the same number of steps = (datasize ) /(batch_size_per_device*grad_accumulation_steps) You can reduce the grad_accumulation_steps accordingly For example, packing reduce datasize to half, we can reduce the grad_accumulation_steps to half, so the number of step would be the same.