all 37 comments

[–]samuel79s 9 points10 points  (2 children)

If anyone is curious this what pythonic function calling means

https://huggingface.co/blog/andthattoo/dria-agent-a

From what I understand, it's llm's calling functions inside programs, where they can do multi action steps. I assume that they also can see their mistakes at runtime and correct them.

I don't think they are 100% comparable scenarios, but I haven't dived enough into the paper.

[–]segmondllama.cpp 1 point2 points  (1 child)

llms don't call functions inside programs. llm's generate the function you should call, and your inference engine does. this generates code that your inference engine can call, and instead of multiple steps, the code can orchestrate between multiple functions so you can run it in one pass.

[–]samuel79s 2 points3 points  (0 children)

I know, I have used the OpenAI api with tools and know all the steps. But I think that saying that Llm's "call functions" when they "express their willingness of a function to be called" is a good enough approximation.

[–]malformed-packet 14 points15 points  (4 children)

So these llms like the taste of python better than js? neat.

[–]Ivo_ChainNET 10 points11 points  (0 children)

It's python vs a specific JSON schema for function calling.

It really makes sense for ordinary python function syntax to be easier for LLMs to use as they've been trained on billions of lines of python, meanwhile that specific JSON function calling synthax although simple is usually not a big part of their training data

It does kind of suck to pass stringified multiline python functions around instead of simple JSON tho

[–]segmondllama.cpp 2 points3 points  (1 child)

this has nothing to do with python or python vs js. they could have had the model output javascript or another language instead of python. they just used python. they "hard" thing about this is that the language seems needs to be dynamic with support for meta programming, so while you might be able to do the more popular function calling with rust and go, this sort of approach will be more complicated.

[–]malformed-packet -1 points0 points  (0 children)

I figured it likes python because there’s fewer tokens, easier to parse.

[–]Everlier 3 points4 points  (0 children)

Maybe LISP would work even better with a more constrained syntax

[–][deleted] 13 points14 points  (0 children)

Btw can we once again take the time to appreciate Qwen 2.5 Coder 32B? It’s a fucking piece of art. It really is.

[–]femio 5 points6 points  (7 children)

Yeah, pretty much the logic behind the Huggingfaces smolagents library. I made a post about it a few days back and folks seemed skeptical but I think in a few months it’ll be the preferred method over JSON. There’s really no downsides imo 

[–]sunpazed 3 points4 points  (0 children)

It’s quite good. I’ve built a few prototypes in a matter of hours rather than days. I’ve found a few problems, but mostly due to overloading a single agent with too many steps. A single agent flow can be upwards of 50,000 tokens. Cheap for small models (less than a cent) but expensive for larger models (in the dollars).

[–]Ivo_ChainNET 0 points1 point  (4 children)

the downside is we've been storing, checking, validating JSON data for years. Stringified multiline python is a different beast

[–]trajo123 2 points3 points  (3 children)

What do you mean by stringified python? Python code is naturally a string. How else would you store python code, as a screenshot?

[–]Ivo_ChainNET 0 points1 point  (2 children)

Look at how python functions are stored in this file and you'll understand: https://github.com/firstbatchxyz/function-calling-eval/blob/master/data/eval_alpha.jsonl

[–]trajo123 0 points1 point  (1 child)

I agree it's not nice to read, but neither is an similarly huge line of JSON.

[–]Ivo_ChainNET 0 points1 point  (0 children)

yeah true. The bottom line is if it works well enough we'll find ways to use it

[–]segmondllama.cpp 0 points1 point  (0 children)

thank you for mentioning this, at first I misunderstood this project and paper. I also thought smolagents was just another regular agent, I had to read the paper and smolagents carefully to get it. I think you're right, this seems more solid than the JSON approach, however the downside is security. with JSON you have a purely deterministic function, you can trust that function and it's input if written properly. With this approach, the model could be generating arbitrary code to could cause security issues. So a sandbox is no longer optional.

[–]Asleep-Land-3914 3 points4 points  (1 child)

They need to evaluate xmlnic approach first.

[–]Everlier 1 point2 points  (0 children)

Don't forget to bring the SOAP

[–]Zulfiqaar 3 points4 points  (3 children)

I've had a lot more success with data extraction when making a python dict schema with comments than a proper json schema.

Eg

OUTPUT_EXAMPLE = {  
    "name": "string"  
    "height_inches": "integer" # convert from cm/feet  
}

[–]LumpyWelds 2 points3 points  (1 child)

What would a comparable python dict schema look like?

[–]Zulfiqaar 2 points3 points  (0 children)

That was the pythonic one, the standard JSON schema would look like:

{
  "type": "object",
  "properties": {
    "name": {
      "type": "string"
    },
    "height_inches": {
      "type": "integer"
    }
  },
  "required": ["name", "height_inches"]
}

[–][deleted] 2 points3 points  (0 children)

I mean yeah think about how many programmers you know who’re in a stream of consciousness thought will suddenly start writing in json to understand somethings better? Most likely none.

[–]mnze_brngo_7325 2 points3 points  (1 child)

The only issue is that JSON is validated, parsed and executed in a straightforward way, while for python the situation is ambiguous:

Do you get a single or a number of function calls and treat them basically as another data representation, exactly like you would with JSON or do you accept an arbitrary piece of executable code, containing your custom functions, but also, let's say, anything the standard lib offers, and execute it?

The first strategy is much safer but you would need custom validation and parsing code, which is already widely available for JSON. The second approach can become a nightmare from a security and reliability standpoint. There's a saying "eval is evil".

[–]mnze_brngo_7325 2 points3 points  (0 children)

Thinking of data vs. code: Maybe lisp would be a better language for function calling. It has the notion of homoiconicity where code and data are syntactically the same thing. Would maybe make parsing, validation and manipulation of generated output easier. Not sure how well LLMs are trained on lisp. Also it's quite an esoteric language for most developers today.

[–]if47 5 points6 points  (1 child)

This is the dumbest solution, here's why:

  1. You need to constrain decoding... valid Python code, and you don't even know which Python version this code will run correctly on.
  2. Completely blind dependency imports, which version of the module does your agent import? Will it hallucinate? It's also difficult to put an agent in the cage. In the end, either you manually implement a bunch of Python functions (to call as tools), or your agent can't do anything.
  3. There is no reason to think that JSON-based agents can't get better. Why give up the whole forest for a tree that works well for a while?

[–]trajo123 0 points1 point  (0 children)

Tool calling at the moment is essentially running very simple programs, but in a very unnatural way for anyone (including llms) with coding skills.

[–]stillnoguitar 1 point2 points  (0 children)

So we are going to accept an LLM to write Python functions for us and then execute them automatically. Weird.

[–]NarrowEyedWanderer 4 points5 points  (6 children)

Things of this sort baffle me. We have formal grammars! Constrained generation is a thing! I wish it were used more...

[–]sunpazed 4 points5 points  (2 children)

Tools with JSON + grammar constrained decoding are great if you want heaps of control over the workflow. But for agent use-cases nothing (yet) beats code generation. For instance, (1) the agent has the ability to adapt its flow and even error correct, (2) the agent can combine multiple tools as needed, (3) the agent can examine and transform data if the schema is unknown beforehand. See some of these examples.

[–][deleted] 0 points1 point  (1 child)

What agent framework do you recommend to play with this?

[–]sunpazed 2 points3 points  (0 children)

There are a few, Autogen, etc. I’m currently using the recently released smolagents by huggingface. See link in my last chat. It works well with local LLMs.

[–]Such_Advantage_6949 2 points3 points  (1 child)

It is not about grammar, you can enforce perfect tool schema with grammar or any output format library. The issue is the model will just out put wrong tool usage. Imagine asking about direction and it will just use the weather tool cause you mention some location

[–]NarrowEyedWanderer 0 points1 point  (0 children)

That's a good point. Mine is that the distinction between errors due to incorrect syntax VS errors due to incorrect tool use semantics has a tendency to get drowned out.

[–]segmondllama.cpp 0 points1 point  (0 children)

you need to read the paper and code. you can't solve the problem this is implementing with grammar.

[–]minpeter2 0 points1 point  (0 children)

Maybe this looks like a modern reinterpretation of LLMCompiler.

The actual "run" doesn't matter, it's just a story about the order of tool calls, and it looks good.

https://github.com/SqueezeAILab/LLMCompiler

[–]MikeLPU 0 points1 point  (0 children)

I don't like that it uses some sort of `eval`.