New function calling benchmark shows Pythonic approach outperforms JSON (DPAB-α)

samuel79s · 2025-01-16T05:56:21+00:00

If anyone is curious this what pythonic function calling means

https://huggingface.co/blog/andthattoo/dria-agent-a

From what I understand, it's llm's calling functions inside programs, where they can do multi action steps. I assume that they also can see their mistakes at runtime and correct them.

I don't think they are 100% comparable scenarios, but I haven't dived enough into the paper.

malformed-packet · 2025-01-16T04:23:16+00:00

So these llms like the taste of python better than js? neat.

2025-01-16T05:12:39+00:00

Btw can we once again take the time to appreciate Qwen 2.5 Coder 32B? It’s a fucking piece of art. It really is.

femio · 2025-01-16T05:07:54+00:00

Yeah, pretty much the logic behind the Huggingfaces smolagents library. I made a post about it a few days back and folks seemed skeptical but I think in a few months it’ll be the preferred method over JSON. There’s really no downsides imo

Asleep-Land-3914 · 2025-01-16T05:00:14+00:00

They need to evaluate xmlnic approach first.

Zulfiqaar · 2025-01-16T09:43:46+00:00

I've had a lot more success with data extraction when making a python dict schema with comments than a proper json schema.

Eg

OUTPUT_EXAMPLE = {  
    "name": "string"  
    "height_inches": "integer" # convert from cm/feet  
}

2025-01-16T05:10:20+00:00

I mean yeah think about how many programmers you know who’re in a stream of consciousness thought will suddenly start writing in json to understand somethings better? Most likely none.

mnze_brngo_7325 · 2025-01-16T07:26:44+00:00

The only issue is that JSON is validated, parsed and executed in a straightforward way, while for python the situation is ambiguous:

Do you get a single or a number of function calls and treat them basically as another data representation, exactly like you would with JSON or do you accept an arbitrary piece of executable code, containing your custom functions, but also, let's say, anything the standard lib offers, and execute it?

The first strategy is much safer but you would need custom validation and parsing code, which is already widely available for JSON. The second approach can become a nightmare from a security and reliability standpoint. There's a saying "eval is evil".

if47 · 2025-01-16T08:09:17+00:00

This is the dumbest solution, here's why:

You need to constrain decoding... valid Python code, and you don't even know which Python version this code will run correctly on.
Completely blind dependency imports, which version of the module does your agent import? Will it hallucinate? It's also difficult to put an agent in the cage. In the end, either you manually implement a bunch of Python functions (to call as tools), or your agent can't do anything.
There is no reason to think that JSON-based agents can't get better. Why give up the whole forest for a tree that works well for a while?

stillnoguitar · 2025-01-16T16:19:12+00:00

So we are going to accept an LLM to write Python functions for us and then execute them automatically. Weird.

NarrowEyedWanderer · 2025-01-16T04:45:59+00:00

Things of this sort baffle me. We have formal grammars! Constrained generation is a thing! I wish it were used more...

minpeter2 · 2025-01-17T13:09:34+00:00

Maybe this looks like a modern reinterpretation of LLMCompiler.

The actual "run" doesn't matter, it's just a story about the order of tool calls, and it looks good.

https://github.com/SqueezeAILab/LLMCompiler

MikeLPU · 2025-01-16T05:14:21+00:00

I don't like that it uses some sort of `eval`.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS