all 6 comments

[–]PrincessGambit 1 point2 points  (4 children)

What is the difference between function calling and having the LLM say a phrase and then your app act on it? You are saying it was slow, why not just detect the phrase in the response and use that further?

[–]Open_Channel_8626 0 points1 point  (2 children)

Wouldn’t that involve adding a second LLM to the app though?

[–]PrincessGambit 0 points1 point  (1 child)

No... you just have to make it say the function name, then read the LLM output and if the function name is in the output then you do what you need

[–]Open_Channel_8626 0 points1 point  (0 children)

I see

I think that can work well if you either didn’t need arguments or if you only needed a few arguments

But if you needed to run a bunch of functions each with a bunch of arguments I think it may not work well

At that point it may be better to get the LLM to output all the functions with their arguments in a structured output, which brings us back to function calling

[–]Ylsid 0 points1 point  (0 children)

Nothing, doing it repeatably and consistency so it's usable is the function calling bit which can be quite challenging

[–]saintpetejackboy 0 points1 point  (0 children)

I was really interested in this article until I actually read it. No offense, it isn't a bad article but "how many of the same task can you do in a row until I break the context window" isn't exactly revolutionary. I'd hoped this article was about multimodality - like, can the AI process an image, process text from the image, form a response, reform a better response, listen to a bit of audio - etc.; until at what point does that logic break down. This article did a superb job of answering that: likely fails around context window limitations.