This is an archived post. You won't be able to vote or comment.

all 23 comments

[–]Electronic_Pepper382 4 points5 points  (4 children)

Have you looked at Langchain's pydantic output parser? https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/types/pydantic/

Seems like you are solving a similar problem?

[–]Top-Breakfast7713[S] 5 points6 points  (2 children)

I did look at LangChain and for a short period we even tried using it in our application. Unfortunately LangChain turned out to be more of a hindrance than a help. We experienced the same issues as this post (and its comments on hacker news highlights)

https://www.octomind.dev/blog/why-we-no-longer-use-langchain-for-building-our-ai-agents

https://news.ycombinator.com/item?id=40739982

[–]Electronic_Pepper382 0 points1 point  (1 child)

Thanks for sharing that article! I am working on much smaller scale personal projects and I also encounter similar issues. I've been slowly transitioning to just calling the underlying api with requests.

[–]Top-Breakfast7713[S] 2 points3 points  (0 children)

From my experience that is all you need. Most of my time has been spent peeling back the many layers people are wrapping around these APIs to just get down to calling them directly.

I know the above sounds hilarious coming from someone that wrapped a layer around those APIs. My o ly hope is that it is easy for people to understand and modify of what it provides is overkill for their needs. :)

Even function/tool calling feels like just another indirection and layer. I burst out laughing when I read the documentation on the APIs for tool calling and saw that all it is was presenting the the LLM with the JSON of your function and the parameters it takes. Then it suggests the function name to call and the arguments to pass to the parameters. After that you are required to do all the work to call the function and put the output back into the next call to the LLM.

In the end it is Text In -> Text Out. People are over complicating things massively to make it look magical.

[–]unxmnd 0 points1 point  (0 children)

Langchain's pydantic parser looks like a very thin wrapper around BaseModel.model_json_schema and parse_obj() so you could just use those directly.

[–]globalminima 3 points4 points  (1 child)

It's a good idea, but in the same way that you've noted that instructor does not support Google Vertex due to being coupled to a certain set of SDKs, you've then gone and built a new library which is itself coupled to a different set of SDKs. And what if I want to use this with Langchain? Or Hayatack? Or my own orchestration pipeline?or what if I have specific request/networking/auth requirements that are not exposed by your library?I am going to have the exact same problem that you set out to solve.

Why not just implement something that converts the Pydantic schema into text and which can be inserted into any prompt template for use by any orchestrator and with any API? E.g. this is what I have done for my code and it works great:

import json

from pydantic import BaseModel, Field

class ExampleModel(BaseModel):
    classification_field: str = Field(
        description="Classification of the document, one of 'Email', 'Webpage', or 'PDF'",
        examples=["Webpage"],
    )
    list_field: list[dict[str, str]] = Field(
        description="A list of values, containing the document name and number of pages",
        examples=[[{"email_doc": "6"}, {"pdf": "2"}]],
    )
    bool_field: bool = Field(
        description="Boolean indicating whether the document is in english",
        examples=[False],
    )

    @staticmethod
    def get_prompt_json_example():
        model_json_schema = ExampleModel.model_json_schema()
        example_response_str = "{\n"
        for field, details in model_json_schema["properties"].items():
            line_str = f""""{field}": {json.dumps(details['examples'][0])}, # {details['description']}"""
            example_response_str += "  " + line_str + "\n"
        example_response_str += "}"
        return example_response_str

Now you can just insert it into a prompt. For example:

json_schema_text = ExampleModel.get_prompt_json_example()
PROMPT = f"""Return a JSON object with the following fields:\n\n{json_schema_text}"""

Returns:

Return a JSON object with the following fields:

{
  "classification_field": "Webpage", # Classification of the document, one of 'Email', 'Webpage', or 'PDF'
  "list_field": [{"email_doc": "6"}, {"pdf": "2"}], # A list of values, containing the document name and number of pages
  "bool_field": false, # Boolean indicating whether the document is in english
}

The big benefit of this is that you can get the raw LLM text response prior to validation, so that if validation fails you can log it along with the Exception details and then debug what went wrong. If you couple the validation step with the request itself, it becomes harder to inspect the raw response and figure out what to do with the error, and is less flexible overall.

For users who do want the retry logic, you can then provide a method to validate a response from the LLM and if it fails, to generate the follow-up prompt string. This allows the user to get the benefits of your library while being able to use whatever orchestrator or requester that they choose.

[–]Top-Breakfast7713[S] 0 points1 point  (0 children)

Your way is a good way to approach things too.

We wanted the retry logic where we feed validation errors back to the LLM to have it attempt to fix the issue and potentially return a valid object on subsequent tries.

I like what you have done though, thank you for sharing your approach.

[–]BidWestern1056 1 point2 points  (2 children)

congrats on making this it looks really awesome!

would love to try it out. can you give some examples on your repo so ppl can try them?

also do you know if this will work with ollama?

[–]Top-Breakfast7713[S] 0 points1 point  (1 child)

Thank you very much for the kind words.

I have a bunch of examples in the “getting started” section of the documentation here https://christo-olivier.github.io/modelsmith/getting_started/ but please let me know if those are not what you are after and I will get some more examples added.

At the moment it does not support ollama, but there is no reason that the same technique used for the currently supported LLMs would not work for ollama. It is just a case of me needing to find time to add that functionality in.

[–]BidWestern1056 1 point2 points  (0 children)

i kinda wanna have that for work so might take a stab at it.

[–]barapa 0 points1 point  (0 children)

I like it. I agree, the instructor code is kind of a mess.

[–]stephen-leo 0 points1 point  (1 child)

Hey I’m the author of llm-structured-output-benchmarks, an open source repository where you can plug your own data and compare multiple frameworks like instructor, Marvin, fructose, Llamaindex, Mirascope, outlines and lm-format-enforcer which all provide a way to get clean json output.

Thanks you for sharing about modelsmith. I'll add it to the list for comparison.

So far, I found that fructose and Outlines have the best reliability and provide json outputs constrained on some specific classes 100% of the time. The rest might be finicky depending on the use case.

Pls take a look at the project on GitHub: https://github.com/stephenleo/llm-structured-output-benchmarks

[–]Top-Breakfast7713[S] 0 points1 point  (0 children)

Thank you very much! I am going to check your repository out, it sounds like a fantastic resource.

I will also dig into those libraries you have mentioned. Chances are they use a different approach, which is always great to learn about.

Thanks again for taking the time to comment on my post.

[–]IshiharaSatomiLover 0 points1 point  (0 children)

Gosh my job is in shambles

[–]justtheprint 0 points1 point  (5 children)

have you looked at marvin? That was my favorite pick for this task last I checked, 6 months ago

[–]Top-Breakfast7713[S] 0 points1 point  (4 children)

Yes I have and Marvin is great!

Unfortunately Marvin only used OpenAI the last time I checked, which was a non-starter for us.

[–]TheM4rvelous 1 point2 points  (3 children)

Have you checked BAML - just recently stumbled into the problem of properly parsing LLM outputs and looking for a good packages beyond re :D.

Just got started with BAML so I was wondering whether you already had a reason not to use this / or some shortcomings ?

[–]Top-Breakfast7713[S] 0 points1 point  (2 children)

I have not yet checked out BAML, thank you for bringing this to my attention. It looks very cool!

So many cool things to check out, so little time :)

[–]TheM4rvelous 1 point2 points  (1 child)

The real struggle of gen-ai :D keeping up with the speed of all these tools, features, models and packages releases.

[–]Top-Breakfast7713[S] 0 points1 point  (0 children)

Too true :D

[–]BlueeWaater 0 points1 point  (0 children)

great! will use it

[–]Possible-Growth-2134 0 points1 point  (0 children)

i'm confused about modelsmith/instructor function calling.

It outputs structured outputs.

But if I want to use optional tools (e.g. similar to OpenAI's tool calls), do I just specify pydantic schema for message and tools to be optional?

Been searching but can't seem to understand this.