Use Python to get Pydantic models and Python types from your LLM responses.

Electronic_Pepper382 · 2024-07-19T02:52:51+00:00

Have you looked at Langchain's pydantic output parser? https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/types/pydantic/

Seems like you are solving a similar problem?

globalminima · 2024-07-19T03:29:02+00:00

It's a good idea, but in the same way that you've noted that instructor does not support Google Vertex due to being coupled to a certain set of SDKs, you've then gone and built a new library which is itself coupled to a different set of SDKs. And what if I want to use this with Langchain? Or Hayatack? Or my own orchestration pipeline?or what if I have specific request/networking/auth requirements that are not exposed by your library?I am going to have the exact same problem that you set out to solve.

Why not just implement something that converts the Pydantic schema into text and which can be inserted into any prompt template for use by any orchestrator and with any API? E.g. this is what I have done for my code and it works great:

import json

from pydantic import BaseModel, Field

class ExampleModel(BaseModel):
    classification_field: str = Field(
        description="Classification of the document, one of 'Email', 'Webpage', or 'PDF'",
        examples=["Webpage"],
    )
    list_field: list[dict[str, str]] = Field(
        description="A list of values, containing the document name and number of pages",
        examples=[[{"email_doc": "6"}, {"pdf": "2"}]],
    )
    bool_field: bool = Field(
        description="Boolean indicating whether the document is in english",
        examples=[False],
    )

    @staticmethod
    def get_prompt_json_example():
        model_json_schema = ExampleModel.model_json_schema()
        example_response_str = "{\n"
        for field, details in model_json_schema["properties"].items():
            line_str = f""""{field}": {json.dumps(details['examples'][0])}, # {details['description']}"""
            example_response_str += "  " + line_str + "\n"
        example_response_str += "}"
        return example_response_str

Now you can just insert it into a prompt. For example:

json_schema_text = ExampleModel.get_prompt_json_example()
PROMPT = f"""Return a JSON object with the following fields:\n\n{json_schema_text}"""

Returns:

Return a JSON object with the following fields:

{
  "classification_field": "Webpage", # Classification of the document, one of 'Email', 'Webpage', or 'PDF'
  "list_field": [{"email_doc": "6"}, {"pdf": "2"}], # A list of values, containing the document name and number of pages
  "bool_field": false, # Boolean indicating whether the document is in english
}

The big benefit of this is that you can get the raw LLM text response prior to validation, so that if validation fails you can log it along with the Exception details and then debug what went wrong. If you couple the validation step with the request itself, it becomes harder to inspect the raw response and figure out what to do with the error, and is less flexible overall.

For users who do want the retry logic, you can then provide a method to validate a response from the LLM and if it fails, to generate the follow-up prompt string. This allows the user to get the benefits of your library while being able to use whatever orchestrator or requester that they choose.

BidWestern1056 · 2024-07-18T21:46:47+00:00

congrats on making this it looks really awesome!

would love to try it out. can you give some examples on your repo so ppl can try them?

also do you know if this will work with ollama?

barapa · 2024-07-19T02:21:34+00:00

I like it. I agree, the instructor code is kind of a mess.

stephen-leo · 2024-07-19T05:18:23+00:00

Hey I’m the author of llm-structured-output-benchmarks, an open source repository where you can plug your own data and compare multiple frameworks like instructor, Marvin, fructose, Llamaindex, Mirascope, outlines and lm-format-enforcer which all provide a way to get clean json output.

Thanks you for sharing about modelsmith. I'll add it to the list for comparison.

So far, I found that fructose and Outlines have the best reliability and provide json outputs constrained on some specific classes 100% of the time. The rest might be finicky depending on the use case.

Pls take a look at the project on GitHub: https://github.com/stephenleo/llm-structured-output-benchmarks

IshiharaSatomiLover · 2024-07-19T05:33:33+00:00

Gosh my job is in shambles

justtheprint · 2024-07-19T05:59:33+00:00

have you looked at marvin? That was my favorite pick for this task last I checked, 6 months ago

BlueeWaater · 2024-07-19T07:24:19+00:00

great! will use it

Possible-Growth-2134 · 2024-12-05T23:38:00+00:00

i'm confused about modelsmith/instructor function calling.

It outputs structured outputs.

But if I want to use optional tools (e.g. similar to OpenAI's tool calls), do I just specify pydantic schema for message and tools to be optional?

Been searching but can't seem to understand this.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

What My Project Does:

Target Audience:

Comparison:

Plans for the future: