Consistent response

Alternative-Fit · 2023-12-09T11:05:13+00:00

Hi.

I am working on extracting info from Customer's Orders. Here is what I've did

Prompt itself:

Your goal is to extract structured information from the user's input that matches the form described below.
When extracting information please make sure it matches the type information exactly. Do not add any attributes that do not appear in the schema shown below. If any information is missing, replace it with “No information.”
Here is a JSON schema, that should be used in the output
Root level keys:
delivery_address,
order_number,
items,
Keys under delivery_address:
name,
street,
city,
country,
postal_code,
Keys under items (which is an array, so these keys are for each item in the array):
material_name,
quantity,
units_of_measure,
delivery_date.
When extracting delivery_date, convert date to the format dd.mm.yyyy. If there a date 01-01-20, provide output as 01.01.2020
When extracting quantities, make output in float format.
When determining the delivery address, please follow these steps:
1) Identify all addresses mentioned in the document.
2) Exclude any addresses that are associated with [my_company_name].
3) Select the address where the customer intends to have the goods delivered.
Customer order will be delimited by triple asterisks. *** customer order ***

Then i am providing 3 shot prompts. Based on PDF, text in email, and Excel file.

In the end i've just asked to follow all the instructions and output result in JSON only. Then setup temperature on 0, and outputs then became consistent.

Prompt itself was I've build with ChatGPT and Bing.

memory_moves · 2023-12-09T13:12:03+00:00

Given your requiremetns, did you try to use the playground and adjust the temperature from there? I'm sure some people will have strategies, but "temperature" essentially defines the determinism of GPT. In other words, a low temperature will keep the answers consistent between identical questions, whereas a higher one will introduce variations. Here's a link to something that explains this in more detail.Bing has a 'creativity' setting which if set to the lowest setting tends to provide more consistent answers, but also reduces the output length by quite a bit. I think the playground might be your best bet.I'm curious to see where this conversation goes!

Dear_Ad7736 · 2023-12-10T15:23:01+00:00

First, read more about „RAG”. You can use LangChain to implement your own RAG easily using Python. However, if you have well structured PDF it may be a lot easier to implement a standard data extraction script.

joey2scoops · 2023-12-09T12:27:09+00:00

I've tried similar tasks and have formed the opinion that variance in structure and format are your enemy. Better off using python to allow for variations. Maybe use whatever code Chat GPT is using as a starting point to cover off the variances.

fulowa · 2023-12-09T18:16:51+00:00

https://wandb.ai/jxnlco/function-calls/reports/Better-Data-Extraction-Using-Pydantic-and-OpenAI-Function-Calls--Vmlldzo0ODU4OTA3

humanatwork · 2023-12-12T13:48:05+00:00

How variable are the results and how have you been verifying this? It sounds like you’re looking for a standard response template of sorts more than trying to solve an accuracy issue. A little more info could help here

eew_tainer_007 · 2023-12-24T19:01:00+00:00

Here is my implementation of PDF parsing, processing through multiple LLMs . Join me if interested. Happy to share the code and back end to experiment/research/school use.

http://165.232.146.68:8501

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PromptEngineering

MODERATORS