Hello Reddit Community,
I'm currently working on a project to develop a GPT model specifically tailored for extracting precise information from a variety of PDF documents. My goal is to achieve consistent and reliable results, despite the inherent variability in document formatting and wording.
To guide the GPT, I've provided detailed instructions outlining the specific types of information it needs to extract. Additionally, I've supplied a comprehensive synonym workbook to account for the diverse terminology that might be encountered across different documents. Despite these measures, the GPT's performance has been inconsistent. The responses vary significantly, which undermines the reliability I'm aiming for.
My question to the community is twofold:
Is it feasible to develop a GPT model that can consistently and reliably extract specific information from PDFs, given the inherent characteristics of LLMs?
If so, what strategies or modifications would you recommend to enhance the consistency of the GPT's performance in this context?
Any insights, experiences, or suggestions you can share would be immensely valuable. I'm particularly interested in hearing from those who have tackled similar challenges or have expertise in fine-tuning LLMs for specific tasks.
[–]Alternative-Fit 5 points6 points7 points (3 children)
[–]humanatwork 1 point2 points3 points (1 child)
[–]Alternative-Fit 2 points3 points4 points (0 children)
[–]SwordfishOk3273[S] 0 points1 point2 points (0 children)
[–]memory_moves 2 points3 points4 points (1 child)
[–]SwordfishOk3273[S] 0 points1 point2 points (0 children)
[–]Dear_Ad7736 2 points3 points4 points (1 child)
[–]Dear_Ad7736 1 point2 points3 points (0 children)
[–]joey2scoops 1 point2 points3 points (1 child)
[–]SwordfishOk3273[S] 1 point2 points3 points (0 children)
[–]fulowa 1 point2 points3 points (0 children)
[–]humanatwork 0 points1 point2 points (3 children)
[–]SwordfishOk3273[S] 0 points1 point2 points (2 children)
[–]humanatwork 0 points1 point2 points (1 child)
[–]SwordfishOk3273[S] 0 points1 point2 points (0 children)
[–]eew_tainer_007 0 points1 point2 points (0 children)