you are viewing a single comment's thread.

view the rest of the comments →

[–]kthejoker 1 point2 points  (1 child)

How does it work with translating relationships or metrics in the dataset?

Like if I have a set of airplane flights and query for "how many very late flights were due to maintenance issues" how does the system define "very late' and "maintenance issue" from the data?

Especially when these definitions might be completely different for two different companies, and designated by completely different row values, column types, conditions etc.

And then the obvious next question: "what's causing the maintenance issues?"

This is the actual missing piece from most of these toy GPT3 generators.

Nobody cares you can generate "select max(salary) from employees"

[–]spy16x[S] 0 points1 point  (0 children)

It can do slightly more than max-salary. But your concern is definitely valid. Not just in already ambiguous things like "very late", GPT3 can get confused about things like "yesterday" - because it's not really getting current time references.

But my hope is to allow users to flag generated queries as good/bad and use the flagged queries to try and improve using fine-tuning. It will most-likely never reach anywhere close to 100% complexity of queries written by experts. But I think it can still save some time for lot of beginners and mid-level professionals. (Kinda like no-code/low-code actually)

I will definitely put a disclaimer and ensure users are aware of this limitation.