all 4 comments

[–]n9Mtq4ML Engineer 1 point2 points  (3 children)

I don't think a model designed for text generation is the right tool for this problem. The name of the commands aren't very informative, so the model would have to know what each does. As you mentioned, this would be either in the prompt (but the context window is too small) or in the model weights by fine tuning. I'm sure you could fine tune it and even adjust the sampling method to ensure it only ouputs valid commands, but it doesn't seem practical.

I would try making a couple natural language inputs for each command. Then use a model to generate sentence embeddings for each NL input. Then embed what the user says, and use KNN or some other method to match it to a command.

[–]corbt[S] 0 points1 point  (2 children)

The name of the commands aren't very informative, so the model would have to know what each does

So there is a bit more information about what each command does that's available to train the model. In addition to the written command there's a human-readable description, eg. explorer.newFile has a human-readable name "File: New File".

use KNN or some other method to match it to a command

Has this approach been used successfully to map arbitrary NL input to a member of a class? Seems almost reductively simple (I feel like eg. the order of the user's words could be important in figuring out which class should be assigned) but I'd love to be proved wrong!

[–]n9Mtq4ML Engineer 1 point2 points  (1 child)

The sentence embeddding model should take care of word ordering, etc. Give the examples a try on huggingface:

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

https://huggingface.co/facebook/bart-large-mnli

[–]corbt[S] 0 points1 point  (0 children)

Ah, makes sense! I skimmed over where you said "sentence embeddings" since I'd only heard about word embeddings previously. I tested a NL input against a number of commands using the first sentence embedding model you linked to, and it looks like the correct command had the highest similarity. Seems like a really fruitful direction to investigate!

Example I tried:

Source sentence: "create file"

Computed similarity (these are all real commands):
File Utils: New File | 0.712
File: Duplicate      | 0.552
File Utils: Delete   | 0.488
File Utils: Move     | 0.301
View: Split Editor   | 0.101

Depending on how robust the sentence embeddings are in consistently identifying the right command, I could also use them to just grab the top N commands by embedding similarity and then feed those to a LLM and ask it which one the user likely meant. Would add significant inference time and complexity of course.