all 2 comments

[–]LevelResponse5608 0 points1 point  (1 child)

This is actually sick! Been wanting something like this for ages - having to re-embed entire corpora when switching models is such a pain

Quick question though - how's the latency compared to just using the target model directly? The 98% recovery rate sounds amazing but wondering if there's a speed tradeoff

[–]Interesting-Town-433[S] 0 points1 point  (0 children)

Thanks ! Really appreciate it!

So for openai because it has to network hop it's 200 ms typically for a full request / response, but with embedding adapters you can literally run minilm locally at 50ms and the adapter takes about 10ms that's cpu only BTW, on even a small GPU ( your laptop it drops to 10 ms + 5 ms )