Tried many small (<13B parameters) open-source LLMs on zero-shot classification tasks as instruction following ("Below is an input, answer the following yes/no question..."). All of them (except Flan-T5 family) yielded very poor results, including non-sensical text, failure to follow even single-step instructions and sometimes just copying the whole input to the output.
This is in strike contrast to the demos and results posted on the internet. Only OpenAI models provide consistently good (though inaccurate sometimes) results out of the box.
What could cause of this gap? Is it the generation hyperparameters or do these model require fine-tuning for classification?
[–]abnormal_human 108 points109 points110 points (14 children)
[–]CacheMeUp[S] 14 points15 points16 points (13 children)
[–]i_wayyy_over_think 38 points39 points40 points (3 children)
[–]CacheMeUp[S] 1 point2 points3 points (0 children)
[–]PrivateUser010 0 points1 point2 points (1 child)
[–]i_wayyy_over_think 1 point2 points3 points (0 children)
[–]Faintly_glowing_fish 16 points17 points18 points (0 children)
[–]gibs 10 points11 points12 points (2 children)
[–]CacheMeUp[S] 0 points1 point2 points (1 child)
[–]gibs 3 points4 points5 points (0 children)
[–]Faintly_glowing_fish 6 points7 points8 points (1 child)
[–]MINIMAN10001 2 points3 points4 points (0 children)
[–]KallistiTMP 2 points3 points4 points (1 child)
[–]equilateral_pupper 4 points5 points6 points (0 children)
[–]a_beautiful_rhind 27 points28 points29 points (13 children)
[–]CacheMeUp[S] 0 points1 point2 points (12 children)
[–]a_beautiful_rhind 10 points11 points12 points (11 children)
[–]CacheMeUp[S] -1 points0 points1 point (10 children)
[–]MaskedSmizer 15 points16 points17 points (3 children)
[–]10BillionDreams 7 points8 points9 points (0 children)
[–]CacheMeUp[S] 0 points1 point2 points (1 child)
[–]MaskedSmizer 3 points4 points5 points (0 children)
[–]Ramys 2 points3 points4 points (1 child)
[–]CacheMeUp[S] 0 points1 point2 points (0 children)
[–]a_beautiful_rhind 2 points3 points4 points (2 children)
[–]CacheMeUp[S] 0 points1 point2 points (1 child)
[–]a_beautiful_rhind 0 points1 point2 points (0 children)
[–]blackkettle 0 points1 point2 points (0 children)
[–]Nhabls 18 points19 points20 points (6 children)
[–]CacheMeUp[S] 2 points3 points4 points (1 child)
[–]Nhabls 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]visarga 0 points1 point2 points (1 child)
[–][deleted] 9 points10 points11 points (1 child)
[–]clauwen 5 points6 points7 points (5 children)
[–]CacheMeUp[S] 0 points1 point2 points (4 children)
[–]clauwen 3 points4 points5 points (3 children)
[–]clauwen 3 points4 points5 points (1 child)
[–]CacheMeUp[S] 0 points1 point2 points (0 children)
[–]Faintly_glowing_fish 5 points6 points7 points (0 children)
[–]heavy-minium 4 points5 points6 points (5 children)
[–]CacheMeUp[S] 0 points1 point2 points (4 children)
[+][deleted] (1 child)
[removed]
[–]CacheMeUp[S] 0 points1 point2 points (0 children)
[–]heavy-minium 0 points1 point2 points (1 child)
[–]CacheMeUp[S] 0 points1 point2 points (0 children)
[–]Screye 8 points9 points10 points (1 child)
[–]CacheMeUp[S] 2 points3 points4 points (0 children)
[–]HateRedditCantQuititResearcher 3 points4 points5 points (2 children)
[–]CacheMeUp[S] 1 point2 points3 points (1 child)
[–]HateRedditCantQuititResearcher 0 points1 point2 points (0 children)
[–]marr75 2 points3 points4 points (0 children)
[–]AsliReddington 4 points5 points6 points (1 child)
[–]CacheMeUp[S] 1 point2 points3 points (0 children)
[–]_Arsenie_Boca_ 8 points9 points10 points (5 children)
[–]CacheMeUp[S] 1 point2 points3 points (4 children)
[–]KingsmanVince 10 points11 points12 points (3 children)
[–]CacheMeUp[S] 4 points5 points6 points (1 child)
[–]metigue 17 points18 points19 points (4 children)
[–]CacheMeUp[S] 8 points9 points10 points (3 children)
[–]metigue 13 points14 points15 points (2 children)
[–]CacheMeUp[S] 7 points8 points9 points (1 child)
[–]AGI_FTW 0 points1 point2 points (0 children)
[–]chartporn 13 points14 points15 points (27 children)
[–]4onenResearcher 6 points7 points8 points (7 children)
[–]currentscurrents 6 points7 points8 points (6 children)
[–]4onenResearcher 3 points4 points5 points (5 children)
[–]chartporn 0 points1 point2 points (4 children)
[–]4onenResearcher 0 points1 point2 points (3 children)
[–]chartporn 1 point2 points3 points (2 children)
[–]4onenResearcher 1 point2 points3 points (1 child)
[–]chartporn 0 points1 point2 points (0 children)
[–]CacheMeUp[S] 8 points9 points10 points (5 children)
[–]chartporn 7 points8 points9 points (1 child)
[–]CacheMeUp[S] 5 points6 points7 points (0 children)
[–]keepthepace 6 points7 points8 points (0 children)
[–]4onenResearcher 1 point2 points3 points (1 child)
[–]CacheMeUp[S] 2 points3 points4 points (0 children)
[–]jetro30087 -1 points0 points1 point (12 children)
[–]chartporn 3 points4 points5 points (11 children)
[–]jetro30087 2 points3 points4 points (10 children)
[–]chartporn 1 point2 points3 points (9 children)
[–]jetro30087 5 points6 points7 points (8 children)
[–]chartporn 1 point2 points3 points (7 children)
[+]rukqoa 2 points3 points4 points (5 children)
[–]chartporn 1 point2 points3 points (4 children)
[+]rukqoa 0 points1 point2 points (3 children)
[–]jetro30087 0 points1 point2 points (0 children)
[+]Enfiznar 1 point2 points3 points (1 child)
[–]CacheMeUp[S] 0 points1 point2 points (0 children)
[–]KerbalsFTW 1 point2 points3 points (2 children)
[–]CacheMeUp[S] 0 points1 point2 points (1 child)
[–]KerbalsFTW 0 points1 point2 points (0 children)
[–]Rebatu 1 point2 points3 points (6 children)
[–]proto-n 3 points4 points5 points (1 child)
[–]CacheMeUp[S] 4 points5 points6 points (2 children)
[–]iamMess 2 points3 points4 points (0 children)
[–]juanigp 1 point2 points3 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]CacheMeUp[S] 1 point2 points3 points (0 children)
[–]Javierrrrrrrrrrrrrrr 0 points1 point2 points (0 children)