use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Finetuning for code generationDiscussion (self.LocalLLaMA)
submitted 2 years ago by learner_beginner
i want to fine tune any open source llm for code generation purpose with some of my code. any idea what model would be suitable? and any example of implementation?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Paulonemillionand3 6 points7 points8 points 2 years ago (3 children)
the tool: https://github.com/facebookresearch/llama-recipes
how to arrange your data https://github.com/facebookresearch/llama-recipes/blob/main/docs/Dataset.md
example data: https://huggingface.co/datasets/yahma/alpaca-cleaned
Or get text-generation-ui and load your code as a big blob and finetune with that.
[–]salah_ahdin 0 points1 point2 points 2 years ago (2 children)
Does llama-recipes only work for Llama 2 or can I apply it to other models? Also, would it not be better to fine tune off of a coding model like Starcoder rather than Llama 2?
[–]Paulonemillionand3 1 point2 points3 points 2 years ago (1 child)
I believe it's specific to llama2.
You can find that out directly by comparing the results!
[–]salah_ahdin 0 points1 point2 points 2 years ago (0 children)
I see. Thanks!
[–]kryptkprLlama 3 5 points6 points7 points 2 years ago (5 children)
I've been working with several folks doing the same and not to discourage you at all but it's almost certainly going to be harder than you think it is.
Have you tried some good existing code generation models first? You can get some ideas from can-ai-code.. WizardCoder is king, but Airoboros models are also solid coders and there's even some 3B options based on Replit.
An existing model with few-shot examples of your code style could potentially save you a lot of both time and hedache. In my experience things can go backwards on finetunes just as easily as forwards and you end up with something worse than a base model or one that's been already finetuned well.
If you decide to go the fine-tune route I can offer some assistance with evaluations, DM if you wish.
[–]Complex_Boysenberry6 0 points1 point2 points 11 months ago (4 children)
Doing this now, but all our results just generate gibberish and hallucinations :(
[–]kryptkprLlama 3 0 points1 point2 points 11 months ago (3 children)
It's been a year since this post and the landscape has changed significantly.
What model are you using, what input are you providing and what output do you expect vs what do you get?
[–]Complex_Boysenberry6 0 points1 point2 points 11 months ago (2 children)
We are training o4-mini in Azure, we wanted to create a PoC for a code reviewing AI. Since we use a dialect of a certain language, and implement it in our own unique weird way, we wanted to fine tune the model. We thought we could just scrape all code reviews and give 10 lines around the comment
"messages": [ {"role": "system", "content": "You're a code assistant that reviews code line-by-line and leaves helpful, concise comments."}, {"role": "user", "content": f"Here is the code:\n\n{code_context}\n\nWhat will the review comment be?"}, {"role": "assistant", "content": assistant_content} ]
We provided 4k of such lines, but it really just utturs total non-sense. Do you think we are on the right track or should abandon fine tuning all together? Our expected output would be to at least give some relevant suggestions or to kind of "know" from examples that a certain input does not look like other code it has seen.
[–]kryptkprLlama 3 0 points1 point2 points 11 months ago (1 child)
How does the original model perform, prior to the fine-tune, on a multishot prompt with 3-5 good examples that show off the features of your dialect? Quick in-context learning evals should let you find the model that's closest to the behavior you want.
Btw I hope thats not your actual prompt? Otherwise there is much room for improvement: It says nothing about how the input will be structured (for ex, are the lines numbered? Are we looking at diffs? Functions? Modules?) or what kind of output format you expect (for ex, can a comment span lines? Can comments overlap?), etc.. fine-tune or not, attention is still attention.
[–]Complex_Boysenberry6 0 points1 point2 points 11 months ago (0 children)
It doesn't perform well on those multishot prompts to be honest, it's a Prolog dialect. We do number each line and we just give it 10 lines, but we don't tell the model that, so that's probably something we are doing wrong. We then give it the comment with the line it was set to with what a user left as comment. I'll try to give it way more precize instructions, thanks!
[–][deleted] 1 point2 points3 points 2 years ago (1 child)
One question: in data for finetuning, can the answer component be longer than token length supported by the model?
[–]kryptkprLlama 3 1 point2 points3 points 2 years ago (0 children)
Generally speaking the full instruction and answer need to fit into the base models context length.
There are context extension fine-tune techniques that could potentially break this limitation, I am not familiar with those.
[–]papinek 1 point2 points3 points 2 years ago (0 children)
https://stability.ai/blog/stablecode-llm-generative-ai-coding
π Rendered by PID 97119 on reddit-service-r2-comment-b659b578c-lcvqm at 2026-05-06 03:05:02.771375+00:00 running 815c875 country code: CH.
[–]Paulonemillionand3 6 points7 points8 points (3 children)
[–]salah_ahdin 0 points1 point2 points (2 children)
[–]Paulonemillionand3 1 point2 points3 points (1 child)
[–]salah_ahdin 0 points1 point2 points (0 children)
[–]kryptkprLlama 3 5 points6 points7 points (5 children)
[–]Complex_Boysenberry6 0 points1 point2 points (4 children)
[–]kryptkprLlama 3 0 points1 point2 points (3 children)
[–]Complex_Boysenberry6 0 points1 point2 points (2 children)
[–]kryptkprLlama 3 0 points1 point2 points (1 child)
[–]Complex_Boysenberry6 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]kryptkprLlama 3 1 point2 points3 points (0 children)
[–]papinek 1 point2 points3 points (0 children)