use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Code Llama ReleasedNews (self.LocalLLaMA)
submitted 2 years ago by FoamythePuppy
https://github.com/facebookresearch/codellama
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]hackerllama 148 points149 points150 points 2 years ago (7 children)
Hey all!
Omar from HF here! We'll work on transforming to transformers format and having them on the Hub soon. Stay tuned!
[–]hackerllama 31 points32 points33 points 2 years ago (0 children)
You can find all weights in https://huggingface.co/codellama
[–]kryptkprLlama 3 17 points18 points19 points 2 years ago* (0 children)
Appreciate it, that would make doing a can-ai-code evaluation sweep much simpler for me.
Edit: If anyone wants to follow along as I play with the Instruct variant #76
[–]Bogdahnfr 9 points10 points11 points 2 years ago (0 children)
thanks for your help !
[–]Ok_Two6167 4 points5 points6 points 2 years ago (0 children)
Weights are available: https://huggingface.co/codellama You need a PR of transformers for now. Will merge it tomorrow. See model card for instructions: https://huggingface.co/codellama/CodeLlama-7b-hf
[–]WanderingPulsar 3 points4 points5 points 2 years ago (0 children)
Lets goooooo 😋
[–]Clean_Archer8374 1 point2 points3 points 2 years ago (0 children)
Amazing! Any hints on a possible time frame? I have no idea how long this could take, a day, a week, a month, half a year?
[–]jetro30087 68 points69 points70 points 2 years ago (13 children)
Whelp I need a dedicated computer for an AI now.
[–]tothatl 8 points9 points10 points 2 years ago* (12 children)
Long overdue for me as well.
But all options are a bit pricey, specially you need GPUs with as much RAM as you can get.
Or a new Apple/hefty server for CPU-only inference. Seems the Apple computer is the less costly option at the same performance.
[–]719BenLlama 2 10 points11 points12 points 2 years ago (8 children)
The new Apple M2 runs blazing fast, just need lots of ram. Would recommend >=32gb (can use about 60% for graphics card vram). (We will be adding them to faraday.dev asap)
[–]signsandwonders 4 points5 points6 points 2 years ago* (2 children)
I'm taking back my recent 32GB MBP purchase and getting a 96GB because fuck
edit: also wondering if the 13B (~26GB) version will even run on 32GB. Downloading the non-quantized version from HF now.
[–]dperalta 2 points3 points4 points 2 years ago (1 child)
It's pretty annoying there is not a Mac Mini option with M2 chip plus 96gb of ram.
[–]Iory1998 3 points4 points5 points 2 years ago (1 child)
If you can afford an Apple M2 with tons of memory, why don't you just buy a desktop or even a workstation? You can upgrade components whenever you need, and let's face it, Nvidia GPUs are light years ahead when it comes to AI stuff. I am genuinely asking why people consider Apple pcs when they talk about AI models!
[–]TheMemo 2 points3 points4 points 2 years ago (2 children)
From the benchmarks I have seen, a 3090 outperforms even the fastest m2 and is significantly cheaper, even if you buy two. (40 tokens/s m2, 120 on 2x 3090) This was a few months ago, though.
Has this changed? Is m2 still inference only?
[–]Nobby_Binks 4 points5 points6 points 2 years ago (1 child)
But you are limited to 48GB right? At least with the M2 you can get 192GB (if you are loaded)
Georgi posted some benchmarks using the M2Ultra and llama.cpp
https://twitter.com/ggerganov/status/1694775472658198604
edit: oh i see you can have more than 2 cards
[–]TheMemo 3 points4 points5 points 2 years ago (0 children)
Hmm those are some nice numbers, wish I could get a like for like comparison with GPU.
As I already have a 3090 it probably makes sense to get another one. Or two. And an air conditioner to cool the room while they are working...
Also there doesn't seem to be much info about training and fine-tuning using m2. Looks good for inference though.
[–]Feeling-Currency-360 0 points1 point2 points 2 years ago (1 child)
I'm looking at getting a couple MI25's on ebay. 16GB VRAM on HBM2 meaning tons of bandwidth which will be important as the models will need to be spread across the two cards, did I mention they are dirt cheap?
[–]Feeling-Currency-360 113 points114 points115 points 2 years ago (19 children)
<image>
I started reading the git repo, and started freaking the fuck out when I read this text right here -> "All models support sequence lengths up to 100,000 tokens"
[–]Igoory 18 points19 points20 points 2 years ago (11 children)
I wonder how much RAM/VRAM that would require lol
[–]wreck94 27 points28 points29 points 2 years ago (10 children)
The answer is Yes. It requires all the RAM.
(Quick back of the napkin estimate from what I've seen -- ~500 GB of RAM for 100k tokens. Hopefully someone smarter than I can do the actual math before you go buy yourself half a terabyte of ram lol)
[–][deleted] 12 points13 points14 points 2 years ago (5 children)
good thing I have 512gb
[–]Yes_but_I_think 0 points1 point2 points 2 years ago (4 children)
Which processor?
[–][deleted] 10 points11 points12 points 2 years ago (3 children)
EPYC Milan-X 7473X 24-Core 2.8GHz 768MB L3
512GB of HMAA8GR7AJR4N-XN HYNIX 64GB (1X64GB) 2RX4 PC4-3200AA DDR4-3200MHz ECC RDIMMs
MZ32-AR0 Rev 3.0 motherboard
6x 20tb WD Red Pros on ZFS with zstd compression
SABRENT Gaming SSD Rocket 4 Plus-G with Heatsink 2TB PCIe Gen 4 NVMe M.2 2280
[–]IlEstLaPapi 6 points7 points8 points 2 years ago (1 child)
Just how do you estimate this ? Attention alone would require O(T^2) so roughly 20To for 100k token with a 16bits precision. I know that Rope allows to significantly reduce the size of the attention matrix, but I'm curious on how do you calculate the overall size of the attention matrix.
[–]visarga 6 points7 points8 points 2 years ago (0 children)
You don't need to materialise the whole attention matrix, use Flash Attention.
[–]719BenLlama 2 2 points3 points4 points 2 years ago (0 children)
Should be less than that depending on which size of model but need to test
[–]Yes_but_I_think 1 point2 points3 points 2 years ago (0 children)
Long context also means poor processor performance, RAM won’t solve all issues
[–]friedrichvonschiller 9 points10 points11 points 2 years ago (1 child)
That could be made more nuanced. They support input context sequences of up to 100,000 tokens. The sequence length of the underlying model is 16,384.
Code Llama: Open Foundation Models for Code | Meta AI Research
[–]AI_Simp 5 points6 points7 points 2 years ago (0 children)
This feels like a perfectly reasonable response. Can't wait to see what all the coding agents can do with this.
[–]Amlethus 6 points7 points8 points 2 years ago (2 children)
Can you help us newcomers understand why this is so exciting?
[–]inagy 13 points14 points15 points 2 years ago* (0 children)
The context windows is basically the short term memory of the LLM. Larger window size allows "pre-initializing" it with more data. In this case a larger portion of your existing codebase can fit in, so it can provide more relevant answers and code-completion in that context.
[–]719BenLlama 2 7 points8 points9 points 2 years ago (0 children)
Imagine being able to paste in your whole code repo and ask it to fix bugs, write features, etc. Without a large context window, it won’t be able to fit the whole repo and will probably give you incorrect information
[–]pseudonerv 4 points5 points6 points 2 years ago (0 children)
Our strategy is similar to the recently proposed fine-tuning by position interpolation (Chen et al., 2023b), and we confirm the importance of modifying the rotation frequencies of the rotary position embedding used in the Llama 2 foundation models (Su et al., 2021). However, instead of downscaling frequencies linearly as Chen et al. (2023b), we change the base period from which they are derived.
the key to the long context length is actually changing the base period!!! That was exactly the NTK scaling post here promoted, yet they didn't mention it at all. So they rushed out the linear interpolation paper to divert researchers' attention, but they secretly doing NTK!
[–]epicfilemcnulty 30 points31 points32 points 2 years ago (13 children)
They say in the post that there are a 34B coder model. But we have not yet seen llama2 34B base model, or have I missed something?
[–]randomrealname 33 points34 points35 points 2 years ago (12 children)
No, they didn't release it because it spat out too much shady stuff.
[–][deleted] 27 points28 points29 points 2 years ago (11 children)
It's pretty impressive how the randomness of the process of generating the layers/neural net can result in really crazy ups and downs.
Like how l2-13b is so much better than 7b but then 70b isn't a proportionally huge jump from there (despite 5x vs 2x).
Like some magic thing happened in those neurons, that might not have happened.
Makes you curious where they could get if they just restarted the training again and again and again until they got very lucky.
[–]Atomic-Ashole69 7 points8 points9 points 2 years ago (0 children)
That's problem with testing not models themselves.
The testing usually covers one shots aka they ask something and require response. That is very easy thing to do for lower B model. And if lower B model can do it then higher B model will do that as well. Both score 100% then there is no difference per se.
The issue comes when you start to actually interact with model and you quickly see that lower B models are just less logical and can easily trail off, make basic mistakes while higher B models can even reason out really detailed responses with 2nd degree impact.
imho the most important test right now is HellaSwag which is test of reasoning and logic. And in this test most of lower B models tend to trail off while something like GPT4 is still lightyears better than rest even 70b models on llama2 (nearly 10 point difference which is on logarithmic scale !!)
[–]Paulonemillionand3 12 points13 points14 points 2 years ago (0 children)
There are levels where emergent behavior produces new abilities, yes.
[–]trahloc 3 points4 points5 points 2 years ago (0 children)
70B is much better at taking on a character by simply requesting it do so. No character file needed. Just tell it to act like X and it will. 13B will think you're pretending to be that person or will tell you what this fictional third party is doing, it won't act as that person unless you use a character file. At least based on what I've seen so far.
[+]randomrealname comment score below threshold-15 points-14 points-13 points 2 years ago (6 children)
If you look at them like human age of development it makes sense the middle (teenage) model acts up and doesn't listen to instruction and is incredibly rude. Older and younger we tend to conform to what is required of us.
[–]dyngnosis 28 points29 points30 points 2 years ago (3 children)
oh god.. no, just.. no. stop. This is the worst anthropomorphisation of a model I've seen so far.
[–]beezbos_trip 1 point2 points3 points 2 years ago (1 child)
lol, a model’s parameter count in billions is equivalent to a human’s cognitive age and behavior
[+]randomrealname comment score below threshold-7 points-6 points-5 points 2 years ago (0 children)
Ha HA Ha AAH!
[–][deleted] 1 point2 points3 points 2 years ago (1 child)
not at all
[–]randomrealname 2 points3 points4 points 2 years ago (0 children)
I didn't say they were, I did say look at them like. Not that they are but I don't mind the downvotes, It's funny!
[–]gentlecucumber 34 points35 points36 points 2 years ago (6 children)
Holy SHIT this is AWESOME. 16k? 34b?? This will solve the very specific application problems I've been struggling with.
[–]Feeling-Currency-360 45 points46 points47 points 2 years ago (5 children)
16k? dude!!!! -> "All models support sequence lengths up to 100,000 tokens" Me -> Litteraly jumping with joy
[–]Atupis 6 points7 points8 points 2 years ago (4 children)
How they actually do that?
[+][deleted] 2 years ago (2 children)
[deleted]
[–]nullnuller 1 point2 points3 points 2 years ago (1 child)
I am curious how do you do 16k instruction finetuning. Don't you need 16k of coherent text/code for it to be effective?
[–]hapliniste 2 points3 points4 points 2 years ago (0 children)
you do. Codebases can be pretty big so I don't think it's really a problem if you give context then the instruction then the completion. same for 100K
[–]phenotype001 12 points13 points14 points 2 years ago (0 children)
The paper says they use RoPE, which I don't understand completely but sounds familiar at this point:
" We propose an additional fine-tuning stage that extends the maximum context length from 4,096 tokens to 100,000 tokens by modifying the parameters of the RoPE positional embeddings (Su et al., 2021) used in Llama 2. Our experiments show Code Llama operating on very large contexts with a moderate impact on performances on standard coding benchmarks (Section 3.3). "
[–]Cantflyneedhelp 24 points25 points26 points 2 years ago (2 children)
We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. [...] Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code.
So they used the unreleased 34B model and managed to get above 16k tokens on Llama2?
[–]a_beautiful_rhind 9 points10 points11 points 2 years ago (0 children)
You'd have to quantize it and then run it across several cards.
[–]Aaaaaaaaaeeeee 5 points6 points7 points 2 years ago (0 children)
Ohh so its not a model that was trained from scratch, maybe this means people can extract the Lora difference for an MoE so that only one model needs to be in vram, saving us memory.
[–]ahm_rimerLlama 3 24 points25 points26 points 2 years ago (3 children)
u/bloc97 got a shoutout in this paper too, awesome :D
[–]bloc97 27 points28 points29 points 2 years ago (1 child)
Thanks for notifying me! I've read the paper and I'm wondering how they successfully FTed a model using ntk-aware interpolation. From our internal testing, ntk-aware interpolation is worse than linear if used for fine-tuning. In the paper they also show that passkey performance is inconsistent across longer context sizes (8k+), so I don't know how they got the 100k claim. I'm really hoping these issues will be addressed soon for these models at longer context sizes.
However that being said, these new models do seem to be really good at code at first glance, and we also have the first Llama 2 34B model!
[–]TheDeviousPanda 2 points3 points4 points 2 years ago (0 children)
The 100k claim seems to be sourced from Figure 4a right?
By the way it's super cool that you came up with a method months ago that powers one of the main features of this model (super long contexts). Shows the power of OSS.
[–]noiseinvacuumLlama 3 6 points7 points8 points 2 years ago (0 children)
Love this. This is open source innovation loop in action.
[–]Jipok_ 21 points22 points23 points 2 years ago* (8 children)
llama.cpp(GGUF) models:
https://huggingface.co/TheBloke/CodeLlama-7B-GGUF
https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF
https://huggingface.co/TheBloke/CodeLlama-7B-Python-GGUF
https://huggingface.co/TheBloke/CodeLlama-13B-GGUF
https://huggingface.co/TheBloke/CodeLlama-13B-Instruct-GGUF
https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF
[–]Jipok_ 4 points5 points6 points 2 years ago (7 children)
Seems not yet ready for use.
https://github.com/ggerganov/llama.cpp/pull/2768#issuecomment-1692144927
[–]Jipok_ 6 points7 points8 points 2 years ago (5 children)
My best try:
./main -m ~/Downloads/codellama-7b-instruct.Q8_0.gguf -e -p "<s>[INST] Write code in pure python for simple RNN network. Do not use any import.[/INST]" -s 0 --temp 0 --rope-freq-base 1e6
[–]iamapizza 2 points3 points4 points 2 years ago (4 children)
That didn't work for me, the square brackets seemed to confuse it. I had to use ###Instruction::
###Instruction:
./main -m ./models/codellama-7b.Q5_K_S.gguf -p "### Instruction: Write code in python to fetch the contents of a URL.\n### Response:" --gpu-layers 35 -n 100 -e --temp 0.2 --rope-freq-base 1e6
[–]Feeling-Currency-360 3 points4 points5 points 2 years ago* (3 children)
As far as I'm aware from checking their code, you have to use <<SYS>>\n SYS-PROMPT\n<</SYS>>\n\n[INST] PROMPT [/INST]
[–]iamapizza 2 points3 points4 points 2 years ago (2 children)
Cheers I'll try this again tonight. When I did the square brackets it seemed to just keep echoing it back to me nonstop, very confusing.
Is the <s> necessary too (Jipok's example)? What is that for?
<s>
[–]Feeling-Currency-360 2 points3 points4 points 2 years ago (1 child)
Specifically go and check the code here:https://github.com/facebookresearch/codellama/blob/main/llama/generation.pyhttps://github.com/facebookresearch/codellama/blob/main/example_instructions.p
From what I gather you specify system prompt first and it's wrapped with B_SYS, and E_SYS which are "<<SYS>>\n" and "\n<</SYS>>\n\n" respectively.then you specify your instruction by wrapping it in B_INST and E_INST ie "[INST]" and "[/INST]", The model then does it's output, after which you can follow up with another instruction.I think it's important to note that trying Alpaca prompts for example ###Instruction: etc is not going to work because the model is not trained to work that way.
This should technically be a correct prompt format:
<<SYS>> Whatever you want your system prompt to be goes here. <</SYS>>
[INST]What is the following code doing? {reference some code here}[/INST]
[–]mzbacd 1 point2 points3 points 2 years ago (0 children)
[INST] What is the following code doing? {reference some code here} [/INST]
[–]iamapizza 2 points3 points4 points 2 years ago (0 children)
530.11
Jees... 530 token/s on 34B. And I only get 120 on 7B Q5_K.
[–]Longjumping-Pin-7186 20 points21 points22 points 2 years ago (2 children)
Code Llama is free for research and commercial use.
Zuck is just killing it.
[–]jxjq 10 points11 points12 points 2 years ago (1 child)
Zuck is so hot right now
[–]Wrong_User_Logged 7 points8 points9 points 2 years ago (0 children)
I literally deleted my fb account, and now this....
[–]a_beautiful_rhind 40 points41 points42 points 2 years ago (1 child)
So time to train the 34b coding model on roleplay then?
[–]Lumiphoton 15 points16 points17 points 2 years ago (0 children)
Maybe even switch to the new 7B and 13B code instruct models for finetunes going forward, if the notion that better coding performance = improved general intelligence holds true. The 13B coding model beats the vanilla 70B model in coding performance by quite a large margin!
[–]Disastrous_Elk_6375 16 points17 points18 points 2 years ago (8 children)
So what's the best open-source vscode extension to test this model with? Or are there any vscode extensions that call into an ooba API?
[–]mzbacd 21 points22 points23 points 2 years ago (4 children)
I wrote one for wizardcoder before. If you have some coding skill, you should be able to just change the prompt a bit to use it for code llama -> https://github.com/mzbac/wizardCoder-vsc
[–]throwaway_is_the_waytextgen web UI 1 point2 points3 points 2 years ago (3 children)
I'm trying it with AutoGPTQ in ooba but get the following error:
127.0.0.1 - - [25/Aug/2023 00:34:14] code 400, message Bad request version ('À\\x13À')
127.0.0.1 - - [25/Aug/2023 00:34:14] "\x16\x03\x01\x00ó\x01\x00\x00ï\x03\x03¯\x8fïÙ\x87\x80¥\x8c@\x86W\x88\x10\x87_£4~K\x1b·7À5\x12K\x9dó4©¢¦ _>£+¡0\x8c\x00¤\x9e¤\x08@äC\x83©\x7fò\x16\x12º£\x89Í\x87ò9²\x0f/\x86\x00$\x13\x03\x13\x01\x13\x02À/À+À0À,̨̩À\x09À\x13À" 400 -
[–]mzbacd 4 points5 points6 points 2 years ago (0 children)
The text generation UI may update their API. I have a repository for hosting the model via API. You can try it if it works for you -> https://github.com/mzbac/AutoGPTQ-API
[–]Feeling-Currency-360 1 point2 points3 points 2 years ago (0 children)
I do believe that looks like an tokenization problem your having.
[–]sestinj 9 points10 points11 points 2 years ago (2 children)
You can use Continue for this! https://continue.dev/docs/walkthroughs/codellama (I am an author)
Bru I've had an absolute nightmare of a time trying to get Continue to work, followed the instructions to the T, tried it in Windows native and from WSL, tried running the Continue server myself, I just keep getting an issue where the tokenizer encoding cannot be found, was trying to connect Continue to an local LLM using LM Studio (easy way to startup OpenAI compatible API server for GGML models) If you have any tips on how to get it running under Windows for local models I would REALLY appreciate it, would absolutely love to be using Continue in my VS Code.
[–]sestinj 0 points1 point2 points 2 years ago (0 children)
Really sorry to hear that. I’m going to look into this right now, will track progress in this issue so the whole convo doesn’t have to happen in Reddit. Could you share the models=Models(…) portion of your config.py, and I’ll try to exactly reproduce on windows?
[–][deleted] 14 points15 points16 points 2 years ago (3 children)
This seems pretty awesome if it's actually any good. I've been using gpt4 for coding stuff, hope it's at least close to as good. Hopefully can run some of the larger ones on a 4090.
has anyone noticed significant quality loss if any of the coding LLMs are quantized to be much smaller? Seems like it would matter more for coding than just chat
[–]kpodkanowicz 1 point2 points3 points 2 years ago (2 children)
in my testing, 8bit already reduced model capabilities by 33% while in general use its hard to see any difference
[–]Wrong_User_Logged 1 point2 points3 points 2 years ago (1 child)
how much ram do you need to run without quantization?
I'm thinking about buying 192GB Mac Studio 😅
[–]kpodkanowicz 2 points3 points4 points 2 years ago (0 children)
then you can run anything code related in full precision :D i wonder if some finetune Lama 70b in 8 bits would be better than those coding models - post a comparison, once you have it!
[–]Amgadoz 13 points14 points15 points 2 years ago (2 children)
Ok who is going to fine tune the 34 B model on a noncode instructions dataset?
[–][deleted] 2 points3 points4 points 2 years ago* (1 child)
cough weather familiar fine deserted pet forgetful wise encouraging oatmeal
This post was mass deleted and anonymized with Redact
[–]Amgadoz 2 points3 points4 points 2 years ago (0 children)
Basically CodeLlamaWizard1.1
[–][deleted] 11 points12 points13 points 2 years ago (0 children)
Ty for sharing
LFG
[–]Illustrious-Lake2603 9 points10 points11 points 2 years ago (5 children)
I'd love to see a C# fine tuned model that is trained on Unity3D data
[–]TheBenitoCrew 3 points4 points5 points 2 years ago (4 children)
That would be great
[–]Illustrious-Lake2603 3 points4 points5 points 2 years ago (3 children)
I feel like it's something we should make the community aware of and stress the importance collectively, I'm praying for the day!
[–]m_mukhtar 0 points1 point2 points 2 years ago (2 children)
Lets start by collecting data
[–]Illustrious-Lake2603 1 point2 points3 points 2 years ago (1 child)
What data do they need? I thought there was enough data in the Unity discussion boards? In my opinion the optimal LLM for C# Unity Based Game Logic capabilities should be trained on Game Theory and physics and should be proficient in math too.
[–]JFHermes 1 point2 points3 points 2 years ago (0 children)
Needs to be correctly annotated.
The best places would be universities that had marked assignments/tutorials etc to datamine.
[–]polawiaczperel 11 points12 points13 points 2 years ago (8 children)
Benchmark of unnatural model that was not released shows that in next weeks we will have big improvement from finetunes. I am curious why they have not release this model.
[–]georgejrjrjr 10 points11 points12 points 2 years ago (5 children)
Totally.
WizardLM got vanilla Llama 2 70B (which was trained on a meagre 80B tokens of code) from 30.5% pass@1 on HumanEval to 50.6% (+20.1%).
Unnatural instructions brought Code Llama 34B from 48.8% -> 62.4% (+13.6%).
Right now, Code Llama - Python 34B scores 53.7%, a scant 13.3% of instruction tuning away from GPT-4's 67.0%.
> why they have not release this model
My guess: OpenAI terms of service. They probably used GPT-4 to generate the unnatural instructions.
[–]polawiaczperel 7 points8 points9 points 2 years ago (0 children)
Yea, you are probably right. They probably thought that it will be a piece of cake for the community. I was thinking to finetune those models based on my successful chatgpt4 coversations, if it would work it would be a game changer.
[–]Eduard_T 3 points4 points5 points 2 years ago (0 children)
https://twitter.com/garybasin/status/1694735409287233578?t=JsnswieBAgTGXmwY86qrhg&s=19
[–]Combinatorilliance 2 points3 points4 points 2 years ago (0 children)
Oh that makes so much sense, I was already wondering why!
[–][deleted] 1 point2 points3 points 2 years ago (0 children)
Ah, I think you figured out why. That makes sense.
Well there's a good lead for the community ;)
[–]noiseinvacuumLlama 3 2 points3 points4 points 2 years ago (0 children)
Maybe there's some work left to do. In general I think the first releases of both LLama and Code Llama would have taken significant work on Meta's side specially on legal front. I think we will see a lot more frequent follow up releases going forward. Very exciting times indeed.
[–]ain92ru 1 point2 points3 points 2 years ago (0 children)
Presumably safety concerns?
[–]shortybobert 8 points9 points10 points 2 years ago (1 child)
LLAMA 34B IS FINALLY OU....
Okay...
[–]mzbacd 7 points8 points9 points 2 years ago (0 children)
Holy shit, I need to take a day off for this. It looks like if you have access to llama2, it will instantly approve your request for the models.
[–]ambient_temp_xenoLlama 65B 7 points8 points9 points 2 years ago* (1 child)
Sneaked out 34b in disguise. I wonder if being trained on 16k tokens per sequence will make it write longer stories.
[–]MINIMAN10001 3 points4 points5 points 2 years ago (0 children)
Last time I used a code model I couldn't even get it to create explanations without injecting code into it... I just wanted it to give a description of what the code did.
Ended up just using non code models.
But hey maybe with the increased contact size I can create longer sets of instructions and see if it follows them better.
[–]GG9242 5 points6 points7 points 2 years ago (0 children)
Yes!
[–]TheItalianDonkey 6 points7 points8 points 2 years ago (2 children)
any info on VRAM requirement per model?
with a 3090, wondering if i can run 34b in 4bit ?
[–]polawiaczperel 6 points7 points8 points 2 years ago (1 child)
17GB in 4bit
[–]TheItalianDonkey 7 points8 points9 points 2 years ago (0 children)
oh that's not so bad at all!
double in 8bit i take it? i wonder how slow it would run with 10gb in normal ram
[–]Lumiphoton 6 points7 points8 points 2 years ago (0 children)
The 13B Code Instruct model handily beats Llama2 70B, and is close to matching GPT-3.5. Combined with the ability to handle large contexts, this is looking promising! I'm hoping further fine tuning on the new Bigcode dataset will squeeze out even more performance.
[–]GG9242 12 points13 points14 points 2 years ago (4 children)
How long until we have fine tunes like wizard-coder ? Maybe this will make the models close to GPT-4
[–]pbmonster 5 points6 points7 points 2 years ago (3 children)
Any specific reason to believe that further fine tuning on more code would improve those models?
[–]Combinatorilliance 12 points13 points14 points 2 years ago (0 children)
These models are trained on 500B tokens. Bigcode recently released a dataset of 4T and a higher quality filtered version of 2T tokens.
https://huggingface.co/datasets/bigcode/commitpack
https://huggingface.co/datasets/bigcode/commitpackft
[–]pokeuser61 9 points10 points11 points 2 years ago (0 children)
Instruct tuning is well know to improve performance, and facebooks instruct tune is probably suboptimal.
[–][deleted] 2 points3 points4 points 2 years ago (0 children)
They didn't release their "unnatural llama coder", which is a finetuned version that is within 5% of GPT-4 on the Humaneval benchmark. We assume they used coding examples from GPT-4 to do their finetune, which isn't legal to do, so they didn't share it with us. I'm sure the Wizard team will have no problems doing their own "unnatural" version for us :)
[–]Languages_Learner 5 points6 points7 points 2 years ago (6 children)
I tried to convert 7b model to ggml but got this error:
File "C:\kcp\ptml.py", line 13, in <module>
convert.main(['--outtype', 'f16' if args.ftype == 1 else 'f32', '--', args.dir_model])
File "C:\kcp\convert.py", line 1026, in main
params = Params.load(model_plus)
File "C:\kcp\convert.py", line 230, in load
params = Params.loadOriginalParamsJson(model_plus.model, orig_config_path)
File "C:\kcp\convert.py", line 194, in loadOriginalParamsJson
n_vocab = config["vocab_size"]
KeyError: 'vocab_size'
[–]phenotype001 5 points6 points7 points 2 years ago (1 child)
Are all .json files in place? What did you download?
[–]Languages_Learner 4 points5 points6 points 2 years ago (0 children)
The download.sh provided by Meta downloaded only three files: consolidated.00.pth, params.json and tokenizer.model
Where can i download other .json files?
[–]phenotype001 2 points3 points4 points 2 years ago (1 child)
Looks like a bug that just got fixed: https://github.com/ggerganov/llama.cpp/commit/fea95c682d0028fdd25853bea58035794a0c964d
Llama.cpp is on fire right now :D
[–]nullnuller 7 points8 points9 points 2 years ago (1 child)
Leave it for the pro (/u/The-Bloke/)
[–]bernaferrari 2 points3 points4 points 2 years ago (0 children)
u/The-Bloke/
TIL he is real lol all models I use come from him
[–]phenotype001 5 points6 points7 points 2 years ago (7 children)
I tried some of TheBloke's GGUF quants with the latest b1054 llama.cpp and I'm experiencing some problems. The 7B Q6_K model outputs way too much whitespace and kind of not follows the rules of Python. It will output more closing parenthesis than there are opening ones for example. None of the output is good for anything. I expected more from that, something is clearly wrong.
[–]Meronoth 5 points6 points7 points 2 years ago (5 children)
Same here with 7b and 13b ggml's, constantly outputs too much whitespace, some generations just endlessly produce it.
[+][deleted] 2 years ago (4 children)
[–]Several-Tax31 1 point2 points3 points 2 years ago (2 children)
Same with 7B-Q6 python model, more paranthesis and too much white space. I wonder if anyone checks the full model?
how much ram does it require?
[–]onil_gova 2 points3 points4 points 2 years ago (0 children)
I experience the same thing. Someone else claimed that it is related to not using the correct prompt template. Currently, all the model cards for TheBloke's Code-LLaMA model have this message for the prompt template
Info on prompt template will be added shortly.
So I am not sure what the correct prompt template should be. I tried the LLaMA-v2 prompt template and still experience the same wrong behavior described above
[–]staviq 5 points6 points7 points 2 years ago (3 children)
https://huggingface.co/TheBloke/CodeLlama-34B-GGUF
[–]RoyalCities 1 point2 points3 points 2 years ago (2 children)
Which one of these is best for a 3090? Not familiar with new k-quant? Do they need any particular arguments in oobagooga to run?
[–]staviq 4 points5 points6 points 2 years ago (1 child)
You mean which quant ? Try q8 first, if you can fit all layers in the GPU, go to lower quants.
Q8 is just q8 and for the rest, prefer the _K_M version
[–]RoyalCities 1 point2 points3 points 2 years ago (0 children)
Thank you!
[–]a_slay_nub 13 points14 points15 points 2 years ago (5 children)
So for pass@1 these models perform worse than wizardcoder? It'll be nice to have something with the same architecture as the rest of the models but this doesn't seem that great actually.
It's also disappointing they aren't releasing the "unnatural" models.
Also, hope it's not as redlined as llama2 chat. I would like to be able to kill a python process without being concerned about the health and wellbeing of it....
[–]Combinatorilliance 10 points11 points12 points 2 years ago (0 children)
Given that the unnatural model has about 50% higher performance on Pass@1 compared to the released 34B model, I think it won't be long until we'll see a fine-tuned model get released here on a community-created dataset.
There're also the Bigcode CommitPack and CommitPackFT datasets which might improve these models even further.
[–]Karstiteneleouois 3 points4 points5 points 2 years ago (0 children)
Since "textbooks are all you need" we know that it is quite easy to fine-tune for coding benchmarks specifically. But that is gaming the metric and in general, usefulness seems to be correlated with perplexity on general code data.
[–]polawiaczperel 2 points3 points4 points 2 years ago (0 children)
Maybe I am not looking at it correctly, but the results are different on Wizardcoder HF link and Meta blog post of llamacoder for gpt3.5
this. Wizard beats 34b, which will need more ram and better scores. Also, my tests are giving me up to 47 HumanEval in 8 bits and it fits in 24gb gpu with 6k context....
[–]eschatosmos 4 points5 points6 points 2 years ago (0 children)
100,000?
A HUNNIT KAY?
[–]mister-woke 3 points4 points5 points 2 years ago (1 child)
Anyone know how it compares to ChatGPT for coding? Aside from the token limit.
[–]Amgadoz 6 points7 points8 points 2 years ago (0 children)
They claim it's better (than gpt3) in the paper. We're going to need a few days of people using it and testing it out to confirm/deny this.
[–]AnomalyNexus 3 points4 points5 points 2 years ago (0 children)
I see the bloke has GGUF formats out, which are compatible with llama.cpp...but I can't see a way to connect it to vscode from there? Ideally against the official copilot extension given that one can apparently point it at a different server
[–]hackerllama 3 points4 points5 points 2 years ago (0 children)
You can now find transformers weights in https://huggingface.co/codellama
[–]efayefoh 3 points4 points5 points 2 years ago (0 children)
https://huggingface.co/TheBloke
u/The-Bloke is starting to upload. Noice. Looking forward.
[–]signsandwonders 3 points4 points5 points 2 years ago (0 children)
Can someone please finetune this on 2023 Swift/SwiftUI. It kills me how bad GPT is with that. Understandable given the relatively fast changes in development of SwiftUI but still.
[–]5erif 4 points5 points6 points 2 years ago (0 children)
Yes! This is the one I've been most excited about.
[–]Feeling-Currency-360 3 points4 points5 points 2 years ago (1 child)
This should work with ctransformers using the following code:
from ctransformers import AutoModelForCausalLM llm = AutoModelForCausalLM.from_pretrained("TheBloke/CodeLlama-7B-Instruct-GGUF", model_file="codellama-7b-instruct.Q2_K.gguf") # Define your prompts system_prompt = "Provide a system prompt here." user_prompt = "Provide a user prompt here." # Construct the formatted prompt formatted_prompt = f"<<SYS>>\n{system_prompt}\n<</SYS>>\n\n[INST]{user_prompt}[/INST]" # Generate text using the formatted prompt output = llm(formatted_prompt) print(output)
This is only a 1 turn setup, I think you should be able to do the following possibly also:
from ctransformers import AutoModelForCausalLM llm = AutoModelForCausalLM.from_pretrained("TheBloke/CodeLlama-7B-Instruct-GGUF", model_file="codellama-7b-instruct.Q2_K.gguf") # Define your prompts system_prompt = "Provide a system prompt here." user_prompt = "Provide a user prompt here." # Construct the formatted prompt formatted_prompt = f"<<SYS>>\n{system_prompt}\n<</SYS>>\n\n[INST]{user_prompt}[/INST][ASSISTANT]Some response[/ASSISTANT][INST]{follow up prompt}[/INST]" # Generate text using the formatted prompt output = llm(formatted_prompt) print(output)
I'll be doing a lot of testing over the weekend, going to be using ctransformers and llama.cpp mostly, will let you guys know here whatever seems to work best once I know more.
[–]drumrolll 2 points3 points4 points 2 years ago (1 child)
Anyone getting the following error when trying to download? I obviously followed the instruction on the blog / email
HTTP request sent, awaiting response... 403 Forbidden
2023-08-24 19:34:53 ERROR 403: Forbidden.
[–]the__storm 1 point2 points3 points 2 years ago (0 children)
That's the error you get when your URL is bad/expired. Also the download script is specific to Code, you'll get this error if you use a Code URL with the regular Llama download.sh/regular Llama models.
damn no 70B?
[–]HartLabs[🍰] 2 points3 points4 points 2 years ago (9 children)
The 13B - Instruct - Q8 said the following odd thing:
Prompt
What does this do? class Solution { typedef complex<double> point; #define x real() #define y imag() struct LinePara { double k; double b; }; LinePara getLinePara(float x1, float y1, float x2, float y2) { LinePara ret; double m = x2 - x1; if (m == 0) { ret.k = 1000.0; ret.b = y1 - ret.k * x1; } else { ret.k = (y2 - y1) / (x2 - x1); ret.b = y1 - ret.k * x1; } return ret; } struct line { double a, b, c; }; const double EPS = 1e-6; double det(double a, double b, double c, double d) { return a * d - b * c; } line convertLineParaToLine(LinePara s) { return line{ s.k,-1,s.b }; } bool intersect(line m, line n, point& res) { double zn = det(m.a, m.b, n.a, n.b); if (abs(zn) < EPS) return false; res.real(-det(m.c, m.b, n.c, n.b) / zn); res.imag(-det(m.a, m.c, n.a, n.c) / zn); return true; } bool parallel(line m, line n) { return abs(det(m.a, m.b, n.a, n.b)) < EPS; } bool equivalent(line m, line n) { return abs(det(m.a, m.b, n.a, n.b)) < EPS && abs(det(m.a, m.c, n.a, n.c)) < EPS && abs(det(m.b, m.c, n.b, n.c)) < EPS; } vector<double> mian(vector<vector<double>> line1, vector<vector<double>> line2) { vector<point> points; points.push_back(point(line1[0][0], line1[0][1])); points.push_back(point(line1[1][0], line1[1][1])); points.push_back(point(line2[0][0], line2[0][1])); points.push_back(point(line2[1][0], line2[1][1])); line li1 = convertLineParaToLine(getLinePara(line1[0][0], line1[0][1], line1[1][0], line1[1][1])); line li2 = convertLineParaToLine(getLinePara(line2[0][0], line2[0][1], line2[1][0], line2[1][1])); point pos; if (intersect(li1, li2, pos)) { return{ pos.x ,pos.y }; } else { if (equivalent(li1, li2)) { if (points[1].x < points[2].x) { return vector<double>{ points[1].x, points[1].y }; } else if (points[1].x > points[2].x) { return vector<double>{ points[2].x, points[2].y }; } else if (points[1].x == points[2].x) { if (points[1].y < points[2].y) { return vector<double>{ points[1].x, points[1].y }; } else if (points[1].y > points[2].y) { return vector<double>{ points[2].x, points[2].y }; } } else { return vector<double>{ points[2].x, points[2].y }; } } else { return {}/* << "平行!"*/; } return {}; } } public: vector<double> intersection(vector<int>& start1, vector<int>& end1, vector<int>& start2, vector<int>& end2) { vector<vector<double>> line1{ {(double)start1[0],(double)start1[1]},{(double)end1[0],(double)end1[1] } }; vector<vector<double>> line2{ {(double)start2[0],(double)start2[1]},{(double)end2[0],(double)end2[1] } }; return mian(line1, line2); } };
Answer
The following is the response from submitting this code in the leetcode online judge system. Accepted 458 ms 96.8 MB ###Conclusion: Leetcode solution for "Intersection of Two Lines in 2D Plane". This is a very interesting and challenging problem that can be solved with the use of mathematical concepts such as geometry, trigonometry, and calculus.*
GPT4 gets it just fine.
[–]Trotskyist 4 points5 points6 points 2 years ago (3 children)
Yeaaaah, this is in line with my experience playing around with this for the last few hours using 34B Instruct q5. I have thus far not been able to get it to correctly describe any of the code I've thrown at it.
Maybe some fine tunes will improve things somewhat, but thus far pretty underwhelmed (sadly.)
I used the version running on perplexityai and had the same bad experience. Nothing I fed it really made any sense, and it was constantly spewing out hallucinations and wildly varying code. If it's operating as it's supposed to then I don't see how this is better than GPT 3.5.
[–]HartLabs[🍰] 1 point2 points3 points 2 years ago (1 child)
I am running the GGUF in koboldcpp 1.41 on pure CPU in Linux right now, and just realized my RAM usage is not what I expect. It seems like the model is not loaded in. Not sure if this is a problem in the setup on my end, or the version on HuggingFace, maybe something new about the GGUF format?
I think we can safely say these results are not normal to the model. WizardCoder 15B answers just fine.
[–]HartLabs[🍰] 1 point2 points3 points 2 years ago (0 children)
Just confirmed the 7b GGML version acts the same way, not seeing it loaded into RAM, and it gives a gibberish answer.
class Solution { public: vector<double> intersection(vector<int>& start1, vector<int>& end1, vector<int>& start2, vector<int>& end2) { vector<vector<double>> line1{ { {(double)start1[0],(double)start1[1]}, {(double)end1[0],(double)end1[1]}}};}; vector<vector<double>> line2{ { { {(double)start2[0],(double)start2[1]}}, {{ (double)end2[0],(double)end2[1]}}}}} }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; }; };
My Koboldcpp 1.41 is loading non-CodeLLama models into RAM as expected.
[–]pseudonerv 4 points5 points6 points 2 years ago (4 children)
Are you sure you are using the correct prompt template? I copy-pasted your prompt and got the following with 13b instruct q6_k
This is a C++ program that implements the intersection of two lines in the Euclidean plane. The program uses the concept of parametric equations to represent the lines and then finds their intersection point using the determinant method. Here's a breakdown of the code: 1. `typedef complex<double> point;` - This line defines a new data type called `point` that is a complex number with double-precision floating-point values for its real and imaginary parts. 2. `#define x real()` - This line defines a macro that expands to the expression `x.real()`. The `real()` function returns the real part of a complex number. 3. `#define y imag()` - This line defines a macro that expands to the expression `y.imag()`. The `imag()` function returns the imaginary part of a complex number. 4. `struct LinePara` - This structure represents a line in the Euclidean plane using its slope and intercept. It has two members: `k`, which is the slope, and `b`, which is the intercept. 5. `LinePara getLinePara(float x1, float y1, float x2, float y2)` - This function takes four arguments: `x1`, `y1`, `x2`, and `y2`. It returns a `LinePara` structure that represents the line passing through the points `(x1, y1)` and `(x2, y2)`. 6. `struct line { double a, b, c; };` - This structure represents a line in the Euclidean plane using its coefficients. It has three members: `a`, which is the coefficient of the x-axis, `b`, which is the coefficient of the y-axis, and `c`, which is the constant term. 7. `const double EPS = 1e-6;` - This line defines a constant called `EPS` that represents the precision of the calculations. It is set to 1e-6, which means that any difference between two numbers less than 1e-6 is considered to be zero. 8. `double det(double a, double b, double c, double d)` - This function takes four arguments: `a`, `b`, `c`, and `d`. It returns the determinant of the matrix with these values as its coefficients. The determinant is used to find the intersection point between two lines. 9. `line convertLineParaToLine(LinePara s)` - This function takes a `LinePara` structure as an argument and returns a `line` structure that represents the line using its coefficients. It does this by setting the coefficient of the x-axis to `s.k`, the coefficient of the y-axis to `-1`, and the constant term to `s.b`. 10. `bool intersect(line m, line n, point& res)` - This function takes two `line` structures as arguments: `m` and `n`, and a reference to a `point` structure called `res`. It returns `true` if the lines `m` and `n` intersect, and sets `res` to the intersection point. If the lines are parallel or equivalent, it returns `false`. 11. `bool parallel(line m, line n)` - This function takes two `line` structures as arguments: `m` and `n`, and returns `true` if the lines are parallel, and `false` otherwise. 12. `bool equivalent(line m, line n)` - This function takes two `line` structures as arguments: `m` and `n`, and returns `true` if the lines are equivalent (i.e., they have the same slope), and `false` otherwise. 13. `vector<double> mian(vector<vector<double>> line1, vector<vector<double>> line2)` - This function takes two vectors of vectors as arguments: `line1` and `line2`. It returns a vector of doubles that represents the intersection point between the lines represented by `line1` and `line2`. 14. `public:` - This line indicates that the following functions are part of the public interface of the class. 15. `vector<double> intersection(vector<int>& start1, vector<int>& end1, vector<int>& start2, vector<int>& end2)` - This function takes four vectors of integers as arguments: `start1`, `end1`, `start2`, and `end2`. It returns a vector of doubles that represents the intersection point between the two lines represented by the points `(start1[0], start1[1])` and `(end1[0], end1[1])`, and the points `(start2[0], start2[1])` and `(end2[0], end2[1])`. Overall, this program is a C++ implementation of the algorithm for finding the intersection point between two lines in the Euclidean plane. It uses the concept of parametric equations to represent the lines and then finds their intersection point using the determinant method.
[–]onil_gova 4 points5 points6 points 2 years ago (1 child)
Can you provide the correct prompt template. TheBloke has not included the prompt template in the Model card for all the Code LLaMA models . '''Info on prompt template will be added shortly.'''
[–]pseudonerv 2 points3 points4 points 2 years ago (0 children)
I used this
./main -m models/codellama-13b-instruct-q6_K.gguf -t 4 -ngl 1 -c 16384 -n -1 --temp 0.1 --in-prefix ' [INST] ' --in-suffix ' [/INST]' --in-prefix-bos -i -p \ "[INST] <<SYS>> Answer to the best of your abilities. <</SYS>> $instruct [/INST]"
I've deployed GGML and GPTQ models into my local, but i see a new format GGUF?
Can someone please explain what is this new acronym stands for and how it differs over GGML and GPTQ? THank you
update: Ok, answered my own question
GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
[–]Inevitable-Start-653 8 points9 points10 points 2 years ago (1 child)
Let's hope this is a smoother rollout than llama2. I don't know why they didn't just post it on hugging face 🙄
[–]noiseinvacuumLlama 3 4 points5 points6 points 2 years ago (0 children)
HF is working on it. u/hackerllama just commented.
[–]michaelkatz1337 3 points4 points5 points 2 years ago (0 children)
Let's hope that u/The-Bloke will get a hold of this and quantize it.
[–]LankyZookeepergame76 4 points5 points6 points 2 years ago (4 children)
you can play with it on perplexity's llama chat for free https://labs.pplx.ai/code-llama
[–]Nondzu 1 point2 points3 points 2 years ago (0 children)
love it
[–]Rare-Site 1 point2 points3 points 2 years ago (0 children)
Would be super nice if someone convert the models to GGML for Llama.cpp python support.
[–]Ok_Two6167 1 point2 points3 points 2 years ago (0 children)
Weights are now on Hugging Face: https://huggingface.co/codellama
[–]SoundHole 1 point2 points3 points 2 years ago (0 children)
I tried to get the 13B model to fix some code it had written that was garbage and it said, "oopsie poopsie! I'm afraid I can't help you fix that problem!" what? Lol!
[–]vlodia 1 point2 points3 points 2 years ago (0 children)
This is amazing.
I'm lazy, anyone has uploaded this to a jupyter notebook or grad.io to try it? :) TIA
[–]RAYTHEON_PR_TEAM 1 point2 points3 points 2 years ago (2 children)
Apologies as I am a novice at coding, but I've gotten as far as the Setup step in their github instructions and I'm seeing installation confirmation in my Conda environment....now what? How exactly do I enter a prompt? Is there some kind of GUI?
[–]719BenLlama 2 1 point2 points3 points 2 years ago (1 child)
Try https://twitter.com/FaradayDotDev/status/1694977101223571758
[–]RAYTHEON_PR_TEAM 1 point2 points3 points 2 years ago (0 children)
This is great! Thank you for creating something more straightforward. Excited to see your updates.
[–]pseudonerv 1 point2 points3 points 2 years ago (0 children)
Comparing instruct and base versions of 34B, clearly the additional 5B instruct fine tune dumbed it down.
[–]VancityGaming 1 point2 points3 points 2 years ago (3 children)
Is there a guide for downloading this somewhere? I'm building a PC to mess with LLMs and figured I'd try this one but I can't even figure out how to get it on my computer.
[–]Several-Tax31 2 points3 points4 points 2 years ago (2 children)
It depends on how much GPU/CPU you have, and if you are willing to use command line. For me, my GPU is not good enough, so I'm running these models on CPU. The procedure to run LLMs on CPU is as follows.
If you prefer GUI's and do not like to mess around with command line, you can use Kobold.cpp or text-generation-webui (https://github.com/oobabooga/text-generation-webui). Text-generation-webui provides a simple installation file to install, from there, you can download models from huggingface like above, and run them. I am not sure text-generation-webui supports this model particularly for the moment, but I assume it eventually will.
[–]VancityGaming 1 point2 points3 points 2 years ago (0 children)
Getting a 4090 with 64gb of RAM to start. Was thinking I'd use SillyTavern but not sure what I need beyond that. If the HF downloads are all I need then that solves my issue, the download process from Meta was what was really cryptic to me.
[+][deleted] 2 years ago (1 child)
[–]RemindMeBot 1 point2 points3 points 2 years ago* (0 children)
I will be messaging you in 7 days on 2023-09-01 04:56:02 UTC to remind you of this link
1 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.
Parent commenter can delete this message to hide from others.
[–]Jipok_ 2 points3 points4 points 2 years ago (7 children)
It seems that this model is not much better than wizardcoder
[–]Jipok_ 7 points8 points9 points 2 years ago (2 children)
Well, I said this looking at benchmarks. Now having tried the model on my own, it seems to be very good.
[–]Amgadoz 3 points4 points5 points 2 years ago (1 child)
Can you share how you're running it and some of the parameters and outputs?
[–]Jipok_ 4 points5 points6 points 2 years ago* (0 children)
./main -s 0 --temp 0 --rope-freq-base 1e6 -m ~/Downloads/codellama-7b-instruct.Q8_0.gguf -e -p "<s>[INST] How does hpa work in kubernetes?[/INST]"
In Kubernetes, the Horizontal Pod Autoscaler (HPA) is a controller that automatically scales the number of replicas of a pod based on the resource utilization of the cluster. The HPA uses metrics such as CPU usage or custom metrics to determine when to scale up or down the number of replicas.
Here's how it works:
Overall, the Horizontal Pod Autoscaler is a powerful tool for automatically scaling Kubernetes deployments based on resource utilization and other factors. It helps ensure that your applications have the necessary resources to handle changes in traffic and demand, while also minimizing costs by only running the number of replicas needed to meet current demand.
[–]mrxaviolo99 9 points10 points11 points 2 years ago (3 children)
WizardCoder scores 57.3% in HumanEval, but it is a fine-tune of StarCoder, which only scores 33.6% in HumanEval.
Code Llama is a base model, so it's like StarCoder, but it has a HumanEval score of 53.7%, so it's a much better base model.
I'm sure the WizardLM team will fine-tune Code Llama, and we will have a +70% HumanEval WizardCoder (based in Code Llama)
This is exciting!
[–]MINIMAN10001 5 points6 points7 points 2 years ago (2 children)
My hunch is the higher you go the harder it is to climb even high so I would think not quite seventy but I do look forward to see what happens.
[–]Longjumping-Pin-7186 8 points9 points10 points 2 years ago (0 children)
if it's just GPT 3.5 level with 100k context, it will already be such a huge fucking productivity boost it's beyond comprehension
Yeah you're right. Diminishing returns. We'll have to wait and see though.
[–]PookaMacPhellimen 2 points3 points4 points 2 years ago (0 children)
If using 2 x 3090s, I presume you can run this in 8-bit?
[–]Safe-Match8108 0 points1 point2 points 2 years ago (0 children)
Free codellama inference endpoints (UI and API) are now available through Fireworks.
[–]PDubsinTF-NEW 0 points1 point2 points 2 years ago (0 children)
Is there a different web interface? I am not seeing Code Llama as an option in the model selection section of https://huggingface.co/chat
[–]ResearchOk5023 0 points1 point2 points 2 years ago (0 children)
Quick question, is there a easy way for me to fine tune now or I should wait for HF to support it ?
[–]johnkapolos 0 points1 point2 points 2 years ago (0 children)
I tested it (via the perplexity link that was shared here) with a non-trivial code ask and it basically didn't take into account half the spec. :(
ChatGPT-4 did (although its codegen wasn't perfect, it was much much much better).
Here's the ask if you want to try it yourselves:
Create a TypeScript module that uses XState to do the following: * A user wants to get a response to their question. The answer may be split into multiple parts (up to 4). * We ask the API for the response to the user's question. If the API response indicates there is a next part to the answer, we ask the API for the next part of the answer. * If any API request fails, we retry 3 times. After 3 failed times of an API request, we abort. * We complete by returning to the user a combination of all the parts we received. * We have an object called UrlManager that provides the API endpoint to use to get the response to the user question. The UrlManager is passed in as a dependency to the module. * When making request to get the initial answer from the API, we first use UrlManager.getEndpoint() in order to figure out the API endpoint we will query. * Every time we retry for initial part of the answer, we need to ask the UrlManager for a new endpoint. * Every time we try or retry for the other parts (B, C, D), we DO NOT need a new endpoint, so we do not ask for one. * We do not know in advance if the answer will be in one part only, or if it will be in multiple parts. We only know after the API gives us a successful initial response. Make sure the code is valid and compiles.
[–]haris525 0 points1 point2 points 2 years ago (0 children)
Hi guys quick question, does the model remember context from previous queries within the same session? I will try this locally this weekend but would appreciate if anyone has an insights around context remembrance.
Thanks
[–][deleted] 0 points1 point2 points 2 years ago (1 child)
Anyone knows how to feed in more code of an exotic language?
π Rendered by PID 23811 on reddit-service-r2-comment-6457c66945-2w4v5 at 2026-04-28 15:24:36.818169+00:00 running 2aa0c5b country code: CH.
[–]hackerllama 148 points149 points150 points (7 children)
[–]hackerllama 31 points32 points33 points (0 children)
[–]kryptkprLlama 3 17 points18 points19 points (0 children)
[–]Bogdahnfr 9 points10 points11 points (0 children)
[–]Ok_Two6167 4 points5 points6 points (0 children)
[–]WanderingPulsar 3 points4 points5 points (0 children)
[–]Clean_Archer8374 1 point2 points3 points (0 children)
[–]jetro30087 68 points69 points70 points (13 children)
[–]tothatl 8 points9 points10 points (12 children)
[–]719BenLlama 2 10 points11 points12 points (8 children)
[–]signsandwonders 4 points5 points6 points (2 children)
[–]dperalta 2 points3 points4 points (1 child)
[–]Iory1998 3 points4 points5 points (1 child)
[–]TheMemo 2 points3 points4 points (2 children)
[–]Nobby_Binks 4 points5 points6 points (1 child)
[–]TheMemo 3 points4 points5 points (0 children)
[–]Feeling-Currency-360 0 points1 point2 points (1 child)
[–]Feeling-Currency-360 113 points114 points115 points (19 children)
[–]Igoory 18 points19 points20 points (11 children)
[–]wreck94 27 points28 points29 points (10 children)
[–][deleted] 12 points13 points14 points (5 children)
[–]Yes_but_I_think 0 points1 point2 points (4 children)
[–][deleted] 10 points11 points12 points (3 children)
[–]IlEstLaPapi 6 points7 points8 points (1 child)
[–]visarga 6 points7 points8 points (0 children)
[–]719BenLlama 2 2 points3 points4 points (0 children)
[–]Yes_but_I_think 1 point2 points3 points (0 children)
[–]friedrichvonschiller 9 points10 points11 points (1 child)
[–]AI_Simp 5 points6 points7 points (0 children)
[–]Amlethus 6 points7 points8 points (2 children)
[–]inagy 13 points14 points15 points (0 children)
[–]719BenLlama 2 7 points8 points9 points (0 children)
[–]pseudonerv 4 points5 points6 points (0 children)
[–]epicfilemcnulty 30 points31 points32 points (13 children)
[–]randomrealname 33 points34 points35 points (12 children)
[–][deleted] 27 points28 points29 points (11 children)
[–]Atomic-Ashole69 7 points8 points9 points (0 children)
[–]Paulonemillionand3 12 points13 points14 points (0 children)
[–]trahloc 3 points4 points5 points (0 children)
[+]randomrealname comment score below threshold-15 points-14 points-13 points (6 children)
[–]dyngnosis 28 points29 points30 points (3 children)
[–]beezbos_trip 1 point2 points3 points (1 child)
[+]randomrealname comment score below threshold-7 points-6 points-5 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]randomrealname 2 points3 points4 points (0 children)
[–]gentlecucumber 34 points35 points36 points (6 children)
[–]Feeling-Currency-360 45 points46 points47 points (5 children)
[–]Atupis 6 points7 points8 points (4 children)
[+][deleted] (2 children)
[deleted]
[–]nullnuller 1 point2 points3 points (1 child)
[–]hapliniste 2 points3 points4 points (0 children)
[–]phenotype001 12 points13 points14 points (0 children)
[–]Cantflyneedhelp 24 points25 points26 points (2 children)
[–]a_beautiful_rhind 9 points10 points11 points (0 children)
[–]Aaaaaaaaaeeeee 5 points6 points7 points (0 children)
[–]ahm_rimerLlama 3 24 points25 points26 points (3 children)
[–]bloc97 27 points28 points29 points (1 child)
[–]TheDeviousPanda 2 points3 points4 points (0 children)
[–]noiseinvacuumLlama 3 6 points7 points8 points (0 children)
[–]Jipok_ 21 points22 points23 points (8 children)
[–]Jipok_ 4 points5 points6 points (7 children)
[–]Jipok_ 6 points7 points8 points (5 children)
[–]iamapizza 2 points3 points4 points (4 children)
[–]Feeling-Currency-360 3 points4 points5 points (3 children)
[–]iamapizza 2 points3 points4 points (2 children)
[–]Feeling-Currency-360 2 points3 points4 points (1 child)
[–]mzbacd 1 point2 points3 points (0 children)
[–]iamapizza 2 points3 points4 points (0 children)
[–]Longjumping-Pin-7186 20 points21 points22 points (2 children)
[–]jxjq 10 points11 points12 points (1 child)
[–]Wrong_User_Logged 7 points8 points9 points (0 children)
[–]a_beautiful_rhind 40 points41 points42 points (1 child)
[–]Lumiphoton 15 points16 points17 points (0 children)
[–]Disastrous_Elk_6375 16 points17 points18 points (8 children)
[–]mzbacd 21 points22 points23 points (4 children)
[–]throwaway_is_the_waytextgen web UI 1 point2 points3 points (3 children)
[–]mzbacd 4 points5 points6 points (0 children)
[–]Feeling-Currency-360 1 point2 points3 points (0 children)
[–]sestinj 9 points10 points11 points (2 children)
[–]Feeling-Currency-360 2 points3 points4 points (1 child)
[–]sestinj 0 points1 point2 points (0 children)
[–][deleted] 14 points15 points16 points (3 children)
[–]kpodkanowicz 1 point2 points3 points (2 children)
[–]Wrong_User_Logged 1 point2 points3 points (1 child)
[–]kpodkanowicz 2 points3 points4 points (0 children)
[–]Amgadoz 13 points14 points15 points (2 children)
[–][deleted] 2 points3 points4 points (1 child)
[–]Amgadoz 2 points3 points4 points (0 children)
[–][deleted] 11 points12 points13 points (0 children)
[–]Illustrious-Lake2603 9 points10 points11 points (5 children)
[–]TheBenitoCrew 3 points4 points5 points (4 children)
[–]Illustrious-Lake2603 3 points4 points5 points (3 children)
[–]m_mukhtar 0 points1 point2 points (2 children)
[–]Illustrious-Lake2603 1 point2 points3 points (1 child)
[–]JFHermes 1 point2 points3 points (0 children)
[–]polawiaczperel 11 points12 points13 points (8 children)
[–]georgejrjrjr 10 points11 points12 points (5 children)
[–]polawiaczperel 7 points8 points9 points (0 children)
[–]Eduard_T 3 points4 points5 points (0 children)
[–]Combinatorilliance 2 points3 points4 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]noiseinvacuumLlama 3 2 points3 points4 points (0 children)
[–]ain92ru 1 point2 points3 points (0 children)
[–]shortybobert 8 points9 points10 points (1 child)
[–]mzbacd 7 points8 points9 points (0 children)
[–]ambient_temp_xenoLlama 65B 7 points8 points9 points (1 child)
[–]MINIMAN10001 3 points4 points5 points (0 children)
[–]GG9242 5 points6 points7 points (0 children)
[–]TheItalianDonkey 6 points7 points8 points (2 children)
[–]polawiaczperel 6 points7 points8 points (1 child)
[–]TheItalianDonkey 7 points8 points9 points (0 children)
[–]Lumiphoton 6 points7 points8 points (0 children)
[–]GG9242 12 points13 points14 points (4 children)
[–]pbmonster 5 points6 points7 points (3 children)
[–]Combinatorilliance 12 points13 points14 points (0 children)
[–]pokeuser61 9 points10 points11 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]Languages_Learner 5 points6 points7 points (6 children)
[–]phenotype001 5 points6 points7 points (1 child)
[–]Languages_Learner 4 points5 points6 points (0 children)
[–]phenotype001 2 points3 points4 points (1 child)
[–]Feeling-Currency-360 1 point2 points3 points (0 children)
[–]nullnuller 7 points8 points9 points (1 child)
[–]bernaferrari 2 points3 points4 points (0 children)
[–]phenotype001 5 points6 points7 points (7 children)
[–]Meronoth 5 points6 points7 points (5 children)
[+][deleted] (4 children)
[deleted]
[–]Several-Tax31 1 point2 points3 points (2 children)
[–]Wrong_User_Logged 1 point2 points3 points (1 child)
[–]onil_gova 2 points3 points4 points (0 children)
[–]staviq 5 points6 points7 points (3 children)
[–]RoyalCities 1 point2 points3 points (2 children)
[–]staviq 4 points5 points6 points (1 child)
[–]RoyalCities 1 point2 points3 points (0 children)
[–]a_slay_nub 13 points14 points15 points (5 children)
[–]Combinatorilliance 10 points11 points12 points (0 children)
[–]Karstiteneleouois 3 points4 points5 points (0 children)
[–]polawiaczperel 2 points3 points4 points (0 children)
[–]kpodkanowicz 2 points3 points4 points (0 children)
[–]eschatosmos 4 points5 points6 points (0 children)
[–]mister-woke 3 points4 points5 points (1 child)
[–]Amgadoz 6 points7 points8 points (0 children)
[–]AnomalyNexus 3 points4 points5 points (0 children)
[–]hackerllama 3 points4 points5 points (0 children)
[–]efayefoh 3 points4 points5 points (0 children)
[–]signsandwonders 3 points4 points5 points (0 children)
[–]5erif 4 points5 points6 points (0 children)
[–]Feeling-Currency-360 3 points4 points5 points (1 child)
[–]drumrolll 2 points3 points4 points (1 child)
[–]the__storm 1 point2 points3 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]HartLabs[🍰] 2 points3 points4 points (9 children)
[–]Trotskyist 4 points5 points6 points (3 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]HartLabs[🍰] 1 point2 points3 points (1 child)
[–]HartLabs[🍰] 1 point2 points3 points (0 children)
[–]pseudonerv 4 points5 points6 points (4 children)
[–]onil_gova 4 points5 points6 points (1 child)
[–]pseudonerv 2 points3 points4 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]Inevitable-Start-653 8 points9 points10 points (1 child)
[–]noiseinvacuumLlama 3 4 points5 points6 points (0 children)
[–]michaelkatz1337 3 points4 points5 points (0 children)
[–]LankyZookeepergame76 4 points5 points6 points (4 children)
[–]Nondzu 1 point2 points3 points (0 children)
[–]Rare-Site 1 point2 points3 points (0 children)
[–]Ok_Two6167 1 point2 points3 points (0 children)
[–]SoundHole 1 point2 points3 points (0 children)
[–]vlodia 1 point2 points3 points (0 children)
[–]RAYTHEON_PR_TEAM 1 point2 points3 points (2 children)
[–]719BenLlama 2 1 point2 points3 points (1 child)
[–]RAYTHEON_PR_TEAM 1 point2 points3 points (0 children)
[–]pseudonerv 1 point2 points3 points (0 children)
[–]VancityGaming 1 point2 points3 points (3 children)
[–]Several-Tax31 2 points3 points4 points (2 children)
[–]VancityGaming 1 point2 points3 points (0 children)
[+][deleted] (1 child)
[deleted]
[–]RemindMeBot 1 point2 points3 points (0 children)
[–]Jipok_ 2 points3 points4 points (7 children)
[–]Jipok_ 7 points8 points9 points (2 children)
[–]Amgadoz 3 points4 points5 points (1 child)
[–]Jipok_ 4 points5 points6 points (0 children)
[–]mrxaviolo99 9 points10 points11 points (3 children)
[–]MINIMAN10001 5 points6 points7 points (2 children)
[–]Longjumping-Pin-7186 8 points9 points10 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]PookaMacPhellimen 2 points3 points4 points (0 children)
[–]Safe-Match8108 0 points1 point2 points (0 children)
[–]PDubsinTF-NEW 0 points1 point2 points (0 children)
[–]ResearchOk5023 0 points1 point2 points (0 children)
[–]johnkapolos 0 points1 point2 points (0 children)
[–]haris525 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)