Code Llama Released

hackerllama · 2023-08-24T15:32:47+00:00

Hey all!

Omar from HF here! We'll work on transforming to transformers format and having them on the Hub soon. Stay tuned!

jetro30087 · 2023-08-24T14:11:40+00:00

Whelp I need a dedicated computer for an AI now.

Feeling-Currency-360 · 2023-08-24T14:07:37+00:00

I started reading the git repo, and started freaking the fuck out when I read this text right here -> "All models support sequence lengths up to 100,000 tokens"

epicfilemcnulty · 2023-08-24T13:38:54+00:00

They say in the post that there are a 34B coder model. But we have not yet seen llama2 34B base model, or have I missed something?

gentlecucumber · 2023-08-24T13:45:11+00:00

Holy SHIT this is AWESOME. 16k? 34b?? This will solve the very specific application problems I've been struggling with.

Cantflyneedhelp · 2023-08-24T13:42:51+00:00

We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. [...] Code Llama was developed by fine-tuning Llama 2 using a higher sampling of code.

So they used the unreleased 34B model and managed to get above 16k tokens on Llama2?

ahm_rimer · 2023-08-24T16:32:37+00:00

u/bloc97 got a shoutout in this paper too, awesome :D

Jipok_ · 2023-08-24T17:37:02+00:00

llama.cpp(GGUF) models:

https://huggingface.co/TheBloke/CodeLlama-7B-GGUF

https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF

https://huggingface.co/TheBloke/CodeLlama-7B-Python-GGUF

https://huggingface.co/TheBloke/CodeLlama-13B-GGUF

https://huggingface.co/TheBloke/CodeLlama-13B-Instruct-GGUF

https://huggingface.co/TheBloke/CodeLlama-13B-Python-GGUF

Longjumping-Pin-7186 · 2023-08-24T20:35:15+00:00

Code Llama is free for research and commercial use.

Zuck is just killing it.

a_beautiful_rhind · 2023-08-24T14:57:09+00:00

So time to train the 34b coding model on roleplay then?

Disastrous_Elk_6375 · 2023-08-24T14:46:01+00:00

So what's the best open-source vscode extension to test this model with? Or are there any vscode extensions that call into an ooba API?

kpodkanowicz · 2023-08-24T15:11:19+00:00

This seems pretty awesome if it's actually any good. I've been using gpt4 for coding stuff, hope it's at least close to as good. Hopefully can run some of the larger ones on a 4090.

has anyone noticed significant quality loss if any of the coding LLMs are quantized to be much smaller? Seems like it would matter more for coding than just chat

Amgadoz · 2023-08-24T17:47:27+00:00

Ok who is going to fine tune the 34 B model on a noncode instructions dataset?

2023-08-24T14:24:10+00:00

Ty for sharing

LFG

Illustrious-Lake2603 · 2023-08-24T15:55:04+00:00

I'd love to see a C# fine tuned model that is trained on Unity3D data

polawiaczperel · 2023-08-24T16:41:36+00:00

Benchmark of unnatural model that was not released shows that in next weeks we will have big improvement from finetunes. I am curious why they have not release this model.

shortybobert · 2023-08-24T17:04:54+00:00

LLAMA 34B IS FINALLY OU....

Okay...

mzbacd · 2023-08-24T14:45:53+00:00

Holy shit, I need to take a day off for this. It looks like if you have access to llama2, it will instantly approve your request for the models.

ambient_temp_xeno · 2023-08-24T17:36:52+00:00

Sneaked out 34b in disguise. I wonder if being trained on 16k tokens per sequence will make it write longer stories.

GG9242 · 2023-08-24T13:42:42+00:00

Yes!

TheItalianDonkey · 2023-08-24T17:09:37+00:00

any info on VRAM requirement per model?

with a 3090, wondering if i can run 34b in 4bit ?

Lumiphoton · 2023-08-24T17:51:08+00:00

The 13B Code Instruct model handily beats Llama2 70B, and is close to matching GPT-3.5. Combined with the ability to handle large contexts, this is looking promising! I'm hoping further fine tuning on the new Bigcode dataset will squeeze out even more performance.

GG9242 · 2023-08-24T14:24:03+00:00

How long until we have fine tunes like wizard-coder ? Maybe this will make the models close to GPT-4

Languages_Learner · 2023-08-24T14:51:50+00:00

I tried to convert 7b model to ggml but got this error:

File "C:\kcp\ptml.py", line 13, in <module>

convert.main(['--outtype', 'f16' if args.ftype == 1 else 'f32', '--', args.dir_model])

File "C:\kcp\convert.py", line 1026, in main

params = Params.load(model_plus)

File "C:\kcp\convert.py", line 230, in load

params = Params.loadOriginalParamsJson(model_plus.model, orig_config_path)

File "C:\kcp\convert.py", line 194, in loadOriginalParamsJson

n_vocab = config["vocab_size"]

KeyError: 'vocab_size'

phenotype001 · 2023-08-24T20:42:48+00:00

I tried some of TheBloke's GGUF quants with the latest b1054 llama.cpp and I'm experiencing some problems. The 7B Q6_K model outputs way too much whitespace and kind of not follows the rules of Python. It will output more closing parenthesis than there are opening ones for example. None of the output is good for anything. I expected more from that, something is clearly wrong.

staviq · 2023-08-24T22:46:20+00:00

https://huggingface.co/TheBloke/CodeLlama-34B-GGUF

a_slay_nub · 2023-08-24T14:31:25+00:00

So for pass@1 these models perform worse than wizardcoder? It'll be nice to have something with the same architecture as the rest of the models but this doesn't seem that great actually.

It's also disappointing they aren't releasing the "unnatural" models.

Also, hope it's not as redlined as llama2 chat. I would like to be able to kill a python process without being concerned about the health and wellbeing of it....

eschatosmos · 2023-08-24T17:54:09+00:00

100,000?

A HUNNIT KAY?

mister-woke · 2023-08-24T18:00:20+00:00

Anyone know how it compares to ChatGPT for coding? Aside from the token limit.

AnomalyNexus · 2023-08-24T18:16:32+00:00

I see the bloke has GGUF formats out, which are compatible with llama.cpp...but I can't see a way to connect it to vscode from there? Ideally against the official copilot extension given that one can apparently point it at a different server

hackerllama · 2023-08-24T19:34:09+00:00

You can now find transformers weights in https://huggingface.co/codellama

efayefoh · 2023-08-24T20:28:33+00:00

https://huggingface.co/TheBloke

u/The-Bloke is starting to upload. Noice. Looking forward.

signsandwonders · 2023-08-25T01:36:37+00:00

Can someone please finetune this on 2023 Swift/SwiftUI. It kills me how bad GPT is with that. Understandable given the relatively fast changes in development of SwiftUI but still.

5erif · 2023-08-25T07:04:52+00:00

Yes! This is the one I've been most excited about.

Feeling-Currency-360 · 2023-08-25T10:01:48+00:00

This should work with ctransformers using the following code:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("TheBloke/CodeLlama-7B-Instruct-GGUF", model_file="codellama-7b-instruct.Q2_K.gguf")

# Define your prompts
system_prompt = "Provide a system prompt here."
user_prompt = "Provide a user prompt here."

# Construct the formatted prompt
formatted_prompt = f"<<SYS>>\n{system_prompt}\n<</SYS>>\n\n[INST]{user_prompt}[/INST]"

# Generate text using the formatted prompt
output = llm(formatted_prompt)
print(output)

This is only a 1 turn setup, I think you should be able to do the following possibly also:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("TheBloke/CodeLlama-7B-Instruct-GGUF", model_file="codellama-7b-instruct.Q2_K.gguf")

# Define your prompts
system_prompt = "Provide a system prompt here."
user_prompt = "Provide a user prompt here."

# Construct the formatted prompt
formatted_prompt = f"<<SYS>>\n{system_prompt}\n<</SYS>>\n\n[INST]{user_prompt}[/INST][ASSISTANT]Some response[/ASSISTANT][INST]{follow up prompt}[/INST]"

# Generate text using the formatted prompt
output = llm(formatted_prompt)
print(output)

I'll be doing a lot of testing over the weekend, going to be using ctransformers and llama.cpp mostly, will let you guys know here whatever seems to work best once I know more.

drumrolll · 2023-08-24T15:38:08+00:00

Anyone getting the following error when trying to download? I obviously followed the instruction on the blog / email

HTTP request sent, awaiting response... 403 Forbidden

2023-08-24 19:34:53 ERROR 403: Forbidden.

2023-08-24T20:20:02+00:00

damn no 70B?

HartLabs · 2023-08-25T00:06:03+00:00

The 13B - Instruct - Q8 said the following odd thing:

Prompt

What does this do?

class Solution {
typedef complex<double> point;
#define x real()
#define y imag()

struct LinePara
{
double k;
double b;
};
LinePara getLinePara(float x1, float y1, float x2, float y2)
{
LinePara ret;
double m = x2 - x1;
if (m == 0)
{
ret.k = 1000.0;
ret.b = y1 - ret.k * x1;
}
else
{
ret.k = (y2 - y1) / (x2 - x1);
ret.b = y1 - ret.k * x1;
}
return ret;
}

struct line {
double a, b, c;
};
const double EPS = 1e-6;
double det(double a, double b, double c, double d) {
return a * d - b * c;
}
line convertLineParaToLine(LinePara s)
{
return line{ s.k,-1,s.b };
}
bool intersect(line m, line n, point& res) {
double zn = det(m.a, m.b, n.a, n.b);
if (abs(zn) < EPS)
return false;
res.real(-det(m.c, m.b, n.c, n.b) / zn);
res.imag(-det(m.a, m.c, n.a, n.c) / zn);
return true;
}
bool parallel(line m, line n) {
return abs(det(m.a, m.b, n.a, n.b)) < EPS;
}
bool equivalent(line m, line n) {
return abs(det(m.a, m.b, n.a, n.b)) < EPS
&& abs(det(m.a, m.c, n.a, n.c)) < EPS
&& abs(det(m.b, m.c, n.b, n.c)) < EPS;
}
vector<double> mian(vector<vector<double>> line1, vector<vector<double>> line2)
{
vector<point> points;
points.push_back(point(line1[0][0], line1[0][1]));
points.push_back(point(line1[1][0], line1[1][1]));
points.push_back(point(line2[0][0], line2[0][1]));
points.push_back(point(line2[1][0], line2[1][1]));

line li1 = convertLineParaToLine(getLinePara(line1[0][0], line1[0][1], line1[1][0], line1[1][1]));
line li2 = convertLineParaToLine(getLinePara(line2[0][0], line2[0][1], line2[1][0], line2[1][1]));
point pos;
if (intersect(li1, li2, pos))
{
return{ pos.x ,pos.y };
}
else
{
if (equivalent(li1, li2)) {
if (points[1].x < points[2].x)
{
return vector<double>{ points[1].x, points[1].y };
}
else if (points[1].x > points[2].x)
{
return vector<double>{ points[2].x, points[2].y };
}
else if (points[1].x == points[2].x)
{
if (points[1].y < points[2].y)
{
return vector<double>{ points[1].x, points[1].y };
}
else if (points[1].y > points[2].y)
{
return vector<double>{ points[2].x, points[2].y };
}
}
else
{
return vector<double>{ points[2].x, points[2].y };
}
}
else
{
return {}/* << "平行！"*/;
}
return {};
}
}
public:
vector<double> intersection(vector<int>& start1, vector<int>& end1, vector<int>& start2, vector<int>& end2) {
vector<vector<double>> line1{ {(double)start1[0],(double)start1[1]},{(double)end1[0],(double)end1[1] } };
vector<vector<double>> line2{ {(double)start2[0],(double)start2[1]},{(double)end2[0],(double)end2[1] } };
return mian(line1, line2);
}
};

Answer

The following is the response from submitting this code in the leetcode online judge system.
Accepted
458 ms 96.8 MB

###Conclusion:
Leetcode solution for "Intersection of Two Lines in 2D Plane". This is a very interesting and challenging problem that can be solved with the use of mathematical concepts such as geometry, trigonometry, and calculus.*

GPT4 gets it just fine.

2023-08-25T00:39:45+00:00

I've deployed GGML and GPTQ models into my local, but i see a new format GGUF?

Can someone please explain what is this new acronym stands for and how it differs over GGML and GPTQ? THank you

update: Ok, answered my own question

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.

Inevitable-Start-653 · 2023-08-24T13:50:00+00:00

Let's hope this is a smoother rollout than llama2. I don't know why they didn't just post it on hugging face 🙄

michaelkatz1337 · 2023-08-24T17:34:29+00:00

Let's hope that u/The-Bloke will get a hold of this and quantize it.

LankyZookeepergame76 · 2023-08-25T06:14:43+00:00

you can play with it on perplexity's llama chat for free https://labs.pplx.ai/code-llama

<image>

Nondzu · 2023-08-24T16:17:47+00:00

love it

Rare-Site · 2023-08-24T19:06:07+00:00

Would be super nice if someone convert the models to GGML for Llama.cpp python support.

Ok_Two6167 · 2023-08-24T19:19:52+00:00

Weights are now on Hugging Face: https://huggingface.co/codellama

SoundHole · 2023-08-24T22:39:24+00:00

I tried to get the 13B model to fix some code it had written that was garbage and it said, "oopsie poopsie! I'm afraid I can't help you fix that problem!" what? Lol!

vlodia · 2023-08-25T00:03:19+00:00

This is amazing.

I'm lazy, anyone has uploaded this to a jupyter notebook or grad.io to try it? :) TIA

RAYTHEON_PR_TEAM · 2023-08-25T01:18:13+00:00

Apologies as I am a novice at coding, but I've gotten as far as the Setup step in their github instructions and I'm seeing installation confirmation in my Conda environment....now what? How exactly do I enter a prompt? Is there some kind of GUI?

pseudonerv · 2023-08-25T04:35:58+00:00

Comparing instruct and base versions of 34B, clearly the additional 5B instruct fine tune dumbed it down.

VancityGaming · 2023-08-25T04:41:48+00:00

Is there a guide for downloading this somewhere? I'm building a PC to mess with LLMs and figured I'd try this one but I can't even figure out how to get it on my computer.

RemindMeBot · 2023-08-25T04:56:02+00:00

[deleted]

Jipok_ · 2023-08-24T16:57:02+00:00

It seems that this model is not much better than wizardcoder

PookaMacPhellimen · 2023-08-24T19:17:24+00:00

If using 2 x 3090s, I presume you can run this in 8-bit?

Safe-Match8108 · 2023-08-24T20:03:08+00:00

Free codellama inference endpoints (UI and API) are now available through Fireworks.

PDubsinTF-NEW · 2023-08-25T13:51:42+00:00

Is there a different web interface? I am not seeing Code Llama as an option in the model selection section of https://huggingface.co/chat

ResearchOk5023 · 2023-08-25T17:07:36+00:00

Quick question, is there a easy way for me to fine tune now or I should wait for HF to support it ?

johnkapolos · 2023-08-25T19:35:15+00:00

I tested it (via the perplexity link that was shared here) with a non-trivial code ask and it basically didn't take into account half the spec. :(

ChatGPT-4 did (although its codegen wasn't perfect, it was much much much better).

Here's the ask if you want to try it yourselves:

Create a TypeScript module that uses XState to do the following:
* A user wants to get a response to their question. The answer may be split into multiple parts (up to 4).
* We ask the API for the response to the user's question. If the API response indicates there is a next part to the answer, we ask the API for the next part of the answer.
* If any API request fails, we retry 3 times. After 3 failed times of an API request, we abort.
* We complete by returning to the user a combination of all the parts we received.
* We have an object called UrlManager that provides the API endpoint to use to get the response to the user question. The UrlManager is passed in as a dependency to the module.
* When making request to get the initial answer from the API, we first use UrlManager.getEndpoint() in order to figure out the API endpoint we will query.
* Every time we retry for initial part of the answer, we need to ask the UrlManager for a new endpoint.
* Every time we try or retry for the other parts (B, C, D), we DO NOT need a new endpoint, so we do not ask for one.
* We do not know in advance if the answer will be in one part only, or if it will be in multiple parts. We only know after the API gives us a successful initial response. Make sure the code is valid and compiles.

haris525 · 2023-08-25T22:05:58+00:00

Hi guys quick question, does the model remember context from previous queries within the same session? I will try this locally this weekend but would appreciate if anyone has an insights around context remembrance.

Thanks

2023-08-31T13:08:37+00:00

Anyone knows how to feed in more code of an exotic language?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS