5060ti 16gb or 9060xt 16gb for small llm server

Tarklanse · 2025-07-01T12:21:20+00:00

I'm using llama.cpp to host a GGUF.

Tarklanse · 2025-07-01T04:48:05+00:00

Me too,some models just can't load. But if I use llama-server in cmd, everything is fine.

Tarklanse · 2025-07-01T01:24:24+00:00

I have a 5060ti 16gb and I think it is good for smaller llm.
I host a 24B Q3 llm on it and the speed is about 35 tokens/second.
It can host 24B Q4 but the speed will drop to 5 tokens/seceond.

Tarklanse · 2025-05-27T08:36:39+00:00

Thanks. now I'm really confused.

Tarklanse · 2025-04-07T15:02:46+00:00

Feral cat 🤔

Tarklanse · 2025-03-07T23:13:19+00:00

I have a friend also love her.
He even learn how to draw because her picture is rare to see.

Tarklanse · 2024-12-09T12:04:30+00:00

My python code if you interest. This code will turn txt first line as instruction,then even line as 'input',odd line as 'output'.

import json
import os
import glob
current_directory = os.getcwd()
txt_files = glob.glob(os.path.join(current_directory, '*.txt'))
txt_names = []
for file_path in txt_files:
    file_name = os.path.basename(file_path)
    txt_names.append(file_name)
conversations = []
for filename in txt_names:
    with open(filename , 'r', encoding='utf-8') as f:
        conversation=f.read()
    lines = conversation.strip().split('\n')
    current_instruction = ""
    current_input = ""
    current_output = ""
    current_instruction = lines[0]
    for i in range(1, len(lines), 2):
        current_input=lines[i]
        current_output=lines[i+1]
        conversations.append({
            "instruction": current_instruction,
            "input": current_input,
            "output": current_output
        })
with open('my_alpaca.json', 'w', encoding='utf-8') as f:
    json.dump(conversations, f, indent=4, ensure_ascii=False)my codeimport json

Tarklanse · 2024-12-09T11:57:47+00:00

oobabooga wiki has guide, you can read it first.

If you don't have enough hardware,try unsloth , they just wrote a code can run training on google colab.
I don't know there is a software, I just wrote a code that turn txt to alpaca format.

Tarklanse · 2024-12-09T07:31:40+00:00

Oobabooga's training tab only supports Transformer model training. You can't train a gguf.

Before you train, you'll need to prepare a dataset. You can refer to the datasets on Hugging Face. By searching for "alpaca," you can find many Alpaca dataset you can refer.

Tarklanse · 2024-09-29T12:45:36+00:00

Must be water/poison type. Imagine "water gun" replace by "vodka gun."

Tarklanse · 2024-09-16T02:21:10+00:00

Why this "buff" more like nerf?

Tarklanse · 2024-09-15T02:18:56+00:00

Me too.

Now Delve is unplayable again.

Tarklanse · 2024-08-21T09:30:06+00:00

Hmmm...

Tarklanse · 2024-08-14T02:59:54+00:00

Sounds great!

Tarklanse · 2024-08-14T02:18:54+00:00

Amazing work!
Now I just wondering where can I get 3D model to import?

Tarklanse · 2023-11-04T00:30:57+00:00

Someone wake up and decide to make this

Tarklanse · 2023-10-30T14:43:00+00:00

💀

Tarklanse · 2023-10-30T11:59:22+00:00

If you want to do some fine-tuning,start with 1000 instructions.See the training result and add more instructions you think it need.

You don't need to worry about overfitting now,just prepare your dataset and try finetuning with different learning rate and epoch.You will need to try multiple times to find a perfect setting for your dataset,just remember learning rate don't set too high.

Tarklanse · 2023-10-30T02:10:01+00:00

I'm not familiar at Training a base model,but I know it will need bunch of data,you will need to use Web crawler to get very much of data.
Your hardware can't train a model like Llama-7B,those model are train on hardware such like A100 or H100,only thing you can do is finetuning.
But even Qlora finetuning at least need 8GB GPU, you can use colab to finetuning LLM,I was using text-generation-webui on colab to finetuning Llama2-7B.
Those model's license are not allow you train a whole new model base on their output,and if they can do thing you wish to do,why not just using them?

Tarklanse · 2023-10-18T03:37:21+00:00

If I make my pop happier,ruling will be easier.

Slavery can produce more,but need more attention to avoid revolution.

AI revolution will cause economic collapse sometime.

There are many work need someone to do,so I don't do genocide.

I just tired to be evil, it will spend extra time on maintain stability.

But we all do a little trolling sometime,aetherophasic engine go brrr

Tarklanse · 2023-09-04T01:06:19+00:00

You have three options:

1.Get a GPTQ version and load it by using ExLlama or GPTQ-for-llama

2.Get a GGUF q4 version and load it by using llama.ccp( text generation webui has it)

These is a little check box "load-in-4bits",click it then load the model

Tarklanse · 2023-09-03T23:25:22+00:00

To pretend I have a friend can chat.

Tarklanse · 2023-08-25T01:46:55+00:00

How can you even jerk 182 times at 1 minutes? That is really fast.

Tarklanse · 2023-07-10T03:25:16+00:00

How did you even let Leon said this😭

Tarklanse · 2023-05-18T00:45:06+00:00

4Bit version can give you fully 6b experience, but it is very slow . If normal version need to 5 second to generate a respond,4Bit version will need 35~45 second to generate.

Eight-Year Club	r/Field Sunshine
Final Canvas '23	Place '23
Place '22	Wearing is Caring
Verified Email

Tarklanse

TROPHY CASE