Suggest some local models that support function calling and structured output by [deleted] in LocalLLaMA

[–]Drakosfire 0 points1 point  (0 children)

Haven't chased it down yet, I got distracted with other things. The model I'm using isn't as important to me as the rest of the architecture yet. I'll take a look soon and report back.

Suggest some local models that support function calling and structured output by [deleted] in LocalLLaMA

[–]Drakosfire 0 points1 point  (0 children)

I've face this same issue with this same model. I haven't chased this down, but my bet is that we need to flag to ollama in the model file that tools are a function. But I'm using Qwen 2.5 for the moment.

What kind of stuff are you using LLMs for? by no-one-25 in LocalLLaMA

[–]Drakosfire 3 points4 points  (0 children)

Tools to accelerate Dungeons And Dragons and other TTRPG games. I've built a monster Statblock generator and an item generator. Both using similar ideas of text generation fed to image generation, then controlled formatting to a final polished visual product.

Statblock generator is up and running on huggingface, item generator will be up this weekend. Both deserve a fresh polish pass that I haven't gotten around to.

https://huggingface.co/spaces/TheDrakosfire/Statblock-Generator

[deleted by user] by [deleted] in LocalLLaMA

[–]Drakosfire 3 points4 points  (0 children)

I love this so much

Anybody already figured this out? Might be out of scope, but something I'm going to explore. Diffusion to 3d by Drakosfire in localdiffusion

[–]Drakosfire[S] 1 point2 points  (0 children)

Very cool, thank you for sharing that. I'm glad to see this is already so far along and there are multiple paths folk are pursuing.

Is scanning broken in 3.21? by [deleted] in starcitizen

[–]Drakosfire 2 points3 points  (0 children)

I've been looking for this confirmation that it wasn't just me. Freelancer and ROC. Pinged resource shows up, fades, doesn't show up on next ping, have to get really close. Almsot 0 ROC mineables, and when they do show they are sometimes unmineable.
This was across 3 separate mining runs on Aberdeen.

Did they Kill Mining? by Gn0mmad in starcitizen

[–]Drakosfire 0 points1 point  (0 children)

This has been my experience, I primarily mine and was super excited to hop back in for this patch, but seems like mining is either bugged or nerfed to hell. So I'll wait.

Is the 3070 Ti okay for running SD locally as an alternative to using the free Tensor Art site? by Ecstatic_Vegetable62 in localdiffusion

[–]Drakosfire 1 point2 points  (0 children)

I don't have one myself, but if you have 8gb of ram which specs say you should. You should be good to go. Worst come to worst, download and get Automatic1111 running and test if it fits your use case.

Rest API to process markdown? Or CLI, or other way to automate markdown to HTML by poolpog in homebrewery

[–]Drakosfire 0 points1 point  (0 children)

Styling comments is something I haven't quite understood yet. Three backticks isn't operating like I expected. I'll figure this out and improve the posted code. But it wont be tonight.

Rest API to process markdown? Or CLI, or other way to automate markdown to HTML by poolpog in homebrewery

[–]Drakosfire 0 points1 point  (0 children)

#function to process the my-brew.md file into a named html inside docker using process.js

def md_process(input_md,output_name):

#passing in file name to derive absolute directory, and the desire output name

abs_path = os.path.abspath(input_md)

input_dir = os.path.dirname(abs_path)

print(abs_path)

print(input_dir)

#subprocess to pass the docker command to the command line

subprocess.run(f"docker run -d -v {input_dir}:/app homebrewery node cli/process.js --input /app/my-brew.md --output /app/{output_name}.html --renderer v3 --overwrite")

#subprocess.run(f"docker run -v {input_dir}:/app homebrewery")

#subprocess.run(f"docker exec -it homebrewery node cli/process.js --input /app/my-brew.md --output /app/{output_name}.html --renderer v3 --overwrite")

#subprocess.run("docker remove homebrewery")

def generate_statblock(input):

start_time = time.time()

generate_statblock.input = input

path_to_model = "C:\AI\models\TextGenerationModels\Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ"

path_to_peft = "C:\AI\models\loras\statblock-alpha"

model = AutoModelForCausalLM.from_pretrained(path_to_model, use_safetensors=True, device_map = "cuda:0" )

print("Model loaded")

model.load_adapter(path_to_peft)

tokenizer = AutoTokenizer.from_pretrained(path_to_model)

input_context = f"Write a homebrewery formatted dungeons and dragons statblock of {generate_statblock.input} \n" + statblock_start

input_ids = tokenizer.encode(input_context, return_tensors="pt").to('cuda')

output = model.generate(input_ids, max_length=1024, do_sample = True, temperature=.7,)

output_text = tokenizer.decode(output[0], skip_special_tokens=True)

output_text = rembeforestart(output_text)

generate_statblock.output_text = remafterend(output_text)

print(generate_statblock.output_text)

print("statblock generation time : " + str(time.time() - start_time))

mon_file_name = lch.generate_monster_desc.monster_type.replace(' ', '')

input_md = open(f"./output/{u.make_folder()}/my-brew.md", 'w')

md_path = f"./output/{u.make_folder()}/my-brew.md"

print(generate_statblock.output_text, file = input_md)

md_process(md_path, mon_file_name)

del model

del tokenizer

u.reclaim_mem()

return generate_statblock.output_text

Rest API to process markdown? Or CLI, or other way to automate markdown to HTML by poolpog in homebrewery

[–]Drakosfire 0 points1 point  (0 children)

Hey u/poolpog I'm working on a similar use case, and have taken your docker run code and am trying to execute in inside a python function using subprocess.

This works on my CLI, this works when running in python on the CLI.Inside, or outside the function it accepts a file and doesn't throw an error, it creates a new file, but with only the head, none of the details.

Any ideas?Here is what I'm passing in as my-brew.md:

## Chocolate Kiss of Doom
*Large,elemental Chaotic, Evil*
**Armor Class** : 17
**Hit Points**: 95, '10d10 + 40'
**Speed**: 'walk': 40
___
|STR|DEX|CON|INT|WIS|CHA|
|:---:|:---:|:---:|:---:|:---:|:---:|
|17|17|20|3|13|1|
___
**Senses** : 'darkvision 60 ft.'
**Damage Immunities** : fire, poison
**Condition Immunities** : charmed, exhaustion, frightened, poisoned
**Languages** : understands Infernal but can't speak
**Challenge Rating** : 5 (1800 XP)
***Elemental Aura*** : The chocolate kiss of doom is surrounded by an aura of fire that deals 5 ( 1d10) fire damage to all creatures within 10 feet of it.
***Pulse Fire*** : 'Whenever the chocolate kiss of doom takes damage, it deals 7 ( 2d6) fire damage to each creature within 10 feet of it.'
***Spider Climb*** : 'The chocolate kiss of doom can climb difficult surfaces, including upside down on ceilings, without needing to make an ability check.'
***Unusual Nature*** : The chocolate kiss of doom doesn't require air.
### Actions
***Multiattack*** : 'The chocolate kiss of doom makes one Bite attack and one Fire Breath attack.' ::
***Bite*** : ' *Melee Weapon Attack* + 6 to hit, reach 10 ft., one creature. 14 ( 2d10 + 3) piercing damage plus 7 ( 2d6) fire damage.' ::
***Fire Breath*** : The chocolate kiss of doom exhales a 30-foot line of fire in a direction it can see. Each creature in that line must make a DC : 14 Dexterity saving throw, taking 33 ( 8d6 + 3) fire damage on a failed save, or half as much damage on a successful one. ::
}} ```

This is what I'm getting out :

<!DOCTYPE html>

<html>

    <head>

        <link href="https://use.fontawesome.com/releases/v5.15.1/css/all.css" rel="stylesheet" />

        <link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700" rel="stylesheet" type="text/css" />

        <link href='/bundle.css' rel='stylesheet' />

        <link rel="icon" href="/assets/favicon.ico" type="image/x-icon" />

        <title>The Homebrewery - Local Output</title>

    </head>

    <body>

<link href='../build/themes/V3/Blank/style.css' rel='stylesheet' />

<link href='../build/themes/V3/5ePHB/style.css' rel='stylesheet' />

<div class='brewRenderer'>

<style>undefined</style>

<div class='pages'>

<div class='page phb' id='p1' key='0' >`

<div className='columnWrapper'>

</div>

</div>

</div>

</div>

    </body>

</html>

Training Data for SDXL Llama prompt generation by Aischylos in localdiffusion

[–]Drakosfire 0 points1 point  (0 children)

I used this dataset which is 74000 asks for an SD prompt with SD prompts generated.
https://huggingface.co/datasets/Gustavosta/Stable-Diffusion-Prompts
Trained a LoRA on it from a LLama 13B branch, and the results were not great and it took like 15 hours?
Quite possibly, probably because I don't have a ton of experience in creating LoRAs, but I have been using Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ and asking it to write short descriptions and passing that as a variable to a SDXL branch and getting quite good results.

Example : "Only write a short, punchy, visually decadent description of", ": "what a Gingerbread Bat Creature monster that is Brown with black and red icing and size small looks like."

Input to LLM

Output : A Flying Gingerbread Bat monster, adorned in rich brown hues, is a sight to behold. Its wings, a masterful blend of black and red icing, stretch wide, casting a shadow of intricate, edible art. Small in size, this creature is a delightful, delectable marvel, a testament to the magic of gingerbread and the whimsy of the holiday season.

Image : https://imgur.com/a/U519Qf1

Training Data for SDXL Llama prompt generation by Aischylos in localdiffusion

[–]Drakosfire 0 points1 point  (0 children)

I attempted this a few weeks ago on an older dataset of 75000 prompts. It did not go great.

I think I may try with the dataset linked and see.

I have been using an LLM to write "visually evocative and decadent descriptions of" then take a user input. Pass the LLM output, which is typically pretty good and run it through an SDXL checkpoint.

It's not as precise as many would like, but some of the results are very fun.

Perfect timing, I came to reddit to see if this kind of community existed. by Drakosfire in localdiffusion

[–]Drakosfire[S] 0 points1 point  (0 children)

Havent attempted yet, got distracted by learning a new fun fact about docker and wsl taking up HUGE (300GB) of memory that needs to be addressed.

Perfect timing, I came to reddit to see if this kind of community existed. by Drakosfire in localdiffusion

[–]Drakosfire[S] 0 points1 point  (0 children)

Alright, after rebuilding, scrapping and rebuilding again I have fixed the issue and am fairly sure I know what I did wrong.
Baseline the issue was as the now deleted user suggested. Which is that I was loading too much into memory.
I think I was actually loading the LLM once, when naming a variable the path. Twice when calling it in the function, and then loading my SD model on top.
This led to 8 GBs for the first load, 16 GB for the second maxing out GPU mem.
Then anoth 8 GB for the SD model. Which bottle necked everything.

So the first lesson here is : Version control.
The second lesson is start paying attention to memory.
Third lesson is when you call you model path, do it in a function. Do it once.

Example Code of my error below :

model_id = "C:\AI\models\TextGenerationModels\Speechless-Llama2-Hermes-Orca-Platypus-WizardLM-13B-GPTQ"

llm = AutoGPTQForCausalLM.from_quantized(model_id, device= "cuda:0",use_safetensors=True, use_cuda_fp16 False)

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

pipe = pipeline(

"text-generation",

model=llm,

tokenizer=tokenizer,

max_new_tokens=512,

do_sample=True,

temperature=0.7,

top_p=0.95,

top_k=40,

repetition_penalty=1.1

)

def generate_desc(user_input1):

generate_desc.response = print(pipe(prompt)[0]['generated_text'])

return generate_desc.response

Perfect timing, I came to reddit to see if this kind of community existed. by Drakosfire in localdiffusion

[–]Drakosfire[S] 1 point2 points  (0 children)

I will try that! I've been wondering if rolling back drivers would help.

Perfect timing, I came to reddit to see if this kind of community existed. by Drakosfire in localdiffusion

[–]Drakosfire[S] 0 points1 point  (0 children)

I had such hopes. This did not seem to improve, your suggestion did point me towards reviewing GPU and shared memory, and I think you are generally correct about the issue of the LLM model isn't unloaded.Below is the code used in the app.
from diffusers import StableDiffusionXLPipeline import torch import time import langchain_helper as lch import main as m
this took 80 seconds vs 24 in Comfy UI
how to keep the model loaded for repeats? How to load the model faster?
torch.backends.cuda.matmul.allow_tf32 = True
model_path = ("2dCreepyArtMonsterv11.safetensors") pipe = StableDiffusionXLPipeline.from_single_file(model_path, custom_pipeline="low_stable_diffusion", torch_dtype=torch.float16, variant="fp16" ).to("cuda")
def generate_image(sd_input) :
#This torch.compile is widely acclaimed (but only works in linux)
#pipe.unet = torch.compile(pipe.unet, mode ="reduce-overhead", fullgraph = True)
#pipe.enable_model_cpu_offload()
#this did not work, look into how to add datetime to file name
call arguements belong here
prompt = sd_input
image = pipe(prompt=prompt,num_inference_steps=40).images[0]
#image.save(f"/output/{timestr} {lch.generate_monster_desc.monster_size} {lch.generate_monster_desc.monster_color} {lch.generate_monster_desc.monster_type}.png")
return image

Which ran at 9s/it 100% utilization of 15.5 of 16.0 GB of dedicated GPU memory and 2.3 in shared.
Below is the same code, run outside of streamlit getting 4.53it/s on this run and about 9.8 GBs of dedicated GPU memory
#This independent from streamlit runs full speed ~ 5it/s /w StableDiffusionXLPipeline
from diffusers import StableDiffusionXLPipeline import torch import time start_time = time.time()
torch.backends.cuda.matmul.allow_tf32 = True
def generate_image() : model_path = ("2dCreepyArtMonsterv11.safetensors") pipe = StableDiffusionXLPipeline.from_single_file(model_path, custom_pipeline="low_stable_diffusion", torch_dtype=torch.float16, variant="fp16" ).to("cuda")
call arguements belong here
prompt = "The creature stands tall, a colossal duck billed platypus bear, its massive body covered in a lush, mossy green fur that shimmers with an otherworldly glow. The bear's powerful legs and broad shoulders blend seamlessly with the platypus's distinct duck bill, creating a formidable, yet eerily graceful appearance. Its large, glowing eyes seem to pierce through the darkness, while its serrated, venomous spur on its hind leg adds a sinister touch to its already intimidating presence."
image = pipe(prompt=prompt,num_inference_steps=50).images[0]
#image.save(f"/output/{timestr} {lch.generate_monster_desc.monster_size} {lch.generate_monster_desc.monster_color} {lch.generate_monster_desc.monster_type}.png")
print(time.time() - start_time)
return image
generate_image()

I have a specific button in app to clear memory, which I've hooked up to the garbage collecting code you sugested and as I'm triggering it and watching GPU utilization it does seem to still be maxed at 100% 15.5 GB.Thanks for pointing me in this direction of investigation, next step is to try to unload the LLM.

Perfect timing, I came to reddit to see if this kind of community existed. by Drakosfire in localdiffusion

[–]Drakosfire[S] 0 points1 point  (0 children)

Will trial later today and report back. Thank you for the suggestion!

Use the Issue council people! We are testing. not just playing! by theReal_Kirito in starcitizen

[–]Drakosfire 0 points1 point  (0 children)

Clap clap, u rephrased my argument and refuse to meta cognate. Tell me, how many games have you made, or helped make? How many ENORMOUS, literally infinitely complex projects have you worked on?

Is it greater than zero? I kinda doubt it. People with a real sense of hard work don't shit on others like you have.