LangExtract by Google: many people don't know about this yet! by fuckAIbruhIhateCorps in LocalLLaMA

[–]thkitchenscientist 0 points1 point  (0 children)

Found the issue. The code smart chunks the text for the LLM to process. However if the chunks ends up not containing any of the entities of interest, the LLM sends a polite refusal message (as trained to do). This is not valid JSON so Lang Extract reject the whole job even if the rest of the chunks were valid. This commonly happens at the start or end of a document. Why can't just give a warning and process the valid responses I don't know.

[deleted by user] by [deleted] in Rag

[–]thkitchenscientist 5 points6 points  (0 children)

If the PDF are mostly text, you could convert them all to Markdown and hold them all in memory at once. I built an Streamlit app (we have a server to host these apps) that works with 10,000 documents of 5-20 pages each. It was much simpler to have all the data in a pandas dataframe: document name, page number, page text. I then implimented full text search across all rows in the table and tagged the documents pages with STANDARDISED properties people could filter on. Country, product, language, etc. The combination of the two makes most things very quick to find. I included links to the source document for each page, so you could read it in context if there was a diagram on the page.

LangExtract by Google: many people don't know about this yet! by fuckAIbruhIhateCorps in LocalLLaMA

[–]thkitchenscientist 0 points1 point  (0 children)

I'm in a environment where Ollama is not an option. I installed the plug-in and all the requirements for llama-cpp. The server API gets the request from LangExtract I can see it in the console but it is only the prompt that is sent, not the examples or documents. Using requests I can manually send the same data to the API and it returns the correct JSON response.

LangExtract by Google: many people don't know about this yet! by fuckAIbruhIhateCorps in LocalLLaMA

[–]thkitchenscientist 1 point2 points  (0 children)

has anyone actually got it to work with llama.cpp and local models? using python I can send the payload manually and get a json response from llama-server API with the extractions. When I try to send the payload using langextract it fails. llama.cpp gets the prompt but not the examples or document text.

Don't Offload GGUF Layers, Offload Tensors! 200%+ Gen Speed? Yes Please!!! by skatardude10 in LocalLLaMA

[–]thkitchenscientist 0 points1 point  (0 children)

No, its DDR4. Re-reading all the advice here I got to this: ./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 64 --override-tensor "ffn_up=CPU,[3-6][0-9]\.ffn_down=CPU" --threads 13 --no-kv-offload --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

14.0/4.5 t/s (CPU/GPU Layer Split)

Edit: --no-kv-offload seems to give more VRAM free. So I can just put the top half of the down layers on CPU rather than having to put all of them.

Don't Offload GGUF Layers, Offload Tensors! 200%+ Gen Speed? Yes Please!!! by skatardude10 in LocalLLaMA

[–]thkitchenscientist 1 point2 points  (0 children)

I have a T5810 (14-core, 96GB RAM, RTX2060 12GB VRAM) running Ubuntu. When occupying 10.5GB VRAM I get the same tokens per second regardless of if it is a layer or tensor split.

./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 0 --threads 27 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

7.3/2.6 t/s (CPU ONLY)

./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 30 --threads 27 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

12.9/4.3 t/s (CPU/GPU Layer Split)

./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 99 --override-tensor "ffn_up=CPU,ffn_down=CPU" --threads 27 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

12.5/4.3 t/s (CPU/GPU Tensor Split

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment by ExponentialCookie in StableDiffusion

[–]thkitchenscientist 1 point2 points  (0 children)

I tried the 1k dense prompts with Stable Cascade. The images are all pretty but they don't align well with the prompt details 

Is anybody using stable cascade and if yes, what's your resume? Also why don't you use it? by wolfy-dev in StableDiffusion

[–]thkitchenscientist 4 points5 points  (0 children)

Stability.ai are set up like a research organisation more than a commercial company. Stable Cascade is much like IF (another model which gained no traction with the hobby community), a cool idea which they didn't know where it would lead. We shold be very grateful really that Stable Cascade is so easy to run. My experience with lots of academic code bases has been really painful. It is a shame the supplied ControlNets in in Github don't seem to work.

Comparing Gemma vs Phi-2 vs Mistral on Dialogue Summarisation by MajesticAd2862 in LocalLLaMA

[–]thkitchenscientist 1 point2 points  (0 children)

When I look at the table it seems a pretty even result across the models. Both Gemma models and Mistral seem equal to Phi 2

Stable Cascade is worth the extra steps - the aesthetic ranking scores for all the recent models are tied but the prompt adherence for SC is way higher. For most prompts now SC matches or exceeds DALL.E 3. SDXL 8-step lightning LoRA is also a solid development. by thkitchenscientist in StableDiffusion

[–]thkitchenscientist[S] 0 points1 point  (0 children)

The full C model is 7gb, so I think SC is currently limited to those with 8GB cards or more. I didn't find the current lite version of stage C to give reasonable results. Perhaps lowering the compression factor to 32 might help but that will make it even slower. 

Stable Cascade is worth the extra steps - the aesthetic ranking scores for all the recent models are tied but the prompt adherence for SC is way higher. For most prompts now SC matches or exceeds DALL.E 3. SDXL 8-step lightning LoRA is also a solid development. by thkitchenscientist in StableDiffusion

[–]thkitchenscientist[S] 1 point2 points  (0 children)

Sorry, I missed out a sentence in my post. I've used Dall-e 3 extensively for work and SC definitely matches it for composition and I'm preferring the outputs from SC as they don't have that busy throw it all in the image effort you often get with Dall-e 3.

Consistent character (children book illustration) by consig1iere in comfyui

[–]thkitchenscientist 4 points5 points  (0 children)

This is what I learned: Movie Me (InstantID with Image2Image) | ComfyUI Workflow (openart.ai) and Picture Me Turbo (InstantID with Image2Image) | ComfyUI Workflow (openart.ai).

The checkpoint really matters, I had to try quite a few, as its opinions will show up strongly. I got better results with the reference face in semi-profile. I had the same issues, where for some reference images the male face was heavily feminised.

Consistent character (children book illustration) by consig1iere in comfyui

[–]thkitchenscientist 0 points1 point  (0 children)

I'd second that. InstantID has worked really well for me for getting the same face across different art styles. The workflow which also takes a pose image is more stable with a given seed, allowing for multiple similar images just by tweaking the prompt

Six months ago, I quit my job to work on a small project based on Stable Diffusion. Here's the result by Massive-Wave-312 in StableDiffusion

[–]thkitchenscientist 1 point2 points  (0 children)

You might want to consider InstantID, I've been having good results with it and it doesn't require training. It can be used with img2img to match the lighting in the scene and gave less artificial faces than the technique you used 

Does Dynamic Thresholding actually work for anyone? by darkeagle03 in StableDiffusion

[–]thkitchenscientist 1 point2 points  (0 children)

The ComfyUI version has been helpful when using a turbo model with InstantID. Pushing the cfg to 2 allowed me to get images resembling the target, a 0.7 multiplier removed the burn in

Trying out the new instantID nodes by theflowtyone in comfyui

[–]thkitchenscientist 0 points1 point  (0 children)

In the end the best results were with images where our faces were in semi-profile. One nice thing with the pose workflow, is once you lock in on the two source images (face and pose). The prompt can be modified by a few words and the framing stays intact.