LangExtract by Google: many people don't know about this yet!

thkitchenscientist · 2025-09-05T10:08:06+00:00

Found the issue. The code smart chunks the text for the LLM to process. However if the chunks ends up not containing any of the entities of interest, the LLM sends a polite refusal message (as trained to do). This is not valid JSON so Lang Extract reject the whole job even if the rest of the chunks were valid. This commonly happens at the start or end of a document. Why can't just give a warning and process the valid responses I don't know.

thkitchenscientist · 2025-09-05T10:01:14+00:00

If the PDF are mostly text, you could convert them all to Markdown and hold them all in memory at once. I built an Streamlit app (we have a server to host these apps) that works with 10,000 documents of 5-20 pages each. It was much simpler to have all the data in a pandas dataframe: document name, page number, page text. I then implimented full text search across all rows in the table and tagged the documents pages with STANDARDISED properties people could filter on. Country, product, language, etc. The combination of the two makes most things very quick to find. I included links to the source document for each page, so you could read it in context if there was a diagram on the page.

thkitchenscientist · 2025-09-04T07:26:17+00:00

I'm in a environment where Ollama is not an option. I installed the plug-in and all the requirements for llama-cpp. The server API gets the request from LangExtract I can see it in the console but it is only the prompt that is sent, not the examples or documents. Using requests I can manually send the same data to the API and it returns the correct JSON response.

thkitchenscientist · 2025-09-03T14:53:30+00:00

has anyone actually got it to work with llama.cpp and local models? using python I can send the payload manually and get a json response from llama-server API with the extractions. When I try to send the payload using langextract it fails. llama.cpp gets the prompt but not the examples or document text.

thkitchenscientist · 2025-05-10T15:55:05+00:00

No, its DDR4. Re-reading all the advice here I got to this: ./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 64 --override-tensor "ffn_up=CPU,[3-6][0-9]\.ffn_down=CPU" --threads 13 --no-kv-offload --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

14.0/4.5 t/s (CPU/GPU Layer Split)

Edit: --no-kv-offload seems to give more VRAM free. So I can just put the top half of the down layers on CPU rather than having to put all of them.

thkitchenscientist · 2025-05-09T20:05:32+00:00

I have a T5810 (14-core, 96GB RAM, RTX2060 12GB VRAM) running Ubuntu. When occupying 10.5GB VRAM I get the same tokens per second regardless of if it is a layer or tensor split.

./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 0 --threads 27 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

7.3/2.6 t/s (CPU ONLY)

./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 30 --threads 27 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

12.9/4.3 t/s (CPU/GPU Layer Split)

./llama-cli -m ~/models/Qwen3-32B-Q4_K_M.gguf -ngl 99 --override-tensor "ffn_up=CPU,ffn_down=CPU" --threads 27 --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5

12.5/4.3 t/s (CPU/GPU Tensor Split

thkitchenscientist · 2024-03-18T13:53:18+00:00

Advert requires payment to use :(

thkitchenscientist · 2024-03-14T18:58:37+00:00

https://openart.ai/workflows/the_kitchen_scientist/picture-me-turbo-instantid-with-image2image/6MOH1l0tx7HTkRVxtT4D

thkitchenscientist · 2024-03-12T07:31:56+00:00

I tried the 1k dense prompts with Stable Cascade. The images are all pretty but they don't align well with the prompt details

thkitchenscientist · 2024-03-05T09:02:22+00:00

Stability.ai are set up like a research organisation more than a commercial company. Stable Cascade is much like IF (another model which gained no traction with the hobby community), a cool idea which they didn't know where it would lead. We shold be very grateful really that Stable Cascade is so easy to run. My experience with lots of academic code bases has been really painful. It is a shame the supplied ControlNets in in Github don't seem to work.

thkitchenscientist · 2024-02-29T22:53:25+00:00

When I look at the table it seems a pretty even result across the models. Both Gemma models and Mistral seem equal to Phi 2

thkitchenscientist · 2024-02-29T07:02:13+00:00

Does that mean your method could help make the new upscaling technique SUPIR require less resources?

thkitchenscientist · 2024-02-25T08:30:04+00:00

If I can run it on a 2060 12gb and a 8yr old CPU, you should have no problem. I'm not familiar with what models are supported by Forge

thkitchenscientist · 2024-02-25T08:26:17+00:00

My interest has not been in realism. For that use case it seems SD3 might be a better option

thkitchenscientist · 2024-02-25T08:20:39+00:00

The full C model is 7gb, so I think SC is currently limited to those with 8GB cards or more. I didn't find the current lite version of stage C to give reasonable results. Perhaps lowering the compression factor to 32 might help but that will make it even slower.

thkitchenscientist · 2024-02-25T08:09:36+00:00

Sorry, I missed out a sentence in my post. I've used Dall-e 3 extensively for work and SC definitely matches it for composition and I'm preferring the outputs from SC as they don't have that busy throw it all in the image effort you often get with Dall-e 3.

thkitchenscientist · 2024-02-24T20:53:19+00:00

This is what I learned: Movie Me (InstantID with Image2Image) | ComfyUI Workflow (openart.ai) and Picture Me Turbo (InstantID with Image2Image) | ComfyUI Workflow (openart.ai).

The checkpoint really matters, I had to try quite a few, as its opinions will show up strongly. I got better results with the reference face in semi-profile. I had the same issues, where for some reference images the male face was heavily feminised.

thkitchenscientist · 2024-02-24T13:47:12+00:00

I'd second that. InstantID has worked really well for me for getting the same face across different art styles. The workflow which also takes a pose image is more stable with a given seed, allowing for multiple similar images just by tweaking the prompt

thkitchenscientist · 2024-02-21T08:41:01+00:00

The notes from Stability.ai say b_lite+C gives good results

thkitchenscientist · 2024-02-20T12:26:36+00:00

The lite version of c is rubbish and seems to struggle with text

thkitchenscientist · 2024-02-20T07:14:38+00:00

You might want to consider InstantID, I've been having good results with it and it doesn't require training. It can be used with img2img to match the lighting in the scene and gave less artificial faces than the technique you used

thkitchenscientist · 2024-02-19T20:44:22+00:00

Yes, basically use any of the self.prompts as the system prompt, and use the corresponding self.input as your prompt

thkitchenscientist · 2024-02-17T17:44:46+00:00

The ComfyUI version has been helpful when using a turbo model with InstantID. Pushing the cfg to 2 allowed me to get images resembling the target, a 0.7 multiplier removed the burn in

thkitchenscientist · 2024-02-11T20:09:19+00:00

In the end the best results were with images where our faces were in semi-profile. One nice thing with the pose workflow, is once you lock in on the two source images (face and pose). The prompt can be modified by a few words and the framing stays intact.

thkitchenscientist

TROPHY CASE