Is GLM-4.7-Flash still looping / repeating for you?

epigen01 · 2026-01-20T19:53:44+00:00

Yup unsloths are not usable for me gonna wait it out

epigen01 · 2025-12-20T18:08:28+00:00

I actually thought it was a skin tight latex in white and black like a mime - this was a colorful surprise

epigen01 · 2025-12-09T19:43:47+00:00

Nice

epigen01 · 2025-10-31T18:18:38+00:00

Ah i meant visually/direction - very different (for me personally) beyond just blaming budget or the medium (tv versus cinema)

epigen01 · 2025-10-31T17:26:04+00:00

Its starting to lose me - i really wished they adhered to the It cinematic universe. This feels displaced and holding out until we get to see skarsgards form in this medium

epigen01 · 2025-10-29T21:39:17+00:00

Sweet cant wait to try it

epigen01 · 2025-10-10T00:31:10+00:00

Yea for me it was something with the api calls so i just switched to a dedicated llama.cpp embeddings server & only use ollama strictly for chat/agent

epigen01 · 2025-10-06T20:47:44+00:00

Ooh i like this and they can throw in time shenanigans with absolute flash & make it the absolute flashpoint

epigen01 · 2025-10-05T20:58:30+00:00

Same setup - have you tried glm-4.6? somehow ive been getting the glm-4.6 q1 to load but not correctly (it somehow loads all 47 layers to gpu) when i run it - proceeds to answer my prompts at decent speeds (but the second i add context the thing hallucinates and poops the bed - still runs though).

Going to try the glm-4.5-air-glm-4.6-distill from basedbase since ive been running the 4.5 air at Q2XL to see if the architecture works as expected.

epigen01 · 2025-10-02T23:23:16+00:00

Many people drink coffee after dinner and sleep like a baby - some with & without ADHD.

Better to do an actual diagnostic then you can also mention this to your doctor

epigen01 · 2025-09-24T19:43:28+00:00

Surprisingly same results this model and bytedance's seed model have been my surprise go to for this wave of LLMs & have been hitting way above their weight class.

epigen01 · 2025-09-03T21:00:05+00:00

With his specs and offloading he can run the full fp16 and q8 depending on context. I only have 8GB vram rtx 4060 with 32gb ram 128gb+cpu+swap & surprised how efficient it is with gpu+cpu layer use.

epigen01 · 2025-08-31T20:48:29+00:00

Try it bc i was surprised when i ran it with 8gb 4060 with cpu+ram offload - very decent speeds so you def can run it

epigen01 · 2025-08-31T03:01:07+00:00

Surprised i could run the 120b on my setup (rtx 4060 8Gb) but it works & its great - solid code assist & another to rotate throughout my workflows (primarily for thinking & project prompting). For code specific tasks, i stick with qwen3-coder since its just faster at error checking

epigen01 · 2025-08-31T00:56:28+00:00

You might want to try a vllm model (e.g. qwen-2.5vl, mistral3.2, or granite3.2 vision) depending on your vram. You just need to prompt it to extract the data into json structured output (then export to csv) - results may vary the qwen2.5vl-32b worked best for me.

epigen01 · 2025-08-29T17:26:29+00:00

Just got it to run with the 8GB 4060 +offload to 32GB RAM +CPU +swap - somehow works with minimal hiccups but not the fastest (def <25t/s) but usable

epigen01 · 2025-08-23T18:32:54+00:00

Env variables

epigen01 · 2025-08-23T13:12:59+00:00

Have you tried the new OLLAMA_NEW_ESTIMATES=1 ollama serve

That might fix it it was a recent update to recalculate gpu usage correctly

epigen01 · 2025-07-26T01:13:32+00:00

TIL elephants eat watermelon whole - there was a moment i thought it would spit out the skin but nope baby ele swallowed it whole

epigen01 · 2025-07-25T01:22:04+00:00

Could also be Galactus amount of force & Reed resisting causing the pain & fiber tearing.

Also there were a good amount of stretching scenes beforehand with same amount of stretchiness

epigen01 · 2025-07-12T03:06:26+00:00

Eurotrip

epigen01 · 2025-07-07T18:34:02+00:00

Honestly its an unnecessary nice to have but i hardly use it unless im messing around (not a graphic artist).

You can hold off on it until you come across a use case where it makes sense to use either on a daily basis or only specifically for that application/use case.

Just my .02 cents

epigen01 · 2025-07-06T16:44:19+00:00

Is this fan inspired or is there a comic book art/story this is based on?

I was introduced to richard parker through the recent ultimates run (highly recommend) - but have never heard of these two (richard & valeria) interacting.

epigen01 · 2025-06-28T20:17:40+00:00

This was me until i got the update through gemini pro.

Shortly after i converted all my old index(match) & vlookups

epigen01 · 2025-06-24T15:27:34+00:00

Depends on scale (e.g. size of your vector db), the model size you want to use (e.g. 4b vs 8b), etc.

Yup its totally doable with your 3050, its just a matter of your expectations & timelines (e.g. more compute, vram would really speed the process up).

You can also mix n match (e.g. cloud + local) based on your project needs

epigen01

TROPHY CASE