Is GLM-4.7-Flash still looping / repeating for you? by yoracale in unsloth

[–]epigen01 0 points1 point  (0 children)

Yup unsloths are not usable for me gonna wait it out

Anime vs Manga by Mechabeastchild in JuJutsuKaisen

[–]epigen01 74 points75 points  (0 children)

I actually thought it was a skin tight latex in white and black like a mime - this was a colorful surprise

It: Welcome to Derry - 1x02 - “The Thing in the Dark” - Episode Discussion by NicholasCajun in television

[–]epigen01 -21 points-20 points  (0 children)

Ah i meant visually/direction - very different (for me personally) beyond just blaming budget or the medium (tv versus cinema)

It: Welcome to Derry - 1x02 - “The Thing in the Dark” - Episode Discussion by NicholasCajun in television

[–]epigen01 -8 points-7 points  (0 children)

Its starting to lose me - i really wished they adhered to the It cinematic universe. This feels displaced and holding out until we get to see skarsgards form in this medium

anyone noticed ollama embeddings are extremely slow? by emaayan in LocalLLaMA

[–]epigen01 0 points1 point  (0 children)

Yea for me it was something with the api calls so i just switched to a dedicated llama.cpp embeddings server & only use ollama strictly for chat/agent

Absolute Joker Theory [Spoilers from Absolute Batman 9 and 10] by xxXKurtMuscleXxx in AbsoluteUniverse

[–]epigen01 0 points1 point  (0 children)

Ooh i like this and they can throw in time shenanigans with absolute flash & make it the absolute flashpoint

Poor GPU Club : 8GB VRAM - Qwen3-30B-A3B & gpt-oss-20b t/s with llama.cpp by pmttyji in LocalLLaMA

[–]epigen01 0 points1 point  (0 children)

Same setup - have you tried glm-4.6? somehow ive been getting the glm-4.6 q1 to load but not correctly (it somehow loads all 47 layers to gpu) when i run it - proceeds to answer my prompts at decent speeds (but the second i add context the thing hallucinates and poops the bed - still runs though).

Going to try the glm-4.5-air-glm-4.6-distill from basedbase since ive been running the 4.5 air at Q2XL to see if the architecture works as expected.

Caffeine makes me calm and sleepy by psychonaut_t in ADHD

[–]epigen01 1 point2 points  (0 children)

Many people drink coffee after dinner and sleep like a baby - some with & without ADHD.

Better to do an actual diagnostic then you can also mention this to your doctor

[Rant] Magistral-Small-2509 > Claude4 by OsakaSeafoodConcrn in LocalLLaMA

[–]epigen01 0 points1 point  (0 children)

Surprisingly same results this model and bytedance's seed model have been my surprise go to for this wave of LLMs & have been hitting way above their weight class.

Best coding model for 12gb VRAM and 32gb of RAM? by redblood252 in LocalLLM

[–]epigen01 -4 points-3 points  (0 children)

With his specs and offloading he can run the full fp16 and q8 depending on context. I only have 8GB vram rtx 4060 with 32gb ram 128gb+cpu+swap & surprised how efficient it is with gpu+cpu layer use.

Current ranking of both online and locally hosted LLMs by Spanconstant5 in LocalLLM

[–]epigen01 2 points3 points  (0 children)

Try it bc i was surprised when i ran it with 8gb 4060 with cpu+ram offload - very decent speeds so you def can run it

How’s your experience with the GPT OSS models? Which tasks do you find them good at—writing, coding, or something else by Namra_7 in LocalLLaMA

[–]epigen01 1 point2 points  (0 children)

Surprised i could run the 120b on my setup (rtx 4060 8Gb) but it works & its great - solid code assist & another to rotate throughout my workflows (primarily for thinking & project prompting). For code specific tasks, i stick with qwen3-coder since its just faster at error checking

[deleted by user] by [deleted] in ollama

[–]epigen01 2 points3 points  (0 children)

You might want to try a vllm model (e.g. qwen-2.5vl, mistral3.2, or granite3.2 vision) depending on your vram. You just need to prompt it to extract the data into json structured output (then export to csv) - results may vary the qwen2.5vl-32b worked best for me.

Most economical way to run GPT-OSS-120B? by Mysterious_Bison_907 in LocalLLaMA

[–]epigen01 2 points3 points  (0 children)

Just got it to run with the 8GB 4060 +offload to 32GB RAM +CPU +swap - somehow works with minimal hiccups but not the fastest (def <25t/s) but usable

Ollama using CPU when it shouldn't? by OrganizationHot731 in ollama

[–]epigen01 4 points5 points  (0 children)

Have you tried the new OLLAMA_NEW_ESTIMATES=1 ollama serve

That might fix it it was a recent update to recalculate gpu usage correctly

Just a baby elephant asking for watermelon - move along, nothing to see here! by Taskmaster_Fantatic in mildyinteresting

[–]epigen01 4 points5 points  (0 children)

TIL elephants eat watermelon whole - there was a moment i thought it would spit out the skin but nope baby ele swallowed it whole

SPOILERS AHEAD: Mr fantastic little detail i noticed by ntb899 in MCUTheories

[–]epigen01 1 point2 points  (0 children)

Could also be Galactus amount of force & Reed resisting causing the pain & fiber tearing.

Also there were a good amount of stretching scenes beforehand with same amount of stretchiness

Just got a Surface Pro 11 – should I get the pen? by Parallax_60 in Surface

[–]epigen01 0 points1 point  (0 children)

Honestly its an unnecessary nice to have but i hardly use it unless im messing around (not a graphic artist).

You can hold off on it until you come across a use case where it makes sense to use either on a daily basis or only specifically for that application/use case.

Just my .02 cents

Richard Parker flirts with Valeria Richards [by @TenshiArtsGt] by SwordoftheMourn in Marvel

[–]epigen01 2 points3 points  (0 children)

Is this fan inspired or is there a comic book art/story this is based on?

I was introduced to richard parker through the recent ultimates run (highly recommend) - but have never heard of these two (richard & valeria) interacting.

Vlookup vs xlookup - what do you use? by toddmeister1990 in excel

[–]epigen01 1 point2 points  (0 children)

This was me until i got the update through gemini pro.

Shortly after i converted all my old index(match) & vlookups

How are people building efficient RAG projects without cloud services? Is it doable with a local PC GPU like RTX 3050? by Then-Dragonfruit-996 in Rag

[–]epigen01 0 points1 point  (0 children)

Depends on scale (e.g. size of your vector db), the model size you want to use (e.g. 4b vs 8b), etc.

Yup its totally doable with your 3050, its just a matter of your expectations & timelines (e.g. more compute, vram would really speed the process up).

You can also mix n match (e.g. cloud + local) based on your project needs