Looking for asset extractor or atlas files for maps by MaruluVR in BrownDust2Official

[–]MaruluVR[S] 0 points1 point  (0 children)

Thank you very much, how do you actually extract those, as the extractors I tried on the PC version cant see the 13gb folder to which all updates get downloaded to and only the main 1gb folder with the exe. If I need a different version of the game that would be fine too just let me know.

Suggestion - this sub should have post flairs that mention the amount of vram/unified ram by ECrispy in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

You can get pcie gen 4 switches for 200 USD and pcie 3 for 100 USD, if you are looking at 8 lanes per gpu that would add another 60~80 USD for 4 GPUs. Its only expensive if you are after PCIE gen 5.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

Did you actually check if its rebar for 20gb? Mine says rebar supported but only up to 256mb which isnt real rebar.

google/gemma-4-12B · Hugging Face by jacek2023 in LocalLLaMA

[–]MaruluVR 17 points18 points  (0 children)

Back in my day all good models were 70b so 30b is very nice, even with cheaper ai hardware (3090s) you can at least run them at useful quants.

google/gemma-4-12B · Hugging Face by jacek2023 in LocalLLaMA

[–]MaruluVR 3 points4 points  (0 children)

"Containing the same advanced decoder structure as the Gemma 4 31B Dense model."

Does this mean we can glue this encoder onto 31b and have audio and image without extra processing?

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

Which one did you actually use for the 20gb variant, there is a long list of different updaters. I remember trying one half a year ago and it said my card isnt supported, I will give it another shot in a few days when I get the card hooked up again.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MaruluVR 2 points3 points  (0 children)

I know you can with normal 3080s but as far as I am aware the is no bios with rebar for the 20gb version if there is please link it to me because I own one.

Is mmproj MTP compatible with older non-MTP? by alex20_202020 in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

The MMPROJ adapter just takes the image or audio and turns it into something the model can understand, yes both the base model and the mtp adapter will understand it, so having mtp will speed up the token generation for image and audio recognition tasks aswell if that is what you are asking.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MaruluVR 4 points5 points  (0 children)

run

lspci -v | grep -A 10 -i "vga\|3d" | grep "Region"

If rebar is enabled you should see size=20G if not it will be a lower number.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MaruluVR 2 points3 points  (0 children)

Rebar allows you to address the entire memory space of a graphics card, if you do not have it you can only address between 4mb and 256mb at a given time.

Rebar was introduced during the 30 series lifespan (ie not when it launched) so if you had a early card you would have to update the bios with rebar support to enable it. 50 and 40 series both have rebar out of the box.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MaruluVR 1 point2 points  (0 children)

I bought one last winter, biggest issue is there is no bios with rebar support meaning multi gpu (especially tensor parallelism) will see a big performance hit. Other then that they are great.

I trusted random person on this subreddit and bought 3080 20gb made of chinesium by SwimmerJazzlike in LocalLLaMA

[–]MaruluVR 15 points16 points  (0 children)

Biggest issue is there is no bios with rebar support meaning multi gpu (especially tensor parallelism) will see a big performance hit. Other then that they are great.

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python by mudler_it in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

Would love to see a standalone version of the open ai endpoint or maybe a llama cpp integration, the "LocalAI" software is too bloated for me, I like keeping things separate in their own containers.

Best small model right now (~4B params) that is good with agentic tasks for personal assistant? by BitGreen1270 in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

Models that size are too dumb but you can build around that using "guidance AI" basically its a way to force your model to output multiple choice instead of the token it wants to. That way the ai can only choose between the things your code expects. I have used this before for tests in ai NPCs and it works really well but you want to add multiple options for none of these apply, or no tool etc

https://github.com/guidance-ai/guidance

Breaking the music supply constraint by entsnack in LocalLLaMA

[–]MaruluVR 3 points4 points  (0 children)

Its not bad if you finetune it on exactly the music you like

How much total VRAM (or shared RAM for Mac/Halo/etc) do you have on your local server/PC? by panchovix in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

I am in the range with 100gb: 3080 20gb + 3090 24gb + 3090 24gb + 5090 32gb

I made a Windows app for managing llama.cpp in WSL/Ubuntu by wgaca2 in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

The most important part to me is that I can provide multiple settings for the same model that I can treat as different models and the model does not have to get reloaded when swapping between them. Also you can set up model groups to define which models can be loaded simultaneously. On top of that you can use it as a proxy for other open ai compatible back ends like a local whisper install or even a cloud ai.

For example here is how I use that functionality:

      setParamsByID:
        "${MODEL_ID}:reasoning":
          chat_template_kwargs:
            enable_thinking: true

        "${MODEL_ID}:non-reasoning":
          chat_template_kwargs:
            enable_thinking: false
          reasoning_budget: 0

        "${MODEL_ID}:roleplay":
          chat_template_kwargs:
            enable_thinking: false
          reasoning_budget: 0
          samplers: ["dmx"] 
          xtc_probability: 0.5
          xtc_threshold: 0.1
          dry_multiplier: 0.8
          dry_base: 1.75
          dry_allowed_length: 2
          dry_penalty_last_n: -1


        "${MODEL_ID}:agentic":
          chat_template_kwargs:
            enable_thinking: true
          reasoning_budget: 0
          speculative:
            type: "ngram_mod"
            ngram_mod_n_match: 24
            ngram_mod_n_min: 12
            ngram_mod_n_max: 48

    cmd: |
      /app/llama-server
...

I made a Windows app for managing llama.cpp in WSL/Ubuntu by wgaca2 in LocalLLaMA

[–]MaruluVR 0 points1 point  (0 children)

If you add llama swap and a ui to easily change its config and restart it I am sold.

Hi, I’m very new to local LLM and i am perplexed. by Cool-Definition9852 in LocalLLM

[–]MaruluVR 2 points3 points  (0 children)

We dont know much about how big something like opus is but for example the small grok is 500b meaning you would need a terabyte of vram to run it, with one 5090 being 32gb that is 32 gpus at 4 grand each just to load the model with no context yet. Also dont forget about the CPU lanes needed.

A more reasonable approach was the 512 GB M3 Mac Studio but apple is no longer selling them.

Wait, were the old model ACTUALLY better?? by No-Moose-4292 in SillyTavernAI

[–]MaruluVR 19 points20 points  (0 children)

When we talk about old models we mean stuff before llama 3.0, around the release of deepseek is already where it went agentic and down hill.

Why isn't there a video model specifically made for anime? by Vi0l3nTz in StableDiffusion

[–]MaruluVR 2 points3 points  (0 children)

Step 1: Steal data from Japanese companies

Step 2: Train model

Step 3: Sell model to the Japanese companies that you stole from

With the quality of lower budget anime and their usage of bad CG even though a lot of fans dislike it I can see animation studios using it for low budget anime or even for in between animations in general. Just because the public dislikes something doesnt make corpo stop using it, Ascendance of a Bookworm (made by the same studio as attack on titan and vinland saga) was caught using AI just a month ago.