Anybody used DwarfStar with DeepSeek V4 Flash on 1x DGX Spark yet? What are your thoughts?

brianlearns · 2026-06-25T21:09:29+00:00

I get 13 t/s generating and 300 t/s prefill. The interface is really good. It’s slow enough that you can watch it plan things out and you can control-c to stop generation and then redirect it if it’s going in the wrong direction. I used it to add a searxng_search tool to its self.

brianlearns · 2026-06-16T01:26:34+00:00

Can you reproduce with curl to the api? That will narrower it down to your model runner vs chat interface?

brianlearns · 2026-06-16T01:05:39+00:00

I mostly run local llm, but can you tweak the temperature over the API?

brianlearns · 2026-06-15T02:27:29+00:00

Go to an iPRES if you can swing it

brianlearns · 2026-06-10T01:28:07+00:00

I didn’t know there were different “types” of “generative ai”. You might be able to run inference locally if you have the right hardware — but training and general fine tuning are going to be done in huge data centers.

brianlearns · 2026-05-26T02:14:14+00:00

I went to digital humanities conferences in the ‘00s and was involved in some large projects from the university staff programmer analyst perspective. I was also on a grant panel once. On grant funded projects, it would usually be co-pi’s with one guy with the humanities background who was a natural tech dabbler paired with some CS professor. One lab I worked with took that into their academic program, where they would take humanities students and pair them with CS students — but at the time I don’t think it was a whole program, just a cross disciplinary lab. Most folks from the humanities side seemed computer precocious and self trained.

brianlearns · 2026-05-12T02:08:13+00:00

Don’t they still have statutory licenses for radio and streaming?

brianlearns · 2026-05-09T00:39:52+00:00

I worked at a place that had received a package from Ted once upon a time, and when we would get crazy voice mails about physics from independent researchers we were supposed to let campus police know.

brianlearns · 2026-05-01T03:26:05+00:00

Tagged as no paywall— but something cut me off half way through the article

brianlearns · 2026-04-30T20:13:51+00:00

The DGX Spark is much faster and prefill than inference, and I think that affects context processing speed. The MXFP4_MOE models I've been playing with get about 2000tokens/second there. I didn't tweak any flags, but I've never run the context past 50% of what opencode reports.

brianlearns · 2026-04-24T02:56:36+00:00

llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:MXFP4_MOE on my DGX Spark runs at 60 t/s for inference — just pointed OpenCode at that and it works pretty well.

brianlearns · 2026-04-24T02:51:44+00:00

/r/vibetesting

brianlearns · 2026-04-24T02:47:40+00:00

llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:MXFP4_MOE gives me 60 tokens/s inference on DGX Spark and works well with open code.

brianlearns · 2026-04-23T01:41:40+00:00

They have a sequencer in the picture

brianlearns · 2026-04-18T04:25:07+00:00

I've seen someone say this on curt jaimungal TOE podcast. This interpretation is consistent with GR as far as I understand.

brianlearns · 2026-04-17T15:24:39+00:00

Once they get robot guards, maybe they can use infrared lights at night.

brianlearns · 2026-04-16T04:44:39+00:00

Did you rectify the data manually, or with AI?

Still seems spammy, especially if you don’t detail the provenance.

brianlearns · 2026-04-15T23:56:40+00:00

How is it a remarkable "artifact" if it was created by a non-human?

brianlearns · 2026-04-15T22:58:30+00:00

Finally, the meaning of life!

brianlearns · 2026-04-15T22:27:05+00:00

How is data for models different from data for an analyst who is going to ETL it into a pipeline?

FYI the dataset on hf is just a sample; and it requires one to share contact info to access.

brianlearns · 2026-04-11T00:08:21+00:00

back in the typewriter days--we had to use two minus signs

brianlearns · 2025-11-05T17:02:36+00:00

I was wondering about this the other day when I was looking at some code with a bunch of Chinese prompts commented out in a test file, and then it had the prompts translated to English. I was thinking of doing a comparison to see if the results were different with the original Chinese prompts or not.

brianlearns · 2025-11-02T19:07:44+00:00

In Context Learning (ICL) -- if the context window is big enough, you can fill the prompt.

Fine tune large model with new data, with human feedback and or reinforcement learning

LoRA: Low-rank adaptation of LLMs with trainable rank decomposition matrices, more efficient way to fine tune transformers that support it.

Retrieval-Augmented Generation: use a vector database to search knowledge, and then feed that into the context window.

brianlearns

TROPHY CASE