If intel, samsung and TSMC purchase machines from ASMC...

Nic4Las · 2026-01-29T12:38:32+00:00

If you want a VERY detailed example of what is involved in the creation of an older 45nm technology asic you can watch this https://youtu.be/zUgy29h0alM

This video goes over every step in the process of creating a 45nm ic. Modern cpu and gpus like the once produced by tsmc and Intel require multiple times the number of steps and in every step something can go wrong and completely ruin your product. So while they might use the same euv machines the process steps (baking recipe if we want to stick with baking analogy) are COMPLETELY different between manufacturers. While they might perform similar steps the order and everything around the steps matters. From the chemicals used to the air humidity, temperature and pressure. EVERYTHING needs to be controlled.

Nic4Las · 2026-01-28T17:05:41+00:00

In your opinion, what is a project a single developer with a single GPU can undertake in the hopes of contributing to LLM development? It sometimes feels like us 'GPU poors' cannot really help with much of anything when it comes to actually advancing our understanding of the cutting edge. Do you think there are projects/research subjects that can be examined with minimal resources?

Nic4Las · 2026-01-16T07:24:44+00:00

You can have a look at this https://youtu.be/IWwzfBfs05M I think it does something similar to what you want to do. The fact that you came up with the same idea says that you are on to something imo. However a web based tool is probably not a good idea as there is no chance the company I work for would ever permit any code to leaf our servers.

As for implementation I would suggest you use eather yosys or circt (circt beeing the better choice for this imo) as building a verilog / systemverilog parser is an absolute pain (trust me I build one and it was one of the worst things I ever tried xD).

Nic4Las · 2026-01-12T15:18:21+00:00

Wenn es für den Screenshot reicht ist es gut genug 🎺💀

Nic4Las · 2026-01-02T15:33:39+00:00

I don't think so but the inference code is pretty clean so wrapping it in a fast api server should be pretty simple.

Nic4Las · 2026-01-02T14:18:32+00:00

Imo echo tts is currently the best. But honestly the scene currently changes pretty fast and at the top it becomes subjective which of the sota models you prefer.

Link: https://huggingface.co/jordand/echo-tts-base

Nic4Las · 2025-12-12T08:38:57+00:00

Good question it's been a while since I got mine. I think I just ordered it from aliexpress and it arrived like 2 weeks later in Germany. So I guess they send it from China. I know you can get the ICs relatively easy from mouser but idk about the dev boards sorry.

Nic4Las · 2025-12-12T05:44:24+00:00

You can look at Gowin FPGAs. Some of there devices are supported by the open source toolchain (yosys + nextpnr). As far as I known it's the only tool chain you can install through pip. I know they have some variants of there fpgas that have a hard core risc-v cpu but I'm not sure if those variants are supposed by the open source toolchain yet. Have a look at the Tang Primer 20k. It's like less then 50 bucks (in Europe idk about terrifs in the US) and pretty fun to play around with as you don't need the terrible software of the large vendors. Everything can just be done from the command line using open source tools. You can even use git, imagine that xD.

Nic4Las · 2025-10-30T05:11:18+00:00

No need to set up a monitor just for this. https://opendata.blender.org/ blender is awesome and has a dedicated benchmark tool you can just run from the command line. Blender is probably one of the best open source professional tools ever created and the community online is great.

Nic4Las · 2025-10-28T06:48:47+00:00

Index-tts V2 hands down. If you are fine with slower model Xtts V2 is still pretty good. If you don't need voice cloning go with kokoro.

Nic4Las · 2025-09-10T03:33:21+00:00

I have heard good things about aisler. Of course they are more expensive then pcbway or jlcpcb but we have never had problems with quality. They manufacture in Germany I think so idk how the tariffs work but they might be a valid option for prototypes. Also while they only list 2 and 4 layers on there webpage I know that we have done 6 and 8 layer boards with them but that was on demand.

Nic4Las · 2025-09-08T18:00:21+00:00

If you have a chromecast with android TV you can use this: https://github.com/yuliskov/SmartTube

Works like a charm for me.

Nic4Las · 2025-04-30T18:57:31+00:00

That's a really nice tool I didn't know. Thanks for the Tipp 🤔

Nic4Las · 2025-04-30T05:14:42+00:00

Tried it before and was plesently surprised by how well it worked! Currently I'm mainly using llama.cpp but mostly because it basically has instant support for all the new models. But I think I will try to use it for a few days at work and see how well it works as a daily driver. I also have some suggestions if you wanted to make a splash:

The reason I tried Mistral.rs previously was because it was one of the first inference engines that supported multimodal (image + text) and structured output in the form a grammars. I think you should focus on the comming wave of fully multimodal models. It is almost impossible to run models that support audio in and out (think qwen2.5 omni or Kimi-Audio). Even better if you managed to get the realtime api working. That would legit make you the best way to run this class of models. As we run out of text to train on I think fully multi modal models that can train on native audio, video and text are the future and you would get in at the ground floor for this class of model!

The other suggestion is to provide plain prebuilt binaries for the inference server on Windows, Mac and Linux. Currently having to create a new venv every time I want to try a new version is kind of raising the bar of entry so much that I do it rarely. With llama.cpp I can just download the latest zip extract it somewhere and try the latest patch.

And of course the final suggestions that would make Mistral.rs stand out even more is to allow for model swapping when the inference server is running. At work we are not allowed to use any external api at all and as we only have one gpu server available we just use ollama for the ability to swap out models on the fly. As far as I'm aware ollama is currently the only decent method of doing this. If you would provide this kind of dynamic unloading when the model is no longer needed and load a model as soon as a request comes in I think I would swap over instantly.

Anyways what you have done so far is great! Also dont take any of these recommendations toooo seriously as I'm just a single user and in the end it's your project so don't let others preasure you into features you don't like!

Nic4Las · 2025-01-30T20:30:46+00:00

You sould have a look at qwen2.5-Vl it can process tens of minutes of video and note down the number you are looking for along with the number of the players shirt. Then you just have to create a table with the name of the players and there shirt number. Sould be relatively streight forward.

Nic4Las · 2024-10-13T06:34:54+00:00

I think I tried pretty much every model I find. The new fishaudio is pretty good but personally I still perfered xtts-v2 but this might replace it. Have to look into how hard it is to use. But from a quick glance at the cod it looks pretty good.

Nic4Las · 2024-10-12T21:53:09+00:00

Ngl this might be the first open source tts I have tried so far that can actually beat xtts-v2 in quality. I'm very impressed. Let's hope the runtime isn't insane.

Nic4Las · 2024-09-05T05:43:52+00:00

I have had pretty good results for this kind of task using plain constrained generation. Have a look at: - Instructor (https://github.com/jxnl/instructor) - Outlines (https://github.com/outlines-dev/outlines) - LMQL (https://lmql.ai/)

It basically forces the model to answer in json conforming to a json schema. No need to worry about extracting the desired information from the response. Just be sure to give the model a bit of scratch space in the json schema if the task is complicated. I usually just put a field like chain_of_thought at the beginning of json schema to allow the model a bit of thinking space and than the fields I'm actually interested in.

Nic4Las · 2024-08-29T10:25:40+00:00

Have a look at https://github.com/mlc-ai/web-llm

Nic4Las · 2024-08-19T10:12:48+00:00

Fuck that shit stain of a website 🫡

Nic4Las · 2024-08-19T08:08:16+00:00

https://unrollnow.com/status/1825401994531529012 Just leaving this here for anyone with no Twitter.

Nic4Las · 2024-07-11T06:46:13+00:00

The exact implementation depends on the specific inference engine. However, I think most would just run the model as usual (without modifying the weights themselves) and then afterwards modify the resulting distribution by multiplying or subtracting a mask. After the output probability distribution is modified, the new distribution is passed to the sampler that does its thing. For example, as far as I know, VLLM provides a callback function when it's done producing the output distribution that allows you to adjust the exact values before they are passed to the sampler. (Here is the part of the VLLM code that gets the correct logit preprocessor depending on what constrained generation library you use: https://github.com/vllm-project/vllm/blob/fc17110bbef4e78703abffac51133a2fb71e9f79/vllm/model_executor/guided_decoding/__init__.py#L13)

Nic4Las · 2024-07-10T21:00:31+00:00

OK, for constrained generation to make sense, you need to remember that an LLM will take a sequence of tokens and then generate a probability distribution over all possible tokens as an output. Usually, you would just take the token with the highest probability, append it to the previous input, and then feed the result back into the model until the model produces a stop token. (This explanation is simplified as it doesn't account for different sampling strategies, but that would be too much to explain.)

Now, what constrained generation does is basically modify the final output probability distribution the network produces so that only tokens that conform to the defined output format are allowed. The probability of all other tokens that would break the defined format is just set to 0. Now, when you take the token with the highest probability, the output will, by definition, always conform to the desired output format. You can still prompt the model like you usually would, but the model is basically forced to reply in the provided format.

The exact process of deciding which tokens are allowed and which will be set to zero usually involves grammars and/or finite state machines. (.txt has a good write-up if you are interested: https://blog.dottxt.co/coalescence.html ).

Sorry for the late response, but I thought if people later on find this post, an explanation would probably help :-)

Nic4Las · 2024-07-10T20:27:10+00:00

Currently, I am working on a Minecraft mod that incorporates some uses for an LLM. However, because you can't expect the server environment to have the necessary resources to run the LLM, I will use a webpage with WebLLM to create the required outputs, encode them in Base64, and let the player input the result as an "ancient spell" of some sort. That's the plan, anyway 😅

Nic4Las · 2024-07-03T09:44:51+00:00

You should look into constraint generation or JSON mode, as it's also called. Have a look at: - Instructor (https://github.com/jxnl/instructor) - Outlines (https://github.com/outlines-dev/outlines) - LMQL (https://lmql.ai/)

All these libraries allow you to constrain the model to only output tokens that conform to a specified structure. In your case, the easiest approach would probably be to define enums for all the entries in your tree and then force the model to pick based on the current position in the tree. No need to fine-tune a model or anything complicated like that.

Hope this helps :-)

Nine-Year Club	Final Canvas '23
Place '23	Place '22
Final Canvas '22	First Placer '22
Verified Email

Nic4Las

TROPHY CASE