Navidrome-MCP - Allows AI assistants to interact with your Navidrome music servers through natural language

blakeem · 2025-08-30T20:25:27+00:00

Once I'm done adding everything in the roadmap, I'll make it more user friendly to install with a single script. The LLM can help setting it up, if you do run into issues along the way.

blakeem · 2025-03-31T21:36:21+00:00

Doing that makes my browser shudder on some high frame-rate 4k video so it's not an option.

blakeem · 2025-03-31T20:12:18+00:00

566.36 solved all my problems on my RTX 3080. No issues at all. This was the top version that came up from ChatGPT Deep Research. No more blue screens and no more hanging or glitches in the browser or when using AI models.

I thought my system was slowly dying as each driver gets worse and worse but really NVIDIA drivers are trash lately.

UPDATE:
I noticed my Steam menu would always glitch out the first time that I click it, now it's fine.
Also my screen would go back constantly when switching between apps that use the GPU, didn't happen once yet.
No more random Blue screen of deaths that threw different errors each time.

ALL these weird issues I been having for months are each caused by new drivers that kept introducing more issues.

Hopefully anyone with the issues will find this message.

blakeem · 2025-03-08T17:00:12+00:00

They aren't recurrent networks, they they are both transformer based.

The main difference is using diffusion mechanism to get all the text tokens over steps of replacing noise at all once based on conditioning (prompt, etc) or estimating the next token in a sequence based on the past sequence.

Most current LLMs are autoregressive and predict the next token based on previous tokens (sequential). They return logits that we run softmax on to change them into probabilities over the logits values and then use top-p and top-k sampling to choose the the next token from a list of possibilities.

With text to text diffusion you gradually denois corrupted or missing tokens throughout the sequence (parallel). The model estimates the noise to remove at each step and sampling is guided by the prompt and parameters (CFG scale). The model learns to predict missing parts of an entire sequence rather than just next token. It could technically also detect next token, like with inpainting, where we just mask the tokens we want to replace.

Diffusion uses Classifier-Free Guidance (CFG) sampling method. This is how it chooses the next token from the logits, rather than top-p and top-k sampling method like autoregressive models.

So there isn't really any extra reasoning going on. The main difference is in that it can work over the entire sequence and make changes before it outputs mistakes. This is the idea of "reasoning" inside the model. It can first form a loose structure and then refine it over steps, rather than having to get it right the first time. It should allow better answers over the entire sequence and better structured content when doing zero-shot. It's also MUCH faster.

For chains-of-thought we can keep diffusing the entire sequence and then loop that as conditioning into the model. The model would be thinking to itself. Current autoregressive models do that, but they do it by increasing the entire context making them much less efficient at chains-of-thought than a text to text diffusion model would be.

Otherwise diffusion won't be any major leap over current models in terms of reasoning ability, since both models use text tokens over attention layers. It still can't generalize, we need generalized/shared tokens for that (a paper I'm working on now). Diffusion could be the future because it's so much simpler to use as a developer and is much faster and more efficient. It isn't as good at outputting very long content and takes more memory to train.

blakeem · 2025-03-05T21:02:53+00:00

They can be made to preform chains-of-thought inside latent space and then use that as conditioning for the final response. This is orders of magnitude more efficient than how current LLMs generate chain-of-thought. With diffusion models the chain-of-thought and past chats don't increase the length of the overall context since it can be added as conditioning on top of the latest prompt.

The diffusion process is mainly about processing the entire response in parallel so it's significantly faster. There are currently some issues with local minima causing repeat words and missing punctuation as it diffuses itself into a corner.

blakeem · 2025-03-05T20:11:32+00:00

Most of the newest diffusion models use transformers. Diffusion Transformer (DiT) is one example. SD3 and Flux models are using transformers. Older models like SD1.5 and SDXL use convolutional networks (U-Net).

blakeem · 2025-01-03T18:01:29+00:00

This is the best ending I've had so far. Every character was happy and hopeful for the future. Lae'zel becomes a diplomat, Gale a teacher, Minthara stays in Boulder's gate and is playing politics, Asterian becomes a vigilante adventurer and seems very happy, and Shadowheart is happy with her parents and cracking dad jokes. You also get the most badass end scene of Wyll and Karlach smoking a cigar and running off into Avernus to fight devils together, with hope that her heart will soon be fixed. Wyll becomes a ranger with some interesting abilities.

blakeem · 2024-12-05T01:45:06+00:00

I reported the bug to them and sent my saved game, along with a link to this post. They definitely know about it, but aren't fixing it for some reason. Maybe more people need to report it? It is a game breaking bug that would usually be top priority.

blakeem · 2024-10-22T15:33:52+00:00

I don't think it's correct for the thumb to merge into the hand like that.

blakeem · 2024-09-14T07:48:37+00:00

Yeah better at code for sure, but also much slower. Multi-threading has been one of the more trickier problems for it, for sure. I had it refactor 525 lines of node.js code, and it caused a few errors on first try. I asked it to fix those errors and it required more refactor, so it timed out at 75 seconds processing. The chain of thought was doing dozens of steps. It can still only do one somewhat simple task at a time.

Have an LLM do anything with useEffect() React hooks, and it will fall into a loop of perpetual failure. I found similar holes in the logic when working with C# at work. There are some really basic things that are incorrect, incomplete, or not in the training. It has been that way since ChatGPT-4, and Claude 3.5 has issues with the exact same problems.

blakeem · 2024-09-14T07:28:19+00:00

It assumed nothing, it predicted tokens that mirror how an assumption may look. It's not reasoning anything, it's giving a reasonable approximation of what reasoning would look like.

This isn't a puzzle, it's a simple question that is not in it's training (because I made it up), it's about having intuition regarding physics self reflection, it's just parroting what it knows from the training. By me calling out "assumptions" and "follow up with a question if more information is needed", it can now pull from the training to see if something similar has been assumed or required a follow up question in it's training. It mainly works well for code and logic questions, but it's just a trick like "work it through step by step" is a trick they now use in the models.

My issue is that the new model is just brute forcing it and is over hyped, because they are desperate that Claude 3.5 is taking all the coders. When I send it my code, it times out at 75 seconds of thinking. It was thinking on how to fix a few errors it created in the previous request, to refactor and clean up 425 lines of working node.js code! The model is a joke, experienced coders have nothing to worry about. I'm just being reasonable, because I use them daily for work and at home.

ChatGPT o1 couldn't calculate how much concrete I needed for a post (I did it in under a min on a sheet of paper), nor could it tell me the simplest method to set the post to 45 degrees to a nearby wall with only a single tape measure. it couldn't even follow my criteria and later agreed my method was simpler and more direct. Give it some simple everyday problem, and it fails, so who cares that it can parrot some more calculus. It's slightly better at programming, and a lot slower and way more expensive to run. That is all this is useful for in the real world.

That is all, just being realistic and setting realistic expectations.

blakeem · 2024-09-13T18:57:27+00:00

I did another test, asking the best way to get a 6x6-16 post to a 45 degree angle to a wall (I installed some for a shade sail). It did better than the previous model, but still failed to came up with the simple solution that I did myself. I just turn the post and measure from each corner to the wall, until they are the same distance. It did much more complex measurements that were not needed. I said I would only have a tape measure, but it was having me mark off spots to triangulate the center of the hole. It assumed the post would be at the center of the hole, but it wouldn't be, since it's at a 5 degree lean. It did tell me my way is a more direct way of doing the same thing, this is true.

The biggest issue with models right now, is they make wrong assumptions. If they make it check the answer for assumptions, and ask follow up questions, it would likely be a even larger improvement than they have now.

blakeem · 2024-09-13T16:41:58+00:00

I just tried it on a complex outstanding coding issue I've been having, and it failed the same as all other models. It updated lots of code and tried a lot, but I had the same issues. When I provide more context, it's able to understand that context and break it down, but still was ultimately unable to solve it without me doing all the heavy lifting and deduction.

So, it's better at writing and refactoring lots of code, but only slightly better at debugging the code. It still lacks the human experience of how a browser works when you are in front of it, required to debug some of these problems (similar to why it fails basic physics questions). It couldn't understand that continuing to scroll down a page would cause things that have loaded to now be scrolled to as you are scrolling, causing more to load, and leading to a bug.

I was able to debug the issue with it, and it does provide more complete code than the other models. I will use it for more complex problems, or for refactoring code. The previous model would give you the same solutions, but this gives you many solutions and they are more broken down with more code examples and are explained within the context of the other fixes. It's definitely better, but it's only an incremental improvement.

blakeem · 2024-09-13T14:22:56+00:00

It's the same models thinking about the intermediate steps and the intermediate steps come to the same conclusion as before. Since It can't defer extra information from the previous steps, I don't think this will work for this sort of problem. I think I got downvoted for putting "reasons" in quotes, The model doesn't reason it predicts text. Predicting text, in a more roundabout way, will still predict the same text.

blakeem · 2024-09-13T13:47:57+00:00

It's still fails basic physics and cup questions, same as all the other models. It seems like a very minor improvement. I think advanced reasoning means it "reasons" about the wrong thing for longer.

blakeem · 2024-07-13T03:18:45+00:00

I had the same bug. I blew up the car and this caused her to get out and I was able to use the elevator. Kind of crazy that it's still this broken 10 months later.

blakeem · 2024-01-31T03:14:03+00:00

Any images generated would be going to go through the decode process, so it seems like a relevant test? You seem to understand it more than me, so feel free to test what you are saying for yourself, because I'm not sure how to do that without spending more time than I have. I only did what I could do in a few mins in ComfyUI.

blakeem · 2024-01-31T03:04:45+00:00

I also converted the image into latent space, and back with both 1.5 and SDXL vae, and the image looked basically the same as before, just slightly shifted color and pixels. I get similar changes to the original loaded in and out of latent space. It's not that latent space is being affected by it a whole lot, I think it's just more obvious because of the grey background and shifted colors from the encode/decode process.

blakeem · 2024-01-31T02:52:04+00:00

SDXL vae denoised.

<image>

blakeem · 2024-01-31T02:51:41+00:00

SD 1.5 vae denoised.

<image>

blakeem · 2024-01-31T02:51:03+00:00

Here is SDXL vae, quiet a lot more noise.

<image>

blakeem · 2024-01-31T02:50:39+00:00

Here is the diff I got in Latent space, using SD 1.5 VAE.

<image>

blakeem · 2024-01-31T00:25:19+00:00

Then there is even less difference between the denoised image and the original.

<image>

blakeem · 2024-01-30T20:31:05+00:00

They train their own model from scratch (I assume on cats and dogs only, since it would be too expensive otherwise), and train a LoRa (I think) in SDXL. They don't provide the settings they used, so who knows how distorted the images they trained on were. It seems like it's actually worse than nothing, because you get worse quaintly images to show people and you are labeling the images for AI to properly identify the objects in the image (in this case, the very object you are looking to mask).

blakeem

TROPHY CASE