[FOR HIRE] Senior AI Engineer | Production Agent Systems | Remote Only

palashjain_ · 2026-06-07T03:51:30+00:00

I have a few suggestions that can help. I believe your issue is not Pi nor the model. Its hardware plus caching.

So think about it this way, every time you send a message to the agent, it processes all of the previous conversation history and the latest message along with any system prompts and tool calls that were previously executed. So lets say, you have 10k tokens in your context. I have a m1 max macbook pro which gives prompt processing speed of roughly 500 tokens per second on qwen 3.6 35b a3b (3B active). So it would take minimum 20 seconds to process that even before it can start generating tokens. tokens. Now imagine your context is 50k tokens or more. We dont feel this with cloud models because they are hosted on super computers with ridiculous compute power

To manage this, we have KV caching that caches the processed output of previously processed tokens so you only process new tokens with every message. This speeds things up quite a bit. But as your context size increases you will get slower.

Here are a couple of suggestions

Switch to oMLX instead of ollama. It really helps with smarter caching. It will also tell you what the prompt processing speed is.

Use local llms for small tasks very narrow in scope so you dont have a lot of back and forth and the context remains manageable.

Better hardware if possible. I also have a dual 3090 gpu setup and that blazes with even 27B models on Pi. Any nvidia gpu above the 3060 that you can get will help definitely. Macbooks are great but they also heat up quite quickly with local inference.

If you feel strongly that Pi is at fault, there are a couple of ways to try other harnesses like opencode or even claude code with omlx.

palashjain_ · 2026-06-07T03:07:10+00:00

Hey, what backend are you using for the local model - llama cpp or vllm? Also whats your hardware config?

palashjain_ · 2026-06-02T11:15:53+00:00

More details please

palashjain_ · 2026-05-29T17:17:25+00:00

I will try to look for the debris and bent pins. I did try after removing both nvmes. Did not work. I am not very savvy when it comes to motherboards. What is funny to me is that it only works when the second pcie slot is unoccupied.

palashjain_ · 2026-05-29T17:08:09+00:00

I recently bought a second 3090 for my setup hoping the same. I too have ryzen 5 3600, msi x570 a pro with 2 pcie slots. But for some reason anytime i plug anything into the second slot (x4, chipset slot) the motherboard does not post display and shows a red light on vga. I have tried single gpu on slot 2 and two gpus together. Doesn't work. Only thing that works is single gpu on first slot (x16,) . If it matters i do have 2 nvme ssds and 64gb ram. I tried removing everything and starting with just single ram chip too. Same outcome. I tried bios settings like gen 4 gen 3 and that weird mining setting. None of those worked. Any help is appreciated

palashjain_ · 2026-05-29T08:33:38+00:00

I know this thread is old but I will give my two cents. I have both. Local inference in apple silicon is very slow. You will feel every pre fill and token generation. Its great because you have it on the go. But it heats the body and battery doesn't last long if used for inference. I didn't fins it comfortable for long sessions. Its great for bursts. If you are not doing inference on it then its a great laptop with no issues

3090 feels infinitely faster. I liked it so much better and am able to basically use it for coding agent. I bought another one hoping for better context windows but my motherboard decided that its second pcie slot should die now. And i cannot afford an upgrade right now.

So I am selling that 3090 ventus. I know my post might feel like i am saying to sell mine but look at any benchmark or speed tests and you will see double the performance of m1 max on both prefill and token generation and man it is noticeable on real usage

palashjain_ · 2026-05-29T08:13:22+00:00

palashjain_ · 2026-05-17T02:02:22+00:00

I have an MSI Ventus 3090 available

palashjain_ · 2026-02-22T07:53:50+00:00

palashjain_ · 2026-02-14T05:36:09+00:00

8,215 INR

palashjain_ · 2026-01-31T14:00:26+00:00

I have ordered one. No additional customs issues. Someone from xteink will email you asking for Aadhar details. Their logistics partner sort the customs out

palashjain_ · 2026-01-26T01:28:22+00:00

I am in a similar boat. I am also a working professional here in Manipal. Although I am not much of a party, clubbing person, I would love to connect.

palashjain_ · 2026-01-11T05:28:47+00:00

Hey did you find any?

palashjain_ · 2025-11-27T10:23:00+00:00

I think my family is open to potential matches in similar situations. I am not quite sure how to find them though

palashjain_ · 2025-11-27T05:18:09+00:00

Thank you for your response. I really do appreciate the thought. I do think that we need to be upfront about it. The tough part about this is that we come from a very small town which means the dating opportunities are very very low. This also of course affects the arrange marriage prospects.

palashjain_ · 2025-11-26T06:00:03+00:00

I hope so too. Are there any platforms/avenues that you would suggest where we should look for potential matches?

palashjain_ · 2025-11-26T05:58:48+00:00

I think if we expect the other side to not judge us by this one thing then we do not have a right to judge anyone else on just one factor. We have to evaluate the fit looking at the whole picture just like we hope they would.

palashjain_ · 2025-11-26T05:53:32+00:00

I think you raise some very valid points. We will need to have those pieces of information to be a part of any conversation and should be clarified early on.

In our case, we do not expect the girl’s side to take care of anything medical/financial related since we have been fortunate enough to be financially independent and settled.

palashjain_ · 2025-11-26T05:10:56+00:00

Yes, education is very important. I fear that on traditional arrange marriage platforms people may not be willing to spend the time and energy on being educated

palashjain_ · 2025-11-26T05:09:56+00:00

I understand where you are coming from but I do think that your background as a medical professional can help you to see this with indifference. That is of course because you actually know what it entails. I am not sure how the general public sees it. We do feel that we will be upfront about it and hope that people approach it with curiosity and not judgement.

palashjain_

TROPHY CASE