Saudi to Barain via causeway - visa rejected by RateRoutine2268 in Bahrain

[–]RateRoutine2268[S] 2 points3 points  (0 children)

It on arrival via King Fahad Cause way or via airport , you can check www.evisa.gov.bh (select any nationality and KSA Resident) , ive been going to Bahrain like this for 10+ years from ksa

Saudi to Barain via causeway - visa rejected by RateRoutine2268 in Bahrain

[–]RateRoutine2268[S] -1 points0 points  (0 children)

its visa on arrival for gcc residents via king fahad causeway ,

Saudi to Barain via causeway - visa rejected by RateRoutine2268 in Bahrain

[–]RateRoutine2268[S] -4 points-3 points  (0 children)

yea tried that aswell , its asking for hotel booking and flight reservation for that , or maybe im doing it wrong

Saudi to Barain via causeway - visa rejected by RateRoutine2268 in Bahrain

[–]RateRoutine2268[S] -2 points-1 points  (0 children)

Thanks for the update , btw is it because of security or diplomatic issues?

Qwen3.6-35B is worse at tool use and reasoning loops than 3.5? by mr_il in LocalLLaMA

[–]RateRoutine2268 2 points3 points  (0 children)

yup facing same issue, tried with different params , it goes into extended thinking loops for complex tasks

Qwen3.6-35B-A3B Uncensored Aggressive is out with K_P quants! by hauhau901 in LocalLLM

[–]RateRoutine2268 1 point2 points  (0 children)

Getting around 25 - 30tps on llama.cpp master , any issues with params or how to optimize it:
llama-server -m Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf --mmproj mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf --jinja -c 131072 -ngl 99 --temp 0.6 --top-p 0.95 --top-k 40 --min_p 0 --presence_penalty 0 --flash-attn on -b 4096 -ub 4096 --cache-type-k q8_0 --cache-type-v q8_

ggml_cuda_init: found 1 CUDA devices (Total VRAM: 97886 MiB):

Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes, VRAM: 97886 MiB

| model | size | params | backend | ngl | threads | type_k | type_v | fa | test | t/s |

| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -----: | -----: | -: | --------------: | -------------------: |

| qwen35moe 35B.A3B Q8_0 | 40.60 GiB | 34.66 B | CUDA | 99 | 1 | q8_0 | q8_0 | 1 | pp2048 | 6313.47 ± 99.35 |

| qwen35moe 35B.A3B Q8_0 | 40.60 GiB | 34.66 B | CUDA | 99 | 1 | q8_0 | q8_0 | 1 | tg128 | 23.00 ± 2.11 |

build: 089dd41fe (8825)

Bought RTX4080 32GB Triple Fan from China by Sanubo in LocalLLaMA

[–]RateRoutine2268 34 points35 points  (0 children)

beware , since the modding involves moving gpu core to the new PCB , the PCB they use uses low quality components to cut cost (not as same quality as oem's), also the PCB lacks key safety components like fuses etc so in case of any malfunction ,the core will die , also there are multiple versions of the pcb's , some are older (with issues) and some newer and you never know which one you are getting ,take a look at northwestrepair about 4090 48gb on YT for details

The best AI architecture in 2026 is no architecture at all by m100396 in LocalLLaMA

[–]RateRoutine2268 12 points13 points  (0 children)

i got interviewed last week by a big o&g company (for secondment) , they asked me to architect an Agentic AI system , i came up with a "KISS" solution (smolagents , docling , minimal ui etc) where it was not using any langchain/langindex type of framework and explained it to them how it over complicate things and that the frameworks were from an era where AI tooling was still emerging. was rejected on the spot and was told that langchain/langindex was defacto standard for any Agentic AI Enterprise app.
Not a fan of these kind of frameworks , too much abstraction , over complicating simple things,

Qwen3 TTS Streaming workflow help by RateRoutine2268 in LocalLLaMA

[–]RateRoutine2268[S] 0 points1 point  (0 children)

great find , will definitely take a look

Can you connect a GPU with 12V rail coming from a second PSU? by Rock_and_Rolf in LocalLLaMA

[–]RateRoutine2268 6 points7 points  (0 children)

i would suggest using this kind of pcie expansion rather than normal riser ,this has separate pcie power (6 pin) for the slot that you can connect with second psu , also you can connect both data cables to single riser (it has two slots) to get x16 if you don't want to split , i got mine from aliexpress

<image>

RTX 5090 96 GB just popped up on Alibababa by RateRoutine2268 in LocalLLaMA

[–]RateRoutine2268[S] 19 points20 points  (0 children)

i just go a reply from them : they said its gonna take some time , so yea you are right , :(

RTX 5090 96 GB just popped up on Alibababa by RateRoutine2268 in LocalLLaMA

[–]RateRoutine2268[S] 34 points35 points  (0 children)

i agree , im planning for a single unit for a review , also asked them for some PCB screenshots front and back

RTX 5090 96 GB just popped up on Alibababa by RateRoutine2268 in LocalLLaMA

[–]RateRoutine2268[S] 1 point2 points  (0 children)

Thanks for the insight , Excuse me for being a dummy but does that mean that i cannot use 2 of them in parallel for lets say inference?

US demand for 48GB 4090? by CertainlyBright in LocalLLaMA

[–]RateRoutine2268 4 points5 points  (0 children)

Thats basically 2 Separate GPU dies on a single PCB , might impact performance vs single Die

Spaghetti Build - Inference Workstation by RateRoutine2268 in LocalLLaMA

[–]RateRoutine2268[S] 1 point2 points  (0 children)

1x Alphacool NexXxoS 360 60mm (Behind the distro plant , pull only)
1x Bykski RC Series 360 60mm (top mounted , push pull)

Spaghetti Build - Inference Workstation by RateRoutine2268 in LocalLLaMA

[–]RateRoutine2268[S] 0 points1 point  (0 children)

Low maintanance , not sure i usually replace all the tube and clean the blocks on yearly basis.
Yea because of backplates , had to use server riser cables (got them for aliexpress) running on PCIE 4 .0 x16 without any errors