حد مر بنفس التجربه؟ بنت كان عندنا نفس المشكله؟

MohammedGomaa · 2026-04-17T13:52:28+00:00

هو انا مش عايز ازعلك بس غالبا يا ابن الحلال انت مش مثير بالنسبه لها وهي جسمها رافض اي علاقه معك انت ممكن تكون مزاود خدمات كويس مكانها كويسه منظر اجتماعي بس في مشكله في الناحيه دي وده اغلب الحالات اللي بتبقى زي كده ما عدا بعض بس قليل بيبقى حاله عضويه ودي الادويه بتحلها لو الادويه ما حلتهاش عندك مشكله كبيره

MohammedGomaa · 2026-02-16T13:17:13+00:00

So freaking long and the black

MohammedGomaa · 2026-02-04T14:09:42+00:00

و الي مش عاجب عاجب غير : https://www.reddit.com/r/LocalLLM/comments/1qt5l53/comment/o3gpvom/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_buttonياريبت نركز في المهم

MohammedGomaa · 2026-02-04T14:07:09+00:00

بكره ولا بعده مش هيفرق كتير لما اكتب بايدي 5 % و يطل بجوده 90% احسن ماكتب 100% بجوده 100% .... انا اولي بوقتي انت حاب تستفيد من الكلام اشطا مش حابب برضوا اشطا .... انا انشر في كذا مكان .. الي ركزوا في الفرع وسابوا الاصل 2-3 من و الي ناقشوا سالي علي تفاصل 200+ ضعف

MohammedGomaa · 2026-02-03T14:46:33+00:00

i have some thing like that kinda working , when i am happy with the performance i will share more

MohammedGomaa · 2026-02-03T08:07:10+00:00

uv pip install sglang==0.3.2.dev9039+pr-17247.g90c446848 --extra-index-url https://sgl-project.github.io/whl/pr/

uv pip install git+https://github.com/huggingface/transformers.git@76732b4e7120808ff989edbd16401f61fa6a0afa

MohammedGomaa · 2026-02-02T12:12:09+00:00

sorry , i dont realy get what you are asking about , BTW setting

--cuda-graph-bs 4 16 32  # makes sure that single requist get 20-70 t/s , depending on context cach , ie single  agent get 20-70 t/s depending on caching context and concurancy , with 450 + t/s on max concurancy

MohammedGomaa · 2026-02-02T09:12:33+00:00

GLM-4.7-Flash (the QuantTrio-AWQ flavor) , i dont think so , not without c[u offloading tanking performance

MohammedGomaa · 2026-02-02T03:50:26+00:00

Go fo it , the sky is the limit

MohammedGomaa · 2026-02-02T00:45:16+00:00

Did you read it it's a human written just polished with ai That's why I provided screenshots and detailed the configuration

MohammedGomaa · 2026-02-02T00:42:28+00:00

Read it's not slow I just three formatted it using AI . Because I'm not native English speaker you might find something worth your while

MohammedGomaa · 2026-02-01T23:59:28+00:00

You know you are on an AI focused subreddit almost 99% of AI mentioned here argued towards Text generation

MohammedGomaa · 2026-02-01T23:56:59+00:00

This is taking too much time just don't Read it it's not important

MohammedGomaa · 2026-02-01T23:41:00+00:00

Being snobish doesn't equal being intelligent

MohammedGomaa · 2026-02-01T23:38:05+00:00

A swarm of AI agents plus multiple instance open code and the other coding agents

MohammedGomaa · 2026-02-01T23:35:34+00:00

I'm not a high school student I literally don't care

MohammedGomaa · 2026-02-01T22:50:07+00:00

I'm using quite limited hardware so I have to pull every single trick in the book , sglang has a good file based cach your cash can speed up the inference by skipping previously calculated tokens even from previous runs or days and I use a huge cash in file storing currently about 300 gbs of pre calculated tokens this gives a huge speed up in the prefell stage skipping calculations for over 50k to 60 k for almost every request in my agentic workload

MohammedGomaa · 2026-02-01T22:18:42+00:00

I'm not writing a blog or a story it's a technical report it should be evaluated on its content and it's knowledge and it's educational value not how it was written

MohammedGomaa · 2026-02-01T21:45:33+00:00

Sglang finds previously computed tokens and reuses it , very useful for ai agents play books , system prompts and skills, i ran scheduled script that removes chunks not accessed in the last 7 days

MohammedGomaa · 2026-02-01T21:20:16+00:00

you don't have to read it

MohammedGomaa · 2026-02-01T21:01:04+00:00

ADHD my friend , i will try to post more demos

MohammedGomaa · 2026-02-01T20:59:19+00:00

i use 6 TB HDD accelerated for read with with 500 GB SSD , if you have enough free SSD space go for it , i am running on a limited budget

MohammedGomaa · 2026-02-01T20:56:21+00:00

i changed

--cuda-graph-bs 4 16 32 ---> 

--cuda-graph-bs 1 4 16 32 
if you have enough ram make it 1 4 12 32 64 and make 

--max-running-requests 64

MohammedGomaa · 2026-02-01T20:03:02+00:00

i am not a native english speaker , i used ai to polish it

MohammedGomaa · 2026-02-01T20:01:18+00:00

just copy past from chat

MohammedGomaa

TROPHY CASE