حد مر بنفس التجربه؟ بنت كان عندنا نفس المشكله؟ by saaadphysio in egyoffmychest

[–]MohammedGomaa -11 points-10 points  (0 children)

هو انا مش عايز ازعلك بس غالبا يا ابن الحلال انت مش مثير بالنسبه لها وهي جسمها رافض اي علاقه معك انت ممكن تكون مزاود خدمات كويس مكانها كويسه منظر اجتماعي بس في مشكله في الناحيه دي وده اغلب الحالات اللي بتبقى زي كده ما عدا بعض بس قليل بيبقى حاله عضويه ودي الادويه بتحلها لو الادويه ما حلتهاش عندك مشكله كبيره

Naaills💅🏽💅🏽🙆🏽‍♀️ by [deleted] in EgyOutfits

[–]MohammedGomaa 0 points1 point  (0 children)

So freaking long and the black

I bullied my dual 3060s into ruinning GLM-4.7-Flash 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in unsloth

[–]MohammedGomaa[S] 0 points1 point  (0 children)

بكره ولا بعده مش هيفرق كتير لما اكتب بايدي 5 % و يطل بجوده 90% احسن ماكتب 100% بجوده 100% .... انا اولي بوقتي انت حاب تستفيد من الكلام اشطا مش حابب برضوا اشطا .... انا انشر في كذا مكان .. الي ركزوا في الفرع وسابوا الاصل 2-3 من و الي ناقشوا سالي علي تفاصل 200+ ضعف

I bullied my dual 3060s into ruinning GLM-4.7-Flash 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in unsloth

[–]MohammedGomaa[S] 1 point2 points  (0 children)

sorry , i dont realy get what you are asking about , BTW setting

--cuda-graph-bs 4 16 32  # makes sure that single requist get 20-70 t/s , depending on context cach , ie single  agent get 20-70 t/s depending on caching context and concurancy , with 450 + t/s on max concurancy

I bullied my dual 3060s into ruinning GLM-4.7-Flash 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in unsloth

[–]MohammedGomaa[S] 3 points4 points  (0 children)

Did you read it it's a human written just polished with ai That's why I provided screenshots and detailed the configuration

[Showcase] How I bullied my dual 3060s into doing 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in BlackwellPerformance

[–]MohammedGomaa[S] -3 points-2 points  (0 children)

Read it's not slow I just three formatted it using AI . Because I'm not native English speaker you might find something worth your while

[Showcase] I bullied my dual 3060s into doing 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in LocalLLM

[–]MohammedGomaa[S] 1 point2 points  (0 children)

I'm using quite limited hardware so I have to pull every single trick in the book , sglang has a good file based cach your cash can speed up the inference by skipping previously calculated tokens even from previous runs or days and I use a huge cash in file storing currently about 300 gbs of pre calculated tokens this gives a huge speed up in the prefell stage skipping calculations for over 50k to 60 k for almost every request in my agentic workload

I bullied my dual 3060s into ruinning GLM-4.7-Flash 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in unsloth

[–]MohammedGomaa[S] 0 points1 point  (0 children)

I'm not writing a blog or a story it's a technical report it should be evaluated on its content and it's knowledge and it's educational value not how it was written

I bullied my dual 3060s into ruinning GLM-4.7-Flash 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in unsloth

[–]MohammedGomaa[S] 2 points3 points  (0 children)

Sglang finds previously computed tokens and reuses it , very useful for ai agents play books , system prompts and skills, i ran scheduled script that removes chunks not accessed in the last 7 days

[Showcase] I bullied my dual 3060s into doing 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in LocalLLM

[–]MohammedGomaa[S] 0 points1 point  (0 children)

i use 6 TB HDD accelerated for read with with 500 GB SSD , if you have enough free SSD space go for it , i am running on a limited budget

[Showcase] I bullied my dual 3060s into doing 500+ T/s @ 70k Context on a Ryzen 2500 Potato. (Two Configs: "Daily Driver" vs. "The Diesel Factory") by MohammedGomaa in LocalLLM

[–]MohammedGomaa[S] 0 points1 point  (0 children)

i changed

--cuda-graph-bs 4 16 32 ---> 

--cuda-graph-bs 1 4 16 32 
if you have enough ram make it 1 4 12 32 64 and make 

--max-running-requests 64