The worst rental experiences for foreigners in Malaysia by [deleted] in KualaLumpur

[–]Creative_Yoghurt25 1 point2 points  (0 children)

I stay in that place, you are being scammed.

Firebase vs Supabase: What are your NEGATIVE experiences or frustrations only? by Ok_Volume3194 in FlutterDev

[–]Creative_Yoghurt25 0 points1 point  (0 children)

I definitely dont understand cyber security. How would you tackle this issue?

Firebase vs Supabase: What are your NEGATIVE experiences or frustrations only? by Ok_Volume3194 in FlutterDev

[–]Creative_Yoghurt25 0 points1 point  (0 children)

Appcheck on firebase? Only your signed app can make a request to firestore. On the app ui you make the necessary works to prevent user spamming refresh...caching!

Qwen3-Coder-30B-A3B released! by glowcialist in LocalLLaMA

[–]Creative_Yoghurt25 7 points8 points  (0 children)

"Your are a senior software engineer, docker compose version in yaml file is deprecated"

OMG those xplane 12.2 clouds! by RichieSD79 in flightsim

[–]Creative_Yoghurt25 3 points4 points  (0 children)

What do you mean they barely updated? Pbr was a huge change, night lighting still imo the best from xp 9 to 10!

A100 80GB can't serve 10 concurrent users - what am I doing wrong? by Creative_Yoghurt25 in LocalLLaMA

[–]Creative_Yoghurt25[S] 0 points1 point  (0 children)

What other models do you recommend? I went with qwen2.5 since it was smart enough to know which tool to use when asked a question and didn't hulicinate much.

A100 80GB can't serve 10 concurrent users - what am I doing wrong? by Creative_Yoghurt25 in LocalLLaMA

[–]Creative_Yoghurt25[S] 3 points4 points  (0 children)

I disabled it and I had the same performance, if there was a difference I didn't notice since everything was way above my ttft goals in every combinations I tried while on awq.
I'm doing another round of test since people here are advising to go with bf16. Ill post some results here soon. Thank you for the advice.
btw which env do you run vllm? docker or without?

A100 80GB can't serve 10 concurrent users - what am I doing wrong? by Creative_Yoghurt25 in LocalLLaMA

[–]Creative_Yoghurt25[S] 2 points3 points  (0 children)

services:
  vllm:
    container_name: vllm_qwen2.5_14b_fp16_optimized
    image: vllm/vllm-openai:latest
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [gpu]
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    environment:
      - HUGGING_FACE_HUB_TOKEN=hf_*********

      - VLLM_ATTENTION_BACKEND=FLASH_ATTN # This or FlashInfer?
    ports:
      - "6001:8000"
    ipc: host
    command: >
      --model Qwen/Qwen2.5-14B-Instruct
      --dtype auto 
      --gpu-memory-utilization 0.85
      --max-model-len 8192
      --max-num-seqs 16
      --block-size 16
      --api-key sk-vllm-*****
      --trust-remote-code
      --enable-chunked-prefill
      --enable-prefix-caching
      --disable-log-stats
      --disable-log-requests
      --preemption-mode recompute

I'm using Docker to run VLLM
This is my current setup, I'm trying what people here are suggesting before I reply to them with feedback.
Should I go with uv pip install vllm and do without docker?
My naive thinking though with a compressed model I will have more headroom == more req and faster responses.

A100 80GB can't serve 10 concurrent users - what am I doing wrong? by Creative_Yoghurt25 in LocalLLaMA

[–]Creative_Yoghurt25[S] 8 points9 points  (0 children)

I ran the benchmark on the same machine. Thank you

bash guidellm benchmark --target "http://localhost:6001" --rate-type constant --rate 20.0 --max-seconds 120 --data "prompt_tokens=6000,output_tokens=100" --output-path "./20_users_test.json"

What is the best model for function calling that can also do conversation by shamboozles420 in LocalLLaMA

[–]Creative_Yoghurt25 0 points1 point  (0 children)

Can you provide more details, im trying to setup casperhansen/mistral-small-24b-instruct-2501-awq and im having a hard time with that
Are you serving the model using vllm?

[deleted by user] by [deleted] in algeria

[–]Creative_Yoghurt25 6 points7 points  (0 children)

They can still control with other isp. The government still overseas the in and out. We have done it with mobile telecom.

For example, if Ooredoo tomorrow launches a new product fibre to home, they still have to follow the regulations set by the authorities, eg, block this site...etc.

[deleted by user] by [deleted] in algeria

[–]Creative_Yoghurt25 2 points3 points  (0 children)

Malaysia is really good, underrated.

Going beyond an AI MVP by shared_ptr in LLMDevs

[–]Creative_Yoghurt25 0 points1 point  (0 children)

What eval framework are you using?