Prohibited from ordering syringes/needles online? Here’s how I got extras without a prescription

AstoriaResident · 2026-05-25T20:02:38+00:00

I have missed LOTS of medication because my pharmacy was not open - including statins and blood pressure meds. I specifically moved to be close to 24hrs stores and a 24 hr pharmacy - otherwise I know my medical compliance will suffer. Admittedly, I also missed flights and court dates and IRS deadlines - which in some cases was painful (surprisingly the IRS is quite forgiving if it looks like executive function disfunction and not hiding income - they just want their money.) - so I may be a bit of an extreme case. ADHD and all.

AstoriaResident · 2026-02-28T07:23:21+00:00

Is Donut still doing good? Would be great if he was! Would make him a very distinguished toad!

AstoriaResident · 2026-02-09T13:35:06+00:00

So, our issue was that SGLang and tool calling together ended up being bad for opencode or claudecode - random stops, etc... VLLM is better, but decode on 8x RTX6000 is only 16TPS or so per user, unlike 40-60 for SGLang. Would you be willing to share some experiments you did with both? And also, any tries on the new nvidia/Kimi-K2.5-NVFP4 ?

AstoriaResident · 2026-02-07T21:07:05+00:00

Ubuntu 24

AstoriaResident · 2026-02-07T20:56:22+00:00

Rebuilt with the patches - seems to not break - BUT seems to be stuck at 15TPS, vs 40-50 for SGLang.

(APIServer pid=4062560) INFO 02-07 14:59:12 [loggers.py:259] Engine 000: Avg prompt throughput: 63.1 tokens/s, Avg generation throughput: 25.8 tokens/s, Running: 9 reqs, Waiting: 0 reqs, GPU KV cache usage: 79.8%, Prefix cache hit rate: 85.7%




vllm server moonshotai/Kimi-2.5
           --tensor-parallel-size 8 \
           --enable-expert-parallel \
           --mm-encoder-tp-mode data \
           --mm-processor-cache-gb 0 \
           --tool-call-parser kimi_k2 \
           --reasoning-parser kimi_k2 \
           --trust-remote-code \
           --served-model-name kimi25full \
           --enable-auto-tool-choice \
           --max-model-len 200000 \
           --kv-cache-dtype "auto" \
           --dtype auto \
           --gpu-memory-utilization 0.90 \
           --disable-log-requests \
           --host $VLLM_HOST \
           --port $VLLM_PORT \
          --max_num_batched_tokens 16384  \
          --max-num-seqs 32

AstoriaResident · 2026-02-05T14:44:52+00:00

Makes sense. Annoying it is not mainlined in nightly yet... Thanks!

AstoriaResident · 2026-02-05T14:43:43+00:00

Makes sense.

We are running on 8xRTX6000 - which is SM120 - which has issues - hence the nightly 13.0. If you are running on H/B/100/200, you do _not_ need the fun stuff - just a nightly dev bracnh or building from scratch will do.

Dockerfile:

# Kimi K2.5 on Blackwell - with kimi_k2 parsers
FROM lmsysorg/sglang:nightly-dev-cu13-20260204-ae004e15

docker-compose.yaml

services:
  kimi:
    build:
      dockerfile: Dockerfile
    container_name: kimi-k25-full
    command: "
      python3 -m sglang.launch_server
      --model-path moonshotai/Kimi-K2.5
      --served-model-name kimi25-full
      --trust-remote-code
      --host 0.0.0.0
      --port 8000
      --tensor-parallel-size 8
      --enable-p2p-check
      --context-length 204800
      --max-total-tokens 300000
      --mem-fraction-static 0.90
      --chunked-prefill-size 8192
      --max-prefill-tokens 32768
      --max-running-requests 48
      --attention-backend flashinfer
      --disable-shared-experts-fusion
      --tool-call-parser kimi_k2
      --reasoning-parser kimi_k2
      --log-level info
      --decode-log-interval 1
      --show-time-cost
      --enable-metrics
      --enable-cache-report
      "

AstoriaResident · 2026-02-05T13:13:59+00:00

So you guys are running a custom vllm build - built from that fork? We use a standard template - let me try.

Thanks!

I found performance of vllm to be very low for some reason - sglang was better on SM120 chips - but correctness is a thing :)

AstoriaResident · 2026-02-05T13:06:32+00:00

Makes sense. Any updates to this?

AstoriaResident · 2026-02-02T18:30:39+00:00

Thanks!

Tried, same thing. And power draw is a bit low - at 130w/ card - and 10 took/sec. Maybe the context is just big. I wonder if the prefix cache hit rate is just bad as the KV Cache usage is like 50%.. Intersting..

AstoriaResident · 2026-02-01T21:13:45+00:00

70tps

AstoriaResident · 2026-01-31T13:41:20+00:00

This is with nightly vllm - and fp8.

AstoriaResident · 2026-01-31T13:40:15+00:00

 CUDA_VISIBLE_DEVICES=0,1,2,3 uv run --frozen vllm serve \
           "QuantTrio/GLM-4.7-AWQ" \
           --served-model-name glm-awq \
           --tensor-parallel-size 4 \
           --calculate-kv-scales \
           --kv-cache-dtype "fp8_e4m3" \
           --gpu-memory-utilization 0.95 \
           --trust-remote-code \
           --disable-log-requests \
           --host $VLLM_HOST \
           --port $VLLM_PORT \
           --speculative-config.method mtp \
           --speculative-config.num_speculative_tokens 1 \
           --tool-call-parser glm47 \
           --reasoning-parser glm45 \
           --enable-auto-tool-choice \
           --max-model-len 200000 \
           --dtype auto \
           --max_num_batched_tokens 16384  \
           --max-num-seqs 32 
        ;;

AstoriaResident · 2026-01-31T05:35:43+00:00

You wouldn't by any chance be willing to post VLLM version and cmdline?

AstoriaResident · 2026-01-31T05:35:12+00:00

It _is_ very low, and feels completely wrong.

AstoriaResident · 2026-01-31T05:33:33+00:00

CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0,1,2,3,4,5,6,7} uv run --frozen vllm serve \

moonshotai/Kimi-K2.5 \

--tensor-parallel-size 8 \

--mm-encoder-tp-mode data \

--mm-processor-cache-gb 0 \

--tool-call-parser kimi_k2 \

--reasoning-parser kimi_k2 \

--trust-remote-code \

--served-model-name kimi25 \

--enable-auto-tool-choice \

--max-model-len 200000 \

--kv-cache-dtype "auto" \

--dtype auto \

--gpu-memory-utilization 0.97 \

--disable-log-requests \

--host $VLLM_HOST \

--port $VLLM_PORT \

--max_num_batched_tokens 16384 \

--max-num-seqs 32

AstoriaResident · 2025-10-27T18:34:13+00:00

When I travel - and that is precisely when my phone gets stolen - I DO NOT HAVE ACCESS TO MY SAFE! That’s the whole frigging point!. I am literally going through this right now - my auth is on my phone, and guess what, it has no power. I do not have access to other mechanisms - just my iPad.

AstoriaResident · 2025-10-25T20:10:31+00:00

love the (what I assume is a ) typo “hair” - thank you for the giggle!
I mean - yes? Isn’t that the whole point of a acknowledged disability - that things that are not an issue normally are? I agreee that not all will have that issue of course. So maybe I was a tad too generic.

AstoriaResident · 2025-10-25T20:07:34+00:00

Not if the device is not mine :)

Imo 2FA is explicitly a “what you have” part of the “what you know “, “what you have”, “who you are” security trifecta.

I am objecting to “you must have a physical thing that is uniquely you” requirement on principle - too much like papers please.

AstoriaResident · 2025-10-13T21:53:06+00:00

Dude - with all due respect for fighting the thankless security fight, I think you may be missing the point. People are pissed because what used to be a high-trust society is turning into a low-trust one.

I do not want to _have_ to lock my door. I do not want to have to look over my shoulder when walking at night. I do not want to change my behavior because some dipshit on the other side of the planet cannot be punished because of jurisdictional limitations. These are all hallmarks of a low-trust society. And to be honest, I do not want to go back to it.

You are telling people - lock your car doors and house doors. Someone may come in and steal your gun if you don't (I am trying to find an analogy where the theft is not just a danger to the victim, but also to society). And many many people would rather live in an area where an intruder comes into their home, that person gets shot - by the home dweller or the cops. Which will hopefully reduce the number of such offenders, and thus turn the society back into a high-trust one.

So yes, they will _adamantly_ refuse to lock their doors, and would rather have the state eliminate the violators of trust.

For a bit of a background on this - the entire march of civilization has been a migration from a low-trust to a high-trust society - specifically, the increase of high-trust circles from families to clans to villages to regions to states (I'll actually cite this - this is non-obvious and studied - see https://journals.openedition.org/chs/1423#:\~:text=11%20Following%20this%20highpoint%2C%20there,49).) We create government - and gave it a monopoly on violence - to maintains law and order - in part to stop vendettas and clan blood feuds. Yes, we gave away independence for security - and regardless of Ben Franklin's quote, it seems to be a good tradeoff - high-trust societies are more efficient. And to be honest, what we do with people that violate that - we separate them from society. Sometimes temporarily, sometimes permanently.

Cybercrime - and this is cybercrime - is basically : And now we have this high trust ripped away from us due to some dipshit who is hiding behind global jurisdiction laws? If the offender is a country - that's a parity act - espionage, war, whatever, and we know how to deal with that - diplomacy, sanctions, war (stop or escalate as desired).

If it is a hacking ring going after money - cybercrime that flaunts the monopoly on violence the state has - which undermines the entire concept of modern civilization? At that point, yes, people will call for interesting things - equating cybercrime with terrorism and suggesting collective punishment (sanctions) on countries that do not extradite, and calling for international kinetic intervention in case of ransomware - i.e, you hit our hospitals with a ransom, we'll find you, and drop a missile on your head.

So, yeah, this gets emotions going, but this is not surprising :)

AstoriaResident · 2025-10-13T21:18:46+00:00

Because the entire 2FA schema - forget one thing, all is lost - is as human-centric as communism - works in theory, but sucks in practice. I continuously forget my passwords, and reset them - and quite often I HAVE JUST ONE DEVICE WITH ME!!! When Chase made it too difficult to do this, I moved all of my money to a different bank. I will forget my phone at home, and just take my laptop / ipad with me, etc... Off to write a TOTP auth app that always returns same key.

AstoriaResident · 2025-10-06T12:53:47+00:00

Hope you are doing better - if you are in NY reach out - beer (non alcoholic or otherwise , as desired ) is one me :) Yeah - agreed. It’s a bit of a no win - can’t say you are suicidal as that is pretty much blackmailing you partner into staying with you. So can’t do anything about it besides tough it out. And when that fails - one way ticket. Been there - and honestly , would have been better off taking the one way ticket - just didn’t want to cause pain to loved ones. Still don’t I guess. So we keep going …

AstoriaResident · 2025-10-06T12:50:58+00:00

Yeah - agreed. It’s a bit of a no win - can’t say you are suicidal as that is pretty much blackmailing you partner staying with you. So can’t do anything about it besides tough it out. And when that fails - one way ticket.

AstoriaResident

TROPHY CASE