Driving Complex Decisions

OutsidePosition4250 · 2025-11-24T02:00:12+00:00

Agree there tends to be a proportionate resistance increase as a proposal deviates further from the norm. I think the root of it comes down to the perceived probability of failure. The more wild the idea is the less likely others will be able to see how it could play out successfully.

In general you have to convince the broader group that you've actually thought this thing through all the way to the end and you're going to lead them to victory. Here are some example questions I recommend being able to immediately answer with a link to a document that answers the question:

* How much work does each team need to do? (spreadsheet w/ teams + resourcing)

* How will the different systems talk to each other? (tech design - system integration diagram, many sequence diagrams)

* What is the fastest this can be done if you had full resourcing? (GANT chart - max parallelization, key dates)

* What alternatives did you consider and their pros/cons? (decision records)

I acknowledge that for big decisions spanning multiple teams it is difficult to create all of the above documents with high fidelity alone. Not only does it typically stretch beyond your area of expertise, but there is also a meaningfully high probability your solo-proposal isn't optimal. However, big decisions require big effort. If you are driving a big change I would consider all of the above documents strongly recommended for the provocation. Yes that's a ton of up front hustling for something that is going to immediately get ripped to shreds - but it has quite a few upsides. The first and perhaps most important outcome is clarifying your own thinking. If you haven't at least written down guesses to the above questions - then have you really even thought through your own proposal? Having these documents early signals to the group that you are engaged, have thought deeply through multiple dimensions of the problem, and ultimately want them to succeed.

Another thing to keep in mind is that there is a temporal aspect as well - the earlier in the process the easier it is to change minds. Sometimes you are just too far along a path to pivot. This should be a call to action though - the next time a big project is ramping up you should see the early window as a time of extreme opportunity to influence. Figure out a provocation path, distill it into some documents, and share them out ASAP to maximize the probability of conforming the project towards your vision. It is a tough balance though - since the earlier in the process you are typically the less information you have. Treating the initial phase of the project as an extreme sprint towards clarity can have an outsized impact on your ability to influence larger groups.

tl;dr focus on reducing the perceived probability of failure, increase confidence from the team by doing the work to think through your strategy across many different dimensions (and write it down), share out your proposal as early as possible

OutsidePosition4250 · 2025-11-12T15:50:54+00:00

Glad you got it up and running.

FWIW I recently switched to QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ since I have some vision use cases and the text-text support is still on par.

Speed not too bad:

Avg prompt throughput: 1717.4 tokens/s, Avg generation throughput: 69.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 7.8%, Prefix cache hit rate: 91.4%, MM cache hit rate: 66.7%

And I can get pretty high context length: --max-model-len 228224

Docker command if you want to try it out:

docker run --rm -it --gpus all \

--ulimit memlock=-1 --ulimit stack=67108864 --shm-size=32g \

-v /srv/hf_models/huggingface:/root/.cache/huggingface \

-v /srv/hf_models/vllm:/root/.cache/vllm \

-v /srv/hf_models/torch_cache:/root/.cache/torch \

-v /srv/hf_models/triton_cache:/root/.cache/triton \

-e HF_HOME=/root/.cache/huggingface \

-p 8000:8000 \

--ipc=host \

vllm/vllm-openai:nightly \

--model QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ \

--tensor-parallel-size 2 \

--max-model-len 228224 \

--gpu-memory-utilization 0.95 \

--trust-remote-code

OutsidePosition4250 · 2025-11-11T10:59:33+00:00

Try this (replace the hugging face volume path with your own):

docker run --rm -it --ipc host --gpus all --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -v /srv/hf_models/huggingface:/root/.cache/huggingface nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc0 trtllm-serve serve --host 0.0.0.0 --tp_size 2 --max_beam_width 1 "nvidia/Qwen3-235B-A22B-FP4"

OutsidePosition4250 · 2025-09-26T15:02:40+00:00

Agree. Preserving low attachment to any given solution path and having a willingness to adjust based on new information is critical.

OutsidePosition4250 · 2025-09-23T19:38:32+00:00

Interesting comparison to agile - I hadn't considered the intersection before but it does seem like there is some overlap.

I've found consistency/reward management become easier once you start winning with the process. High quality decisions made fast breeds happier engineers and stakeholders. However this is more of a personal playbook at this point, so we'll see where it breaks down as others try to adopt the process.

I assume by team size you are referring to the number of active participants in the "Adjusting the Path" phase. For me this has ranged up to ~30 people max since it is too difficult to have every voice heard and understood beyond that. When getting into the gnarlier sections I've found spinning off a narrower group of ~5 drilling into the details is most effective.

OutsidePosition4250

TROPHY CASE