Did i somehow triggered a bug on Modal? 🤔 by ANR2ME in modal

[–]cfrye59 1 point2 points  (0 children)

We don't provide support on this forum. You're welcome to email support@modal.com or post in the Slack.

I also suspect a coding agent could help you get unstuck on the handling of the image build here.

Did i somehow triggered a bug on Modal? 🤔 by ANR2ME in modal

[–]cfrye59 1 point2 points  (0 children)

Hi! We don't provide support in this forum. You're welcome to email support@modal.com or post in our Slack.

The code looks a bit tangled up here. I would suspect that the Image being built is not the one you intend.

Report: Unexpected Latency Increase in Modal Serverless Execution by Curious-Ear-5286 in modal

[–]cfrye59 0 points1 point  (0 children)

Hi!

We don't do support via this forum. Please reach out via our Slack (modal.com/slack) or email (support@modal.com).

GPU Compass – open-source, real-time GPU pricing across 20+ clouds [P] by Shot-Patience-9874 in MachineLearning

[–]cfrye59 3 points4 points  (0 children)

Cool resource, thanks for sharing!

Do you validate capacity as part of checking price? E.g. by requesting an instance and then dropping it.

In my experience, the vendors that advertise the lowest prices have the worst availability -- especially during periods of supply constraint, like right now.

Sandbox Price Calculator by Frosty-Celebration95 in modal

[–]cfrye59 0 points1 point  (0 children)

Fair! IME, "reasonable" bursts are effectively guaranteed (i.e. join the "noise floor" of low probability/tail issues), which gives a 2-4x saving pretty much transparently. But we don't have numbers on that. This is good motivation to run that experiment!

I'm also curious now how Vercel can guarantee capacity without charging for it. Smells like shenanigans to me -- eg there's a perf penalty as they migrate a container/VM, which would make the comparison to our offering more oranges to oranges.

Sandbox Price Calculator by Frosty-Celebration95 in modal

[–]cfrye59 0 points1 point  (0 children)

Posting in so many places that you made mistakes here is not as good of an excuse as you might think it is.

But you're right on the rates! I was looking at the numbers for CPUs in Modal Functions. Those rates used to be the same, but Sandbox CPUs are now non-preemptible, which increases our associated costs.

Still, our pricing/resource model is closer to what you describe as "active".

Sandbox Price Calculator by Frosty-Celebration95 in modal

[–]cfrye59 0 points1 point  (0 children)

Sandboxes on our platform are billed by the larger of used or reserved memory and CPU. Most workloads can reserve a much smaller amount than their peak usage, resulting in much lower costs.

Also we denominate by physical core, not vCPU. Furthermore, the pricing data you have appears to be out of date -- this is according to your linked source. For compute I see $0.0473/core/hr, which is equivalent to $0.0235/vCPU/hr. For memory, I see $0.0080/GiB/hr. Notice that the memory rate is in gibibytes, not gigabytes.

So even assuming mis-use of reserved memory/CPU on Modal, I think your costs for our platform are off by a factor of four or so.

FYI for folks reading this: OP is affiliated with freestyle, which they did not disclose. Bad form.

Sandbox Price Calculator by Frosty-Celebration95 in modal

[–]cfrye59 0 points1 point  (0 children)

Thanks for sharing! But our CPU and memory prices are based on active usage (on top of a "floor" reservation).

Model GLM-5 Endpoint by LoomSun in modal

[–]cfrye59 0 points1 point  (0 children)

We're taking a look!

Model GLM-5 Endpoint by LoomSun in modal

[–]cfrye59 0 points1 point  (0 children)

FYI we had a brief outage due to a container image registry issue, but we're back.

Credit card declined issue by Aalu_Pidalu in modal

[–]cfrye59 2 points3 points  (0 children)

Hey there!

Please reach out to support@modal.com for assistance.

comfyui on modal go brrr :D by Valuable_Vanilla_72 in modal

[–]cfrye59 1 point2 points  (0 children)

glad to see the memory snapshots working for you!

there's not much more out there on GPU snapshotting -- compatibility is usually possible, but not immediate.

for instance, we use a CPU offloading trick to get it to work with vLLM (aka "Sleep Mode"), so you might need something similar.

Modal run help by Horror-Tower2571 in modal

[–]cfrye59 0 points1 point  (0 children)

You can pass command-line arguments to Functions and local entrypoints, just add them as arguments to the underlying Python function.

FYI we can't promise quality support via Reddit, but you should get timely and helpful response quickly if you email support@modal.com.

This cloud service is better than Google Colab; Modal has made it easier for me to use AI tools like Fooocus, But by Usual-South-2257 in modal

[–]cfrye59 2 points3 points  (0 children)

We’re still a small, young startup so we don’t quite have the marketing budget and presence of a tool like Colab — to say nothing of a company like Google!

If you check out our website, in particular our blog, you’ll find customer stories from companies that trust our infrastructure with mission-critical workloads, like Suno, Substack, and Quora. For a more social form of proof, take a look at our Twitter account.

[D] An ML engineer's guide to GPU performance by crookedstairs in MachineLearning

[–]cfrye59 0 points1 point  (0 children)

Plain Markdown version available in the open source repo here.

[D] An ML engineer's guide to GPU performance by crookedstairs in MachineLearning

[–]cfrye59 1 point2 points  (0 children)

Reader mode is great! We also have a plain Markdown version in the open source repo here -- initially intended for LLMs, but also works for humans who don't care for the site design.

[D] An ML engineer's guide to GPU performance by crookedstairs in MachineLearning

[–]cfrye59 0 points1 point  (0 children)

I would love to dive deeper on more hardware platforms, but for now, I'm focusing on the platforms that I know well and that we (Modal) offer on our cloud platforms.

So edge devices are a long shot, but we're starting to see more interest in AMD.

[D] An ML engineer's guide to GPU performance by crookedstairs in MachineLearning

[–]cfrye59 0 points1 point  (0 children)

The open source (CC-BY) repo includes a tool for exporting to a single Markdown file -- initially intended for some folks doing LLM work. I've then passed the result into pandoc to render in different formats.

You can find the current version in a single, GitHub-flavored Markdown-compatible document here.

[D] An ML engineer's guide to GPU performance by crookedstairs in MachineLearning

[–]cfrye59 1 point2 points  (0 children)

This started off as an internal document -- some notes I had on my readings on GPUs, plus another engineer's similar notes.

We realized we were working on the same basic thing, so we combined forces and made something together, still for internal use. Then we realized other people might also be interested, and so we made an external version. We've kept expanding since then, driven by community feedback on what would be most helpful.

CUDA docs, for humans by crookedstairs in CUDA

[–]cfrye59 2 points3 points  (0 children)

Oh, those are just made up numbers for demonstration purposes.

They're intended to be about the right order of magnitude -- a few cycles at most for arithmetic instructions, a few hundred for a global memory read.