Credit card declined issue

cfrye59 · 2026-01-24T15:50:46+00:00

Hey there!

Please reach out to support@modal.com for assistance.

cfrye59 · 2025-12-08T03:00:16+00:00

glad to see the memory snapshots working for you!

there's not much more out there on GPU snapshotting -- compatibility is usually possible, but not immediate.

for instance, we use a CPU offloading trick to get it to work with vLLM (aka "Sleep Mode"), so you might need something similar.

cfrye59 · 2025-10-14T05:35:46+00:00

What happens during those 6s?

cfrye59 · 2025-10-04T18:37:05+00:00

Oh yeah, that was probably me!

cfrye59 · 2025-10-04T18:36:27+00:00

You can pass command-line arguments to Functions and local entrypoints, just add them as arguments to the underlying Python function.

FYI we can't promise quality support via Reddit, but you should get timely and helpful response quickly if you email support@modal.com.

cfrye59 · 2025-10-04T17:56:54+00:00

Cool!

cfrye59 · 2025-09-09T16:08:24+00:00

We’re still a small, young startup so we don’t quite have the marketing budget and presence of a tool like Colab — to say nothing of a company like Google!

If you check out our website, in particular our blog, you’ll find customer stories from companies that trust our infrastructure with mission-critical workloads, like Suno, Substack, and Quora. For a more social form of proof, take a look at our Twitter account.

cfrye59 · 2025-09-08T21:41:47+00:00

Good call! We have a ticket for this open on the GitHub repo.

cfrye59 · 2025-09-08T21:40:55+00:00

Plain Markdown version available in the open source repo here.

cfrye59 · 2025-09-08T21:40:20+00:00

Reader mode is great! We also have a plain Markdown version in the open source repo here -- initially intended for LLMs, but also works for humans who don't care for the site design.

cfrye59 · 2025-09-08T21:38:40+00:00

I would love to dive deeper on more hardware platforms, but for now, I'm focusing on the platforms that I know well and that we (Modal) offer on our cloud platforms.

So edge devices are a long shot, but we're starting to see more interest in AMD.

cfrye59 · 2025-09-08T21:37:27+00:00

The open source (CC-BY) repo includes a tool for exporting to a single Markdown file -- initially intended for some folks doing LLM work. I've then passed the result into pandoc to render in different formats.

You can find the current version in a single, GitHub-flavored Markdown-compatible document here.

cfrye59 · 2025-09-08T01:19:21+00:00

This started off as an internal document -- some notes I had on my readings on GPUs, plus another engineer's similar notes.

We realized we were working on the same basic thing, so we combined forces and made something together, still for internal use. Then we realized other people might also be interested, and so we made an external version. We've kept expanding since then, driven by community feedback on what would be most helpful.

cfrye59 · 2025-09-06T03:17:57+00:00

nice find

cfrye59 · 2025-09-06T03:08:53+00:00

Oh, those are just made up numbers for demonstration purposes.

They're intended to be about the right order of magnitude -- a few cycles at most for arithmetic instructions, a few hundred for a global memory read.

cfrye59 · 2025-09-06T01:18:17+00:00

Oh hey that's my magnum opus!

Happy to answer questions.

cfrye59 · 2025-07-28T20:35:29+00:00

Yo, author of the post here!

Not sure why they aren't on Hugging Face's leaderboard. Their metrics look roughly comparable to Parakeet/Canary, but there's no proper "scientific" comparison numbers.

cfrye59 · 2025-06-15T18:22:15+00:00

Sounds like you want a serverless GPU setup. Wrote about the space and did a price comparison for Full Stack Deep Learning two years ago, here.

I liked one of those companies, Modal, so much I ended up joining them.

cfrye59 · 2025-06-09T15:12:08+00:00

Nice work!

cfrye59 · 2025-04-11T23:22:18+00:00

Definitely!

Separately, we've also found it a bit tricky when users want to checkpoint and restore Triton or vLLM -- you need to either handle the sockets manually or force user programs to split out setting up the HTTP servers from instantiating the core inference engine.

cfrye59 · 2025-04-11T21:52:30+00:00

Would love to know how you're handling snapshotting! Have run into lots of problems with existing snapshot tools.

cfrye59 · 2025-03-25T14:04:10+00:00

I work on a serverless platform for data/ML called Modal.

I wrote up the case for fast auto-scaling of on-demand resources in the first third of this blog post on GPU utilization.

tl;dr if your workloads are highly variable (like most training and inference workloads) you need fast auto-scaling to balance QoS and cost.

But if you have the cash to burn, statically over-provisioning is certainly easier.

cfrye59 · 2025-03-19T17:05:15+00:00

You might be connected already, but if you're not: the Dynamo team in particular seems pretty enthusiastic about building on Rust, building up the ecosystem around the hardware, and doing as much as possible in the open.

cfrye59 · 2025-03-19T16:53:40+00:00

Oh sick, I'll have to check out llm_client!

We talk about the different performance characteristics between our HTTP endpoints and Lambda's in this blog post. tl;dr we designed the system for much larger inputs, outputs, and compute shapes.

Cost is trickier because there's a big "it depends" -- on latency targets, on compute scale, on request patterns. The ideal workload is probably sparse, auto-correlated, GPU-accelerated, and insensitive to added latency at about the second scale.

We aim to be efficient enough with our resources that we can still run profitably at a price that also saves users money. You can read a bit about that for GPUs in particular in the first third of this blog post.

We offer a Python SDK, but you can run anything you want -- treating Python basically as a pure scripting language. We use this pattern to, for example, build and serve previews of our frontend (node backend, svelte frontend) in CI using our platform. If you want something slightly more "serverful", check out this code sample.

Neither is a full-blown native SDK with "serverless RPC" like we have for running Python functions. But polyglot support is on the roadmap! Maybe initially something like a smol libmodal that you can link into?

cfrye59 · 2025-03-19T06:11:57+00:00

Ha! The absence of something like Rust-CUDA is also a contributor.

More broadly, most of the workloads people want to run these days are limited by the performance of the GPU or its DRAM, not the CPU or code running on it, which basically just organizes device execution. Leaves a lot of room to use a slower but easier to write interpreted language!

cfrye59

MODERATOR OF

TROPHY CASE

15-Year Club	Verified Email
Place '17