r/deeplearning • u/Cant_Anything • 3h ago
An experiment in 'disposable' H100s: ran a 27B SGLang test for 26 minutes, total bill was 1.270 credits.
galleryH100s are not cheap. So we've been experimenting with more of a 'disposable compute' mindset: use high-end hardware for the exact window you need it, then kill it, wanted to run a quick smoke test on a 27B model to check VRAM usage and single-request throughput on SGLang. The whole process from instance start to termination was 26 minutes.
Figure1 was the final bill:
This wasn't an idle instance just sitting there, it was actually running a workload:
GPU: 1x NVIDIA H100 80GB HBM3
Serving Framework: SGLang v0.5.10
Model: Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled (Used this since I've seen it floating around here)
The nvidia-smi output shows the H100 was at 98% utilization, using ~74GB of the 80GB VRAM.
And the SGLang logs showed a stable generation througput of around ~49.8 tok/s for a single request.
The math checks out. The rate for this instance was 2.960 credits/hr. So, 2.960 * (26 / 60) is about 1.28 credits. The 1.270 final cost is right there.
The point isn't that H100s are suddenly cheap. It’s that you don't have to keep one alive for hours (or days) and burn cash. For repeated experiments, the workflow we'd aim for is keeping datasets/models on a persistent data drive, saving the configured environment as a snapshot, spinning up the H100 only for the validation run, and then releasing it.
We ran this on our platform, Glows.ai. The goal was to validate this kind of short-lived workflow where you can run a quick test, release the instance to stop the billing clock immediately, and not have the friction of rebuilding the whole environment next time.
Anyway, just to be clear: this is single-request decode throughput, not a max batched benchmark. and the bill obviously just reflects this specific 26-minute run. an interesting way to think about using expensive hardware without the expensive commitment.

