Capture pipeline · end-to-end · measured

Photos in. 77 MB compressed splat out.

One Modal job. Upload a folder of photos; receive a Scaffold-GS QAT-Bundle splat. No PLY required, no client-side training, no hand-holding through COLMAP. Measured on bonsai (292 photos, Mip-NeRF 360 sequence): 49.5 min end-to-end, $2.07 Modal spend.

Source: Mip-NeRF 360 (bonsai) 292 photos baseline 32.89 dB → 33.41 dB (+0.52 dB) 130 MB → 77 MB (40.5% save)
  1. Stage 1

    COLMAP

    sparse 3D from photos
    Wall
    5.4 min
    Modal $
    $0.05

    Feature extraction (CPU SIFT) → exhaustive matcher → mapper. Outputs sparse/0/{cameras,images,points3D}.bin which Scaffold-GS consumes via -s <workspace>. 292 photos, single SIMPLE_PINHOLE intrinsic.

  2. Stage 2

    Scaffold-GS

    30k iterations · anchor representation
    Wall
    37.4 min
    Modal $
    $1.56

    Upstream Scaffold-GS train.py at voxel_size=0.001, ratio=1, n_offsets=10. Trained PSNR 32.83 dB (test eval @ 30k iter). 411k anchors. Output is a 130 MB anchor PLY + 3 MLPs (color, cov, opacity).

  3. Stage 3

    QAT-Bundle

    quant-aware finetune + encode
    Wall
    6.6 min
    Modal $
    $0.46

    5,000-iter QAT finetune at lr_init=2e-4, lr_final=1e-5, per-channel int4 quant on anchor + offset, constant-strip on opacity. Net PSNR delta: +0.521 dB (33.41 dB post-QAT vs 32.89 dB pre). PLY save: 40.5%.

  4. Stage 4

    Upload

    Vercel Blob → public URL
    Wall
    0.1 min
    Modal $
    $0.00

    77 MB output PLY uploaded to Vercel Blob. Customer receives a public URL via API callback identical to the PLY-in / PLY-out flow.

Total wall 49.5 min
Total Modal spend $2.07
PSNR Δ vs Scaffold-GS-baseline +0.52 dB
PLY size reduction 40.5%

The output

QAT-Bundle PLY

Anchors
411,066
Final PLY
77.3 MB
Baseline PLY
129.9 MB
Baseline PSNR
32.89 dB
Post-QAT PSNR
33.41 dB
Finetune iter
5,000
Download bonsai_qat_bundle.ply (77.3 MB)

Single-PLY Scaffold-GS artifact. Drop into any Scaffold-GS renderer (Inria reference, gsplat, splat-transform) or re-encode through SplatForge's web-mobile preset for a progressive-streaming bundle.

Reproduce

# 1. Pack your photos
zip photos.zip my-photos/*.jpg

# 2. Upload + dispatch (via splatforge CLI)
splatforge capture submit \
  --photos photos.zip \
  --preset splatforge-qat-bundle \
  --out ./out.ply

# 3. Or POST direct to the Modal app
curl -X POST https://montabano1--splatforge-capture-enqueue.modal.run/ \
  -H 'content-type: application/json' \
  -d '{
    "job_id": "demo-001",
    "preset": "capture-and-compress",
    "blob_url": "<vercel-blob-url>",
    "filename": "photos.zip",
    "inner_preset": "splatforge-qat-bundle",
    "training_iters": 30000,
    "callback_url": "<your-webhook>"
  }'

Job IDs are customer-supplied; the callback URL receives per-phase progress (fetching, colmap, training, encoding, uploading) and a terminal {status, output_url, metrics} POST.

Honest input requirements

COLMAP is the bottleneck on capture quality. For a successful run we recommend:

The MipNeRF360 bonsai input (above) is a benchmark-grade sequence — a well-lit, textured indoor scene with a clean orbit. Real-world phone captures land at PSNR delta ±2 dB of these numbers depending on how closely they match these conditions.

Run this on your own captures

The /capture endpoint is gated behind the Design Partner tier ($0/mo, capacity-capped) while we baseline real-world phone captures. Drop your email + one example zip and we'll wire it up.

Apply for the Design Partner tier →