QAT-3DGS Bundle (premium — full SH retrain)
The splatforge-qat-3dgs-bundle preset is the
retrain leg of the QAT-3DGS recipe for vanilla
Inria 3DGS PLYs. It accepts a bundle (PLY + COLMAP cameras + GT
images) and runs a 5000-iter int8 quant-aware finetune on A100
against the 45 f_rest_* SH coefficients — the
73%-of-bytes channel that the lossless single-PLY tier cannot
compress. Realized PLY save: ~55%.
Two tiers, one recipe
The QAT-3DGS recipe has two tiers; they share the same on-disk output format (a smaller Inria 3DGS PLY) but trade different constraints for different savings:
| Tier | Input | Output PLY save | ΔPSNR | Time / Cost |
|---|---|---|---|---|
splatforge-qat-3dgs | Single PLY | ~5% (live, validated) | 0 dB (lossless) | ~30 s, free |
splatforge-qat-3dgs-bundle | PLY + cameras + images | ~55% (projected) | ≥ −0.3 dB target | ~5 min A100, premium |
The single-PLY tier is strictly lossless — the encoder asserts a bit-exact round-trip before emitting and refuses to ship anything it can't prove is reversible. The bundle tier is not lossless: it pushes f_rest onto an int8 lattice and uses the 5000-iter finetune to absorb the quant noise into the other Gaussian parameters. The callback returns the honest PSNR delta vs the customer's pre-finetune state — some scenes will land slightly positive (the finetune over-corrects for geometry drift the original training under-resolved), some will land slightly negative.
Why ~55% (the headline)
A vanilla Inria 3DGS PLY at SH degree 3 has 62 fp32 columns per
vertex: x/y/z, nx/ny/nz,
f_dc_0..2, f_rest_0..44,
opacity, scale_0..2,
rot_0..3. The 45 f_rest_* SH
coefficients dominate — 45 × 4 = 180 bytes/vertex,
or 73% of the 248-byte per-vertex footprint. The remaining 27%
is geometry (xyz + scale + rot), alpha (opacity), DC color
(f_dc), and the zeros-only normals.
Switching the 45 f_rest columns from fp32 to int8 with
per-channel symmetric scales compresses that 73% slice by 4×.
Realized save on a 287 MB bonsai PLY: 1.18M vertices ×
45 channels × 3 bytes saved per channel = 159 MB. Plus
the lossless 4.84% from the single-PLY tier (stripped
nx/ny/nz) stacks on top. Total: ~55% of the
original PLY.
Naive post-hoc int8 of f_rest_* destroys render
quality — the SH coefficients control SH→RGB and
int8 quant noise propagates straight to pixel error. The
finetune absorbs that noise: forward pass applies
fake_quant_int8(f_rest) with a straight-through
estimator; backward pass receives full fp32 gradient. AdamW with
cosine LR decay finetunes f_dc + f_rest (in its int8
representation) + opacity + scale + rot for 5000 iters on the
customer's GT images, with L1+SSIM loss against the renderer
output.
Bundle layout (required)
Pack a tar / tar.gz / tgz with the following structure. The
encoder accepts both flat layout and one level of nesting, so
tar -czf bundle.tar.gz bonsai/ works without
flattening:
bundle.tar.gz
├── point_cloud.ply # vanilla Inria 3DGS PLY (any iteration)
├── sparse/
│ └── 0/ # COLMAP sparse model
│ ├── cameras.bin (or cameras.txt)
│ ├── images.bin (or images.txt)
│ └── points3D.bin
└── images/ # GT images referenced by sparse/0/images
├── DSCF...JPG
└── ... The endpoint validates the layout up-front and surfaces a customer-actionable error via the callback if anything is missing. Minimum 8 GT images required (the train/test split needs both legs). 1 GB hard cap on bundle size; bundles larger than that should pre-resize images or split scenes.
What happens server-side
- Browser uploads the bundle to Vercel Blob via a presigned PUT.
-
Worker validates the preset and forwards
{ preset: "splatforge-qat-3dgs-bundle", blob_url, callback_url }to the private Modal/qat-3dgs-bundleendpoint. - Endpoint extracts + validates the bundle layout. Layout violations surface via the callback before any GPU time is consumed.
-
Inria
Sceneloads the customer's PLY + COLMAP scene. A baseline PSNR is computed with the un-patched renderer for the honest delta-report. -
The renderer's
render()is monkey-patched to apply per-channel symmetric int8 fake-quant onpc._features_restwith a straight-through estimator. All other Gaussian parameters flow unchanged. -
5000-iter AdamW finetune on the customer's GT images. L1 + SSIM
loss with
lambda_dssim = 0.2; cosine LR decay on f_rest (5e−5), f_dc (2.5e−4), opacity (5e−3), scale (5e−4), rot (1e−4); xyz frozen so the customer's layout doesn't drift. -
_features_restis permanently snapped to the int8 lattice and saved via the canonical Inriasave_ply. -
Inria
render.py+metrics.pyrun a canonical eval pass with the un-patched renderer. This is the customer-facing PSNR — what a downstream consumer of the saved PLY would actually see. -
The fp32-on-int8-lattice PLY is packed through the int8-column
codec: 45
f_rest_*properties switch fromfloattochar(int8), per-channel scales encoded in acomment quantized_field f_rest int8 channels=45 scale_b64=...header line. Round-trip is verified before upload. - Result is uploaded to Vercel Blob and returned via the callback with the honest per-scene numbers.
Projected smoke target
Bench target on the canonical
bonsai_mipnerf360_iter7k.ply (same scene as the
single-PLY tier's smoke):
| Field | Value |
|---|---|
scene | bonsai (Mip-NeRF 360, Inria 3DGS iter 7k) |
n_vertices | 1,157,141 |
sh_channels | 45 |
size_bytes_in | 286.97 MB |
projected size_bytes_out | ~128 MB |
projected ply_save_pct | ~55% |
finetune_iters | 5,000 |
ΔPSNR target | ≥ −0.3 dB (ship gate) |
Per-scene variation expected. Some scenes may land neutral or slightly positive (bonsai bench target); some indoor scenes with extreme view-dependent specular reflections may land at the lower end of the target band as the int8 noise compresses the SH coefficients that encode those highlights. The callback always reports honest numbers — never the projected target.
API callback shape
{
"status": "done",
"output_url": "https://...vercel-storage.com/jobs/<id>/scene_qat3dgs_bundle.ply",
"size_bytes_in": 286968700,
"size_bytes_out": 128400000,
"ply_save_pct": 55.2,
"delta_psnr_db": -0.12,
"psnr_baseline": 28.81,
"psnr_canonical": 28.69,
"ssim_canonical": 0.881,
"lpips_canonical": 0.143,
"lossless": false,
"preset": "splatforge-qat-3dgs-bundle",
"n_vertices": 1157141,
"sh_channels": 45,
"f_rest_bytes_in": 208285380,
"f_rest_bytes_out": 52071345,
"n_images_used": 31,
"finetune_iters": 5000,
"train_wall_secs": 312.4
} Reader compatibility
The encoded PLY remains a valid PLY file — it just declares
char (i1) instead of float (f4) for the
45 f_rest_* properties. Any PLY parser that respects
per-property dtype declarations (plyfile, gsplat, SplatForge) reads
the int8 values correctly. Decoders that hard-code "f_rest is
always f4" will see bogus values; the SplatForge plugin is the
reference implementation that round-trips properly via the
quantized_field header marker.
← back to Try it · QAT-3DGS (Inria 3DGS single-PLY) · QAT-Bundle (Scaffold-GS full retrain) · SplatBench