QAT-3DGS Bundle (premium — full SH retrain)

The catetus-qat-3dgs-bundle preset is the retrain leg of the QAT-3DGS recipe for vanilla Inria 3DGS PLYs. It accepts a bundle (PLY + COLMAP cameras + GT images) and runs a 5000-iter int8 quant-aware finetune on A100 against the 45 f_rest_* SH coefficients — the 73%-of-bytes channel that the lossless single-PLY tier cannot compress. Realized PLY save: ~55%.

Two tiers, one recipe

The QAT-3DGS recipe has two tiers; they share the same on-disk output format (a smaller Inria 3DGS PLY) but trade different constraints for different savings:

Tier	Input	Output PLY save	ΔPSNR	Time / Cost
`catetus-qat-3dgs`	Single PLY	~5% (live, validated)	0 dB (lossless)	~30 s, free
`catetus-qat-3dgs-bundle`	PLY + cameras + images	~55% (projected)	≥ −0.3 dB target	~5 min A100, premium

The single-PLY tier is strictly lossless — the encoder asserts a bit-exact round-trip before emitting and refuses to ship anything it can't prove is reversible. The bundle tier is not lossless: it pushes f_rest onto an int8 lattice and uses the 5000-iter finetune to absorb the quant noise into the other Gaussian parameters. The callback returns the honest PSNR delta vs the customer's pre-finetune state — some scenes will land slightly positive (the finetune over-corrects for geometry drift the original training under-resolved), some will land slightly negative.

Why ~55% (the headline)

A vanilla Inria 3DGS PLY at SH degree 3 has 62 fp32 columns per vertex: x/y/z, nx/ny/nz, f_dc_0..2, f_rest_0..44, opacity, scale_0..2, rot_0..3. The 45 f_rest_* SH coefficients dominate — 45 × 4 = 180 bytes/vertex, or 73% of the 248-byte per-vertex footprint. The remaining 27% is geometry (xyz + scale + rot), alpha (opacity), DC color (f_dc), and the zeros-only normals.

Switching the 45 f_rest columns from fp32 to int8 with per-channel symmetric scales compresses that 73% slice by 4×. Realized save on a 287 MB bonsai PLY: 1.18M vertices × 45 channels × 3 bytes saved per channel = 159 MB. Plus the lossless 4.84% from the single-PLY tier (stripped nx/ny/nz) stacks on top. Total: ~55% of the original PLY.

Naive post-hoc int8 of f_rest_* destroys render quality — the SH coefficients control SH→RGB and int8 quant noise propagates straight to pixel error. The finetune absorbs that noise: forward pass applies fake_quant_int8(f_rest) with a straight-through estimator; backward pass receives full fp32 gradient. AdamW with cosine LR decay finetunes f_dc + f_rest (in its int8 representation) + opacity + scale + rot for 5000 iters on the customer's GT images, with L1+SSIM loss against the renderer output.

Bundle layout (required)

Pack a tar / tar.gz / tgz with the following structure. The encoder accepts both flat layout and one level of nesting, so tar -czf bundle.tar.gz bonsai/ works without flattening:

bundle.tar.gz
├── point_cloud.ply         # vanilla Inria 3DGS PLY (any iteration)
├── sparse/
│   └── 0/                  # COLMAP sparse model
│       ├── cameras.bin     (or cameras.txt)
│       ├── images.bin      (or images.txt)
│       └── points3D.bin
└── images/                 # GT images referenced by sparse/0/images
    ├── DSCF...JPG
    └── ...

The endpoint validates the layout up-front and surfaces a customer-actionable error via the callback if anything is missing. Minimum 8 GT images required (the train/test split needs both legs). 1 GB hard cap on bundle size; bundles larger than that should pre-resize images or split scenes.

What happens server-side

Browser uploads the bundle to Vercel Blob via a presigned PUT.
Worker validates the preset and forwards { preset: "catetus-qat-3dgs-bundle", blob_url, callback_url } to the private Modal /qat-3dgs-bundle endpoint.
Endpoint extracts + validates the bundle layout. Layout violations surface via the callback before any GPU time is consumed.
Inria Scene loads the customer's PLY + COLMAP scene. A baseline PSNR is computed with the un-patched renderer for the honest delta-report.
The renderer's render() is monkey-patched to apply per-channel symmetric int8 fake-quant on pc._features_rest with a straight-through estimator. All other Gaussian parameters flow unchanged.
5000-iter AdamW finetune on the customer's GT images. L1 + SSIM loss with lambda_dssim = 0.2; cosine LR decay on f_rest (5e−5), f_dc (2.5e−4), opacity (5e−3), scale (5e−4), rot (1e−4); xyz frozen so the customer's layout doesn't drift.
_features_rest is permanently snapped to the int8 lattice and saved via the canonical Inria save_ply.
Inria render.py + metrics.py run a canonical eval pass with the un-patched renderer. This is the customer-facing PSNR — what a downstream consumer of the saved PLY would actually see.
The fp32-on-int8-lattice PLY is packed through the int8-column codec: 45 f_rest_* properties switch from float to char (int8), per-channel scales encoded in a comment quantized_field f_rest int8 channels=45 scale_b64=... header line. Round-trip is verified before upload.
Result is uploaded to Vercel Blob and returned via the callback with the honest per-scene numbers.

Projected smoke target

Bench target on the canonical bonsai_mipnerf360_iter7k.ply (same scene as the single-PLY tier's smoke):

Field	Value
`scene`	bonsai (Mip-NeRF 360, Inria 3DGS iter 7k)
`n_vertices`	1,157,141
`sh_channels`	45
`size_bytes_in`	286.97 MB
`projected size_bytes_out`	~128 MB
`projected ply_save_pct`	~55%
`finetune_iters`	5,000
`ΔPSNR target`	≥ −0.3 dB (ship gate)

Per-scene variation expected. Some scenes may land neutral or slightly positive (bonsai bench target); some indoor scenes with extreme view-dependent specular reflections may land at the lower end of the target band as the int8 noise compresses the SH coefficients that encode those highlights. The callback always reports honest numbers — never the projected target.

API callback shape

{
  "status": "done",
  "output_url": "https://...vercel-storage.com/jobs/<id>/scene_qat3dgs_bundle.ply",
  "size_bytes_in": 286968700,
  "size_bytes_out": 128400000,
  "ply_save_pct": 55.2,
  "delta_psnr_db": -0.12,
  "psnr_baseline": 28.81,
  "psnr_canonical": 28.69,
  "ssim_canonical": 0.881,
  "lpips_canonical": 0.143,
  "lossless": false,
  "preset": "catetus-qat-3dgs-bundle",
  "n_vertices": 1157141,
  "sh_channels": 45,
  "f_rest_bytes_in": 208285380,
  "f_rest_bytes_out": 52071345,
  "n_images_used": 31,
  "finetune_iters": 5000,
  "train_wall_secs": 312.4
}

Reader compatibility

The encoded PLY remains a valid PLY file — it just declares char (i1) instead of float (f4) for the 45 f_rest_* properties. Any PLY parser that respects per-property dtype declarations (plyfile, gsplat, Catetus) reads the int8 values correctly. Decoders that hard-code "f_rest is always f4" will see bogus values; the Catetus plugin is the reference implementation that round-trips properly via the quantized_field header marker.

← back to Try it · QAT-3DGS (Inria 3DGS single-PLY) · QAT-Bundle (Scaffold-GS full retrain) · SplatBench