Unifi3D: A Study on 3D Representations for Generation and Reconstruction in a Common Framework

TMLR 2025

Nina Wiedemann*

Intel Corp.

Sainan Liu*

Intel Corp.

Quentin Leboutet*

Intel Corp.

Katelyn Gao

Intel Corp.

Benjamin Ummenhofer

Intel Corp.

Michael Paulitsch

Intel Corp.

Kai Yuan

Intel Corp.

* Equal contribution

Paper Code Video

Unifi3D Compares Tensorial Representations within 3D Generative Pipelines

Abstract

Unifi3D is a unified framework for evaluating the reconstruction and generation performance of 3D representations. We compare these representations based on multiple criteria: quality, computational efficiency, and generalization performance. Beyond standard model benchmarking, our experiments aim to derive best practices over all steps involved in the 3D generation pipeline, including preprocessing, mesh reconstruction, compression with autoencoders, and generation. Our findings highlight that reconstruction errors significantly impact overall performance, underscoring the need to evaluate generation and reconstruction jointly.

Key Contributions

Unified Comparison

First comprehensive comparison of 6 tensorial 3D representations (SDF, Voxel, Triplane, NeRF, DualOctree, Shape2VecSet) within a single framework.

Pipeline Analysis

Systematic study of all pipeline components: preprocessing, mesh extraction, autoencoder compression, and diffusion-based generation.

Best Practices

Actionable insights on representation selection, architecture choices, and training strategies for optimal 3D generation quality.

Method Overview

Unifi3D method overview showing the diffusion-based 3D generation pipeline

Diffusion-based 3D generation pipelines share a common structure that we systematically analyze:

  1. Preprocessing: The input mesh is transformed into a suitable 3D representation (SDF, Voxel, Triplane, NeRF, DualOctree, or Shape2VecSet).
  2. Compression: An autoencoder is pre-trained to compress the representation into a compact latent vector.
  3. Generation: A diffusion model (U-Net or DiT) is trained to generate new latents by learning to denoise.
  4. Reconstruction: The latent is decoded back to the target representation and converted to a mesh using Marching Cubes or similar algorithms.

3D Representations Compared

We evaluate six major tensorial representations used in modern 3D generative models:

SDF Grid

Dense signed distance field on a regular 3D grid. Excellent reconstruction quality and out-of-distribution generalization.

Voxel Grid

Binary or continuous occupancy grid. Simple but effective, with good balance of quality and efficiency.

Triplane

Three axis-aligned feature planes. Memory efficient but struggles with out-of-distribution shapes.

NeRF

Neural radiance field with density prediction. Flexible but lower reconstruction fidelity.

DualOctree

Hierarchical adaptive octree structure. Best generation metrics but limited generalization.

Shape2VecSet

Cross-attention based point set encoding. Resolution-independent with good quality.

Key Results

Unconditional Generation (ShapeNet)

Best per-representation models evaluated on ShapeNet categories. COV (Coverage) measures diversity, MMD (Minimum Matching Distance) measures quality, and 1-NNA measures distribution similarity (optimal at 0.5).

Method COV ↑ MMD ↓ 1-NNA → 0.5
DualOctree (VAE, U-Net)0.3650.0310.824
SDF (AE, DiT)0.3570.0320.860
3DShape2VecSet0.3440.0330.864
Triplane (AE, U-Net)0.2970.0360.921
Voxel (AE, DiT)0.3190.0400.937
Unconditional generation results comparison
Generation quality metrics visualization

Reconstruction Quality (ShapeNet)

Average over airplane/car/chair (F-score ↑, CD ↓, NC ↑). OOD generalization shown for Chair→Airplane.

RepresentationF-scoreCD (×1e−4)NC OOD F-scoreOOD CDOOD NC
SDF AE88.434±6.580.012±0.000.827±0.0691.123±6.020.010±0.010.843±0.05
Voxel AE85.666±10.540.016±0.010.787±0.0685.602±9.480.017±0.010.800±0.05
Shape2VecSet AE79.37±17.040.023±0.020.776±0.0775.338±8.870.022±0.010.717±0.07
Triplane AE66.445±16.060.028±0.020.759±0.0841.69±11.570.073±0.030.688±0.07
DualOctree VAE76.122±13.440.020±0.010.766±0.0748.38±11.390.047±0.020.677±0.08
NeRF AE58.44±13.220.034±0.020.723±0.0726.229±11.640.107±0.040.589±0.05

Key Findings

Generated 3D Samples

Examples of unconditionally generated 3D meshes from our best-performing models.

Citation

@article{unifi3d,
  title={{Unifi3D: A Study on 3D Representations for Generation and Reconstruction in a Common Framework}},
  author={Wiedemann, Nina and Liu, Sainan and Leboutet, Quentin and Gao, Katelyn and Ummenhofer, Benjamin and Paulitsch, Michael and Yuan, Kai},
  journal={Transactions on Machine Learning Research},
  year={2025},
  url={https://openreview.net/forum?id=GQpTWpXILA},
}