FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking

1Tampere University 2Aalto University 3University of Oulu

Abstract

The development of large-scale 3D scene reconstruction and novel view synthesis methods mostly rely on datasets comprising per spective images with narrow fields of view (FoV). While effective for small-scale scenes, these datasets require large image sets and extensive structure-from-motion (SfM) processing, limiting scalability. To address this, we introduce a fisheye image dataset tailored for scene reconstruction tasks. Using dual 200-degree fisheye lenses, our dataset provides full 360-degree coverage of 5 indoor and 5 outdoor scenes. Each scene has sparse SfM point clouds and precise LIDAR-derived dense point clouds that can be used as geometric ground-truth, enabling robust benchmarking under challenging conditions such as occlusions and reflections. While the baseline experiments focus on vanilla Gaussian Splatting and NeRF based Nerfacto methods, the dataset supports diverse approaches for scene reconstruction, novel view synthesis, and image-based rendering.

Scene Thumbnails

Hall In
Hall_In
Kitchen In
Kitchen_In
Meeting Room In
MeetingRoom_In
Upstairs In
Upstairs_In
Building In
Building_In
Building Out
Building_Out
Corridor Out
Corridor_Out
Bridge Out
Bridge_Out
Road Out
Road_Out
Night Out
Night_Out

Dataset Overview

Our dataset contains still fisheye images of 5 indoor and 5 outdoor environments, captured by an Insta360 RS-One Inch camera mounted on a tripod. Sample images from the types of scenes available can be seen above. You can also get an idea about the dimensions of the scenes from our paper. The dataset also includes two types of point clouds for each scene: the sparse point cloud generated via Structure-from-Motion (SfM) and the dense point cloud captured with a Faro LIDAR scanner.

The dataset structure is as follows:

Important: The raw .insp format images from the camera are output as upside-down, as can be observed from the raw images file. It is possible to rotate these images using the .sh script given or other software tools, if this is a problem for you. The upside-down images do not change the constructed geometry of the scenes, however the rotation and translation parameters will be different. When Nerfacto or GS based results are rendered, simply rotating these results would work just fine. For convenience, the correctly rotated images will be also provided.

Data Capturing

Data Capture Data Processing

The Insta360 camera was mounted on a tripod, and placed at a fixed location to take a single shot for the scene image capturing process. Afterwards, it was repositioned with minimal rotation and movement (less than 10 cm between each camera positioning and less than 60 degrees of rotation to either side on the lateral axis of the camera) to another location within the scene, and another image was taken. This process was repeated systematically until the entire scene was covered, with each lens consistently covering the same side of the scene at all times. This consistency, combined with the minimal rotation or movement between each capture ensured sufficient overlap between the images, which played a key role in the subsequent accurate SfM based sparse point cloud generation step.

To avoid disruptions from moving objects, such as people in indoor environ- ments or cars in outdoor settings, images were captured at times and locations with minimal activity. The photographer ensured they remained out of the frame by strategically positioning themselves in occluded areas or sequentially capturing the two fisheye images from the same fixed position—first taking a shot while remaining outside the field of view (FoV) of one lens, then repositioning to avoid the FoV of the second lens before capturing the next image.

Faro Focus 3D LIDAR scanner, fixed on a tripod is also used to capture high-resolution XYZRGB dense point clouds, which can be used as geometric ground truths for the scenes we captured. This ground truth information can be used for point cloud alignment, 3D scene reconstruction and novel view synthesis model benchmarking and accuracy validation purposes. Scene reconstruction methods that incorporate depth information can also find these dense point clouds useful.

Data Processing

Data Flow

For each scene, Insta360 camera is used to obtain still fisheye images and the Faro scanner is used to capture the dense 3D point cloud for geometry reference. Prior to image capture, the two fisheye cameras of the Insta360 camera are calibrated and the estimated calibration parameters are fed to COLMAP, along with the still fisheye images, to create sparse point clouds of our scenes. The sparse point cloud generated by the Structure-from-Motion pipeline COLMAP is then aligned with the dense 3D point cloud obtained from the LIDAR scanner to enable direct comparison and evaluation for downstream applications of 3D scene reconstruction methods.

BibTeX

@article{gunes2025fiord,
  author    = {Gunes, Ulas and Turkulainen, Matias and Ren, Xuqian and Solin, Arno and Kannala, Juho and Rahtu, Esa},
  title     = {FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking},
  booktitle  = {SCIA},
  year      = {2025},
}