NavVis | Blog | BUILD BETTER REALITY

Teaching 3D Spaces to Understand Themselves: Exploring Semantic Intelligence at NavVis

Written by Tim Runge | Aug 21, 2025

NavVis has always aimed to help people better capture and explore indoor and outdoor spaces. Our scanners and software turn real places into accurate, detailed 3D digital spaces. But what if these digital spaces could do more than just show what is there? What if they could also understand their own content?

At our recent CodeJam, engineers experimented on an early prototype that takes a clear step in that direction: Enhancing 3D Gaussian Splatting with semantic features.

The Idea: Bringing Meaning into 3D

Today, powerful AI models recognize and label objects clearly in 2D photos. You can show a photo to these models, and they instantly detect chairs, tables, doors, and more. Yet, in 3D scenes, this kind of understanding is far behind.

That’s why NavVis’ Manuel Dahnert, a machine learning expert who previously pursued a PhD at the Technical University of Munich, asked: Can we transfer that 2D knowledge into 3D environments, specifically 3D Gaussian Splatting environments, so users can interact with objects, not just geometry?

Manuel’s hypothesis was simple but ambitious. If we could apply the understanding from 2D images directly to 3D digital environments, NavVis' users could eventually interact with spaces in richer ways. They could query a digital twin by object types, ask for quick inventory checks, or even filter a 3D model to focus only on specific elements.

Why Gaussians, Not Point Clouds?

Gaussian Splatting models a scene as thousands of overlapping, coloured “blobs” instead of millions of discrete points. Each blob carries size, color, and depth, making it easier to connect with the original camera images.

This structure also allows semantic labels from 2D images to transfer more naturally into 3D. It also keeps files lighter, renders faster, and supports smoother surfaces and occlusions – all while preserving the original NavVis point cloud underneath for accurate measurement.

In short: Gaussians offer a more expressive and efficient foundation for adding meaning to 3D data.

Building the Prototype

  1. Segment Anything Model (SAM) isolated objects in each image.
  2. CLIP vision–language features associated those objects with categories such as chair or door.
  3. The labelled pixels were lifted into the splatted scene, colouring every blob that belonged to the same object type.

In Manuel’s prototype, different types of objects appeared consistently in different colors. All chairs showed up in one color, tables in another, and doors in yet another. While he wasn’t yet able to add text-based queries, the results clearly demonstrated the potential of this method. The digital scene wasn't only realistic but semantically organized.

View 1 View 2 View 3

Figure 1: Examples of 3D Gaussian Splatting using the NavVis HQ kitchen as a test environment. Different colors indicate different types of objects.

Practical Value

  • Filter a digital twin to view only critical equipment
  • Compare captured furniture against a design model in seconds.
  • Count specific objects without manual annotation.

Semantic understanding in 3D scenes opens many possibilities. Imagine walking through a digital twin of a building and quickly filtering to see only fire safety equipment. Or, consider a construction manager instantly identifying which pieces of a half-finished building match or differ from their BIM models. These capabilities can save significant time and effort in industries like construction, facilities management, and safety inspections.

In other words, we believe that combining digital spaces with semantic understanding can lead to smarter navigation, faster searches, and better decision-making.

Shared Foundations: Cleaner Panoramas

NavVis' exploration in semantic understanding isn't limited to 3D spaces. Another team at our CodeJam used similar segmentation models to automatically remove operators from the panoramas captured by our  NavVis MLX laser scanners.

This cleaner output means less manual editing, clearer images, and an overall improved user experience. It also shows how practical and valuable semantic segmentation can be today, even without full 3D integration.

Current Limits and Open Questions

  • File size: the semantic scene is still heavy, optimisation is in progress.
  • Domain coverage: open models handle everyday objects well but struggle with specialised industrial items, additional training data may be required.
  • Cost envelope: cloud processing must stay within budgets that aren’t prohibitive to current customers.
  • User interface: NavVis IVION will need ways to expose semantic filters and queries.

While the early results are promising, Manuel and the rest of the team recognises that the work is still in an early phase. Moving this from prototype to product will require continued testing on real-world scenes, particularly in complex industrial environments.

The goal is to ensure that any future semantic features meet the same expectations for accuracy, reliability, and performance that apply to all NavVis deliverables.

Looking Ahead

Innovation here follows a predictable path: prototype → measure → iterate. Some ideas graduate to product planning. Others inform future research. This work sits at the research stage, but aligns with clear customer needs and wider industry direction.

Our engineers are continuing to trim model size, wire basic text queries, and test on larger, more varied datasets. As these milestones fall into place, the capability can move closer to pilot projects with real customer data.

NavVis will be ready for the moment semantic understanding becomes an everyday expectation.