Why glTF is the JPEG for the metaverse and digital twins

We’re excited to bring Transform 2022 back to life on July 19th and virtually July 20-28. Join AI and data leaders for sensible conversations and exciting networking opportunities. Register today!


The JPEG file format played a crucial role in transitioning the web from the world of text to the visual experience through open, efficient containers for sharing images. Now, the Graphics Language Transmission Format (glTF) promises to do the same for 3D objects in metavores and digital twins.

JPEG took advantage of various compression tricks to dramatically compress images compared to other formats such as GIF. The latest version of glTF similarly takes advantage of techniques for compressing the geometry and texture of 3D objects. GLTF is already playing a key role in ecommerce, as evidenced by Adobe’s pressure on metavers.

To find out more about what glTF means for enterprises, VentureBeat spoke to Neil Trevet, president of the Cronos Foundation, which operates the glTF standard. He is also the VP of Developer Ecosystem at Nvidia, where his job is to make it easier for developers to use the GPU. It explains how to use glTF in other digital twin and meta formats such as USD, and where it is going.

VentureBeat: What is glTF and how does it fit into the ecosystem of file formats related to metavers and digital twins?

Neil Trevet: At Khronos, we’ve put a lot of effort into 3D APIs like OpenGL, WebGL and Vulkan. We’ve found that every application that uses 3D needs to import assets at some point or another. The glTF file format is widely adopted and is very complementary to the USD, becoming standard for composition and writing on platforms such as Omniverse. USD is the place to be if you want to put multiple tools together in sophisticated pipelines and create very high quality content including movies. That’s why Nvidia is investing heavily in USD for the Omniverse ecosystem.

GlTF, on the other hand, focuses on being efficient and easy to use as a delivery format. It is a lightweight, streamlined and easy to process format that can be used on any platform or device, including a web browser on a mobile phone and below. The tagline we use as a analogy is “glTF is the JPEG of 3D.”

It also complements the file format used in the authoring tools. For example, Adobe Photoshop uses PSD files to edit images. No professional photographer will edit JPEG because a lot of information is lost. PSD files are more advanced than JPEG and support multiple layers. However, you wouldn’t send a PSD file to my mom’s cellphone. You need JPEG to deliver it to one billion devices as efficiently and quickly as possible. Therefore, USD and glTF complement each other in the same way.

Venturebeat: How do you move from one to the other?

Trevet: There must be a seamless distillation process from USD assets to glTF assets. Nvidia is investing in a glTF connector for the Omniverse so that we can import and export glTF assets in and out of the Omniverse. At the glTF Working Group at Khronos, we are pleased that USD meets the industry’s requirements for the authoring format as it is a large amount of work. The glTF aims to be the perfect distillation target for the USD to support widespread deployment.

The authoring format and delivery format have quite different design requirements. The design of the USD is about flexibility. This helps to compose things to create a movie or VR atmosphere. If you want to bring another asset and blend it with the existing scene, you must retain all the design information. And you want everything at the ground truth level of resolution and quality.

The design of the transmission format is different. For example, with glTF, vertex data is not very flexible for rewriting. But it transmits in the specific form that the GPU needs to run that geometry as efficiently as possible through a 3D API like WebGL or Vulkan. Therefore, glTF makes a lot of design efforts in compression to reduce download time. For example, Google has contributed their Draco 3D mesh compression technology and Dwipadi has contributed their Basis Universal Texture Compression technology. We have also started making a lot of efforts in LOD management so that you can download models very effectively.

Distillation helps to move from one file format to another. A big part of it is stripping design and authoring information that you no longer need. But you don’t want to reduce the visual quality unless you really need to. With glTF, you can maintain visual fidelity, but you also have the option to compress items when you’re aiming for low-bandwidth deployments.

Venturebeat: How small can you make it without losing too much loyalty?

Trevet: It’s like JPEG, where you have dial to increase compression with acceptable loss of image quality, only glTF is the same for both geometry and texture compression. If it is a geometry-intensive CAD model, geometry will be a big part of the data. But if that consumer-oriented model is more, the texture data can be much larger than the geometry.

With Draco, it makes sense to reduce data 5 to 10 times without any significant reduction in quality. The same is true for texture.

Another factor is the amount of memory it takes in, which is a valuable resource in mobile phones. Before we implemented binary compression in glTF, people were sending JPEGs, which is great because it is relatively small. But the process of unpacking this into a full-size texture can take hundreds of megabytes even for a simple model, which can damage the power and performance of a mobile phone. The glTF texture allows you to take a JPEG-sized super compressed texture and instantly unpack it into the GPU native texture, so it never grows to full size. As a result, you reduce the required data transmission and memory by 5-10 times. This can help if you are downloading assets in a browser on a cell phone.

Venturebeat: How do people effectively represent the texture of a 3D object?

Trevet: Well, there are two basic classes of textures. One of the most common is image-based texture, such as mapping the image of a logo on a T-shirt. Another is procedural design, where you simply generate patterns like marble, wood or stone by running an algorithm.

There are many algorithms you can use. For example, Algorithmic, which Adobe recently acquired, has now come up with an interesting technique for generating textures used in Adobe Substance Designer. You often create this composition in the image because it is easier to process on client devices.

Once you have the texture, you can do more than slap it on the model like a piece of wrapping paper. You can use those textured images to enhance the look of sophisticated content. For example, physically based rendered (PBR) content is where you try and take it as far as you can to mimic the characteristics of real-world content. Is it the metal that makes it shiny? Is it translucent? Does it reflect light? Some more sophisticated PBR algorithms can use as many as 5 or 6 different texture maps that can be fed into parameters indicating how glossy or translucent they are.

Venturebeat: How has glTF progressed on the visual graph side to show the relationships within an object, such as how the car’s wheels can spin or connect multiple objects?

Trevet: This is an area where the USD is far ahead of the glTF. Most glTF usage cases so far have been satisfied by a single asset in a single asset file. 3D commerce is a leading use case where you want to bring a chair and place it in your living room like Ikea. It is a single glTF asset, and many cases of use have satisfied it. As we move towards metavers and VR and AR, people want to create scenes with multiple assets for deployment. An active area in the working group is discussing how we best implement multi-glTF views and assets and how we link to them. It may not be as sophisticated as the USD because the focus is on transmission and delivery rather than authoring. But glTF will have something to enable multi-asset composition and linking in the next 12 to 18 months.

VentureBeat: How will glTF evolve to support more metavers and use cases of digital twins?

Trevet: We just need to start bringing things out of physical appearance. We have geometry, texture and animation in glTF 2.0 today. The current glTF says nothing about physical properties, sounds or interactions. I think a lot of next generation extensions for glTF will put in this kind of behavior and properties.

The industry is now kind of deciding that it is going to be USD and glTF going forward. Although there are older formats like OBJ, they are starting to show their age. There are popular formats like FBX that are proprietary. USD is an open-source project, and glTF is an open standard. People can participate in both ecosystems and help them develop to meet their customer and market needs. I think both formats are evolving together. Now the goal is to keep them aligned and to have this efficient distillation process between the two.

Similar Posts

Leave a Reply

Your email address will not be published.