Building the Open Metaverse

From Unity to the Metaverse: Aras Pranckevičius on Building 3D Worlds

Aras Pranckevičius, Unity's fourth employee, discusses game engine evolution, AI in programming, the staying power of simple 3D file formats, and the promise of Gaussian splatting. He praises Blender's community and ponders the maturity of 3D on the web.

Guests

Aras Pranckevičius
Computer Programmer
Aras Pranckevičius
Computer Programmer

Listen

Subscribe

Watch

Read

Announcer:

Today on Building the Open Metaverse.

Aras Pranckevičius:
The problem within Unity is that some of the teams actually do very good work and very useful work and very amazing work, and almost no one except themselves knows what they're doing.

Announcer:
Welcome to Building the Open Metaverse, where technology experts discuss how the community is Building the Open Metaverse together. 

Hosted by Patrick Cozzi and Marc Petit.

Marc Petit:
Hello, and welcome back, metaverse builders, dreamers, and pioneers. I'm Marc Petit, and this is my co-host, Patrick Cozzi.

Patrick Cozzi:
Hey Marc, great to be here. I really love the technical episodes; we're in for a treat today.

Marc Petit:
You're listening to Building the Open Metaverse season five. This podcast is your portal into a potential world and spatial computing. It brings you the people and the projects that are on the front lines of building the immersive internet of the future, the open and interoperable metaverse for all.

Patrick Cozzi:
Today we have a special guest joining us on that mission, Aras P. He's an expert in software and game engine architecture. I believe he was employee number four at Unity, having worked there from 2006 to 2021. He's done so many articles and open-source projects on software architecture, rendering, game engines, file formats.

I personally, Aras, have learned so much from you over the years. I think I've been following you on social media since 2009. Nowadays, I see you're doing great work contributing to Blender and we're very inspired by your article and examples on Gaussian Splatting.

I think we first met probably in 2010 when Kevin Ring and I were working on the Virtual Globe book, 3D engine design for Virtual Globe, and we asked you to review some of it, so I wanted to thank you again for doing that and joining us on the podcast. We like to kick things off by asking folks to describe their career journey.

Aras Pranckevičius:
I started doing something with computer graphics in the previous millennium around ‘95 or something. Of course, sort of just as a hobbyist and so on and so forth. Then I was shortly involved in the European demo scene and a few small game development studios that no one has heard about because we didn't really ship anything notable. In some cases, didn't ship anything at all.

But then some things led to other things and I joined Unity in 2006 back when Unity was a product that basically no one had heard about, and we were just a tiny startup, back then, trying to make a fairly simple game engine. Back then, it was Mac only, of all things, and Mac was not a popular platform at all. Back then the iPhone didn't exist, Mac was still on Power PC architecture.

I worked at Unity for 16 years. I left Unity in late 2021 and I've been doing sort of random projects since then.

Basically, that's my journey. I remember we met, Patrick, I think it was GDC maybe 2010 or 2000 something. My memory of that is that you asked opinions about what became glTF one and I was like, yeah, this makes sense, but there's pretty much zero chance to make a new file format. Here we are.

Marc Petit:
Aras, I think we're all curious. Unity has an impressive trajectory and you've been their employee number four in the early days.

So can you share with us some of the most vivid memories of the early days?

Aras Pranckevičius:
A lot of these memories are sort of hazy at this point. I do remember that typical startup life back when we were just a handful of people, everyone was doing sort of everything. I do remember that the CEO at the time, David Helgason, used to cook lunch for us and clean the office because we couldn't afford cleaners. I was the one fixing the website on Internet Explorer Six because I was the one with a Windows PC, everyone else used the Macs.

This was the fun times of wearing many hats. For example, I was originally hired as a graphics programmer and then eventually I made the web browser plugin, back when no web browser plugins existed. I was one of the two people to port the Unity editor to Windows because, as I mentioned initially, it was Mac only and there was a lot of work actually ripping out all the Mac-specific stuff and replacing that with something that is multi-platform.

Of course, once we added Windows editor support then sort of all the floodgates of Windows users opened and Unity kind grew in popularity. The other growth was when we added iPhone support at the very start of the mobile gaming sort of revolution, you could say.

Marc Petit:
I want to switch over to a different topic, which is soft skills. I know you were involved in mentoring young programmers at Unity, and you said something interesting, that learning to program on the team can be challenging for us as engineers, we have to learn how to collaborate and communicate effectively.

I think you said something like that, a skill that is not taught in the university. Do you think that is still the case now?

Aras Pranckevičius:
It's very rare that you're working on some problem just by yourself, especially at a modern scale of software complexity. It's very nice if you like working on something yourself and you can work on something solo. I mean there are products and complete games made by just one or two people, right? 

Something that is not communication per se, but what I've seen for example within Unity is that some of the teams actually do very good work and very useful work and very amazing work. Almost no one except themselves knows what they're doing; this is not so much communication but more like marketing almost, so to speak, in terms of if you're doing something important or something that is just cool, some people for some reason don't really know how to tell others, potentially interested parties or your users or your customers about it.

I've seen that over many products, mostly within Unity, because that's the majority of my work experience. For example, Unity as an engine technology company would do something that is actually very useful and very cool and almost not tell anyone about it. It's kind of, yeah, we have a GPU-driven rendering now, which is fairly sort of recent for Unity, and it's hidden in one forum thread post on Unity forum somewhere.

Patrick Cozzi:
I'm going back 10 or 15 years, but when I was teaching at the University of Pennsylvania, I remember when the graphics program actually moved from Gamebryo to Unity, and I think the students, they both liked Unity the engine and they liked programming in C Sharp.

I wanted to ask you a bit about what you think the future of scripting or creation is, whether there's C Sharp, there's Blueprint and Unreal, there's Luma and Roblox, then Roblox is also doing some AI-assist for creating materials and coding. How do you see that playing out?

Aras Pranckevičius:
It will change things drastically somehow once the dust settles. With all big changes, it usually takes a decade for the dust to settle, for the workflows to actually emerge, and for the right products and the right technologies to emerge. We're probably in the middle of that with AI. Now whether programming should be text-based or node-based or some other type of paradigm, I don't know.

On the other hand, I know a lot of technical artists and other people who are more visual than I am, I guess, where they're exactly the opposite. They understand nodes, but they don't understand text-based code. It probably depends on the type of person plus their experience or their education.

Some of the things, nodes, are obviously more preferred ways of expressing intent, something that's sort of more high level, or especially if it has timing components. If this door opens, then place in whatever this VFX instance there, then after three seconds do this or that; that's just sort of boring.

With AI it seems that everyone is trying to do some sort of AI-based tooling, Roblox has their thing. Unity just recently announced something called Sentis, or Muse, that will do something. It's pretty vague what exactly that will do, but they have some marketing material and I'm pretty sure that every sort of large company is looking into that in one way or another.

Marc Petit:
Sweeney from Epic has chosen to go a different route and to create a new language from scratch to cater to the specifics of the metaverse. It's a double-edged sword argument, but you have LLM so it's going to be easier to learn a new language or, while you create a new language where you can use LLM to program.

What do you think about that?

Aras Pranckevičius:
I would say that's making a new programming language, and we have it to be successful, it has almost zero chance of succeeding. But I said the same about glTF to Patrick, so probably I'm not a good predictor.

Patrick Cozzi:
Aras, maybe that's a good segue into one of our favorite topics, which is 3D file formats and open standards. We know you've done a ton of work there over the years and recently you've really optimized the OBJ within Blender.

We would love to hear how you're viewing the landscape today looking at glTF, USD, OBJ, PLY, just whatever's on your mind.

Aras Pranckevičius:
I was building a house, and I got a 3D model of the house from an architect, and I was like, oh, I know 3D graphics; I'm going to just throw this into Unity or into something else and do a visualization before it's built just because why not? I think I got a COLLADA file of all the things out of them.

Remember COLLADA? It used to exist. I think I tried to put that into Unity and that didn't go very smoothly because it had half a million objects or something like that, and 95% of that were individual planks around the fence around the property, which has nothing to do with the house itself.

Then I tried to put that into Blender, and after a while, that was okay. I don't remember how I ended up into OPJ land, maybe I tried to export that into OBJ or something and I noticed that it felt slow, and I started profiling, and Blender being open source, one thing led to another and I ended up optimizing it a bit. 

Surprisingly enough, OBJ as a file format is one of these formats that shouldn't exist at this point because it was done in the late eighties I think, or early nineties. It doesn't actually have a specification. It has only a Read Me document that came with it, I don't know whether that was even Maya or something before Maya.

OBJ still exists probably because of simplicity; similar to that Stanford PLI still also exists. Probably also because of simplicity and also because it has a really simple binary format where if you have a 10 million vertex 3D scan, then OBJ is not any sort of text-based, purely text-based format is not the right call. PLI has a binary format.

Most of the 3D scanning community and Gaussian Splatting community at this point uses PLI. Again, I guess because of simplicity.

Marc Petit:
Let's switch gears, because we want to talk about Gaussian Splatting. We talked a lot about the past in glTF, OBX, and even OBJ in the late eighties.

Something incredible is happening with Gaussian Splatting, I think. The research paperwork was published last summer and we are now seeing commercial implementation of it. I know you've been contributing a lot of code to the community.

For our listeners, can you tell us what Gaussian Splatting is?

Aras Pranckevičius:
I'd like to describe it as blobs in space. If you know a point cloud, a point cloud is a bunch of points in space. Some point clouds have every point also he has a color or something like that. If you take a lot of these points, you can describe a surface or a mesh or a scene with them. It will take a lot of points, right? Because a point is tiny.

What Gaussian Splatting is that every point is like a blob. It's a bunch of blobs that have scale and rotation in space, and these are called sort of splats and Gaussian just means it's a blob, bloby shape. Each of these splats also has transparency and color, and somewhat crucially color can change depending on direction. That's encoded using spherical harmonics, which is a fairly standard sort of way of encoding directional value.

There was this paper this summer at SIGGRAPH; I think it was published somewhere or maybe made online shortly before SIGGRAPH somewhere. They found a way of taking a bunch of photos or pictures of a real thing and then figuring out how to create a million or so of these Gaussian Splats in space so that they kind of represent these photos when looked at from various angles.

The key insight from them was how to make a decently fast renderer for them that is also differentiable. Whatever differentiable means, I don't exactly know, but that allows them to do a stochastic gradient descent to do this training process of figuring out where the splat should be.

The actual Gaussian Splatting is not new, it's based on a paper from 2002 where it was called the elliptical weighted average splatting. They use exactly the same math, sort of; they projected on the screen in exactly the same way as that paper from 20 years ago. It's just seemingly everyone forgot about that idea until now.

It's not exactly that everyone forgot. There have been implementations that use some sort of splatter particles or blobs in space to render things. Most famous in real time is probably Media Molecule Dreams for PlayStation, where they used a bunch of things. They used sign distance fields, and voxels, and splatting, and probably something else to do the rendering, which is fairly unique among real-time renders.

Marc Petit:
NeRF was the big thing a few months ago. Can you tell us the difference? I think it's fundamentally different, between NeRFs and Gaussian Splat, but can we go through that quickly?

Aras Pranckevičius:
I know that it's trying to solve the same problem, the same sort of thing that the Gaussian Splatting is trying to solve. At this point, it's basically how do you capture real objects into something that you can view from any angle and it looks kind of realistic.

The way I understand it is that NeRFs encode that information in some sort of interconnected neuron layers where it's not intuitive to understand what exactly happens inside except you get the pretty picture in the end.

I'm not a machine learning expert as you can tell. Some of the neRF implementations, as far as I know, they're fairly slow to render in real-time, or if they're fast enough to render in real-time, then the quality is kind of not that good. And also depending on the NeRF implementation, the data sizes are between fairly small but not good-looking and very large and very slow.

Now Gaussian Splatting is conceptually much easier to understand because there's no sort of magic anyway, it's just blobs in space, that's it. For example, what I've been playing around with recently is some tooling within Unity to do rudimentary editing of them. You can crop out things that you don't want. You can remove things because, again, it's just points in space; it's easy to understand, it's easy to manipulate.

I've seen some people starting to do some VFX stuff where they apply some additional deformation or additional sort of change colors or change the textures of the splats to make them look like the scene is burning or whatever. With neRfs, I think that's much harder to do because, again, the internal representation is something that is hard to reason about.

Marc Petit:
For me, it's flabbergasting to see that we're talking about a research paper that went out weeks ago and now you have an implementation. Luma has an implementation, Polycam has an implementation. People are actually using it today on their phones and in production, so it's amazing to witness that happening in real-time. The size of it as well, and optimizing, in its early days, so you think we can optimize much further and make it a real viable production technique out of that?

Aras Pranckevičius:
I'm pretty sure that there's probably a ton of research done in generic point cloud compression somewhere, but I don't know any of that, so I just rolled with whatever I thought makes sense to me. Which is partially inspired by Media Molecule’s Dreams, the way they encode the splat into a non-infinite amount of memory.

The way I do compression is basically chop up the whole scene into very sort of small spatially coherent chunks and then quantize the data of the splats inside these chunks to as few bits as possible. If they're sort of spatially coherent, then I can get away with just ending whatever, 10 bits per position, and things like that.

For capturing individual objects in my approach, the final data size is between three megabytes and 10 megabytes, which I would imagine is a reasonable size for a use case when you want to actually view a single object like you could imagine on a webpage in e-commerce or whatever. When you have a real object and you just want to look at it from various angles, under 10 megabytes is fine probably. For capturing whole scenes, yeah, I don't know. We'll see where that goes, I guess.

Within, say, games, or something like that, I think a much larger problem with Gaussian Splats and with NeRFs actually, is that in games we want to change lighting conditions, when with these technologies we just capture reality with the lighting as it is.

Marc Petit:
I'm glad you mentioned Dreams twice because I think it's an underrated effort, it was very innovative. My interpretation is that by the virtue of just being on the PlayStation, it did not see the pickup that it could have, but it's my personal opinion.

It was a fantastic platform to create content, way ahead of its time.

Patrick Cozzi:
So Aras, we have one more topic for you, and that's the potential future of 3D on the web. I mean over the last 12 years at Cesium, I think we've written about 500,000 lines of JavaScript code using WebGL. Then on the podcast here, Mark and I have hosted folks from the Google Chrome team, Ken Russell, Corentin Wallez, Brandon Jones, and my former student Kai, and they've done amazing work on WebGL and WebGPU and web assembly.

I’m curious about your perspective on 3D on the web and where it could go and will become as advanced as what you see with the game engines today.

Aras Pranckevičius:
It's exciting to me to see WebGPU finally starting to become a thing because let's face it, WebGL was kind of getting fairly old. WebGL 2 is basically OpenGL ES 3, which is at least a decade old at this point. But WebGPU, for any practical purpose, you cannot use webGPU just yet because the market penetration is just not there yet; that will probably change in a year or two or three. I think within various niches, 3D on the web, it's obviously already here, and it's here to stay, but what's the future of it? I don't know.

Actually, for this, my own Gaussian Splatting thing, this way of compressing them and whatever. Now I'm kind of thinking about, “so which of the web Gaussian Splat viewers should I try to put my compression into right now, or should I just make one from scratch?”

Patrick Cozzi:
Aras, we love to wrap up the episodes asking if you want to give a shout-out to any person or organization.

Aras Pranckevičius:
I would probably like to do a shout-out to Blender and the whole community around it. To me, that's one of the open-source projects that seems to be doing most of the things right. Because let's face it, making an actual product fully open source is a very hard thing to do and a hard thing to do correctly.

Blender, against all odds, is one of the success stories, I guess. Especially considering how actually old it is, Blender Code Base started in 94 or something. 

It's still lacking in terms of being industry scale in a lot of aspects, but that is a reason why Maya is still the king within the large productions. But Blender is sort of moving at a fairly impressive speed.

Marc Petit:
Well Aras, thank you so much for being with us today, employee number four at Unity, depths of knowledge in graphics and a level of humility that is fine to see. Thank you so much for being with us today.

Also thank you to our ever-growing audience. You can reach us for feedback on our website, BuildingtheOpenMetaverse.org as well, and subscribe to our LinkedIn page, our YouTube channel, and all the podcast platforms.

Aras, thank you so much for being with us today. Patrick, thank you, too. We'll see you soon for another episode of Building the Open Metaverse. Thank you.