Immersive Storytelling in the Metaverse

Season 4 episode 5

March 28, 2023

Subscribe:

Ben Grossmann, Oscar-winning visual effects specialist and co-founder of Magnopus, joins Patrick Cozzi (Cesium) and Marc Petit (Epic Games) to discuss AR/VR, mixed reality, and immersive storytelling in the Metaverse.

Guests

Ben Grossmann

Co Founder and CEO, Magnopus

Ben Grossmann

Co Founder and CEO, Magnopus

Listen

Watch

Read

Announcer:

Today on Building the Open Metaverse.

Ben Grossmann:

Whether it's Fortnite or Roblox or Gorilla Tag, they're in iPads and VR headsets and mobile phones and laptops, which gets into this metaverse idea, the free and interconnected experience of things across physical and digital.

Announcer:

Welcome to Building the Open Metaverse, where technology experts discuss how the community is building the open metaverse together. Hosted by Patrick Cozzi from Cesium and Marc Petit from Epic Games.

Marc Petit:

Hello, I'm Marc Petit from Epic Games, and my co-host is Patrick Cozzi from Cesium. Hey, Patrick. How are you today?

Patrick Cozzi:

Hey Marc, I'm doing fantastic. We're in for a real treat today. I'm ready to learn a lot.

Marc Petit:

Today, we're super happy to welcome Ben Grossmann, an Oscar-winning Digital Effects Specialist, to the show.

Ben is known for his work on Star Trek Into the Darkness, Hugo, from Martin Scorsese, and The Lion King, with Jon Favreau. Ben has also worked on... We'll cover some of them, a full spectrum of projects, from AR/VR, to theme parks, like Harry Potter, and a lot of other things with his company, Magnopus. So Ben, we're super happy to have you on the podcast.

Welcome.

Ben Grossmann:

Thank you, Marc, Patrick, good to be here. It's an honor.

Marc Petit:

We usually like to start the show by asking our guests about their backgrounds. Are you from Alaska?

Ben Grossmann:

I traveled around a lot as a little kid because my dad was in the military and then he decided to get out of doing crazy stuff in the military and settle down in Alaska in the middle of nowhere.

That's mostly where I was raised.

Marc Petit:

How did you find your way from Alaska to the metaverse?

Ben Grossmann:

Well, eventually, it gets too cold, and you decide you've had enough. Just throw all your stuff in the trunk of a car and drive south until it gets warm enough. That's what happened to me.

I think one of the things that's interesting about Alaska is that there are very few people there, so you have to become a bit of an expert in everything. And so that developed a personality of curiosity and learning. I mean, it's like even if you work at a bank, then you have to know how to run a chainsaw, and you have to know how to repair the starter on a truck. As a result, you're probably good at dealing with bears and all this other eccentric stuff that comes as a part of living on the frontier.

There's not really anyone to tell you you can't do anything. You've developed this mindset of having to do everything. That got me into media, computers, technology, and figuring things out.

I was a photojournalist. I worked in journalism, and then I got into television journalism. I was a weatherman in Fairbanks, Alaska. Then I think at a certain point I was like, oh, well, this stuff's cool. I got one of the first Avids in the state of Alaska, learned how to fix it because I couldn't afford the support contract, and became an editor, motion graphics, After Effects, all that kind of stuff.

Then one day, I was like, I'm running out of things to do in this area of work. I really got annoyed with how cold it was one particular day; My car broke down, and the starter died. So I got the car push started because I knew that I couldn't get the starter for a long time. I drove straight to California, and I stopped in San Francisco, but I couldn't find parking. So, I kept going to Los Angeles, and then I decided to get into Hollywood.

Patrick Cozzi:

I think I learned the temperature is nice in the metaverse as well.

Ben Grossmann:

It's wherever you want it to be in the metaverse.

I guess I only got halfway there. If I was going to get all the way to the metaverse, I did a big detour in visual effects, which is foundational, I think, education for having a conversation about the metaverse. But I got to LA, nobody would hire me for anything because I was from Alaska. Apparently, it turns out that if you work in Los Angeles, nothing that you did anywhere else really matters. You have to start from scratch.

I got a job as a secretary and a receptionist. I would answer phone calls and route things to the front desk, and then I would fix problems for people because that's what I inherently am attracted towards. Then, somebody, one day, said, "You'd probably be good at visual effects. It's got all that stuff."

I was like, "Well, I don't really know too much about it." And they said, "Well, nobody does." Visual Effects is the department of the movie business where nobody understands what's going on. If it's something they can't figure out some other way, they just dump it on Visual Effects, and then those guys have to figure it out. I don't know how it works. I was like, "That sounds interesting." I like chaos.

I started working back office for a Hollywood talent agent, Bob Coleman, who represents visual effects artists. I was doing websites and formatting people's resumes and compressing demo reels and stuff like that.

The deal was I'll do all this office work, but someday if you get an opportunity to put me in for an interview for a visual effects job, please consider me if none of your clients are available. One day that chance came; I started as a roto paint artist, where my job was to paint out dust and wires, and other undesirable items from film scans for visual effects movies. And then, yeah, I don't know, 10 or 15 years of that and won some awards, and then I was exhausted.

Patrick Cozzi:

Speaking of those awards, in 2012, you won an Oscar with Rob Legato for your work on Hugo, and then you created Magnopus and ended up working with John Favreau on the groundbreaking new version of Lion King.

Virtual production has existed perhaps since the first avatar movie in 2008, but you helped virtual production make a huge leap forward with this recent Lion King. Could you describe to us what you built?

Ben Grossmann:

Virtual production is really just everybody's dream in visual effects because when you're in visual effects, you're like, why is this so slow, and why does this take so long? There's a movie called Wag the Dog, which actually fooled me into working in visual effects; I watched the movie Wag the Dog, and there all the visual effects are real-time. I was like, awesome, real-time visual effects; this is super cool. You just say whatever you want, and it appears.

When I actually got into the visual effects business, I kept being like, where's the room where the visual effects happen in real-time? After five years, they were like, there's no room where anything happens in real-time. This takes months to render. I was like, well, this sucks.

So I spent most of my visual effects career trying to be like, how do we make this faster so that people can just feel what they want instead of having to articulate it in an email that gets written into notes that get assigned to 600 people?

Along the journey of visual effects, we were always trying to find whatever little tricks we could do to make visual effects happen in real-time so that we could avoid having to do visual effects in the first place. Virtual production is really just that in a nutshell. It fills that spectrum between physical production, traditional ways of making movies, and digital production, which we call visual effects. You just fill in that gap, and that's now these techniques of virtual production, bring the visual effects into real-time as much as you can, bring the interfaces of traditional filmmaking as much as you can, and put them in the middle.

Eventually, I think virtual production will just go away because it'll just be called production, but today it's a gap. On The Lion King, we had actually worked on the Jungle Book, and on Jungle Book, if you actually see what it takes to make a really complicated movie like the Jungle Book, it is an immense accomplishment.

Thousands of people are just working night and day trying to make sense of the same problem that the metaverse has: how do you reconcile the physical world we can see with a digital world that we cannot? The director, John Favreau, was so mentally exhausted from trying to imagine what the blue was going to become someday that even the film crew was like, "wait, I don't get it, is that the bear, or is that the giraffe?", "Wait, what's going on? Is the boy on a tree, or is he on a rock?"

When you look at that problem, you're like, "man, this sucks." Then the problem is that visual effects is such a low-margin business. There's no financial availability for innovation and when you do innovate some little piece of technology, you can't really productize it into a solution.

Surprisingly, we would be going to all these SIGGRAPH conferences, and we'd see all these amazing things. I started in roto paint; I take roto and paint very seriously. When I was a visual effects supervisor on set, I'd be like, no, no, no. We need better tracking marks there because when I was a match mover, that was really hard, and we need better green screen illumination there because that hair detail’s going to have to get painted back in frame-by-frame by an artist because that was my job.

But the problem was that I'd see all these research papers at SIGGRAPH, and I'd be like, but they solved that problem five years ago. We just don't have it in visual effects because no one productized it.

In Jungle Book, it was really tough, and I saw how much of a struggle it was for everybody on the crew, and they did amazing work. If you saw that movie, you're just like, wow, how is it that we have a real boy integrated into a completely digital world? That was visual effects artists, and that was grips, and that was direction, and that was editorial; That was everybody working together, and it just almost killed them all.

On The Lion King, John was like, I don't know, how do we do this? Is there a better way? And it was like, why don't we just try throwing out all the visual effects software and starting over, and let's build the world almost like it's a video game, and then let's make a video game called Make a Movie, and we'll build the world of the movie inside the video game, and then we'll shoot the movie inside the video game. It was like, is that possible? It's like, eh, probably.

Most people were not really happy about probably. Eventually, we had to upgrade that to “absolutely, a hundred percent, no problem.” It took a little bit of work, but instead of working with visual effects technology that had evolved in fits and starts over decades, we were able to start over in a game engine that was written for games and start building things that way. We just wrote all the physical filmmaking tools into the game engine, then the filmmakers could decide I could put on a VR headset, and I could be in Africa looking at lions, or I can just look at a monitor like I would on a movie set, and I can see the same kind of experience. I can operate physical equipment, cranes, dollies, cameras, drones, whatever, that I would operate on a movie set, but they're all synchronized into this virtual environment at the game engine.

It almost was deceptively simple. By the time you were done, you were just like, it really wasn't that hard. The process of making the movie was really pretty easy. People could come on set who were traditional film crew people, and at first, they'd be like, what the hell is going on in this stage? You're like, it's actually a lot simpler than it looks. You just do the job you always have done, and here's the equipment that you've always used. It just happens to be synchronized between physical and digital.

That was fun because even though we couldn't achieve the quality level that we really wanted, and we couldn't make it totally photo real and all that stuff, it was enough that editorial knew what they were cutting to as opposed to on the Jungle Book where it was like, okay, wait a second, for this one shot, I got seven plates that I have to soft cut together.

They're like, what are we doing? It's crazy.

We would take things in The Lion King, and make them indicatively close enough so that editorial knew what they were doing. Then the visual effects artists stood a better chance of going home at night at a reasonable time to have dinner because they were like, oh, this is what the director wants. This is what the cinematographer shot, so now we just need to make this look photo-real and add all the polish and beauty to it.

Even lighting direction and stuff, we're doing it in real-time on set. It served as a blueprint. To a certain extent, it kind of took visual effects, which is a chaotic process that has been commoditized like a factory. It gets treated like a factory, but each product the factory makes is a unique prototype that there only exists one of.

It takes that process, and it makes it more like the way Apple manufacturers phones and computers, let's do all the crazy design in real-time over here with the people who can make decisions and commit to them, and then we can execute those designs with much more predictability because they're not going to suddenly change the camera or go in a different direction and all that kind of stuff. That was the thing that we tried to accomplish there, and we were super excited about it. John Favreau wanted to take that further on his next project; that was even more virtual production than anybody anticipated.

Marc Petit:

The Lion King system was more of a design system, and then you guys, Jon Favreau, and you had this vision of trying to generate the final pixel with the game engine. Using the game engine to generate the final pixels in an LED volume for that super-secret show at the time called The Mandalorian. The visual production and the post-production work ended up being done by ILM, and I don't think that people know the role that you played, at least in the inception, in the vision.

Ben Grossmann:

I am only but a tiny little piece of a much larger team full of really talented people. I will also return some credit to the team over at Epic that Kim put together in special projects because I think without them, The Mandalorian would never have existed that way. It would've been a visual effect show, and that's that.

There's no question in my mind that without the special projects team in Epic proving that that theory was true, that never would've happened.

But John was like, okay, I love this. I love what we're doing here on the line. I like being able to go back and forth between VR and monitors and whatever, but the next project that I'm going to work on has more physical sets and more real people. It was funny; at one point, somebody said, do we just put the actors in VR headsets and then paint out the VR headsets later?

I was like, here's a better idea. How about instead of doing that, and Rob Legato and John and everybody were all brainstorming on this, why don't we just instead build a stage that's a VR headset and shoot inside that? Then it was like, well, hang on a second. Is that even possible? It's like, yeah, there are a lot of little technical things to overcome, but you just have to decide that you're going to do it and prove that those things are correct, and then it probably works. The Mandalorian just needed to be proved.

We actually kind of proved the concept while we were working on Lion King by taking a television set and a camera and filming little miniature scenes in front of the television set. Rob set up all this stuff and said, okay, well, here are the problems we have to solve.

We have to get camera tracking. We have to get this thing fed, and then we have to do that times a really big stage. That was super awesome, but it was really as simple as, well, sure, we just have to build a VR headset that's as big as a stage and then film inside that.

It does kind of go back to the movie Wag the Dog, which is now you're in a magic room where in theory, as technology continues to improve, you could just click a button and do location changes. It's not that simple today, but it will never get that simple if we don't start solving all those problems and keep working on it.

Patrick Cozzi:

You were interested in following where consumers were going. For example, they realized that maybe immersiveness is more important than visual fidelity. Could you tell us more?

Ben Grossmann:

Oh, yeah. It's really simple, and I don't recommend this to anybody, but the first step is to have a kid. As soon as you have a kid, you suddenly realize that that thing that you were working on for all those years is no longer important anymore and it's actually immaterial. I was like, yeah, working in the movie business, famous directors, super high-quality stuff, winning awards, doing badass shit. Then I had a kid who just didn't give a single shit about any of that. This little girl was just like, Hmm.

She was much more interested in interacting with things than watching stuff. And when I watched her watch stuff, it was almost like her brain shut off, and she just started to drool because she was just staring at a television. I started to have this sort of existential crisis where, as a parent, I was like, oh my God, what have we done?

We're creating these spectacles for televisions that just suck people in and make them just couch potatoes or just stare and consume and blah.

When I would watch my daughter play with toys or even play with her iPad, it was like she was learning, she was engaged, and she was doing cool stuff.

I don't want to diminish the value of one-direction storytelling and cinema and all that stuff, but I was like, when we were kids, those things didn't exist. The world that we lived in was experiential by default. We were always running around in the woods, running around in the city, and we were falling and hurting ourselves. We were experiencing the world. But these days, now, digital media is everywhere, and if you're just consuming that, it's two-dimensional, usually, it's linear, your brain just bluhhh.

I wonder what happens if we do what we've been doing with visual effects for all these decades, but in the real world instead of for filmmakers; what if we could put children inside the television and let them run around?

What if we could take the world that's beautiful and amazing, all these wonderful places that get created in movies, and just put them into the world around us, whether it's parks or shopping malls or your street by your house? What would that be like? That problem that we talked about with SIGGRAPH where it's like you have all this really great research that doesn't actually get put into people's hands. We looked at all of the research, and we were like, it's all there, but no one is making it available to people. Some company needs to exist in the middle of this.

Every time we would go work for a visual effects company, you'd end up working for visual effects, and you just keep doing the same thing. You work for a movie studio, and you just keep making movies. When you're a carpenter, every problem just looks like a nail because that's a hammer you're holding.

We decided we needed to create a new company that just could exist in that gap and try to pull those two things together and say, what if we could make these non-immersive worlds actually more immersive? What if we could make these really high-quality experiences have the interactivity of low-quality games? What would happen? What if you could do physical things with digital? What if you could do one experience that goes across multiple devices? That's kind of our jam. It really just comes down to trying to fill that gap and pull those things closer together.

Because now, particularly, I think because people are into the internet, apparently that's a thing that's going to stick around for a while. All of these different things that used to be very separate are now converging, like sports and music and movies and games.

They're all sort of headed towards each other, and we still treat them like they're separate industries and separate experiences, but children don't. Now, that little girl that was a kind of whiny, complaining little brat when she was like three is now 12, and she still complains a lot, but at least she can articulate her feature requests much better than she did when she was three.

She and all of her friends move back and forth constantly as a group into all these different experiences, whether it's Fortnite or Roblox, or Gorilla Tag; they're in iPads and VR headsets and mobile phones and laptops. They're on all the platforms simultaneously, and they're doing all different things. Sometimes they're watching, sometimes they're gaming, sometimes they're playing, sometimes they're learning, and they follow that freely, which kind of gets into this metaverse idea, which is really just the free and interconnected experience of things across physical and digital.

That's what everybody's kind of working on. Everybody has different descriptions, nobody agrees, but generally speaking, you just have to watch a bunch of 12-year-olds, and you'll figure out exactly what the metaverse is.

Marc Petit:

It looks like creatives from old media are converging on the same set of tools. They're getting to speak not the same but similar languages, particularly game engines.

Now we have, as you mentioned, some of them; we have platforms to actually share the 3D content.

Take out your crystal ball. What can we expect from that fusion and have everybody with the same tools literally on the same platforms?

Ben Grossmann:

It's funny because when you say Roblox and Fortnite, it's interesting. I consider them very early glimpses of just one piece of what the metaverse promises. I guess I should probably take a minute to define the metaverse, at least the way I think about it.

I kind of did, using the 12-year-old kid's metaphor, but a lot of people interpret the metaverse concept from where they come from. If you come from games, then you think the metaverse is just interoperable games. It's a network of game worlds. That's how you think about it. If you come from crypto, then your entire existence is predicated on the notion that there needs to be a decentralized blockchain or your crypto coins are worth nothing. You tend to think the metaverse has to be a decentralized, whatever it is. If you come from movies, then the metaverse is, what is this? Just movies in VR.

Everybody has this different angle on it, and all of them are correct, but only partially. If you think about the metaverse, the way that the internet works, what's on the internet? Information, experiences, entertainment, content. You could play games, you could watch movies, you can read Wikipedia, you can learn, you can go to school.

The internet is a connected network of all these kinds of experiences. You use it for work, you use it for entertainment, it does all those things. You watch sports on it. So what is the metaverse? It's just a cross-platform, cross-reality, and spacialized version of that.

I think about it like this, whenever I'm trying to explain this to my mom for the fifth time, I say the internet today, as you know, is a series of connected pages. It uses a newspaper metaphor.

That's how it was designed. You have pictures and words on a page, and then you can change pages. The only advancement over newspapers from a hundred years ago or 200 years ago is that you can click on the words, and they take you to other pages, or you can click on the pictures, and they move into videos. But that's how the internet is architected. The current thinking for the metaverse is just, well, we live in a physical world of spaces.

Bedroom goes to living room, living room goes garage, car, office, gym, movie theater. Those are all spaces connected. If the current internet is a series of connected pages, then the next internet or the metaverse is a series of connected spaces. Those spaces can be anywhere between the physical world or the digital world and anywhere in between.

When we think about creating content for something like that, you suddenly realize, oh, well, music is going to need to use the same architecture that games use, which needs to use the same quality bar that movies use, which needs to use the same interactivity level and passion level that education uses.

All of those things kind of smash into each other; You realize everybody needs the same thing, but they're just calling it different shit. All we got to do is agree on the same simple standards that we had when we invented the internet with HTML. Okay, that's the next one. And off we go.

What's possible is you're immediately going to have a million problems. Anytime you throw out the foundations of a civilization and bring in a completely new one, somebody's going to lose, and somebody's going to win. The bigger problem is really content creation, which is like the number one problem because if you look at the visual effects business and the movie business, what's the problem there? The cost and the time it takes to make great experiences. Look at the games industry. How many years do you work on a big game launch?

They're worth billions of dollars, but it takes so much time. The irony is people are freaking out about AI, and they're just like, oh, no, AI's going to take our jobs. I'm like, you couldn't have kept up with the demand that is just around the corner if you wanted to. The reason why I can tell you, because this is the line that Magnopus lives in, I can tell you for the last 10 years, people have wanted things from us that we couldn't deliver because they were too expensive to produce.

People would come in with an amazing idea, and we'd say, that is a genius idea. It's at least $20 million worth of labor. They'd be like, oh, well, I don't know if I can make... Shoot, I don't know. Then they would exit through the sad door.

As we start to innovate all these pieces of technology, that make content production scalable and interactive and around common standards and common file formats so that a particular thing that you made can be used in sports or education or entertainment or whatever in the physical world or the digital world, then suddenly, now, anybody can afford to have these great experiences that start to mirror the things that we as humans have been doing for hundreds, thousands of years. Now, you're into some cool shit.

I think, what does it look like? I mean, it's like a whole bunch of chaos and turmoil and bitching and moaning, but eventually, just resigning ourselves to the fact that people will now start to get used to living in one world instead of two. Right now, they live in two. When I'm walking around on the street, it's a physical world. When I'm sitting here doing this at two o'clock in the morning, doom scrolling on Insta, whatever, that's the digital world, and we're just going to put those two things together, and it'll be a generally much better experience.

Marc Petit:

You said something about how visual production, how actors get more context.

Music is something that stands, I believe, to benefit from the metaverse; you worked on the Madison Beer virtual concert with Sony.

What's the potential for performers in the metaverse? What opportunity does it open?

Ben Grossmann:

Music is a crazy place.

We all kind of remembered Napster, and some of us remember records to cassettes, to CDs, and that transition. When you think about games, think about computers, you think about retail, you can think about all these things. I don't think there's a single industry in the world that has been on a bigger rollercoaster than music. If you just go back to the history of the world and just look at music, it is like woo, woo, woo, all over the place.

I really feel for musicians, and what they've been going through because there's never been any stability in the music business. It's like from one generation to the next, it's a completely different industry. It's a thing that everybody agrees that they love and they hate. It's a part of everyone's, like, everyday life.

People listen to music constantly and constantly trying to reinvent and tell stories and do all kinds of crazy stuff.

In the metaverse, I actually think that going through a phase where music videos were the thing, it wasn't even a song if it didn't have a music video, and you had to go to MTV, you watched the music video now where the TV doing reality television, like, nobody watches. It's all kind of weird and changed. For musicians, bless their little hearts for continuing to give us their art, despite the fact that we have somehow decided we no longer want to pay for it and the music industry is all in this chaos and turmoil. Musicians become artists much more than just through people's ears. They now become an identity and a culture rather than a song and a sound.

These new possibilities of the metaverse give them a much broader canvas to create with, because musicians are extremely creative and inspiring individuals who really aren't just limited to creating music.

They're an expression of art in many other ways, but they just haven't been given a really good canvas to do that because it's very cost prohibitive. Music videos are like a million dollars, and the economics of them very rarely work out because no one buys music videos; they’re just marketing for songs. But now no one buys songs anymore. So what are we doing exactly? People subscribe. What do you do? Your budget for your music video is 2% of 20% of 50% of the gross. You're like, how do I budget that?

The metaverse gives people an opportunity as we democratize content creation, it gives musicians an opportunity to create communities where they are in a bidirectional relationship with their fans in a much more sort of, and I'll use the term, the Silicon Valley term scalable way because right now, the way they do that is they kill themselves touring around the world until there's nothing left.

They're exhausted. They can't even sing anymore, and they can't possibly connect with all the fans as intimately as they want to. In some way, well, I'll go back to my daughter as an example. My daughter has a VR headset, she got all the tech she wants. She went to a Taylor Swift concert in VR; in her experience, when she was recounting that to somebody else, the VR wasn't a part of it. She didn't say, I went to a Taylor Swift concert in VR. She said I went to a Taylor Swift concert. Actually, wait, I take it back, it was Billie Eilish, not Taylor Swift. She actually went to a Taylor Swift concert in real life, Billie Eilish in VR.

To her, she was at the concert, she saw the experience, and she saw the stuff that everybody else saw. She was in front row seats, all this kind of stuff.

The VR wasn't really a material part of it, it was just an accessibility thing. She had a really sweet performance because Billie Eilish kind of looked right at her and gave this experience that my daughter just said, yeah, that's the same as a physical experience. Now obviously, we know that there are differences, and so does she, but are they material enough? No.

They made it so that this performance could be more individual and could be unique and could be better, and could reach a larger audience without as much of the sort of soul-draining exhaustion of trying to get in every show in Cleveland before you move on to Spokane or whatever it is. I think that this opportunity will allow people, and right now, the business model hasn’t kept up with the possibilities. Like the business has to keep up with the possibilities, or we can't do this at all.

Because if people can't make money, then they can't invest in making content. If they can't invest in making content, then there's no content to consume. There's no point in buying a device because there's nothing there to see. As we do all of these things iteratively and build them all up, musicians as artists will create amazing spectacles, and they will be able to have a better relationship with their fans. They will be able to have better economics for themselves, and they'll be able to reach more people. The economics just have to be there to support that transition.

Right now, the economics for a musician are if you're not touring, you're not making money. If you're not selling merch, you're not making money, because you're not selling music because that doesn't happen anymore. I think music is a place where musicians deserve a better experience, they can have a better experience, and we just have to make that possible and then let him run with it.

Patrick Cozzi:

Ben, we're hoping to talk a little bit about theme parks and digital-physical integration.

At Magnopus, you just received the Time Magazine Invention of the Year award for your work on the Dubai 2020 World Expo. Could you tell us a bit about it?

Ben Grossmann:

You've heard me say a lot about filling in the gap between physical and digital. This is a sort of tricky thing because most people actually just focus on digital because it's easy. Even our developers here at Magnopus always gravitate towards digital first, because it's easier to do digital first. It's really hard to build things in the physical world. It takes years, whereas in the computer, you can just be like, well, I did this in a week, and then, over here, you're like, well, that took five years, and we're not done yet.

The physical world is the world that we live in. It's what we're most familiar with. If you want to change the future, you've got to go where people are; Start there.

Many people are like, no, no, no. We're going to build this amazing island out in space, and we're going to lure everybody into it with the Sandbox or Decentraland.

I'm like, no one's there. No one cares because you're trying to build someplace and then attract people to come there. You got to build where they are. So, we build from the physical world up, rather than from the digital world down.

A lot of our experiences, even though we've done certainly tons of VR and completely digital worlds, a lot of our experiences’ thinking starts from adapting the physical world. We had been doing a lot of little prototypes and experiments; We had been thinking, okay, we kind of need to build a platform that isn't an AR platform and isn't a VR platform and isn't a web platform. We had all these customers who were like, well, I want to build this thing, but then eventually, I need to adapt it to this, into this, into that. We started focusing on building something that was all of those things at the same time.

We don't want to make a game engine because there are great game engines out there, as you well know. We just want to make the connective technology to make it so that when somebody invests in making an experience that works in VR, it also works in AR. If it works in those places, it also works on the web, and so on and so forth.

Reducing the barriers, getting this metaverse idea out there, we'd been doing a lot of prototyping and experimenting with saying, Hey, AR and VR, now this is, I think, accepted as self-evident, but at the time, it was a bit of a…people kept telling us we were idiots. AR and VR are basically the same thing. It's just what television you're watching it on. In AR, it's the same content, but it has a physical world background. And in VR it's the same content, but it has a digital copy of that background.

You could look at it that way, and that's how you're going to bridge this digital and physical gap.

We had done a bunch of experimentation with a bunch of different people, and we'd said, okay, it's time to start experimenting at scale instead of just, we know we can do this with 10 people at a time, but what would happen if we did hundreds of thousands? What would happen if instead of doing this room, we did a small city?

We made a prototype back in 2016-17 and '18. It was constant iterations; we basically made a digital twin of a room, and we made it accessible on AR glasses, on a VR headset, on a webpage, and on an iPad. I think that was it because, from all those platforms, you pretty much have everything. Desktop application was also one of those. We made them all in sync so that wherever you were, all those users could see each other from all those platforms.

Wherever they were, they would move around appropriately. It was super trippy, and we're like, awesome. This works at small scale. Let's try to build something big. We knew that there was a World Expo, and World Expos are kind of a great opportunity. They used to be showing the world new technologies. Now, they're showing the world food and culture in cuisine because technology happens now on the internet and in real-time. You don't have to go to a World Expo to learn about the latest advancements in steam engines anymore.

Fortunately, this expo was in Dubai, which was a place that not very many people, at least in the western world, would get to go, even though it's one of the biggest travel hubs of the world, particularly for Africa and Asia. The founders of the expo, their message was connecting minds and creating the future.

We were like, this is perfect because this is exactly what the metaverse is really about. They didn't have anything around that strategy as formally part of their mission.

We made about 600 pages of presentations and spent about literally a year working with them to kind of create a vision to say, what if we made a digital twin of the expo in Dubai and we made it accessible and bidirectionally accessible to people who are onsite and offsite.

AR content would be viewed onsite through AR, and remote content for remote visitors would be viewed on top of a digital twin. Then because of the pandemic, which happened right in the middle of building all this complicated stuff, because of the pandemic, we said we actually need to lower the quality of everything so that we could increase the breadth of everything, so we can get more content in there accessible on lower end devices. Because it became obvious that the pandemic was going to impact people who couldn't afford to travel; now it was even more complicated. We needed to make it more accessible.

We ended up doing exactly that. We built... It was accurate to within three centimeters, including a positioning system that would go from GPS satellites down to wifi spectrum, down to visual positioning, which is the phone sees something, creates a 3D fingerprint of it, checks it in the cloud against a 3D fingerprint of the known area that it was in based on GPS, then reconciles exactly where the phone is on the physical site to within centimeters and then can pull all the models and content.

I know this seems a little heavy now, but it basically was like you could have a digital user theme, a physical user bidirectionally in this large digital twin, and we failed so many times.

The entire process is just a journey of failure until time runs out, and then you ship what you've got, which was, I think, better than anybody else has ever done. It was a really hard problem, and we learned a ton from it.

As a result, that Time Magazine thing that you talked about was us saying, cool, now that we have paid the iron price for making this stuff work and, courtesy of the vision of the team at the World Expo for supporting us through a very rocky journey, we can now say, okay, we have a little bit of the glue that can plug all these pieces in so that other people could do this. I don't think there's many people who have spent quite as much time and money as that took on trying to put something together like that. That was nuts.

We're excited, but that's pandemic aside and the current adversity to maybe Marc Zuckerberg personally and the crypto community in general, maybe the metaverse having a little bit of an identity crisis right now. But in general, we know that these technologies just need to be made available and, as much as possible, open-sourced and standardized so that people can say, hey, I have a digital twin in my house. I use it for things that I would use my regular house for. I invite guests over. I also do planning and things like… there's a million things that you can do with these things. But right now, the technologies are just not accessible to the average person.

Marc Petit:

When you look back at the 10 years when you want to look at content creation, where are your bets on the table in terms of the technology you think are very promising?

Ben Grossmann:

All of these things are good for specific purposes, and I feel like sometimes a lot of what frustrates people in business or whatever is that when they come to the metaverse, they keep thinking that there's going to be this simple concept they can wrap their heads around and then say, "we actually do everything like that."

I'll give you an example, 360 video. When 360 video was all the rage in 2015, 2016, everyone was like, everything's going to be 360 video. It was like we did a bunch of 360 video just so that we could understand it. I do think that 360 video has a role to play in the future, but 360 video is kind of a shit experience. 360 video is an example of something where everyone was like, okay, we got to start covering everything in 360 video. Then they got disillusioned by it because they're like, hang on a second, this is expensive and time-consuming and not necessarily a great experience. Why are we doing it?

It's like, yes, 360 video is one slice of the pizza, but it's not everything.

Different experiences require different solutions. Volumetric video is another one of those examples. Is volumetric video the universal thing that's going to unlock it? No, it's not. It works great in a specific use case, and it can be mixed in with many other things. There are times when you need MetaHumans, there are times when you need volumetric video, there are times when you need 360 video, and there's times when you need 2D video. This is why one of the sort of essential ingredients of this metaverse construct is that it supports all of those things because all of them are the right thing at different times. There are a million examples of why you might want all of those in one experience or why you might be able to only do sort of one of them.

I think that's why it was so important for us. People used to get really upset sometimes at Magnopus. They'd be like, we never do the same thing twice; for god's sake, can we just do one thing and then do another version of the same thing? It was like, no, we're doing all these different things because we have to be able to support all of them into a common experience because that's how the world is going to work in the future. It's not just going to be connected pages with words and pictures that move. It's going to be all these kinds of media across physical and digital, some of them interactive and some of them not.

When it comes to the metaverse, it is really important for us to have non-licensed because you can really kill an entire industry if you make these things proprietary.

I think, for example, Pixar's USD, universal scene description, is a really important way to hierarchically organize themes with many internal and external dependencies and reference them in an open way that can be interpreted. It's not very good for real-time right now because it can be very slow when you're trying to assemble all these things, and there are a lot of improvements that it needs, but something like that. Then you start to get into things like a variety of different 3D model formats and schemas.

I think the question that you really asked me was trending a little bit along the lines of what aspects of those are most exciting and most important to content creation and unlocking the metaverse. I'm going to throw out something that's a little bit controversial right now, and it's not quite ready for primetime. It took me a long time to understand it, and I had to really play with it a lot before I kind of started to figure it out.

Obviously, me and my team and everybody else in the world has got many decades of experience building 3D content in today's methodology for creating 3D content, which is I have a mesh. If it needs to move, it has a skeleton or a rig. It has UV coordinates so that I can apply textures to it. Those textures have some schema of auxiliary variables that will allow a renderer or shaders to communicate how it should behave under light from different camera angles. Then I put them into an environment that renders the calculations of all those meshes and textures.

This is the most efficient thing that we have today, and it is the foundation of all computer graphics, but it is extremely hands-built, time-consuming, and it's a barrier to scalability because our entire computer graphics industry is architected around this. In many senses, it's because we are paying the price for decisions we made 20 years ago about how this all should work.

I think that in order to truly unlock the possibilities of content creation stuff, we're going to have to do something very scary that it's going to freak a lot of people out, which is that we're going to have to let go of that, and we're going to say, we have achieved as much as we could achieve with this model of physics that is meshes and rigs and textures and renderers, and we are going to have to abandon it all for a new model that is based on neural rendering.

When I started to see the first examples that researchers were kind of just accidentally mucking around with in the area of neural radiance fields, and these terms are not going to last, they have to change because they're like the research community, there's like something I would've read a white paper on how to name white papers because the names that people come up with in the research community just crack me up to no end.

Neural radiance fields, if you don't know and you look into them, you've seen them on Twitter, probably. There's some really great stuff. Tons of problems, tons of problems right now, no dynamic content, no this and that, no reanimatability, blah blah blah. But if you combine essentially what a neural radiance field is, and I'm oversimplifying for the audience and myself 'cause I'm not that smart, is that the computer understands this object and what it has learned about that object.

If you basically just say, "Hey, I'd like to look at that object from this angle, it goes, great, here's what that would look like." I know that that sounds like a very oversimplified version of what a neural radiance field actually is, but when you look at the properties of a neural radiance field, the storage implications of them are actually way smaller than a 3D model and a mesh and a texture and a rig and a renderer.

Although they're hard to calculate at first, there are a lot of aspects of research around the area of neural radiance fields that have optimized how computers can learn about 3D objects. That’s now become used to take many days to calculate one, and now it's getting down to hours and minutes.

Okay, cool. Once we have that, then essentially your knowledge of an object is not, here's a 3D model, and here's a bunch of textures, and here's a bunch of rigs, and here's a bunch of renderings, schematics, and all this shit, it's actually just, here's a knowledge of that object and what its capabilities are. Now, you can layer on top of that some semantic understanding of its value, and of its variables and of its variations, and its utilization. Then you can start to layer on top of that more layers of understanding so that in the future, a child, like my daughter, can say, I want a pony, and it creates a pony, and it's a 3D pony, and you can ride that pony.

It's like, no, no, no, a pink pony. A rainbow unicorn. And it's like, sure, we could do all that stuff, and we're not moving meshes, and we're not trying to select pieces from an asset catalog and assemble them because we are re-architecting our GPU and our graphics pipelines around the synthesis of volumetric information in real-time using knowledge and learning rather than using simulated photons and reflective surfaces and meshes.

I think that that's going to be a very hard thing for people to let go of, but it's going to be necessary for us to break into that new era of, oh my god, we're creating insane amounts of content that people are having a great time experiencing.

Patrick Cozzi:

To wrap up, I want to ask if there was a person or people or organization that you wanted to give a shout-out to?

Ben Grossmann:

We were just talking about AI, and I just kind of was making the case that this is going to be critical to the next generation of awesome or at least suck less. So how about a person?

I was recently on a project where I was doing a bunch of really complicated research for a bunch of really big stuff that, hopefully, you'll all get to appreciate. I got a chance to meet a woman named Timnit Gebru, and she was fantastic, and she was responsible for ethical artificial intelligence research at Google before she left with many other people in her team, rather controversially, which you can look up, but that's not really her claim to fame. The work that she's doing around addressing systemic bias in artificial intelligence development and the ethicality of many of the things that we take for granted about the way artificial intelligence works.

I think the biggest risk that we have as a civilization today is that we need to harness the power of artificial intelligence for all of the things that we need to do in life and also to help us protect us from ourselves with climate change and other things like that.

If we move too fast and do things just because it's cool and because we can, then we will wipe ourselves off the face of the planet, and listening to voices like Timnit's is the way to do it. It's the way to do it, right, at least. I would suggest everybody take a moment and go listen to some of the things she said and some of the things she's written because I think that that's going to be one of the most important things to bring the world a better place for tomorrow.

Marc Petit:

Ben Grossmann, you won an Oscar for your visual effect work. You created such a trailblazing company, experimenting with great projects, advancing state-of-the-art movie making.

Thanks for sharing your vision as your passion with us today. It's been amazing. Thank you so much.

Ben Grossmann:

Thank you for inviting me, and I have only done those things because I happen to be standing next to a bunch of really smart people who did most of the work. Stick close to talented people and try to keep up.

Thanks, Marc. Thanks, Patrick. Good talking to you.

Marc Petit:

Thanks to the audience. Season four is doing very, very well.

We have a website buildingtheopenmetaverse.org, now; we have an email where you can chat, feedback@buildingtheopenmetaverse.org, so do not hesitate.

Let us know what you think, let us know what you want to hear about, and we'll be back with the next episode soon.

Thank you very much, Ben. Thanks, Patrick. Thanks, everybody.