Podcast

The Future Of Generative AI in Robotics

In this episode of The Future Of, Daniel Rosenstein, Group Product Manager of Advanced Autonomy and Applied Robotics at Microsoft, joins host Jeff Dance to discuss the future of generative AI in robotics. They share the importance of generative AI in robotics, particularly the integration of visual libraries and video analysis, how generative AI enhances human-robot collaboration, and the role robotics will play in our everyday lives in the near future.

TFO Generative AI in Robotics Guest Microsoft

Jeff: Welcome to The Future Of, a podcast by Fresh Consulting, where we discuss and learn about the future of different industries, markets, and technology verticals. Together, we’ll chat with leaders and experts in the field and discuss how we can shape the future human experience. I’m your host, Jeff Dance. In this episode of The Future Of, we’re joined by Microsoft’s robotic expert, Dan Rosenstein, Group Product Manager of Advanced Autonomy and Applied Robotics. He’s here with us today to explore the future of generative AI in robotics. So grateful to have you.

Dan: I’m super happy to be here, Jeff. When you asked for me to be on the podcast a few weeks ago, I actually didn’t even know about the podcast and started listening to it and now it’s in my regular rotation. So I get to hear your voice every few weeks, which is awesome.

Jeff: Great to hear that. You know, I was so impressed in looking over your background. You’ve been at Microsoft for over 25 years. You have a Comp sci degree from Washington University. You later got a master’s from UDub in engineering and with a focus on robotics. But tell us more about your journey. You know, you’ve been at Microsoft for so long. You’ve probably seen so much evolution, so much change. Tell us more about your background also, how you got involved deeper in robotics as a leader there.

Dan: Yeah, no, thank you for that, Jeff. So, my journey starts with everybody, like when I was a little kid and had first impressions. I’ve been in love with robots since I was a little kid. I happened to be the son of an engineer, for better or for worse. And so my dad is an aerospace engineer, or was an aerospace engineer, I should say, and did software companies growing up. So I’ve been around technology my entire life. And when I went to undergrad, I went for electrical and mechanical because I wanted to do robots and then ended up switching to computer science because I couldn’t take electrical mechanical classes my freshman year. Ended up loving computer science, although I thought I didn’t want to do it because my dad was in software. Didn’t think I’d be at Microsoft for more than a year. And as you noted, it’s been 25 years for me. I’m an engineer’s engineer. I know that.

I like leadership, it’s great. I’ve managed multiple times in my career, but I really like being hands-on with the technology. And I’m a dreamer, I’m a thinker. I’m a doer as well. I like to make stuff. I like to build stuff with my hands. What I think about is not always what I’m able to create. And then I heard that the Xbox project was happening and got my way onto that project. And my big claim to fame is I was the youngest engineer, full-time engineer on the original Xbox and Xbox Live. I parlayed off of Xbox and Xbox Live to more media work. And then basically every job that I had since then got me one step closer to robotics in one way or another. I worked on face detection in photos. I worked on photo deduplication in our photos platform in Windows, so that was all the application of AI technology at the time. And at the same time, I got active with FIRST for inspiration and recognition of science and technology.

I ended up being a board member, as you had noted there. I ended up being control systems inspector and a robot inspector at the state level. And then I parlayed a lot of that work into partner applications and experiences work, which is the team that I was on. I worked on the IoT team for a number of years, ran the Maker team, which was how we brought Maker technologies from Microsoft to the Maker community, along with Intel and a number of other partners. I worked with Massimo Banzi from Arduino and Eben Upton from the Raspberry Pi Foundation and worked on Azure Edge devices. I worked on Azure Percept. I was lucky enough to be the group product manager there, where I got even further deeper into the IoT well. And as that project came to a public preview, my boss came to my office and said, hey, you want to work on robots your entire career.

Jeff: I love it. No, thank you for that background. Also, to hear about the Percept side with computer vision, the IoT side, the AI side, your journey with mechanical engineering, electrical engineering through education, through interests, and then all the years, decades now of computer science experience. I can see why that all converges into being a robotics leader today. Thank you also for, go ahead.

Dan: The one other thing, Jeff, is as I did that first work, I’m also a head coach at a high school for a team as well. And so outside of work and inside of work, like my hobby is control systems. Like that’s what I love.

Jeff: I was going to say thank you for volunteering with FIRST, you know, helping other kids develop this passion and grow their robotics interests. That’s awesome. One more question before we dive into the topic at hand. I heard that you’re working on a DeLorean. Tell me more about this.

Dan: Yeah, so I’m like.

Jeff: Are we going back to the future, or what’s going on?

Dan: I mean, it is going through an electrical conversion, so I am following Doc Brown’s plans. So I’ve had a DeLorean since 2000. I’ve always wanted a DeLorean. I talked about how pop culture influenced me for robotics. A lot of my personality, a lot of the way I talk, the way I act is actually influenced by Back to the Future.

I absolutely love that movie for every reason that people love that movie. And one of them is the DeLorean. And I had said that when I get older, I wanted a DeLorean. So back in 2015, I made the decision that I want to convert the car to electric. So what better way to do that than pull the engine out, because that forces the project?

I came up with multiple different designs between 2015 and 2019. None of them worked, and none of them were going to be successful. And I say that I’m a weird duck in that way, that control systems really is my hobby. I work on my car outside of work. I work on pinball machines outside of work. My girls had a… Had a shelf where when they were younger, if one of their toys would break, they’d put their toy on the shelf and the next day it would just magically be fixed.

Jeff: So there’s no plutonium involved?

Dan: There’s so far there’s no plutonium, but there will be a little bit less than 1.21 gigawatts. And I do want to get… Yeah, it’s a great project. It keeps, you know, for the times that I’m not hands-on at work, it keeps me hands-on and grounded at home.

Jeff: So. Nice. Nice. Thanks. Thanks for sharing that. Let’s start with why. You know, I was talking to some of the Microsoft team at a recent Northwest Robotics Alliance event, and they were saying they did a massive pivot, you know, a year or so ago due to generative AI. AI is not new to robotics, but why is generative AI a big deal for robotics now?

The emergence of generative AI as a universal translator

Dan: Yeah, so generative AI really creates this, let’s call it a universal translator. You know, generative AI at the end of the day, and I’m not trying to discount it, like it’s an extremely powerful technology, it’s really predictive text, it’s text prediction. But the power of that is, you know, text prediction, language is text prediction. And the way that language is built, it’s all about like words have meaning, strings of words have meaning, sentences have meaning, and the ability to do prediction on text, well, when you’re not only talking about English text, but you’re also talking about other languages, but then also computer code.

When you’re talking about data structures, when you’re talking about data formats, like JSON formats as an example. The ability to predict that text in that context is actually pretty powerful. And because of that, being able to both generate, but also extract semantics from the syntax, becomes extremely capable. And we’ve seen this, you know, with GPT-2, GPT-3, GPT-4, other models as well. And, you know, ChatGPT-4 just being the most, or GPT-4 being just the most recent version of it.

And so what that’s created is this ability to have, you know, for lack of a better term, a universal translator. And that universal translator can go from code to code, it can go from language to code, it can go from data structures to language, and any permutation in between.

So the ability to do translation of human to human conversation, the ability to do translation of machine to human conversation, and the ability to do machine to machine conversation has now been very much unlocked. And we’re at the very early stages of that.

Jeff: Thanks for the explanation. As we think about generative AI, underneath the neural net is these LLMs. And we’ve been learning more about the library of knowledge that’s underneath. But as I understand it, with generative AI, we have sort of the visual library. And we’re using video as well. And so we have this another aspect of these large models. Why is that a big deal for robotics?

Dan: Yeah, it’s a big deal for a couple reasons. The first one is what I would consider the most obvious, which is we as humans perceive the world, and animals as well, perceive the world through vision. It’s one of our major sensory inputs. And the ability to have the combination between syntactical and semantic structure with language, with grammar, with words, with code, with data structures, as we talked about before, and then being able to augment that and or enhance that with the perception of the physical world, part of what makes a robot a robot.

What makes a robot a robot is A, that it can sense, and B, that it can think, and C, that it can act into the physical environment. You don’t want to act into the physical environment without knowing what’s going on in the physical environment. You need to be able to sense and think in order to be able to do that. And so the ability to bring vision and the ability to bring video analysis, which is moving vision really, would at the end of it, but also bring other data streams into the large language models.

What it does is it creates context translation, awareness between different domains. Let me give a quick example. When I first started playing, when LLM started to show up, I, like everybody else, went and just tried them out. And I’m not one to jump on a technology fad. Technology fads come and go. I’ve worked for Microsoft for 25 years. I’ve seen many of them.

Real-world applications and insights

The reason I got excited about generative AI was when I asked the LLM that I was working with to, I asked a trick question, if you will. Find me the largest country that has red in their flag that is above this GPS latitude.

Jeff: Coordinate?

Dan: Yeah, coordinate, exactly. And generate a picture for me of the most well-known icon or a building from that. And it came back with the Great Wall of China. And the fact that it generated an image of the Great Wall of China from hundreds or thousands or millions of image data that was used to fad in, that’s when I realized that, A, the code that I would have had to write to generate that query across multiple different databases and data sets and then be able to, it would have been days and weeks of writing code for me to do something like that.

And it just did it. It came back with a reasonable answer.

And when I saw that power, that’s when I got excited. And that’s when I saw that this wasn’t just a fad technology and it wasn’t just large language models, but this ability to bring multimodal models together from disparate data sets. What we would do in programming and have to build out a whole software engineering infrastructure to go across datasets and databases and do inner joins and all that stuff, it was just doing it for me.

Did I get a lot of bad information and hallucinations as I was doing my experimentations? Absolutely. But it was through that learning process that I realized the power there. Yeah.

Understanding generative AI’s role in enhancing robotics

Jeff: Thank you. Part of generative AI, as I understand it, is having a neural net, having this sort of like brain, if we can use analogy, on top of this library of information, being able to do things like you just mentioned. That wouldn’t be a hard ask for us to kind of think through pretty quickly, but like you said, it’s different if you’re having to write a mountain of code and query in a whole bunch of different databases versus something that has reasoning and sort of logic and can kind of pull from those different databases and have inference and context.

So is generative, is sort of the advance of generative AI in robotics as simple as like having a smarter brain on top of machines?

Dan: Um. Yes, but not only. It’s necessary, but not sufficient, if you will. So a couple of things there. One of the things we’ve seen in talking with partners and customers and folks in the industry is that everybody feels that they need to train the model on their own data sets. And what we found is the models are actually pretty strong, especially when given grounding, one-shot grounding, or a good set of prompts to start off with. And the ability now to pass in data grounding sources, documents. SharePoint libraries, you know, one drive or Google Drive document repositories.

That becomes even more powerful because now you’re able to give context to the model. And the… So having the brain on top of the machine, like the LLM is in the LMM, a large multimodal model, are absolutely acting like a large brain on the machine, on multiple machines. Like it’s not just one machine. But on top of that, the ability for not just for the LLM and the LLM to be able to do the translation, but also do reasoning on that translation. And furthermore, the ability to orchestrate that reasoning across a set of plugins, either through Semantic Kernel or through LangChain, that orchestration and plugin model now is even more than just the brain on top of the machine. That gives the ability to bring context and relevance to the domain space that you’re trying to solve.

And so Unix for years had a really good model of little languages and tools that did very specific tasks that you could string together through pipes and through the command line. And it became like a state of the art to be able to string together all these things through pipes in college. We then went to an app world where each app had a very specific function or design that did one thing or a couple things really, really well. And then you’d use a chain of apps together to accomplish something. We moved for cloud services to this microservice architecture, where now these apps are distributed and each microservice has a very specific function.

And what we’re going to see with these brains on machines, as you said, is the ability for that brain to then call out to specific skills, like I said, through LangChain or Semantic Kernel skills or functions, to be able to do very specific tasks and then be able to orchestrate those together. Now, when we talk about those tasks, those tasks can be computer to computer tasks. They can be computer to physical world tasks. They can be human to physical world and every combination in between.

And so what that starts to create is more than just the brain, but the ability to orchestrate functionality.

Jeff: Thank you. Let’s give some basic examples. I think you’re doing a good job explaining the significance and how this works. But let’s just talk about basic examples. What does this allow us to do today? If we were to give three use cases, maybe including one at home or if we have a robot at home or something in the workplace, can you talk through some basic use cases?

Dan: Yeah, the robot at home, we’ve got a baby boomer population, which is becoming elderly and needs help at home or in the facilities that they live. The idea or concept of having robots for the aging baby boomers is not a new thing. What generative AI is enabling is the ability to not only have language to text and then to have context to semantic and syntactic relevance for tasks for these robots, creating a natural user interface for our elderly, but also allowing for context relevance into the environment that the robot operates.

Before you could have slapped on a Cortana or a Siri or Alexa onto one of these robots to do the voice communication piece of it and be able to tell the robot, go forward, go back. But the ability to use the higher level orchestration, higher level commands to accomplish a task, get me a Coke, find a place where I can heat something up, heat up my burrito or heat up, find me my pills. The ability to semantically reason across that and actually accomplish that task at higher level. That’s actually where the generative AI is becoming powerful in a consumer experience.

Another example is in utilities inspection. We’ve got power plants and we have train stations and we have We have oil rigs, all of which, some are outside, some are indoors, that need regular ongoing inspection. The ability to send a robot out or a fleet of robots out and do that inspection and report back, that’s not a new concept. But the ability to use generative AI to have contextual relevance of what’s happening in the environment as the environment changes, because a lot of these things are outside or are very dynamic and potentially harsh environments, like in the middle of the ocean. The ability to use generative AI to be able to get sensory data, going back to the computer vision question you had, be able to look at video feed sensory data, thermal data, IR data, as well, depth data, it creates a much richer opportunity at the sensory level, at the think level, but then finally at the act level.

Cloud providers leading the shift in robotics with AI

Jeff: Those are great examples. I really like the, you know, find me my pills as we’re thinking about some of the aging generation or something as simple as, hey, you get me, get me a drink. These are very basic examples and, you know, do my laundry, mow the lawn, you know, come, come next in my mind. You know, we’ve been following a lot of the, the cloud providers that seem to be all have taken sort of a deep interest in this space. Not rely at the, you know, just LLM level, but also, you know, like, like OpenAI, but also at the robotics level. And you said LMM, you know, the multimodal models were combining these visual models as well that allow these robots to see and add that to their, to their reasoning, their logic, you know, their perception, their context. You know, we noticed Google is deep in the space to Pommie like Meta is deep in the space with their, their visual cortex model, artificial visual cortex. And things like GenOGG and, and Sam, and we know some startups, are getting into the space as well. But it seems like the cloud providers continue to lead because they have an edge given their, their, their cloud and, and, and processing power. What is unique as we think about Microsoft’s focus in this space? We know they’re a big player in their partnership and investment in OpenAI. And we see that rolling out into kind of nearly all the products it seems like. But, but you know, what’s unique about Microsoft’s focus in robotics.

Dan: Yeah, let me start by saying that I have deep respect for, you know, every one of the companies that you mentioned, whether they’re considered a competitor or a partner, you know, co-op petition, as we like to say in first. You know, every single one of them, Google, Meta, Amazon, they’re all doing good work. And you know, I take a very pragmatic approach, which is this is a situation where all rising tides drive, rise every ship. We all rise together in this.

And it’s not just the big players, like you said, there’s a giant startup ecosystem. And there’s also in addition to to the startups and the big cloud providers, there’s also a set of customers and partners that are retooling for this specific space as well. Generative AI generally and generative AI with robotics specifically.

This is an opportunity to take the next step of our digital transformation or digital evolution. What makes, what makes the Microsoft offerings, you know, differentiated, let’s say, is that number one, our Azure OpenAI is a your own instance of the OpenAI functionality with all the connectors that you would expect from from from our cloud, from Azure. And the key most important thing, very simply, is Microsoft runs on trust. We are a company that’s built on partners and partner ecosystem and working with others and that that that running on trust like that has been something consistent in 25 years. We can talk about things that have changed at the company.

Azure OpenAI and the promise of seamless integration

But Microsoft runs on trust has been there since day one when I showed up in 1999. And the reason I’m bringing that up is because the trust that we have is that your data is your data. You put it into Azure OpenAI, it stays your data. Only you, your company, and those that you give access to, based on your behalf, have access to that data. And so that segmentation, that ability to keep that data your own, along with the connective tissue to the rest of the Azure services, which actually allows something to be done with that generative AI, is one of the things that we hold most dear. The second thing, as an extension of that, we connect to other products, offerings, and services.

I think six years ago, I created a slide that had dynamics, the Dynamics Cloud, that had the Azure Cloud, and had the, at the time it was Office 365, balanced Microsoft cloud. And I said robotics is going to have a play in every single one of these, and when we bring all of these together, the world is, we’re going to help make the world better.

Now, every single one of those clouds, and all the products that they represent, from silicon to edge devices, to, infrastructure along 5G and space and orbital, all the connective tissue, all the products, everything from Microsoft, going back to that retuning you were talking about, we’re building up copilots for every single one of our major products.

And the great thing is, it’s not just a copilot for Microsoft 365, or a copilot for Teams, or a copilot for Excel, but what happens is these copilots, because they’re orchestrated through Samantha kernels, the fact that they’re orchestrated through our substrate allows for the power of each of those tools and the copilots to go with them to talk to the other copilots. That’s the machine to machine coordination.

So you can actually, 30 years ago, object linking and embedding, the ability to take some data from Excel or Word and put it vice versa, that was a pretty powerful thing. This is the next big evolution of that. And so the reality is this, if you’re already on the Microsoft ecosystem, your solutions are only going to get that much better with the Azure OpenAI offering we have. If you’re not on Microsoft solutions, our Azure OpenAI is interoperable with other clouds. There’s work that’s happening there. It’s not a one cloud for all. There’s many reasons why you want to have cloud in a rock. And yeah, so.

Trust and security

Jeff: Thank you. That’s deep. But you know, ethics, you know, cloud interop, the connectivity across the ecosystem, the partnership model, those are some of the key things sort of that stood out to me. Thanks for that.

Dan: You brought up ethics, Jeff. Let me double click on that for a sec. I talked about trust. One of the places where Microsoft has invested very heavily on artificial intelligence is on responsible AI. And there’s many organizations that are investing here, but our foundational commitment to trust and to security is extended to safety and responsible AI and ethics. We have a whole lot of really good work there. And anything we do in generative AI goes through responsible AI review and processes so that not only can we assure that your data remains your data, but also that it’s being, we’re using AI for ethical reasons. As we start talking about generative AI in robotics and automation and physical systems, one of the things that I’m specifically looking at along with my team is how we, whether you wanna call it extend or add to or replicate our responsible AI efforts, specifically how they get augmented for autonomy in physical systems is a core area where we’re focusing a lot of time. That goes both from a policy perspective, a procedural perspective, an ethics perspective, but also from a technical perspective as well.

Background on Ashish and Jason

Ashish: Hello everyone, my name is Ashish Kapoor and I am a co-founder and CEO of Scaled Foundations where our mission is to enable safe general robot intelligence. Prior to Scaled Foundations, I was the general manager and heading autonomous systems and robotics group at Microsoft Research. Without the ability to sense, reason, and act appropriately, robots are nothing more than a collection of expensive parts. And I have spent last decade thinking and building technologies that infuse intelligence into robots rapidly with a special emphasis on safety. Generative AI is completely changing how we build and program robots. Besides data generation, natural language will be the most common way to affect behavior of a robotic system.

You can already see this in our grid platform, where natural language can be used to chain robot capabilities so that they can carry out complex missions. I soon expect robots to simulate and synthesize missions before they execute them in the real world. As if they are assessing feasibility, safety, efficiency beforehand. Similar to the way we talk to ourselves in order to reason about complex processes, I do think generative AI will render the robots such long-horizon reasoning capabilities.

Jason: I’m Jason Kelly, CEO of Fremont Robotics. For a lot of my career, I’ve been working at the intersection of hardware and software. And that’s what’s led me to where I am now. I have lead teams building products in those areas. At Microsoft, at smaller companies, and as a founder. Products from everything from smartwatches to voice control software to greenhouse automation.

And now at Fremont Robotics, our team is focused on applying software and AI to the industrial world. And the mission were centered around. Is helping people automate those dull, dirty, and dangerous jobs in industry. By taking the best in robotics technology and building the software and AI tools around it that are needed to deploy it into industry. So. Now with this rapid, rapid growth of generative AI, we have an amazing new tool in the toolbox to help us do that. So in 10 to 20 years, how do we think generative AI will be used? It is so exciting to think that far ahead.

Um, I’ll say that Generative AI is receiving far more than its share of hype today. So I’m a bit skeptical of a lot of the hype around it. But. I do believe we’re working here with a foundational technology. That is at the very beginning of its real-world applications. So we’re going to see an explosion in every industry. One area I’m really excited for is bringing gen AI to industrial research and simulation and data mining. If you think about predictive maintenance as an example. Over the last 10 years or more. We’ve seen IoT come in. Become mature. Create a huge growth in operational data and the ability to predict failures and not just respond to them. And so that’s brought the development of really powerful digital twins and simulation tools that you can run experiments with.

But the challenge is those tools are really complex and really difficult to master. And get actionable results out of. It takes a lot of time. Study to learn the tool set. And they’re always changing and advancing and it’s hard to keep. Track of them. As a domain expert, I think this is where generative AI could help those people. Really access the the tools available to them in a much more seamless way. We’ll be able to create intelligent agents that Learn the data. Learn the tools that are available. And can make decisions about how to apply the tools. And really scale up the impact that domain experts can have in analyzing the data available to them.

And this is already happening today with today’s models like GPT-4. You can prompt it to use tools. To call APIs. Generate domain-specific language. And transform the results into input for the next stage. So I’m really excited about being part. Pushing that forward. And empowering workers on the floor with the power of robotic sensing and data analysis. Without requiring them to also be roboticists and software engineers and data scientists.

Jeff: I noted that you’re speaking next week at the Robotics Application Conference on how AI is improving and expanding automation. Any additional key points that you want to bring up that you’re planning to cover from that event next week?

Dan: Yeah, I’d like to thank our partner in VIA, who invited me to speak at that conference along with their CTO Rand. So it was nice of them. To be fair, what I’m going to be speaking about there is actually very much aligned with what we’re discussing today. It’s where generative AI is going, how it intersects with autonomy and robotics, where it’s going.

And one of the key points that I haven’t brought up yet, we haven’t brought up yet, but that we do talk about in that talk is when I talk about responsible AI and I talk about the technical aspect of it, not just the policy, not only the policy piece, one of the technical things that we’re looking at is how do you work to ensure that what the AI has generated, A, is not hallucinated, B, is accurate, and C, is not harmful. And there’s a whole work stream that we have on generative AI for robotics where we’re looking at how can we use simulation, how can we use code validation.

We’re also looking at should we be generating code or data that is then interpreted by end robots or a combination of both, and what are the steps that we can take to drive higher confidence that what the LLM has produced is actually something that is a reasonable solution to the problem. And so then we also talk about supervised autonomy and the role of humans in the loop for that supervision.

I had the honor of being at an event at Fresh the other day, and one of the things that we talked about the responsibility to, you know, that generative AI, even if we can drive the confidence from a technical perspective that the AI is producing valid results, we still have an aspect to individuals as well as public perception, as well as government regulation to ensure that we can show and concretely demonstrate that the AI is, you know, generative AI is doing the right thing before any physical action is taken. This is similar to what’s happening in the automotive industry with autonomous cars and the six levels of autonomy.

Jeff: Yeah, thank you. Very applicable. You know, I think, yeah, when we start thinking about hallucinations and there’s a lot of pushback, you know, because of the fact that we’re regenerating new things and sometimes it can be inaccurate, it can be really creative, too creative, essentially.

When you start connecting that AI model and neural net that can have, you know, some of these differences, hallucinations, then you start to, you anticipate the worst, essentially. Oh, what happens when a big machine hallucinates, you know? But when you couple that with these principles of supervised autonomy, of, you know, giving all of the context that’s needed, you know, the accurate context so that the highly accurate model can perform its function and then if you add in their sensors that are built in for safety on the physical device, you know, I think that gives a lot of, it gives a lot of peace of mind to know, hey, there’s a plan and those that do this right, like Microsoft is doing, you know, where you can be both accurate and consider the consequences.

It’s great to hear there’s a lot of thinking happening right now, because I think our history with technology with these sort of devices, you know, has been, we move really fast and then we don’t, we don’t realize some of the consequences to the kind of humans thereafter.

Dan: Yeah, thank you for saying that, first of all. And let me give two examples of some power of generative AI and robotics that I’ve personally seen recently. So the first one is, they both happened at a trade show for a customer that we had, where the customer brought together numerous different partners who were all looking at various different aspects of robotics, autonomy, AI, generative AI, et cetera. And we were honored and lucky enough to come there. And I sat down with somebody from the customer, and I showed them a chat session. And I said, hey, let’s pretend we’re flying to the Statue of Liberty. And I said, I didn’t provide a GPS coordinate. I didn’t train the model on anything special. I just said, you are a flight controller, and you want to fly to the Statue of Liberty. Okay? And so the LLM came back, and it filled in a flight plan with GPS coordinates. And I went and pulled the GPS coordinate, and it was the Statue of Liberty. And I was like, oh my god, that’s super cool.

And the reason for that is it sounds profound that it did that, but the reason for that is because there’s clearly somewhere in the training data on the internet a good correlation between a GPS coordinate of the Statue of Liberty and the Statue of Liberty. So it found it pretty well. Another example was we were in some remote part of North Carolina. And so I said, fly to the center of the city. And it got close, but it wasn’t exact. It was like a mile and a half off. And the reason for that is because there’s no good training data in the data set around this area where we were and the GPS coordinate, but there was some correlation somewhere in the training. So it was close. Okay.

But with a semantic kernel plugin that knows how to go to, let’s say, Bing Maps and give a location and programmatically get back from Bing Maps the GPS coordinate, and by using not saying that you need to go use that semantic kernel, but making that semantic kernel function, skill, excuse me, available to that capability, the GPS coordinate came back correctly every time.

And so part of, there’s been a lot of work on this concept of prompt crafting, creating better and better prompts to help guide the LLM. But the ability to also, along with prompt crafting, the ability to ground the data through call-outs to other systems is gonna be pretty powerful. Another example.

Jeff: I would just say, I would assume that over time, the prompts become more and more human, more and more natural because the knowledge base and these connected ecosystems are, make it that much easier to perceive and understand. And so it seems like we were talking a lot about prompting these days, but I think the reality is the models keep getting better, they keep getting more data, they keep getting more connections. And then, you said it’s the universal translator, human to human, human to machine. And I would envision that we don’t have to get too crafty, as crafty in the future.

Dan: So it’s perfect you said that. So let me give it another example — it’s a quick one. We — there happened to be a company at this, this — the same trade show. It was an outdoor field event, and who happened to have drones. And their drones do just — they do inspection. This is not a company that does, like, control systems or drones. They use commercial, official drones.

And so I went to them and I said, “Hey, you know, have you done any — ever used a generative AI or LLMs?”

And the guy was like, “Yeah, I played around [with] ChatGPT-4.”

And I was like, “Let’s do a quick experiment. Do you have the ability that if I gave the drone a set of GPS coordinates, that it would be able to fly that path?”

And he was like, “Yeah, we happen — we happen to be able to do that with the systems that we use.”

And so I — you know, we sat down, and in 45 minutes we were able to get their drone to take off and fly a figure-8 pattern in the sky, and cut — you know — and then land. 45 minutes from not having anything, to the drone actually flying. The reason that was interesting is because — when we tried (and this goes to your point about prompting and getting to that, you know, very, very natural language) — we were using the term, you know, “draw figure-8,” “draw figure-8,” and it kind of was doing it, but not exactly.

And so we asked the LLM, “Do you know what an infinity sign is?”

And it was like, “Yes, I do.”

And we said, “Plot the GPS coordinates within 500 feet of an infinity sign.”

And it did it. It was able to do it.

And so it’s:

A) understanding some of those little nuances,
but then B) being able to, in the future — the ability to say “an infinity sign” or “a figure eight” — it’ll just figure that out.

And so there’s these little nuances that really unlock the power of what these LLMs can do.

Jeff: Amazing.

Dan: Just a couple examples.

Ashish: We are seeing rapid advances in AI, in particular deep learning that over the last few years has had a profound impact on how we build robots. Similar to many other subfields of AI, be it computer vision, natural language, or speech, foundation models, which are large pre-trained models, are the key to rapid intelligence in robots. Such pre-trained models provide a great starting point for many different kinds of robots and applications. And reduces the effort that used to be months long to merely a few hours.

But training such pre-trained models are non-trivial for robots, because unlike text and images, there exists no such web scale data to mine. Plus gathering such large quantities of data is quite cost prohibitive. So enter generative models, specifically our ability to synthesize artificial robot data. With appropriate dynamics, sensor modalities, and various environmental processes unlocks the key to foundation models for robots.

And furthermore, multimodal nature of this problem also means that sensing, robot action, and language are also intertwined. And this leads us to a unique combination of technologies that is completely changing how we build intelligence into robots. Listeners are welcome to try out the alpha release of our platform. This is accessible from our website scaledfoundations.ai And various scenarios ranging from wildfire mitigation with autonomous systems to infrastructure inspection with robots are implemented.

Jason: Fremont Robotics is building end-to-end solutions that put mobile robots to work in industrial settings. We’re currently really focused on inspections and maintenance data collection and analysis.

Um, every day there’s thousands of industrial engineers doing manual inspections — in factories, oil plants, chemical processing plants, and many more. A lot of these are routine busy work. A lot of them are in dangerous places or hard-to-get-to places. And some of this has been automated with IoT sensors connected to a network — but those aren’t everywhere. They can be really expensive to outfit in a large facility.

So, with mobile robots like wheeled rovers and drones — and even four-legged robot dogs — they can carry sensors everywhere they need to be and acquire data like photos, videos, thermal readings, and even gas detection or acoustic imaging. But taking these robots and putting them into practice is really hard. Today, you have to be a roboticist, and a software engineer — and you still have to do your day job as an industrial engineer — all to get this done.

And that’s where we’re leveraging Gen AI too — to really make robots accessible to the domain experts at these facilities. We’re putting it to work in three stages:

First, acquiring the data.
Then, understanding and analyzing and reporting on the data.
And then lastly, taking action on the analysis of that data.

So, the first stage is putting the robots to work and giving them a mission: to navigate along a path, to transport its sensors to the right place, and acquire data. And today that’s often done by manually driving the robot around and recording a route — or placing waypoints carefully on a map. We’re using GPT-4 today, and giving it the context of a facility — its layout and floor plan, all of the equipment, and information about the robots and the sensors that are available. And given that context, and a simple command from someone, it can generate an optimal patrol route, collect all that sensor data, and bring it back to the system.

We’re also using GPT in the next stage to let users ask questions about the data. Um, a question like:
“What’s the 95th percentile temperature reading each day from pump 27?” And today, that often requires some complex query language or a lot of trial and error by a user who does not want to be a data scientist.

And with the language model — if you think of that as one step in a pipeline — you can really break this friction down. The user asks a question. The model generates a query and sends it to a time series database. Data is returned. And then the model, in turn, generates a really useful report from the data. And the user is not worrying about how to get the data — but how their question has been answered and what the next step is.

And down the road, the next stage we’re working towards is just taking action about that data. I see this turning into a really tight feedback loop — that’s generating actions and insights, and then using that data in a feedback loop to self-optimize. Identify areas that need to be inspected more often, or suggest cost-saving measures, or ways to improve efficiency.

So, what we’re working towards, in other words, is using generative AI — not to just automate everything — but to take human knowledge and expertise as its input, and just leverage it exponentially.

Jeff: So as we think to the future, I think those that have been in robotics, you’ve been in it a long time. We’ve been in it long enough to know that, you know, it’s been frustrating to see how slow things move sometimes as well. You know, and robots have historically been in cages, you know, kind of separated from humans. But we all realize, like, there’s this confluence of things that are happening that really are changing the game. Do you see generative AI being the game changer for human robot integration in the next five to 10 years? Do you think that’s gonna be the unlock for us to have the acceleration we kind of anticipated 10 years ago, but really haven’t seen yet today?

Dan: Yeah, for sure it will be one of them — one of the major ones. I don’t want to have the hubris to say that it will be the one ring to rule them all, but it will absolutely be a game changer. I like to think that right now it is the game changer, but I don’t think long term. I think there will be other technology advancements as well.

The note on robots being in cages — the reason that was done is because a lot of these robots operated open-loop, okay? They didn’t have much sensory understanding and perception of the physical world, nor were they operating with that to ensure that the humans around them were safe. And even if it did have that, the decision — the proper decision — was: let’s not take any risk, and let’s ensure that the humans are kept safe, because that’s job number one, okay?

And so I frankly think that we will continue to have a future where some robots are in cages, like — you know — because that’s the right thing to do. There will be — you know, especially as we move outdoors with robots for, you know, energy inspection as I talked about before, wildland firefighting as well — the necessity to operate in, you know, environments that are non-deterministic, that are dynamic, and even in some cases contested. If you’re working in a wildland firefight and you’ve got 5G connectivity, but the fire — you know — causes the 5G connectivity to go out, you’re in a hostile environment created by the — you know — by Mother Nature, if you will.

And so the ability for generative AI to at least give us the option to make the decision — as humans — when we want to move the robot out of a cage, or when we believe we can use sensory, closed-loop perception and understanding of the environment to — you know — have robots more in contact and working with humans, you know, that for sure is gonna happen.

As an example — the Boston Dynamics Spot Robots. You know, I’ve been using it at work, and I — you know — for a few years now, I’d seen videos of it before I got my hands on it. And the thing that I’m continuously most impressed about with it is how it’s implemented hierarchical control within the system.

Although there’s no generative AI within the Spot, the fact that it has a low-level controller to make sure that it can stably walk, okay — and, you know, and navigate and move on the terrain that it’s walking on, whether it’s indoors and outdoors. And it has a safety system within it that — if it gets near a human or a wall — you know, it’s got a built-in buffer that, doesn’t matter how fast it’s running, it will stop before it hits. And so — before it hits a wall or what have you — and so we’re able to, you know, drive it and operate it in a way where we can still feel safe around it.

But we can also — you know — use generative AI concepts, et cetera, you know, other techniques on it to understand where the technology can go with safety systems underneath it. So, kind of an example of moving out of cages.

Generative AI in the next five years

Jeff: Great example. How do you see generative AI evolving in the next five years? We talk about, Bill Gates once said, hey, we underestimate what happens in 10 years and we overestimate what happens in two years. But I would think of that more like months with generative AI. We might overestimate what can happen in the next couple of months. We underestimate what can happen in 10 months. And it’s just hard to keep up with how fast things are moving. So how do you see things continuing to evolve from where we are today?

Dan: Yeah, it’s a fantastic question. So, a couple of things.

I fundamentally believe — and I use the example with the Spot robot — that robotics and AI is really a system of systems. Hierarchical control systems from the lowest level, built into silicon, to the highest level, operating in the cloud or the multi-cloud. The fact that generative AI — we’ve already shown — can do certain levels of orchestration, can do certain levels of planning of what a robot’s mission is to be, whether it’s an inspection scenario or a flight controller, as we had talked about. These are all examples of the same thing: it’s all missions for a robot to execute.

And as such, as these systems are able to communicate to other systems — as we’ve talked about before — as humans and robots are able to work better together because of generative AI, what you’re going to see is not just multiple LLMs and LMMs talking to each other via copilot interfaces or backend semantic kernel plugins. What you’re also going to see is systems of systems start to get formed — where you have a heterogeneous ecosystem of what’s called a fleet of robots and generative AI, and another fleet of robots and their generative AI — and they’ll be able to talk to each other as well.

And so this combination of robot orchestration, robot mission planning, and having heterogeneous fleets of robots and humans being able to work together… Like, humans do certain things well. We don’t like to do dull, dirty, and dangerous work — and we shouldn’t be doing some of that work. Robots are really good at some of that stuff.

But on the other hand, picking up a cup and drinking some coffee out of it — robots are not very good at that. And it might be a while before. They’ve gotten better, but it might be a while before they get better at that. And so there’s plenty of examples of robots and humans working together on a one-to-one basis, a one-to-many basis, or a many-to-many basis.

And that’s where generative AI… if there’s one thing that generative AI is unlocking — that’s what it is.

Jeff: Thank you. You know, generative AI. I think one of the interesting things for human beings is it also can be kind of playful and creative and fun. There’s a creation aspect that we haven’t really seen with machines, where it’s creating visuals or it’s creating poems or it’s creating stories. And when you start connecting generative AI into machines, do you think that that will translate into robots and machine having a playful or creative or fun aspect to the extent that they are a robot?

Creativity and fun

Dan: Yeah, I do. I think there’s going to be a few different work streams or lanes that’ll happen there. The first one is Sony tried to bring iBot to market a couple of times to have your robotic dog. I actually think that now we’re closer to having a robotic dog or a robotic pet with generative AI than ever before. And that goes right into play and creativity and fun. And we talked about the elderly and the baby boomers and robotics there. But also for the young, I’ve always had the dream of having a robot buddy who played video games with me and talked trash to me, made fun of the game, and just with somebody to play with when I didn’t have somebody else to play with. I was a latchkey kid growing up. And so probably that’s where that comes from. So that’s one.

The second one is the creative and fun. There’s two work streams there that are both equally important. The first one is the generative AI is being trained on data. And I don’t want to take a stance or a position on the ownership or the ethics of where that data is coming from. It’s a great thing to discuss. I don’t think it’s in the purview of this specific podcast. But it’s something that, back to our responsible AI, Microsoft takes very seriously. The other side of that is the technical aspects of being able to do creative and fun things. And we’re already seeing robots do art, actually physically paint what Dolly or other systems are coming up with. I just saw an article about generative AI generating a robot that can move that a human would have never come up with before, like the kinematics behind it and the way it worked. It was a totally interesting story. And so the fact that we’re showing creativity in the digital world, we as humans know how to bring that creativity into the physical world.

And the first thing that’s going to happen is robots are, we’re going to wire the robots up to mimic that. The ability to move a paintbrush or 3D print something. I mean, from a technical perspective, it’s a really cool thought experiment that a generative AI can come up with a mechanical object or a mechanical system even that it can then 3D print based on a human’s prompt that they created. And the human watching that that thing to be printed is what’s to be expected. And what AI is coming up with is, what generative AI is coming up with is not what humans would normally come up with.

One of my absolute favorite things about being a volunteer mentor and coach with students is I constantly look at the solutions that they come up to, the problems that they have. And I go, wow, I’ve been a professional engineer for 15 years or 20 years or 25 years. And I would have never come up with that solution. And wow, is it elegant and smart and works? And it’s that type of innovation where we want the machines with ethics and responsibility and safety and security in mind to be able to take a creative, you know, a creative direction and a fun direction.

Impact on humanity

Jeff: Thank you. Two other questions before kind of wrap up. One is, you interviewed a lot of serious robotosis, you know, CTOs, founders of other robot startups and companies. You know, one of the patterns I discovered is that, you know, we all kind of believe that the advancement in robotics will kind of help humans be human. And I think they mean, hey, they’ll unlock some of our more creative, innovative kind of social, meaningful, the aspect of ourselves, the higher order of ourselves.

But do you have any thoughts on that? Do you also believe that? Because I think a competing line of thought is here that, you know, robots can be creative too, then is that competitive in nature? Is that, just now something more like us that we can trust in, that we can work with, that we can, that can help us versus something that’s been a bit frustrating. If we think about how long it takes to get, you know, robots to do what we want them to do.

Dan: Yeah, it’s interesting that as you were asking the question, you were at, you talked about trust and the ability both in under two breaths. And I think that’s exactly right. So the… I do absolutely believe that robots will help scale humanity. They will help scale humanity at the global level. They will help scale humanity at an individual level and in groups or collections of humans as well. I believe that robots are going to enable humans to take on less dull, dirty, and dangerous jobs and go more to what we’re good at, which is higher cognitive thinking and helping with the orchestration and collaboration with the robots.

So rather than when I work on a project grabbing a screwdriver and driving a screwdriver to fasten something, maybe I’ll have a robotic co-pilot who’s helping me with that, with the proper safety and ethics and responsibility and security that we’ve talked about before. Going back to the question on creativity, I don’t think it’s an or. I don’t think it’s that robots are creative or humans are creative. I think that combination between the both is going to help humans be more human, humans work with other humans. At the end of the day, robots and the generative AI that powers them is a tool to make us better as humanity, better as a people. We have to think through and we are thinking through and we have a lot more work to do on the responsibility and the ethics and what all those things mean.

There are great theoretical problems about going all the way back to Isaac Asimov and further on what happens in various different situations. They’re all good thought exercises. They need to happen. The ability for humans to be more human is the reason that we’re… I’ve got this innate pursuit of robotics is because I feel that that’s going to unlock human potential like nothing else has been able to unlock before. To me, the investment in robotics my entire career, even before robotics was a key buzzword, is because I believe that robotics is the true intersection of all technology to empower humanity.

Ashish: Safety in robotic systems is a fairly important issue. Deep AI or deep machine learning, due to its complex mathematical design, does not yet provide any guarantees. So if you’re using deep machine learning as a foundation for robot intelligence, the big question is, how do we make these systems safe? Safety is a loaded word and depending upon who you talk to in the AI community, you’ll get a different answer. But I personally believe that most of the current approaches in safety are no more than a band-aid approach, where engineers and researchers are making post-hoc changes to the system to address corner cases. Grounds up AI that emphasizes safety is a holy grail that we aspire towards. Additional mechanisms such as responsible AI licenses is also an important way to address and mitigate the issues that we are also looking into.

Jason: With the exploding usage of generative AI. There are some serious ethical concerns I see. There’s Many out there. Oh. With. Robotics. Hardware. More specifically. Safety really takes on a lot more importance than in a lot of other applications of gen AI. You’ve got robots navigating the physical world and manipulating the world around them. Often. In close proximity with humans. So there’s already a really high bar for designing safety. Into the system. Um. And There’s already a lot of automation out there, but today’s automation tends to be quite deterministic. Um, and the safety standards and the safety validations are designed. To manage risk. In a really deterministic system like that. But Gen AI comes in and adds a bit of unpredictability to it. From the human perspective.

We have to be very careful about designing safety controls that work with that. There’s some organizations like ASTM working hard on that to stay abreast of these. These new developments and. Evolve the safety standards appropriately. But I think the ethical concern here. Is how much control do you see to the model? And how much do you still take advantage of human judgment? And as a technologist, it’s really tempting. Kind of common to push the boundaries there. But with human safety in the mix, we need to keep a break. We need to keep our foot on the brake about that and keeps safety at the forefront of what we’re doing here.

Ethics

Jeff: It’s interesting if we think about how fast ChatGPT has moved. We have 3 billion people that have now experienced that and realize there’s something special here. I think as we make that the universal connector to machines and have machines truly be smart versus just have an input and an output and have them be able to assist us or collaborate with us, I’m a believer as well that the future is a higher order of humanity. And I think that’s hard at the individual level when we think about change. But if we think about the macro level of what computers have done where we had a similar fear to advance humanity or to solve diseases, to improve humanity, I’m definitely with you. You brought up ethics a few times. And as we think about the future, part of the intent of this podcast is to think about how do we design the future with more intent and think about the problems and the ethics. And how do we start taking more emphasis, essentially, to focus in those places that will decide whether that technology is good or bad? Because technology can have both sides. I just wanted to ask you if you have any other thoughts around ethics and designing the future with intent, especially as it relates to robotics and generative AI.

Dan: Yeah, I’ve got a few. So the first one is, you know, we have a responsibility to not just think of the… The when we think about a what generative AI can do or what robotics or any new technology for that matter quantum convenience another another good example. We have the responsibilities we bring these technologies to market to to not even to market but like to to the community like even when they’re in early experimentation. We have the obligation to think through what we call the harms, the not just the positive outcomes, but also like, you know, red teaming is another example of what could go wrong. What are the negative effects? What are the abuse ways that this can be used? And looking at how we can take pre-action before the model, you know, before, let’s say in the case of generative AI to ensure that, you know, only certain valid inputs can go into an LLM, maybe even use LLMs to do that.

But then also, as we’ve talked a little bit about using simulation and validation, you know, various verification methods to ultimately provide information to a human to make a decision from a supervised autonomy perspective if what we’re trying to do with the LLM should be done or can be done. And working through both sides of that, as well as, you know, putting the LLM in the proper rails for the or whatever the technology is, those are some of the responsibilities. In addition, you know, we as Microsoft are a company that really, as I said, runs on trust. We run on partnership. One of the things I’m most proud of is the global scale of partnership that we have. And as part of that, knowing what our partners and customers intend to use with some of these technologies is important. You know, the risks can and will be real. And we need to be ahead of that and thinking through it.

Jeff: Love it. Well, Dan, thank you so much for being with us today. There is a lot of amazing insights that you shared. A lot of nuggets I actually want to revisit personally, being also someone who’s super passionate about robotics and technology and the ethics of it. So, appreciate your time and investment in today’s conversation.

The Future Of podcasts is brought to you by Fresh Consulting. To find out more about how we pair design and technology together to shape the future, visit us at freshconsulting.com. Make sure to search for The Future Of an Apple Podcast, Spotify, Google Podcast, or anywhere else podcasts are found. Make sure to click subscribe so you don’t miss any of our future episodes. And on behalf of our team here at Fresh, thank you for listening.