A Faster Way to Teach a Robot

One way to retrain a robot for a specific task is imitation learning. The user could demonstrate the correct task to teach the robot what to do. If a user tries to teach a robot to pick up a mug and demonstrates with a white mug, the robot could learn that all mugs are white. It may then fail to pick up a red, blue, or “Tim-the-Beaver-brown” mug.

So please consider turning off your ad blocker for our site.

Moving forward, the researchers hope to test this framework on real robots. They also want to reduce the time it takes the system to create new data using generative machine-learning models.

“We want robots to do what humans do, and we want them to do it in a semantically meaningful way,” Peng says. “Humans tend to operate in this abstract space, where they don’t think about every single property in an image. At the end of the day, this is really about enabling a robot to learn a good, human-like representation at an abstract level.”

Peng and her collaborators at MIT, New York University, and the University of California-Berkeley created a framework that enables humans to quickly teach a robot what they want it to do with a minimal amount of effort.

First published July 18, 2023, on MIT News.

In this way, the user could demonstrate picking up one mug, but the system would produce demonstrations showing the desired action with thousands of different mugs by altering the color. It uses these data to fine-tune the robot.

However, someone has to pay for this content. And that’s where advertising comes in. Most people consider ads a nuisance, but they do serve a useful function besides allowing media companies to stay afloat. They keep you aware of new products and services relevant to your industry. All ads in Quality Digest apply directly to products and services that most of our readers need. You won’t see automobile or health supplement ads.

When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needs to change for the robot to succeed. For instance, maybe the robot would have been able to pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then, the system uses this feedback and the counterfactual explanations to generate new data it uses to fine-tune the robot.

Quality Digest does not charge readers for its content. We believe that industry news is important for you to do your job, and Quality Digest supports businesses of all types.

Training a robot to recognize that a mug is a mug, regardless of its color, could take thousands of demonstrations.

The researchers tested this technique in simulations and found that it could teach a robot more efficiently than with other methods. The robots trained with this framework performed better, while the training process consumed less of a human’s time.

Customer Care

Lay users can understand why it failed, then fine-tune it

Peng, the lead author, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor at the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC-Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Pulkit Agrawal, a professor in CSAIL. The research will be presented at the International Conference on Machine Learning.

On-the-job training

Then they applied their framework to three simulations where robots were tasked with 1) navigating to a goal object; 2) picking up a key and unlocking a door; and 3) picking up a desired object and then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer user demonstrations.

To accomplish this, the researchers’ system determines what specific object the user cares about (e.g., a mug) and what elements aren’t important for the task (perhaps the color of the mug doesn’t matter). It uses this information to generate new, synthetic data by changing these “unimportant” visual concepts. This process is known as data augmentation.

Because their work seeks to put the human in the training loop, the researchers tested their technique with human users. They first conducted a study in which they asked people if counterfactual explanations helped them identify elements that could be changed without affecting the task.

The system shows these counterfactuals to the user and asks for feedback to determine which visual concepts don’t affect the desired action. Then it uses this human feedback to generate many new augmented demonstrations.

This framework could help robots learn faster in new environments without requiring a user to have technical knowledge. In the long run, this could be a step toward enabling general-purpose robots to efficiently perform daily tasks for the elderly or individuals with disabilities in a variety of settings.

Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

From human reasoning to robot reasoning

This research is supported in part by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corp., the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions.

The framework has three steps: First, it shows the task that caused the robot to fail. Then it collects a demonstration from the user of the desired actions and generates counterfactuals by searching over all features in the space that show what needed to change for the robot to succeed.

Robots often fail due to distribution shift: The robot is presented with objects and spaces it didn’t see during training, and it doesn’t understand what to do in this new environment.

“Right now, the way we train these robots, when they fail we don’t really know why,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT. “So you would just throw up your hands and say, ‘OK, I guess we have to start over.’ A critical component that’s missing from this system is enabling the robot to demonstrate why it’s failing so the user can give it feedback.”

“I don’t want to have to demonstrate with 30,000 mugs,” Peng says. “I want to demonstrate with just one mug. But then I need to teach the robot so it recognizes that it can pick up a mug of any color.”

“It was so clear right off the bat,” says Peng. “Humans are so good at this type of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that makes sense.”

Fine-tuning involves tweaking a machine-learning model that has already been trained to perform one task so it can perform a second, similar task.

Thanks,
Quality Digest

منبع: https://www.qualitydigest.com/inside/customer-care-article/faster-way-teach-robot-081723.html

Our PROMISE: Quality Digest only displays static ads that never overlay or cover up content. They never get in your way. They are there for you to read, or not.

Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your home. When you ask it to pick up a mug from your kitchen table, it might not recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.