Robotic agents can be made to learn various tasks through simulating many years of robotic interaction with the environment which cannot be made in case of real robots. With the abundance of a large amount of replay data and the increasing fidelity of simulators to implement complex physical interaction between the robots and the environment, we can make them learn various tasks that would require a lifetime to master. But, the real benefits of such training are only feasible, if it is transferable to the real machines. Although simulations are an effective environment for training agents, as they provide a safe manner to test and train agents, often in robotics, the policies trained in simulation do not transfer well to the real world. This difficulty is compounded by the fact that oftentimes the optimization algorithms based on deep learning exploit simulator flaws to cheat the simulator in order to reap better reward values. Therefore, we would like to apply some commonly used reinforcement learning algorithms to train a simulated agent modelled on the Aldebaran NAO humanoid robot.
The problem of transferring the simulated experience to real life is called the reality gap. In order to bridge the reality gap between the simulated and real agents, we employ a Difference model which will learn the difference between the state distributions of the real and simulated agents. The robot is trained on two basic tasks of navigation and bipedal walking. Deep Reinforcement Learning algorithms such as Deep Q-Networks (DQN) and Deep Deterministic Policy Gradients(DDPG) are used to achieve proficiency in these tasks. We then evaluate the performance of the learned policies and transfer them to a real robot using a Difference model based on an addition to the DDPG algorithm.
Inhaltsverzeichnis (Table of Contents)
- Introduction
- Problem Statement
- Motivation
- Scope of the Project
- Literature Review
- Previous Work on Humanoid Robot Control
- Reinforcement Learning for Robotics
- Reinforcement Learning (RL)
- Markov Decision Process (MDP)
- Value Iteration and Policy Iteration
- Deep Reinforcement Learning (DRL)
- Deep Q-Network (DQN)
- Deep Deterministic Policy Gradient (DDPG)
- Trust Region Policy Optimization (TRPO)
- Simulation Environment
- Webots
- Gym
- Custom Environment Development
- Agent Testing Framework
- Agent Environment Interface
- Robot Environment Controller Model
- Results and Discussion
- Simulation Results
- Real-world Experiments
- Performance Evaluation
- Limitations
- Conclusion
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This project aims to develop and evaluate a novel approach for controlling humanoid robots using reinforcement learning techniques. The focus is on bridging the gap between simulated and real-world environments, improving the performance of learned policies in real-world applications.
- Humanoid robot control using reinforcement learning
- Simulation-to-reality transfer of learned policies
- Developing a robust agent testing framework
- Evaluating the effectiveness of different DRL algorithms
- Addressing the challenges of real-world robot control
Zusammenfassung der Kapitel (Chapter Summaries)
- Introduction: This chapter introduces the problem of controlling humanoid robots and motivates the use of reinforcement learning for this purpose. It outlines the scope and objectives of the project.
- Literature Review: This chapter presents a comprehensive review of existing research on humanoid robot control and reinforcement learning for robotics.
- Reinforcement Learning (RL): This chapter provides an overview of reinforcement learning, including fundamental concepts, algorithms, and their application to robot control.
- Simulation Environment: This chapter describes the simulation environment used in the project, highlighting its capabilities and how it facilitates robot control experiments.
- Agent Testing Framework: This chapter introduces the agent testing framework developed for evaluating the performance of trained agents in simulated and real-world settings.
- Results and Discussion: This chapter presents and discusses the results of experiments conducted using the proposed approach, analyzing the performance of learned policies in both simulation and real-world environments.
Schlüsselwörter (Keywords)
This work explores the application of reinforcement learning for humanoid robot control. The research centers on bridging the gap between simulated and real-world environments, evaluating different deep reinforcement learning algorithms, and establishing a robust agent testing framework for real-world deployment. The study incorporates concepts like Markov Decision Process (MDP), Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO), and simulation-to-reality transfer.
- Quote paper
- Suman Deb (Author), 2019, Humanoid robot control policy and interaction design. A study on simulation to machine deployment, Munich, GRIN Verlag, https://www.hausarbeiten.de/document/493652