Neuroevolution

Published on September 5, 2024

This project explores the use of neuroevolution techniques to solve complex reinforcement learning environments such as Lunar Lander and Bipedal Walker. These environments pose significant challenges due to their continuous action spaces, making neuroevolution a promising alternative to traditional Deep Q-Learning methods.

Introduction

Reinforcement learning (RL) has gained significant attention in recent years, particularly with its applications in robotics, gaming, and autonomous systems. In this project, I explored deep reinforcement learning methods, specifically the Actor-Critic and Deep Deterministic Policy Gradient (DDPG) algorithms. My primary goal was to apply these methods to solve continuous action space problems, with a focus on solving the complex problem of hardcore BipedalWalker-v3.

What is Neuroevolution?

Neuroevolution involves training neural networks using evolutionary algorithms instead of gradient-based optimization. By evolving a population of networks over generations and applying genetic operators like mutation and crossover, the goal is to optimize performance in challenging environments. The performance is measured by a fitness function, which is typically the cumulative reward obtained in the environment.

Why Neuroevolution?

For complex environments like Lunar Lander and Bipedal Walker, neuroevolution offers advantages over traditional methods by promoting diverse exploration and improving generalization. Despite extensive efforts using Q-Learning and other Deep Q-Learning techniques, such as Actor-Critic and DDPG, the benchmarks were not met. Neuroevolution provided a simpler yet effective alternative that successfully overcame these challenges.

Part 1: Lunar Lander with Neuroevolution

Before tackling the more complex Bipedal Walker environment, I tested neuroevolution in the Lunar Lander environment, which, while still having a continuous observation space, operates in a discrete action space. This simpler scenario allowed for experimentation and refinement of the neuroevolution approach.

Key Features of Lunar Lander

The Lunar Lander environment simulates the landing of a spacecraft, where the primary goal is to control the spacecraft’s thrusters to achieve a safe landing without crashing. It involves only four control actions: up, left, right, and down, making it an ideal testing ground for neuroevolution.

Neural Network Adapted to Neuroevolution

To adjust to neuroevolution requirements, the neural network framework was modified to include methods that create a network with specific weights derived from chromosomes, which are encoded representations of the network's weights and biases.

Training Function and Fitness Evaluation

The training process involves creating a population of random individuals, evolving them over generations, and evaluating their performance using a fitness function. The fitness function measures cumulative rewards, guiding the selection of individuals for reproduction. Elitism ensures that the highest-performing individuals are preserved across generations.

Execution Examples for Lunar Lander

In 1,000 test iterations, the neuroevolution approach achieved a win rate of 98.10%, demonstrating the effectiveness of this method in conquering the Lunar Lander environment.

Part 2: Bipedal Walker with Neuroevolution

Building on the success with Lunar Lander, I applied neuroevolution to the more complex Bipedal Walker environment. This environment tests algorithms that control a two-legged robot, challenging them to maintain balance and walk across flat terrain.

Key Features of Bipedal Walker

The Bipedal Walker environment features a continuous action space, observation space, and a reward function that encourages effective walking while penalizing falls and excessive energy use. The objective is to develop a control policy that enables the robot to walk as far as possible without falling.

Challenges and Future Directions

While neuroevolution was successful in standard Bipedal Walker scenarios, it struggled with the hardcore mode due to the exponential increase in chromosome size and complexity. Future work will explore maintaining successful architectures or retraining in more complex environments.

Conclusion

Neuroevolution proved to be a powerful tool in tackling the challenges posed by Lunar Lander and Bipedal Walker. By evolving neural networks through genetic algorithms, previously unattainable results were achieved. This approach opens up new possibilities for solving complex problems in continuous action spaces, providing a strong foundation for further exploration and refinement.

View on GitHub