Reinforcement Learning: Maze Games and OpenAI Gymnasium

OpenAI’s Gymnasium: Balancing Cart Pole Agent

The goal of this coursework is to train an agent to balance a pole attached (by a frictionless joint) to a moving (fric- tionless) cart by applying a fixed force to the cart in either the left or right direction. The coursework uses OpenAI’s Gymnasium Cart Pole Environment [1].
The aim is to train a Deep Q-Learning Agent (DQN) to keep the pole balanced (upright) for as many steps as possible by only controlling the direction of the force applied to the pole.

Hyperparameter Tuning Strategy

Hyperparameter	Value
Hidden Layer Size	5
Number of hidden Layers	2
Learning Rate	0.01 with adaptive learning
Replay Buffer Size	30000
Number of Training Episodes	3000
$\epsilon$ (for $\epsilon$-greedy policy)	0.2 with $\epsilon$-decay
Reward Discount	1
Batch size for RB sampled trace	100
Target Network Update Frequency	50
Decay Rate for $\epsilon$-greedy policy	0.95

An overview of the found optimal policy is shown below:

Code Implementation
Report

Maze Game

This Coursework’s goal is to solve a Maze environment modelled as a Markov Decision Process (MDP) using RL techniques that do not require full knowledge of the dynamic of the environment: Monte Carlo and Temporal Difference learning. In the maze, yellow squares symbolise obstacles, and the dark/purple and green squares absorbing states, that correspond to specific rewards. Absorbing states are terminal states, there is no transition from an absorbing state to any other state.

Optimal policies found using different methods are shown below:

Code Implementation
Report

References

[1] OpenAI (n.d.). OpenAI’s Gymnasium: Cart Pole Environment. [online] gymnasium.farama.org. Available at: https://gymnasium.farama.org/environments/classic_control/cart_pole/

Bluesky Facebook LinkedIn X (formerly Twitter)

Asia Belfiore

OpenAI’s Gymnasium: Balancing Cart Pole Agent

Hyperparameter Tuning Strategy

Maze Game

References