Model Based Reinforcement Learning for Cart-pole Swing-up

Implemented Model Predictive Controller to balance the cartpole, the MPC has two components, the first tries to learn the environment dynamics and the other plans the policy (sequential actions) based on the current state
Cross Entroy Maximization algorithm was used for the planner, the algorithm iteratively optimizes a set of Gaussian distributions that model a distribution over each action in a trajectory

The second video shows the performance of the agent after training, the first video is a random policy in action