Reinforcement Learning Glossary

  1. Environment - The World in which our agent lives and interacts with.
  2. Agent - The character interacting with the world.
  3. State - Description of your environment/world. There is no information of the environment which is hidden in the state.
  4. Observation - A partial/complete view of the state, given as input to the Agent
    1. Fully Observed Environment - Complete State = Observation
    2. Partially Observed Environment - Incomplete State = Observation
  5. Action Spaces - Set of all valid actions in an environment
    1. Discrete Action Spaces - Finite Number of Moves
    2. Continuous Action Spaces - Real Valued Vectors
  6. Policies - Rule used by agent to decide which action to take
    1. Stochastic Policies - When the actions is based on a probability distribution
      1. Two common kinds
        1. Categorical Policies - Used in Discrete Action Spaces
        2. Diagonal Gaussian Policies - Used in Continuous Action Spaces
      2. Need to be able to sample actions from policies and compute log likelihoods of particular actions
  7. Trajectories/Episodes/Rollouts - Sequence of States and Actions in the world
    1. The first state is sampled from start-state distribution
  8. Reward - The value that describes the impact of the action taken with the current state and next state.
  9. Return - Cumulative Reward
    1. Finite-Horizon undiscounted Return - Sum of Rewards obtained in a fixed window of steps
    2. Infinite-Horizon Discounted Return - Sum of all rewards ever obtained but discounted by how far off in the future.
  10. Value Functions - Expected Return if start in a state or a state-action pair, then act according to a particular policy
    1. On-Policy Value Function - Expected Return if you start in state s and act to according to policy
    2. On-Policy Action-Value Function - Expected Return if you start in state s and take an arbitrary action and then forever act according to the optimal policy - aka known as the Q-Function
    3. Optimal Value Function - Expected Return when you start in a state and act according to the optimal policy
    4. Optimal Action-Value Function - Expected Return if you start in a state and take an arbitrary action, and then forever after act according to the optimal policy.
  11. Advantage Functions - How much better an action is compared to the average?
    1. How much better it is to select a specific action over selecting an action randomly and then following the same policy?
  12. Model-Based Algorithms - The agent has access to (or learns) a model of the environment.
  13. Model-Free Algorithms - The agent doesn't has access to a model of the environment.
  14. Policy Optimization - Optimize the parameters representing the policy, and mostly it is on-policy. Basically, on-policy means the user data is collected while acting on the most recent version of the policy.
  15. Q-Learning Optimization - Learn an Approximation for the optimal action value function. Usually performed off-policy.