Reinforcement Learning Glossary

Environment - The World in which our agent lives and interacts with.
Agent - The character interacting with the world.
State - Description of your environment/world. There is no information of the environment which is hidden in the state.
Observation - A partial/complete view of the state, given as input to the Agent
1. Fully Observed Environment - Complete State = Observation
2. Partially Observed Environment - Incomplete State = Observation
Action Spaces - Set of all valid actions in an environment
1. Discrete Action Spaces - Finite Number of Moves
2. Continuous Action Spaces - Real Valued Vectors
Policies - Rule used by agent to decide which action to take
1. Stochastic Policies - When the actions is based on a probability distribution
  1. Two common kinds
    1. Categorical Policies - Used in Discrete Action Spaces
    2. Diagonal Gaussian Policies - Used in Continuous Action Spaces
  2. Need to be able to sample actions from policies and compute log likelihoods of particular actions
Trajectories/Episodes/Rollouts - Sequence of States and Actions in the world
1. The first state is sampled from start-state distribution
Reward - The value that describes the impact of the action taken with the current state and next state.
Return - Cumulative Reward
1. Finite-Horizon undiscounted Return - Sum of Rewards obtained in a fixed window of steps
2. Infinite-Horizon Discounted Return - Sum of all rewards ever obtained but discounted by how far off in the future.
Value Functions - Expected Return if start in a state or a state-action pair, then act according to a particular policy
1. On-Policy Value Function - Expected Return if you start in state $s$ and act to according to policy
2. On-Policy Action-Value Function - Expected Return if you start in state $s$ and take an arbitrary action and then forever act according to the optimal policy - aka known as the Q-Function
3. Optimal Value Function - Expected Return when you start in a state and act according to the optimal policy
4. Optimal Action-Value Function - Expected Return if you start in a state and take an arbitrary action, and then forever after act according to the optimal policy.
Advantage Functions - How much better an action is compared to the average?
1. How much better it is to select a specific action over selecting an action randomly and then following the same policy?
Model-Based Algorithms - The agent has access to (or learns) a model of the environment.
Model-Free Algorithms - The agent doesn't has access to a model of the environment.
Policy Optimization - Optimize the parameters representing the policy, and mostly it is on-policy. Basically, on-policy means the user data is collected while acting on the most recent version of the policy.
Q-Learning Optimization - Learn an Approximation for the optimal action value function. Usually performed off-policy.