First of all we have a vector of y_i, outputs of connected layers of neurons aka weights and features - dot product.
[1, 2, 3, 4]
sum_of_all_e_exp = e^1 + e^2 + e^3 + e^4
the first output is
p_0 = e^1 / sum_of_all_e_exp
Explore <s, a> ---> s' reads: move from current state s to s' via action a. Through the action a reward is received, it ...