Hi

in the instructions in the code qdn_learn.py regarding the generation of the bellman (d_error). It's not clear

exactly the definition of the bellman error. In class we studied that the delta error is equal to Q _estimated - Q_expected where

Q_expected is equal to R + gamma * Q'max. Correct? However in the atari paper they take the square of the delta error (Bellman error).

Should we follow the paper and use the square version or follow the code instrcutions in which we take the Bellman error as it is without the square?

Second, if we indeed follow the atari paper solution (SIGMA d_erorr[i] ** 2 where i is a batch element), then it is not clear why do we need to multiply the d_error by -1 (as mentioned in the code: "don't forget to clip the error between [-1,1], multiply is by -1 (since pytorch minimizes)").

Last, using the formula above, should we take the average (dividing it by batch size, and thus treat it as a Mean תוחלת)?

thanks,

Rafi