Recent Forum Posts
 From categories: All categories News: Course News Discussions: General Discussions: HW0 Discussions: HW1 Discussions: HW2 Discussions: HW3 Discussions: HW4 Discussions: HW5 Discussions: Past Exams
page 1123...next »

Thanks, it was fixed.

In Q1, last section it should be gamma squared in the denominator.

I still don't understand the first equation.. What formula did you use? because
\phi(a|s;\theta)=P_t*a+(1-a)(1-P_t)

so

log(\phi(a|s;\theta))=log (P_t*a+(1-a)(1-P_t))

and you can't open that expession

Guy is right. The original text was correct (it wasn't a typo).
The following formula:

(1)
\begin{align} V_{t+1}^*(b) =& \max_a [r(b,a)+\gamma\sum_o V_t^*(T(b,a,o))] \end{align}

Can be written as the following 3 lines:

(2)
\begin{align} V_{t+1}^*(b)=&\max_a V_{t}^a(b) \\ V_{t}^a(b)=&\sum_o V^{a,o}_{t}(b) \\ V^{a,o}_{t}(b)=&\frac{r(b,a)}{|O|}+\gamma\Pr[o|b,a]V_t(T(b,a,o)) \end{align}

Why is this a mistake?
It appears also in the lecture in slide 30.
It seems as a different notation for the formula in the first row.

Did you upload that? Because it still looks the same.

So you need to change the description of that in the question in recitation 7 because it says the gamma is 0.5

Recitation 7 solves the question with $\gamma=1$, while recitation 8 solves the question with $\gamma=0.5$.

thanks, the typo was fixed.

Thanks for pointing out the error - the scribe was fixed and clarified.

recitation 11 page 2 down.

3 lines before the end there is

V*_(t+1)=max_a (V*_t (V^a_t(b))

Isn't it supposed to be

V*_(t)=max_a (V*_t (V^a_t(b))?

Hello,

In recitation 9 page 4 the last line I don't understand not the first equation and not the second equation.

If I understand correctly \phi(a|s;\theta)=P_t*a+(1-a)(1-P_t)

So how did it get to the first equation?

And for the second equation why did \nabla_\theta(P_t)=-x(s)?

YES, you are right.

1. Yes, you can say that
2. Thank you for finding this typos.

Does recitation 7 end with the same exercise that recitation 8 begins?

Why does the answers to the TD-lambda are different?

Did you mean
"If we move right to X=2, we need to first return to X=1, which takes E[T], and then return to X=0, which is another E[T]."

because I don't understand the meaning of what you wrote.

THANKS!

פורסם בקבוצת הווטצאפ

yes, if you can explain delta_v (Part 2: solution 1) in terms of the definitions of Part1. Would it be correct to say that delta_v(s) = alpha * E_t_s * delta_t?

In addition, i think there is a missing gamma (as a multiplier) in the second expression in the bottom line in 1-3,

∆V (s1) = α ∗ (r1 + γ ∗ VT −1(s2) − VT −1(s1)) + α (r2 + γ ∗ VT −1(s3) − VT −1(s2)) [* gamma?]

otherwise v_t_1(s2) would not be cancelled

and another missing gamma in 1-4 in the last updates in delta_v(s2)

∆V (s2) = α ∗ (r2 + γ ∗ VT −1(s3) − VT −1(s2)) + α (r3 + γ ∗ VT −1(sF ) − VT −1(s3)) * [* gamma?]

otherwise v_t_1(s3) would not be cancelled

thanks
Rafi

Thanks!

page 1123...next »