The Passive ADP Agent converges much faster than the PassiveTDAgent, which is as it should be.
The Alpha symbol was a bit confusing till I recognized it was the probability normalizing constant!(I mistook it for a learning constant like the ones in neural networks).
Now onto Q Learning!
No comments:
Post a Comment