The Curated Data Platform is a 2-hour video training aimed at providing you a 30,000 foot overview of the data platform space. In this course, I take you through a variety of data platform technologies—such as relational databases, document databases, caching technologies, data lakes, and graph databases. I show you use cases in which these technologies can be great fits, as well as which companies and products are most relevant in that space today. This includes on-premises technologies as well as major services in Amazon Web Services and Azure.
Get This Course for Free! (Limited Time Offer)
Through Sunday, July 25, 2021, you can register for this course for free using the coupon code FIRSTMOVER when you check out. My one request with this is, if you use the coupon code, please be sure to leave feedback on the course—things which you liked, as well as things you wanted to see but didn’t. I intend to update this course over time to make it better based in part on learner feedback.
This won’t be a post diving into the details of how reinforcement learning works; Andrej does that far better than I possibly could, so read the post. Instead, the purpose of this post is to provide a minor update to Andrej’s code to switch it from Python 2 to Python 3. In doing this, I went with the most convenient answer over a potentially better solution (e.g., switching xrange() to range() rather then re-working the code), but it does work. I also bumped up the learning rate a little bit to pick up the pace a bit.
The code is available as a GitHub Gist, which I’ve reproduced below.
H=200# number of hidden layer neurons
batch_size=10# after how many episodes do we do a parameter update?
gamma=0.99# discount factor for reward
decay_rate=0.99# decay factor for RMSProp leaky sum of grad^2
After running the code for a solid weekend, I was able to build an agent which can hold its own against the CPU, though won’t dominate the game. Still, it’s nice to see an example of training a computer to perform a reasonably complex task (deflecting a ball into the opponent’s goal while preventing the same) when all you provide is a set of possible instructions on how to act (move the paddle up or down) and an indication of how you did in the prior round.