免费一看一级欧美-免费一区二区三区免费视频-免费伊人-免费影片-99精品网-99精品小视频

課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 国产精品好好热在线观看 | 日本vr超清在线视频免费 | 久久激情视频 | 四虎永久地址 | 在线麻豆国产传媒60在线观看 | 国产福利不卡 | 亚洲欧美日韩国产 | 日本高清免费一本视频在线观看 | 朝鲜妇女特级毛片 | 亚洲日本在线播放 | 级毛片久久久毛片精品毛片 | 亚洲精品网站在线观看不卡无广告 | 西野翔在线播放 | 天堂网在线观看 | 国产 另类 在线 欧美日韩 | 国产一级片网站 | 日韩欧美一区二区中文字幕 | 一区二区三区免费观看 | 12至16末成年毛片高清 | 亚洲欧美日韩国产综合高清 | 日本国产高清色www视频在线 | 在线观看免费视频网站色 | 日韩一区二区三区在线免费观看 | 日日夜夜天天人人 | 久久九九99热这里只有精品 | 国产一级一级毛片 | 热久久久久 | 91露脸| 日韩伦理在线看不卡 | 国产欧美精品三区 | 在线视频啪 | 亚洲黄色激情网 | 青青草手机视频 | 欧美黄色免费在线观看 | 狠狠干伊人网 | 土地公土地婆免费观看全集 | 国产一区二区免费福利片 | 99热成人精品热久久669 | 四虎在线视频免费观看视频 | 天天综合欧美 | 免费视频久久久 |