講座報(bào)告主題:非均勻環(huán)境下的強(qiáng)化學(xué)習(xí)
專家姓名:史成春
日期:2023-09-18 時(shí)間:09:00
地點(diǎn):數(shù)科院206
主辦單位:數(shù)學(xué)科學(xué)學(xué)院
主講簡(jiǎn)介:Chengchun Shi is an Associate Professor at London School of Economics and Political Science. He is serving as the associate editors of JRSSB, JASA (T&M) and Journal of Nonparametric Statistics. His research focuses on developing statistical learning methods in reinforcement learning, with applications to healthcare, ridesharing, video-sharing and neuroimaging. He was the recipient of the Royal Statistical Society Research Prize in 2021. He also received the IMS travel awards in three years.研究專長(zhǎng):強(qiáng)化學(xué)習(xí),,統(tǒng)計(jì)推斷,。
主講內(nèi)容簡(jiǎn)介:本文考慮在可能的非平穩(wěn)環(huán)境中進(jìn)行離線強(qiáng)化學(xué)習(xí)(RL)方法,。文獻(xiàn)中許多現(xiàn)有的RL算法依賴于平穩(wěn)性假設(shè),,該假設(shè)要求系統(tǒng)轉(zhuǎn)換和獎(jiǎng)勵(lì)函數(shù)在時(shí)間上保持恒定,。然而,,實(shí)際情況下,,平穩(wěn)性假設(shè)是有限制性的,,并且在許多應(yīng)用中很可能被違反,,包括交通信號(hào)控制,、機(jī)器人技術(shù)和移動(dòng)健康等領(lǐng)域。在本文中,,我們基于預(yù)先收集的歷史數(shù)據(jù),,提出了一種一致的過程來測(cè)試最優(yōu)策略的非平穩(wěn)性,而無需額外的在線數(shù)據(jù)收集,?;谔岢龅臏y(cè)試,我們進(jìn)一步開發(fā)了一種順序變點(diǎn)檢測(cè)方法,,可以與現(xiàn)有的最先進(jìn)RL方法自然地結(jié)合,,用于在非平穩(wěn)環(huán)境中進(jìn)行策略優(yōu)化。我們的方法的有用性通過理論結(jié)果,、仿真研究和來自2018年實(shí)習(xí)生健康研究的真實(shí)數(shù)據(jù)示例進(jìn)行了說明,。提出的方法的Python實(shí)現(xiàn)可在https://github.com/limengbinggz/CUSUM-RL ↗ 上找到。
歡迎師生參加,!