Offline Deep Reinforcement Learning and Off-Policy Evaluation for Personalized Basal Insulin Control in Type 1 Diabetes.
Zhu T., Li K., Georgiou P.
Recent advancements in hybrid closed-loop systems, also known as the artificial pancreas (AP), have been shown to optimize glucose control and reduce the self-management burdens for people living with type 1 diabetes (T1D). AP systems can adjust the basal infusion rates of insulin pumps, facilitated by real-time communication with continuous glucose monitoring. Empowered by deep neural networks, deep reinforcement learning (DRL) has introduced new paradigms of basal insulin control algorithms. However, all the existing DRL-based AP controllers require a large number of random online interactions between the agent and environment. While this can be validated in T1D simulators, it becomes impractical in real-world clinical settings. To this end, we propose an offline DRL framework that can develop and validate models for basal insulin control entirely offline. It comprises a DRL model based on the twin delayed deep deterministic policy gradient and behavior cloning, as well as off-policy evaluation (OPE) using fitted Q evaluation. We evaluated the proposed framework on an in silico dataset containing 10 virtual adults and 10 virtual adolescents, generated by the UVA/Padova T1D simulator, and the OhioT1DM dataset, a clinical dataset with 12 real T1D subjects. The performance on the in silico dataset shows that the offline DRL algorithm significantly increased time in range while reducing time below range and time above range for both adult and adolescent groups. The high Spearman's rank correlation coefficients between actual and estimated policy values indicate the accurate estimation made by the OPE. Then, we used the OPE to estimate model performance on the clinical dataset, where a notable increase in policy values was observed for each subject. The results demonstrate that the proposed framework is a viable and safe method for improving personalized basal insulin control in T1D.