浙江师范大学数学与计算机科学学院“机器学习与视觉”专题系列报告二
报告题目:On Training Implicit Models
报告专家:林宙辰(北京大学)
报告时间:2021年12月06日09:50-10:30
报告地点:腾讯会议号699-492-742(浙师大MLV专题系列报告第三期)
报告摘要:Implicit models have emerged as deep networks with infinite layers. They have good mathematical properties and can achieve competitive performance as traditional deep networks. However, their training is a big issue. Previous works employ the implicit differentiation and solve the exact gradient for the backward propagation. However, is it necessary to compute such an exact gradient (which is usually quite expensive) for training? To this end, we propose a novel gradient estimate for these implicit models, named phantom gradient, that 1) forgoes the costly approximation of the exact gradient; and 2) provides an update direction (empirically) preferable to the implicit model training. We theoretically analyze the condition under which a descent direction of the loss landscape could be found, and provide two specific instantiations of the phantom gradient based on unrolling and the Neumann series. Experiments on large-scale vision tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models (roughly 1.7× speedup), and even boost the performance over approaches based on the exact gradient.
邀请人: 郑忠龙