Upcoming seminar: On the influence of stochastic rounding bias in implementing gradient descent with applications in low-precision training – Lu Xia (Eindhoven University of Technology)

July 13, 2023 by Fabio Durastante

Venue

Dipartimento di Matematica, Aula Magna.
18 Luglio 2023 – 14:00 – 15:00

Abstract

In the context of low-precision computation for the training of neural networks with the gradient descent method (GD), the occurrence of deterministic rounding errors often leads to stagnation or adversely affects the convergence of the optimizers. The employment of unbiased stochastic rounding (SR) may partially capture gradient updates that are lower than the minimum rounding precision, with a certain probability. We provide a theoretical elucidation for the stagnation observed in GD when training neural networks with low-precision computation. We analyze the impact of floating-point roundoff errors on the convergence behavior of GD with a particular focus on convex problems. Two biased stochastic rounding methods, signed-SR and SR, are proposed, which have been demonstrated to eliminate the stagnation of GD and to result in significantly faster convergence than SR in low-precision floating-point computation.
We validate our theoretical analysis by training a binary logistic regression model on the Cifar10 database and a 4-layer fully-connected neural network model on the MNIST database, utilizing a 16-bit floating-point representation and various rounding techniques.
The experiments demonstrate that signed-SR and SR may achieve higher classification accuracy than rounding to the nearest (RN) and SR, with the same number of training epochs. It is shown that a faster convergence may be obtained by the new rounding methods with 16-bit floating-point representation than by RN with 32-bit floating-point representation.

Further information is available on the event page on the Indico platform.