Yesterday, our friends at Unsloth shared an issue with gradient accumulation affecting Transformers Trainer. The first report comes from @bnjmn_marie (kudos to him!). Gradient accumulation can be considered the mathematical equivalent of full batch training. However, the losses were not consistent between training runs with the setting turned on and off. Where did it come [...]
The post Modifying gradient accumulation first appeared on Versa AI hub.
from Blog - Versa AI hub https://versaaihub.com/modifying-gradient-accumulation/?utm_source=rss&utm_medium=rss&utm_campaign=modifying-gradient-accumulation
via IFTTT
No comments:
Post a Comment