Literature review of advanced machine learning technologies (like chatgpt)

Fine-Tuning Language Models from Human Preferences: https://arxiv.org/abs/1909.08593 Learning to summarize text: https://arxiv.org/pdf/2009.01325.pdf https://openai.com/research/learning-to-summarize-with-human-feedback InstructGPT: https://arxiv.org/abs/2203.02155 PPO Optimization: https://arxiv.org/abs/1707.06347 https://github.com/openai/baselines Proximal Policy Explained: https://www.youtube.com/watch?v=HrapVFNBN64 Deeper look: https://www.youtube.com/watch?v=MKb4orC58-M Summary: https://blog.tylertaewook.com/post/proximal-policy-optimization Tutorial: https://medium.com/deepgamingai/proximal-policy-optimization-tutorial-part-2-2-gae-and-ppo-loss-22337981f815

link