Large Language Models
Reinforcement Learning with Human Feedback and Instruction Following
Research Scientist
Training large language models on high-quality human feedback data has been one of the key advances in LLM development over the past year.
However, building high-quality human feedback infrastructure is challenging – requiring a combination of technical, operational, and data expertise. On the technical side, Cohere needed easy-to-use labeling tools and robust quality control algorithms. On the operational side, they needed large-scale teams of workers with the sophistication needed to teach their models a diverse range of language-based skills. On the data side, they needed to develop guidelines to capture the diverse types of human feedback needed to make their language models effective at real-world use cases.
Cohere evaluated several data labeling vendors, but found them lacking in real-world large language model experience, and their quality didn’t meet Cohere’s bar.
After learning of Surge AI’s research and experience in the large language model space, Cohere began leveraging Surge AI’s LLM platform to train and evaluate their models on high-quality human data.
“Cohere’s Command Beta model gained the top spot in the Stanford HELM (Holistic Evaluation of Language Models) earlier this month. The startup’s generative model that’s conditioned to respond well to single-statement commands stood out among 36 LLM models, including Meta’s Galactica, OpenAI’s Davinci, Google’s Flan, Bloom and others.”