Customers
Use Cases
Language Models
Electrify your large language models
Search Evaluation
Power your search engine with human evaluation
Content Moderation
Moderate content like never before
Latest Case Study
Read more
Other Use cases
Instruct and RLHF Training
Customer Support
GPT Fine Tuning
Sentiment Analysis
Financial Categorization
Hundreds of other uses
RLHF
Blog
About
Pricing
Team
Contact
Careers
Login
Get Started
Login
Get Started
Get Started
Home
/
Blog
/
Authors
/
Edwin Chen
Edwin Chen
Follow on Twitter
Follow on Facebook
Follow on LinkedIn
DALL·E 3 and Midjourney Fail Astral Codex Ten's Image Generation Bet
Edwin Chen
AI
How Anthropic uses Surge AI’s RLHF platform to train their LLM Assistant on Human Feedback
Edwin Chen
Large Language Models
How RLHF Shifts LLMs from Autocompletion to Conversational Understanding
Edwin Chen
AI
Introduction to Reinforcement Learning with Human Feedback
Edwin Chen
Large Language Models
2022 Blog Recap: Trends in AI, Language, & Data
Edwin Chen
AI
We Evaluated ChatGPT vs. Google on 500 Search Queries
Edwin Chen
Large Language Models
AI Red Teams for Adversarial Training: How to Make ChatGPT and LLMs Adversarially Robust
Edwin Chen
Large Language Models
HellaSwag or HellaBad? 36% of this popular LLM benchmark contains errors
Edwin Chen
Large Language Models
How TikTok is Evolving the Next Generation of Search
Edwin Chen
Social Media
Evaluating Generative AI: Did Astral Codex Ten Win His Bet on AI Progress?
Edwin Chen
AI
The $250K Inverse Scaling Prize and Human-AI Alignment
Edwin Chen
Large Language Models
Human Evaluation of Large Language Models: How Good is Hugging Face's BLOOM?
Edwin Chen
Large Language Models
30% of Google's Emotions Dataset is Mislabeled
Edwin Chen
AI
Search Behind-the-Scenes: How Neeva Uses Human Evaluation to Measure Search Quality
Edwin Chen
Case Studies
Humans vs. Gary Marcus vs. Slate Star Codex: When is an AI failure actually a failure?
Edwin Chen
AI
How Surge AI Built OpenAI's GSM8K Dataset of 8,500 Math Problems
Edwin Chen
AI
10 Egregious Failures in Gmail Spam Detection
Edwin Chen
Content Moderation
We asked 100 humans to draw the DALL·E prompts
Edwin Chen
AI
Google Search is Falling Behind
Edwin Chen
Engineering
Holy $#!t: Are popular toxicity models simply profanity detectors?
Edwin Chen
Content Moderation
Is Google Search Deteriorating? Measuring Google's Search Quality in 2022
Edwin Chen
Human Evaluation
The AI Bottleneck: High-Quality, Human-Powered Data
Edwin Chen
Product