2022 Blog Recap: Trends in AI, Language, & Data

2022 showcased the need for amazing data in order to build rich, next-gen AI – exactly the story we tell on our blog. So let’s take a look back at this year by recapping our most popular content!

Popular AI & Data Themes

Theme #1: The agonizing death of Google Search

The frustrations with Google have been simmering since the start of the year, when we published a human evaluation study measuring the decline in Google’s search quality – our first post to reach #1 on Hacker News!

Measuring the decline in Google search quality.

Since then:

YC partners have joined into the debate.

New startups like Neeva, You.com, Andi, Perplexity AI, and Kagi have risen to take advantage of the holes Google is leaving.

https://twitter.com/neeva/status/1559942483412279297?lang=en

This culminated in the greatest existential threat Google has faced since Facebook: ChatGPT – which we measured as matching and even beating Google in several domains.

ChatGPT smashes Google on coding queries, and matches it on general informational queries.

Theme #2: The importance of rich human data for the next wave of AI

Reinforcement learning with human feedback has led to a seismic surge in the usability and performance of LLMs. A big part of the advancement behind ChatGPT is simply better, higher-quality human feedback!

Traditional data labeling companies take an outdated view on data, and are focused on simple image problems – like drawing bounding boxes around cars. That’s why we designed our platform and quality control technology from the ground up, focusing on the richness needed to train future generations of AI.

We discussed how we partnered with OpenAI to create a mathematics dataset to teach GPT to solve math problems.

How we built a special OpenAI dataset to train GPT to solve math problems.

We called out the excruciating failures in Google’s ML datasets, and how that affects Google’s ML performance.

30% Of Google’s Emotions dataset is misalabeled!

And we explained why low-quality, mislabeled data in popular large language model benchmarks has been setting the field back for years.

The types of tasks that real-world large language models are strangely measuring themselves on!

Theme #3: Injecting human values into technology

With great power, comes great responsibility. How do we make sure that the superintelligent AI models of the future share our same values, and don’t accidentally spread toxicity, violence, and misinformation – like News Feed systems accidentally did?

We've been excited to partner with the leading AI safety organizations for their human data needs, like:

OpenAI on their values-targeted datasets
Anthropic on their harmless assistants
Redwood Research on adversarial robustness

We also talked about strategies to optimize machine learning algorithms for human values.

What if Twitter and Facebook optimized their recommendation systems for human values?

And we analyzed why wholesome, human-aligned and human-inspired content is a major reason TikTok is succeeding over the clickbait optimizations of Instagram and Reels.

‍

The clickbait, Photoshopped content that Instagram’s algorithm loves.

Theme #4: The rise of generative AI

InstructGPT made coaxing good text generation much easier. Goodbye contorted, autocomplete-based prompt engineering! And DALL·E and Stable Diffusion made image generation a new thing.

People turned to our posts on:

AI-generated (and AI-illustrated!) children’s stories

https://www.surgehq.ai/blog/generating-childrens-stories-using-gpt-3-and-dall-e

Building machine learning models with Copilot – Surgers train many of the advanced new programming assistants too!

Entering into spicy Slate Star Codex vs. Gary Marcus debates on the “intelligence” behind generative AI.

Human-drawn pixel *art of a robot farmer in a cathedral holding a red basketball.*

We also replaced all our blog images with generative ones, and explained why the rise of creative, generative AI means we need new human evaluation methods to replace static benchmarks.

A Surge AI language model evaluator, reading guidelines and measuring BLOOM.

Theme #5: The mirror needs of AI safety and content moderation

We’ve seen the potential dangers of technology, through sites like Twitter and Facebook. In the same way, AI will likely be a transformative force for good in the world, but it also has the potential to be greatly misused. Just think of even the prosaic worries about students using ChatGPT to cheat.

Content moderation and AI safety are very similar in many ways!

On the content moderation side…

We covered why why popular toxicity models like Google’s Jigsaw are merely profanity detectors – it’s bad data all the way down!

We explored the terrible (and obvious) violence, racism, and sexism that Twitter’s moderation systems fail to detect

A tweet that’s been undetected by Twitter’s content moderation for over 8 years.

And we measured the amount of Twitter spam for Elon.

Obvious, coordinated spam that Twitter fails to detect.

We also created several large, open-source datasets of hate speech and misinformation (reach out!), and our safety expertise was featured in outlets from the Wall Street Journal to Bloomberg and the Washington Post!

On the AI Alignment and Safety side…

We discussed adversarial methods for training robust LLMs.

How do you prevent Princess Peach from ripping someone’s head off?

We covered the importance of human-AI alignment.

Bad things happen when you train on raw Internet data!

And we collaborated with Anthropic on researching new methods for scalable human/AI oversight.

Most Popular Blog Posts

In summary, here’s a list of our top 10 most popular articles of 2022!

Edwin Chen

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.

2022 Blog Recap: Trends in AI, Language, & Data

Popular AI & Data Themes

Theme #1: The agonizing death of Google Search

Theme #2: The importance of rich human data for the next wave of AI

Theme #3: Injecting human values into technology

Theme #4: The rise of generative AI

Theme #5: The mirror needs of AI safety and content moderation

Most Popular Blog Posts

DALL·E 3 and Midjourney Fail Astral Codex Ten's Image Generation Bet

Edwin Chen

Data Labeling 2.0 for Rich, Creative AI

Meet the world's largest
RLHF platform

Welcome to
the world's largest RLHF platform

2022 Blog Recap: Trends in AI, Language, & Data

Popular AI & Data Themes

Theme #1: The agonizing death of Google Search

Theme #2: The importance of rich human data for the next wave of AI

Theme #3: Injecting human values into technology

Theme #4: The rise of generative AI

Theme #5: The mirror needs of AI safety and content moderation

Most Popular Blog Posts

DALL·E 3 and Midjourney Fail Astral Codex Ten's Image Generation Bet

Edwin Chen

Data Labeling 2.0 for Rich, Creative AI

Related articles

How Surge AI Built OpenAI's GSM8K Dataset of 8,500 Math Problems

How Anthropic uses Surge AI’s RLHF platform to train their LLM Assistant on Human Feedback

Holy $#!t: Are popular toxicity models simply profanity detectors?

DALL·E 3 and Midjourney Fail Astral Codex Ten's Image Generation Bet

How RLHF Shifts LLMs from Autocompletion to Conversational Understanding

Evaluating Generative AI: Did Astral Codex Ten Win His Bet on AI Progress?

Meet the world's largest RLHF platform

Welcome to the world's largest RLHF platform

Meet the world's largest
RLHF platform

Welcome to
the world's largest RLHF platform