Google Search is Falling Behind

Google Search is Falling Behind

Is Google resting on its laurels?

Imagine you’re a programmer, and you forget the syntax to delete a file in Python. So you search “python delete file”, and get the following:

"python delete file" on Google

It’s not a terrible search results page... But:

  • The caption for the first search result describes removing a folder, not a file.
  • You still have to spend time browsing to find a useful answer. No intelligent question-answering or extraction here!

Is this page any better than what you’d have gotten in 2010? What if it looked like this instead?

"python delete file" on Neeva
"python delete file" on You.com

In both cases, a great piece of code (syntax-highlighted!) is highlighted front and center, which means you never even have to click into the blue links. As an ex-Googler, I’ve historically been skeptical of all the other search engines, but go try them yourself — the whole experience feels so much fresher and snappier.

Remember Google's past innovations, like blending Images, Videos, and Maps onto the same page?! Now we’re lucky if we can find a pancake recipe without wading through 7 ads and 3 pages of storytime first.

Seriously! Here’s the top search result for “blueberry pancakes”:

3 in-your-face ads...
Followed by even more ads and filler!

And that's before I even get to the recipe. What happened to page quality factors in ranking?

Where else is Google starting to fall behind, and how could competitors chip away at its edge? Human evaluation of search quality is one of our flagship use cases at Surge, so let’s dive into three key Search verticals — Programming, Cooking, and Travel — and find out.

Programming

We’re a human intelligence platform for rich, contextual AI; as part of that, we have high-quality “labelers” with software engineering backgrounds who train AI models to answer programming questions and learn how to code. We asked them to collect searches they performed in their normal workday, and then evaluate their performance on Google, Bing, Neeva, You.com, Kagi, and DuckDuckGo.

Here are more examples of programming queries where Google is lagging behind.

Example #1

Rater Background (Jason L.): “I graduated with a degree in Computer Science and a minor in Business last year, and now I’m bootstrapping a passive income business that helps artists sell their work online.”

Query: “python throw exception”

Intent: “I normally use Java and Go, but wanted to use Python for a quick script, and I forgot how to throw an exception. The ideal search result would be an example of the syntax.”

Google Evaluation

"python throw exception" on Google. Are 10 blue links the best it can do?

Google rating: “This search result page is just okay. I don't like the first search result (I already know what exceptions are in programming, so the official Python documentation is way more than I need). I have to scroll halfway down before finding the information that I want.

In a world of personalization, could Google tell this is an experienced programmer who doesn't want a long-winded, introductory tutorial?

The second search result isn’t great either (I don’t want to read a long-winded blog post). Luckily, the third StackOverflow search result is exactly what I was looking for, but pretty much every other search engine performed better.”

Neeva Evaluation

Neeva rating: “Neeva made it very easy to see an example piece of code on the right hand side. The UI also felt very fresh and snappy compared to Google!

The only way it could be better is if the snippet in the first search result were more relevant (“x = -1” isn’t exactly useful).”

You.com Evaluation

You.com rating: “I liked the way that You allowed me to open up the SO answer in the side panel, so that I didn’t need to leave the page.”

“You's code complete module generated Mandarin PHP in this case (but I’ve found it useful in other cases, like writing an email extraction regex)”

"python throw exception" in You.com's code completion module
"python extract email from string" in You.com's code complete module

Bing Evaluation

Bing rating: “Bing also did a better job than Google. It seemed to understand my query better, and the box at the top reminded me that the keyword is “raise”, and not “throw” like in Java.

It extracted a StackOverflow code sample nicely too. The issue is that this was the 5th search result and should have been ranked higher.”

"python throw exception" on Bing
Bing automatically extracts a nice code sample from StackOverflow

Kagi Evaluation

Kagi rating: “Kagi also did a better job than Google. Like Neeva and You.com, surfacing code snippets and the StackOverflow answer on the right-hand panel made it much faster for me to find what I wanted.”

"python throw exception" on Kagi

Example #2

Rater background (Ning X.): “I’m finishing up my senior year in college. I also work as a research assistant in a neuroscience lab, where I use a lot of R and Python, and I’ll be starting a master’s in the fall.”

Query: “filter pandas dataframe date”

Intent: “I recently started learning how to use the pandas library in Python, and I was trying to figure out how to filter a date column in my data frame.”

Google vs. Neeva Evaluation

Rating: “Google’s first search result was very relevant, and the same as Neeva’s. However, Neeva’s UI was better since it extracted exactly the code sample I needed on to the search page itself.

It’s even syntax highlighted, which makes it much easier to read!”

"filter pandas dataframe data" on Google — just 10 blue links...
"filter pandas dataframe date" on Neeva — with intelligent code extraction

Coding searches are clearly one area where Google is starting to lag behind its competitors. In particular, programmers found the code extraction and StackOverflow features on other sites particularly helpful – in addition to a generally slicker UI.

Cooking

What about cooking queries? For these, we sampled our general pool of raters.

Example #1

Rater background (Daniela M.): “I’m a high school History teacher. Besides History, I also enjoy creative writing and playing the piano. While I love teaching my students, it’s long hours and the pay isn’t great, so I’ve been exploring other fields and may make a move next year.”

Query: “biryani recipe”

Intent: “I went to an Indian restaurant with some friends and really enjoyed the biryani we ordered, so I wanted to learn how to make it myself.”

Google Evaluation

Google rating: “Google’s results weren’t anything special. The pictures at the top looked pretty gross, to be honest, and very low-quality. Not what I want to see when looking for a recipe to make.”

"biryani recipe" on Google

Bing Evaluation

Bing rating: “Bing’s results were the best by far. The pictures at the top are so much more appetizing!

"biryani recipe" on Bing

I’ve found the Bing UI distracting in other cases, but it really shone here. I didn’t know much about biryani, so I spent a little time on the search page learning more about it, and I like the way the videos are laid out.

Bing's "biryani recipe" UI

I also liked how it automatically extracted the recipes and videos on the right side. I get a little anxiety when clicking on recipe websites, since you never know what you’re going to find - most, especially recipes from personal bloggers, are incredibly annoying and full of ads and annoying prose.

The only downside was that only the ingredients and nutritional info were extracted, but not the recipe directions themselves. Hopefully they can fix that soon.

Bing automatically extracts recipe information from the webpage

The filters at the top were also a nice touch. In particular, the “Quick” filter was very useful, since I don’t want to spend over an hour cooking. Overall, Bing gets a 5/5 and leaves Google in the dust!”

Bing's recipe filters

Neeva Evaluation

Neeva rating: “The one thing missing from Bing was having the full recipe directions automatically extracted too (not just the ingredients). So I really liked this feature on Neeva.

The carousel on the bottom also made it very fast to quickly browse through recipes, which was helpful because some of the recipes required way more ingredients than I had.”

Travel

Finally, let’s look at Travel queries.

Example #1

Rater background (Alex S.): “I’m a part-time musician and music teacher while I’m staying at home to take care of the kids. Recently, I’ve been trying to learn French and learn more about AI.”

Query: “what to do in los angeles”

Intent: “I’m going to Los Angeles next week for the first time. I don’t have any plans yet and am looking for fun things to do.”

Google Evaluation

Google rating: “Google’s search results were good. The first two lists were great, and I liked the pictures of the top sights.”

"what to do in los angeles" on Google

Bing Evaluation

Bing rating: “But I liked Bing’s even more. The images and filters at the top made it easy to discover fun attractions. Its page seems to be surprisingly “smarter” than Google’s and more advanced.”

"what to do in los angeles" on Bing

You.com and Kagi Evaluation

Ratings: “I’m a sucker for other people’s recommendations, so I also appreciated the “forum” sections that You.com and Kagi had, though the replies were a little sparse.

[author's note: it would be really interesting to see what these search engines could do with intelligent parsing and scoring of forum threads]

"what to do in los angeles" on You.com
"what to do in los angeles" on Kagi

DuckDuckGo Evaluation

DuckDuckGo rating: “As much as people talk about DuckDuckGo, its Maps suggestions at the top were odd. But the rest of the results were fine.”

"what to do in los angeles" on DuckDuckGo

In short, Google was already pretty good, but Bing was even better, thanks to its visual UI, its filters, and its map. I also liked the You.com and Kagi concepts of extracting “social” or “forum” elements, and I'm excited to see how they could add even more community intelligence features.”

Summary

To be clear, we love Google and Search — our team used to work on these problems at Google, Facebook, and Bing. As the Internet changes from what it looked like in 1997 — whether due to the walled gardens of social media, the proliferation of image and video, or the rise of SEO spam and AI-generated content — we just want to make sure the platforms advance alongside it and is optimized for human values.

And if you enjoyed these examples, send us an email or follow @WeRateGoogle for showcases of misperforming search results and ways to make them better!

https://twitter.com/WeRateGoogle/status/1513640424530845696
Edwin Chen

Edwin Chen

Edwin oversees Surge AI's Engineering and Research teams — whether it's helping customers train large language models on human feedback, building content moderation algorithms to detect hate speech and spam, or scaling up an elite data labeling workforce. He previously led AI, Data Science, and Human Computation teams at Google, Facebook, and Twitter, and studied mathematics and linguistics at MIT.

surge ai logo

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.

Meet the world's largest
RLHF platform

Follow Surge AI!