Neeva - Ad-free, private search
https://neeva.com/
Search Engine
Custom Labeling Teams, Dataset Generation, Quality Controls, Template Editor, API, Search Evaluation, Dashboards
Neeva is the world’s first ad-free, private search engine, founded by a legendary team of ex-Googlers.
To succeed, Neeva needs to understand what users think of its search engine and how its capabilities stack up against incumbents. Rather than relying on proxy metrics like clicks, Neeva wanted to gather this data directly by running search evaluations where users rate the quality and relevance of Neeva’s search rankings.
High-quality search evaluations are notoriously hard for companies to run on their own. As Neeva puts it, “when you’re building a search engine, evaluation is one of the most important tricky components to get right. Search is a very human need and so you need unbiased human raters to tell you how well you’re doing.”
We built Surge AI to make it easy for every company to run the same search evaluations that Google depends on (and many other data labeling use cases too!).
After discussing Neeva's needs with their search ranking team, we designed, ran, and delivered a series of search evaluations, including personalized search evaluations, vertical-specific evaluations, and side-by-side evaluations comparing Neeva against Google.
Let’s break our process down into three phases — building a team of Neeva search evaluators, generating and labeling data, and final quality checks.
One of Neeva’s goals was to evaluate their search engine on a specific domain: technical programming queries. Not everyone has the domain expertise to understand what makes a good search result for a query about debugging TensorFlow, so we built a custom labeling team of Surgers with software engineering backgrounds.
These custom teams are a key feature of the Surge AI platform, ensuring that only Surgers with the required skills work on a particular project and ensuring that your labeling team can learn the nuances of your tasks as they stay with you over time.
While most data labeling tasks involve pre-existing data that needs to be labeled, our Search customers often need datasets created from scratch. In these cases, Surgers must gather and label data, resulting in a highly-customized, one-of-a-kind dataset.
For Neeva’s Search Evaluation project, we asked Surgers to do the following:
Pick a programming query that they had recently searched for (e.g., “python split string into characters”)
Explain the query intent
Search for the query on Google
Rate the Google search result page on a 5-point Likert Scale
Explain their rating
Search for the query on Neeva
Rate the Neeva search result page on the same 5-point Likert scale
Explain their rating
Compare Google vs. Neeva on a 5-point Likert scale
Explain their rating
As part of this process, we created a series of quality controls to ensure that our search engine raters were performing well. These quality controls included custom search rating examinations, where Surgers rated a series of <query, search result> pairs. We measured their responses, and read through the explanations of their ratings to ensure that their judgments were thoughtful and sound.
For example, Neeva learned that when they outperform Google, 80% of the time it was because of Neeva’s search widgets and answer features. When Google beats Neeva, it’s because of improvements on long tail queries.
Insights like these allow Neeva to better understand where they are succeeding, and where they need to focus their efforts next. These search evaluations also inspired a range of additional evaluation projects that Neeva and Surge are now partnering on to uncover additional insights.