Using Surge AI with a Hybrid Cloud Model
There are two ways to upload your dataset into the Surge platform. The most popular option is uploading data directly using our web UI, API, or Python SDK. This is the easiest and most convenient way to have data annotated, but for organizations that are unable to upload data for security or compliance reasons, we also offer a hybrid cloud option with Amazon Web Services S3. With the hybrid cloud model, you can take advantage of our labeling tools, workforce, and quality controls while still hosting your own dataset.
There are 3 steps you need to take to set up a hybrid cloud project on Surge, each described below: store your data in JSON in AWS S3, enable CORs, and upload a CSV with URLs. If you have any questions or need help setting up your hybrid cloud project, you can reach us at security@surgehq.ai.
Store your data in an S3 Bucket
Each row of data that needs to be annotated should be in its own JSON object. The object should contain the data in an unnested key: value format. For example, if you are annotating company websites and had two rows of data, your JSON objects would look like this:
Enable CORS for Surge
In order to make sure the data loads properly when from the Surge site, CORS (Cross-origin resource sharing) needs to be configured properly on the S3 bucket containing the data. Navigate to your bucket in the S3 Console, click on the permissions tab, and then scroll all the way to the bottom to see the Cross-origin resource sharing (CORS) policy. Add https://app.surgehq.ai to the allowed origins so approved workers can access your data. A sample configuration is below:
Upload the URLs to the Surge Platform
The final step is to upload a CSV file with the header “json_url” containing the JSON object urls for each of the rows you want labeled. Using the same data as the example above, we would have a CSV files with two rows, one for each of the JSON files containing the data that is to be annotated:
If you need additional security, you can also create and upload signed URLs using AWS. When you upload your data, you should see a notification indicating that the Hybrid cloud workflow is enabled.
If you are creating projects programmatically, you can also upload the URLs using the Python SDK.
Set up your labeling template
Once you have uploaded your URLs, you can set up your labeling interface on the Surge platform. The project data isn’t pulled from your server outside of the labeling interface, so it isn’t possible to see what data keys are available, but you can still use the same handlebars syntax to inject data into your template. Continuing the example above, if we uploaded JSON files with “website_url” as a key, then we would use {{website_url}} to dynamically insert the value anywhere in the labeling template. Here is how it would look in the template editor:
And then this is how it would look to the worker annotating the data (you can see this view yourself using the “Preview Task” link):
Once your URLs are uploaded and the template is created you are ready to send the project to our workforce and get results! If you are having any issues with your project or have any additional questions, you can reach us at security@surgehq.ai.
—
Surge AI is a data labeling platform that provides world-class data to top AI companies and researchers. We're built from the ground up to tackle the extraordinary challenges of natural language understanding — with an elite data labeling workforce, stunning quality, rich labeling tools, and modern APIs. Want to improve your model with context-sensitive data and domain-expert labelers? Schedule a demo with our team today!
The average number of ads on a Google Search recipe? 8.7
Data Labeling 2.0 for Rich, Creative AI
Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.