ML School: Neural Networks

‍Welcome to ML School, where we break down key ML concepts and explain their importance. Let's get started with a fundamental building block of machine learning.

Interested in $50 of free labels? Fill out our 30-second form and we'll get you started today!

What are neural networks?

Neural networks process information in a similar way to the human brain, using a series of layers of artificial neurons. The first layer, called the input layer, is similar to a human’s eyes and ears: it takes in raw data (like an image of a traffic sign) broken into a numerical vector of features (like the RGB values of all the image’s pixels). Then that data flows through a series of hidden layers that find combinations of features that are useful to make a prediction (for example, this is a stop sign, not a yield sign), which is returned through the model’s final, or output layer. In biological neural networks like the brain, a neuron only fires if enough of the neurons connected to it fire. A similar process occurs as data flows through each hidden layer of an artificial neural network:

First each artificial neuron receives a vector of inputs from all the neurons in the layer behind it. These inputs are fed into a linear equation that multiplies each input by a weight reflecting its importance, then adds a constant called the bias that represents how easily that particular neuron should fire.
Second, the neuron uses a nonlinear activation function to decide whether to “fire” or pass that value along to the next layer of neurons. The simplest activation function is to just return 1 if the answer from step 1 is positive and 0 if it’s negative, while other popular choices include the sigmoid, tanh, and softmax functions. Without these activation functions, neural networks would basically just be a complicated form of linear regression, since each layer would be a linear combination of linear equations.

This process is called feedforward, because each layer feeds input to the layer in front of it until the output layer spits out a final prediction. But how does the neural network learn what values for each neuron’s weights and biases will make those predictions accurate?

If you had to take a practice test for a subject you barely understood, you might start by guessing the answers, checking them against the answer key, then gradually changing your approach on the problems where you were most mistaken. Similarly, training a neural network starts by making an initial prediction based on randomly assigned weights, then fine-tuning their values using two functions:

An error function that expresses the model’s overall error as a function of the weights at each neuron. The error gradients are the first-order derivatives of that error function over each weight.
An optimization function to minimize the error gradients, i.e. find the value for each weight that makes the model as accurate as possible. Stochastic gradient descent, an especially popular choice, uses a calculus technique called back-propagation to calculate error gradients for the nodes in the hidden layers.

Why are neural networks important?

Feed-forward neural networks are still widely used, especially for pattern recognition tasks, and were the starting point for developing more specialized neural networks, including:

Recurrent neural networks (RNNs), which use loops between hidden layers to analyze sequential data like speech or text
Convolutional neural networks (CNNs), where each hidden layer detects progressively higher-level features for image processing and object recognition
Long short-term memory networks, a type of RNN that’s not as biased towards recent information

Stay tuned for more ML School lessons!

Surge AI is a data labeling workforce and platform that provides world-class data to top AI companies and researchers. We're built from the ground up to tackle the extraordinary challenges of natural language understanding — with an elite data labeling workforce, stunning quality, rich labeling tools, and modern APIs. Want to improve your model with context-sensitive data and domain-expert labelers? Schedule a demo with our team today!

Bradley Webb

Bradley runs Surge AI's Product and Growth teams. He previously led Integrity and Data Operations teams at Facebook, and graduated from Dartmouth.

Data Labeling 2.0 for Rich, Creative AI

Superintelligent AI, meet your human teachers. Our data labeling platform is designed from the ground up to train the next generation of AI — whether it’s systems that can code in Python, summarize poetry, or detect the subtleties of toxic speech. Use our powerful data labeling workforce and tools to build the rich, human-powered datasets you need today.