Build a Neural Network From Scratch – in Less Than 5 minutes

No TensorFlow. No PyTorch. Just you, NumPy, and 20-ish lines of code.

We’re going straight to the core: how a neural network actually learns — and we’ll teach it the classic XOR problem.

The Problem: XOR

We want this network to learn the XOR rule:

0 XOR 0 = 0  
0 XOR 1 = 1  
1 XOR 0 = 1  
1 XOR 1 = 0

If A or B are equal to 1, then the output is equal to 1… unless they are both equal to 1, in which case the output is 0.

It’s a simple pattern… that isn’t linearly separable. A single-layer perceptron fails here. But with one hidden layer, it works.

Step 1: Setup and architecture

Let’s define our data and our tiny network.

import numpy as np

# XOR input and labels
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

# Define network architecture
input_size = 2
hidden_size = 4
output_size = 1

We’ve got:

  • 2 input features (x1, x2)
  • 4 neurons in the hidden layer
  • 1 output (for binary classification)

Step 2: Initialize weights

Random weights, zero biases. Simple and effective.

np.random.seed(1)
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))

We’ll learn these weights as we train.

Step 3: Activation functions

We’ll use sigmoid for both layers — good enough for this toy example.

def sigmoid(z): return 1 / (1 + np.exp(-z))
def sigmoid_deriv(a): return a * (1 - a)

sigmoid_deriv is the derivative — it tells us how much to adjust during backprop.

Step 4: Train it

Here’s the full training loop. Forward pass, backprop, and gradient descent.

learning_rate = 0.1
epochs = 1000

for epoch in range(epochs):
    # Forward pass
    A1 = sigmoid(X @ W1 + b1)      # hidden layer
    A2 = sigmoid(A1 @ W2 + b2)     # output layer

    # Backpropagation (compute gradients)
    dA2 = (A2 - y) * sigmoid_deriv(A2)
    dA1 = dA2 @ W2.T * sigmoid_deriv(A1)

    # Gradient descent (update weights and biases)
    W2 -= learning_rate * A1.T @ dA2
    b2 -= learning_rate * np.sum(dA2, axis=0, keepdims=True)
    W1 -= learning_rate * X.T @ dA1
    b1 -= learning_rate * np.sum(dA1, axis=0, keepdims=True)

This is the heart of every neural net:

  • Forward pass: make a guess
  • Backward pass: see how wrong you were
  • Update: adjust weights to do better next time

Step 5: Make predictions

Let’s see if it learned XOR.

preds = sigmoid(sigmoid(X @ W1 + b1) @ W2 + b2) > 0.5

print("Predictions:\n", preds.astype(int))

Output:

[[0]
 [1]
 [1]
 [0]]

It works!

Where to go from here

You just built a functioning neural network from scratch.

Here’s what you can try next:

  • Replace sigmoid with ReLU in the hidden layer
  • Add a second hidden layer
  • Swap out the loss function for cross-entropy
  • Wrap this into a class and build your own mini framework

Final words

Learning how this stuff works under the hood is powerful.

You’ll never look at TensorFlow or PyTorch the same again.

No magic.
Just math.
Just code.

Here’s a video version of this tutorial:

Comments

Leave a comment