No TensorFlow. No PyTorch. Just you, NumPy, and 20-ish lines of code.

We’re going straight to the core: how a neural network actually learns — and we’ll teach it the classic XOR problem.

The Problem: XOR

We want this network to learn the XOR rule:

0 XOR 0 = 0  
0 XOR 1 = 1  
1 XOR 0 = 1  
1 XOR 1 = 0

If A or B are equal to 1, then the output is equal to 1… unless they are both equal to 1, in which case the output is 0.

It’s a simple pattern… that isn’t linearly separable. A single-layer perceptron fails here. But with one hidden layer, it works.

Step 1: Setup and architecture

Let’s define our data and our tiny network.

import numpy as np

# XOR input and labels
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

# Define network architecture
input_size = 2
hidden_size = 4
output_size = 1

We’ve got:

2 input features (x1, x2)
4 neurons in the hidden layer
1 output (for binary classification)

Step 2: Initialize weights

Random weights, zero biases. Simple and effective.

np.random.seed(1)
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))

We’ll learn these weights as we train.

Step 3: Activation functions

We’ll use sigmoid for both layers — good enough for this toy example.

def sigmoid(z): return 1 / (1 + np.exp(-z))
def sigmoid_deriv(a): return a * (1 - a)

sigmoid_deriv is the derivative — it tells us how much to adjust during backprop.

Step 4: Train it

Here’s the full training loop. Forward pass, backprop, and gradient descent.

learning_rate = 0.1
epochs = 1000

for epoch in range(epochs):
    # Forward pass
    A1 = sigmoid(X @ W1 + b1)      # hidden layer
    A2 = sigmoid(A1 @ W2 + b2)     # output layer

    # Backpropagation (compute gradients)
    dA2 = (A2 - y) * sigmoid_deriv(A2)
    dA1 = dA2 @ W2.T * sigmoid_deriv(A1)

    # Gradient descent (update weights and biases)
    W2 -= learning_rate * A1.T @ dA2
    b2 -= learning_rate * np.sum(dA2, axis=0, keepdims=True)
    W1 -= learning_rate * X.T @ dA1
    b1 -= learning_rate * np.sum(dA1, axis=0, keepdims=True)

This is the heart of every neural net:

Forward pass: make a guess
Backward pass: see how wrong you were
Update: adjust weights to do better next time

Step 5: Make predictions

Let’s see if it learned XOR.

preds = sigmoid(sigmoid(X @ W1 + b1) @ W2 + b2) > 0.5

print("Predictions:\n", preds.astype(int))

Output:

[[0]
 [1]
 [1]
 [0]]

It works!

Where to go from here

You just built a functioning neural network from scratch.

Here’s what you can try next:

Replace sigmoid with ReLU in the hidden layer
Add a second hidden layer
Swap out the loss function for cross-entropy
Wrap this into a class and build your own mini framework

Final words

Learning how this stuff works under the hood is powerful.

You’ll never look at TensorFlow or PyTorch the same again.

No magic.
Just math.
Just code.

Here’s a video version of this tutorial:

Build a Neural Network From Scratch – in Less Than 5 minutes

The Problem: XOR

Step 1: Setup and architecture

Step 2: Initialize weights

Step 3: Activation functions

Step 4: Train it

Step 5: Make predictions

Where to go from here

Final words

Comments

Leave a comment Cancel reply

More posts

How I Built a PDF-to-Excel App with FastAPI and Gemini

Berkson’s Paradox: Why Your Data Might Be Lying to You

Build Your First AI Agent

Query2doc: Improve your RAG by expanding Queries

Build a Neural Network From Scratch – in Less Than 5 minutes

The Problem: XOR

Step 1: Setup and architecture

Step 2: Initialize weights

Step 3: Activation functions

Step 4: Train it

Step 5: Make predictions

Where to go from here

Final words

Share this:

Comments

Leave a comment Cancel reply

More posts

How I Built a PDF-to-Excel App with FastAPI and Gemini

Berkson’s Paradox: Why Your Data Might Be Lying to You

Build Your First AI Agent

Query2doc: Improve your RAG by expanding Queries