No TensorFlow. No PyTorch. Just you, NumPy, and 20-ish lines of code.
We’re going straight to the core: how a neural network actually learns — and we’ll teach it the classic XOR problem.
The Problem: XOR
We want this network to learn the XOR rule:
0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0
If A or B are equal to 1, then the output is equal to 1… unless they are both equal to 1, in which case the output is 0.
It’s a simple pattern… that isn’t linearly separable. A single-layer perceptron fails here. But with one hidden layer, it works.
Step 1: Setup and architecture
Let’s define our data and our tiny network.
import numpy as np
# XOR input and labels
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])
# Define network architecture
input_size = 2
hidden_size = 4
output_size = 1
We’ve got:
- 2 input features (x1, x2)
- 4 neurons in the hidden layer
- 1 output (for binary classification)
Step 2: Initialize weights
Random weights, zero biases. Simple and effective.
np.random.seed(1)
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))
We’ll learn these weights as we train.
Step 3: Activation functions
We’ll use sigmoid for both layers — good enough for this toy example.
def sigmoid(z): return 1 / (1 + np.exp(-z))
def sigmoid_deriv(a): return a * (1 - a)
sigmoid_deriv is the derivative — it tells us how much to adjust during backprop.
Step 4: Train it
Here’s the full training loop. Forward pass, backprop, and gradient descent.
learning_rate = 0.1
epochs = 1000
for epoch in range(epochs):
# Forward pass
A1 = sigmoid(X @ W1 + b1) # hidden layer
A2 = sigmoid(A1 @ W2 + b2) # output layer
# Backpropagation (compute gradients)
dA2 = (A2 - y) * sigmoid_deriv(A2)
dA1 = dA2 @ W2.T * sigmoid_deriv(A1)
# Gradient descent (update weights and biases)
W2 -= learning_rate * A1.T @ dA2
b2 -= learning_rate * np.sum(dA2, axis=0, keepdims=True)
W1 -= learning_rate * X.T @ dA1
b1 -= learning_rate * np.sum(dA1, axis=0, keepdims=True)
This is the heart of every neural net:
- Forward pass: make a guess
- Backward pass: see how wrong you were
- Update: adjust weights to do better next time
Step 5: Make predictions
Let’s see if it learned XOR.
preds = sigmoid(sigmoid(X @ W1 + b1) @ W2 + b2) > 0.5
print("Predictions:\n", preds.astype(int))
Output:
[[0]
[1]
[1]
[0]]
It works!
Where to go from here
You just built a functioning neural network from scratch.
Here’s what you can try next:
- Replace sigmoid with ReLU in the hidden layer
- Add a second hidden layer
- Swap out the loss function for cross-entropy
- Wrap this into a class and build your own mini framework
Final words
Learning how this stuff works under the hood is powerful.
You’ll never look at TensorFlow or PyTorch the same again.
No magic.
Just math.
Just code.
Here’s a video version of this tutorial:

Leave a comment