End to endPart 9 · 65 min · advanced

Mastery

Assemble the complete neural network class, data pipeline, training loop, evaluation, and dashboard.

Open in Colab Download notebook Full lab fallback

Kernel: ColdSections: 0/12

Neural Network Fundamentals

Part 9: Full Implementation - Mastery

The Brain's Decision Committee - Chapter 9

The Complete Journey

We've come a long way! From understanding matrices to building neurons, from single perceptrons to multi-layer networks, from training basics to handling deep learning challenges - now it's time to bring everything together.

"The complete, trained committee works in harmony. All the lessons learned, all the challenges overcome, unified into one elegant solution."

What You'll Learn in Part 9

By the end of this notebook, you will have:

A Complete Neural Network Class - All concepts unified in clean, documented code
A Full Data Pipeline - Train/validation/test splits with proper handling
A Robust Training Pipeline - With validation monitoring and early stopping
Complete Evaluation - All metrics, confusion matrix, and saliency visualization
Interactive Dashboard - Experiment with hyperparameters in real-time
The Final V/H Classifier - Our mission accomplished!

Prerequisites

This is the culmination notebook - you should have completed:

Part 0-1: Matrices and fundamentals
Part 2: Single neurons
Part 3: Activation functions
Part 4: The Perceptron
Part 5: Training
Part 6: Evaluation
Part 7: Hidden layers
Part 8: Deep learning challenges

Concepts We're Unifying

Part	Concept	How We'll Use It
1	Matrices, dot product	Data representation, weight operations
2	Neuron anatomy	Building blocks of our network
3	Activation functions	ReLU for hidden, sigmoid for output
4	Forward pass	Making predictions
5	Loss, gradients, backprop	Learning from mistakes
6	Metrics, saliency	Evaluating and understanding
7	Hidden layers	Multiple specialists
8	Overfitting prevention	Early stopping, proper sizing

Setup: Import Dependencies

cell 003

# =============================================================================# PART 9: FULL IMPLEMENTATION - SETUP# ============================================================================= import numpy as npimport matplotlib.pyplot as pltfrom IPython.display import display, clear_output # Try to import ipywidgets for interactive featurestry:    import ipywidgets as widgets    WIDGETS_AVAILABLE = Trueexcept ImportError:    WIDGETS_AVAILABLE = False    print("Note: ipywidgets not installed. Interactive features will be limited.") # Set up matplotlib stylestyle_options = ['seaborn-v0_8-whitegrid', 'seaborn-whitegrid', 'ggplot', 'default']for style in style_options:    try:        plt.style.use(style)        break    except OSError:        continue plt.rcParams['figure.figsize'] = [10, 6]plt.rcParams['font.size'] = 12 print("="*70)print("PART 9: FULL IMPLEMENTATION")print("The Complete V/H Line Classifier")print("="*70)

9.1 The Complete Neural Network Class

This is the unified implementation incorporating everything we've learned:

Feature	Part Learned	Implementation
Activation functions	Part 3	ReLU for hidden, Sigmoid for output
Forward propagation	Parts 4, 7	Matrix operations through layers
Loss function	Part 5	Binary Cross-Entropy
Backpropagation	Parts 5, 7	Chain rule through all layers
Validation monitoring	Part 8	Track train/val metrics
Early stopping	Part 8	Stop when val loss increases

Why This Architecture?

Input (9) → Hidden (8, ReLU) → Output (1, Sigmoid)

Layer	Size	Activation	Why?
Input	9	None	One neuron per pixel (3×3 = 9)
Hidden	8	ReLU	Enough specialists without overfitting; ReLU prevents vanishing gradients
Output	1	Sigmoid	Binary classification needs probability in (0,1)

Why Two Different Initializations?

We use different initialization strategies for different activations:

Initialization	Formula	Used For	Why?
He	$w \sim N (0, 2 / n_{i n})$	ReLU layers	ReLU "kills" half the neurons (negative z), so we need 2× variance
Xavier	$w \sim N (0, 1 / n_{i n})$	Sigmoid/Tanh	These are symmetric around 0, so standard variance works

Using the wrong initialization can cause:

Too small: Signals shrink through layers (vanishing)
Too large: Signals explode through layers (exploding)

cell 005full lab recommended

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

# =============================================================================# THE COMPLETE NEURAL NETWORK CLASS# ============================================================================= class NeuralNetwork:    """    Complete Neural Network implementation for binary classification.        This class unifies all concepts from Parts 1-8:    - Matrix operations (Part 1)    - Neuron anatomy (Part 2)    - Activation functions (Part 3)    - Forward propagation (Part 4)    - Training with backprop (Part 5)    - Evaluation metrics (Part 6)    - Hidden layers (Part 7)    - Overfitting prevention (Part 8)        Architecture: Input → Hidden (ReLU) → Output (Sigmoid)    """        # =========================================================================    # ACTIVATION FUNCTIONS (Part 3)    # =========================================================================        @staticmethod    def sigmoid(z):        """Sigmoid: maps to (0, 1) - used for output layer (Part 3.3)"""        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))        @staticmethod    def sigmoid_derivative(z):        """Derivative of sigmoid: σ(z) * (1 - σ(z)) (Part 3.3.1)"""        s = NeuralNetwork.sigmoid(z)        return s * (1 - s)        @staticmethod    def relu(z):        """ReLU: max(0, z) - used for hidden layers (Part 3.5)"""        return np.maximum(0, z)        @staticmethod    def relu_derivative(z):        """Derivative of ReLU: 1 if z > 0, else 0 (Part 3.5)"""        return (z > 0).astype(float)        # =========================================================================    # INITIALIZATION (Part 7 - Xavier/He initialization)    # =========================================================================        def __init__(self, n_inputs, n_hidden, n_outputs=1, seed=None):        """        Initialize the neural network.                Parameters:            n_inputs: Number of input features (9 for 3x3 images)            n_hidden: Number of hidden neurons (the "specialists")            n_outputs: Number of outputs (1 for binary classification)            seed: Random seed for reproducibility        """        if seed is not None:            np.random.seed(seed)                self.n_inputs = n_inputs        self.n_hidden = n_hidden        self.n_outputs = n_outputs                # He initialization for ReLU layers (Part 8 - proper initialization)        self.W1 = np.random.randn(n_hidden, n_inputs) * np.sqrt(2.0 / n_inputs)        self.b1 = np.zeros(n_hidden)                # Xavier initialization for sigmoid output        self.W2 = np.random.randn(n_outputs, n_hidden) * np.sqrt(1.0 / n_hidden)        self.b2 = np.zeros(n_outputs)                # Cache for forward pass (needed for backprop)        self.cache = {}                # Training history        self.train_loss_history = []        self.val_loss_history = []        self.train_acc_history = []        self.val_acc_history = []                # Best model weights (for early stopping)        self.best_weights = None        self.best_val_loss = float('inf')        self.best_epoch = 0        # =========================================================================    # FORWARD PROPAGATION (Parts 4, 7)    # =========================================================================        def forward(self, X):        """        Forward pass: Input → Hidden (ReLU) → Output (Sigmoid)                The "Committee Meeting" - each specialist examines the evidence,        then the final decision maker combines their opinions.        """        # Ensure X is 2D        X = np.atleast_2d(X)                # Layer 1: Input → Hidden (with ReLU - Part 3.5)        self.cache['X'] = X        self.cache['Z1'] = np.dot(X, self.W1.T) + self.b1  # (batch, n_hidden)        self.cache['A1'] = self.relu(self.cache['Z1'])      # ReLU activation                # Layer 2: Hidden → Output (with Sigmoid - Part 3.3)        self.cache['Z2'] = np.dot(self.cache['A1'], self.W2.T) + self.b2  # (batch, n_outputs)        self.cache['A2'] = self.sigmoid(self.cache['Z2'])                  # Sigmoid for probability                return self.cache['A2']        def predict(self, X):        """Make binary predictions (0 or 1)."""        probs = self.forward(X)        return (probs >= 0.5).astype(int).flatten()        # =========================================================================    # LOSS FUNCTION (Part 5.3 - Binary Cross-Entropy)    # =========================================================================        def compute_loss(self, y_true, y_pred):        """        Binary Cross-Entropy loss (Part 5.3)                Measures "surprise" - how unexpected the predictions are.        """        epsilon = 1e-15  # Prevent log(0)        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)        y_true = y_true.reshape(-1, 1)        loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))        return loss        # =========================================================================    # BACKPROPAGATION (Parts 5.6, 5.7, 7.4)    # =========================================================================        def backward(self, y_true, learning_rate):        """        Backpropagation: Compute gradients and update weights.                The "Blame Assignment" - tracing errors back through the committee.        """        m = len(y_true)        y_true = y_true.reshape(-1, 1)                # Output layer gradients (Part 5.6)        dZ2 = self.cache['A2'] - y_true  # (batch, n_outputs)        dW2 = np.dot(dZ2.T, self.cache['A1']) / m        db2 = np.mean(dZ2, axis=0)                # Hidden layer gradients (Part 7.4 - chain rule)        dA1 = np.dot(dZ2, self.W2)        dZ1 = dA1 * self.relu_derivative(self.cache['Z1'])        dW1 = np.dot(dZ1.T, self.cache['X']) / m        db1 = np.mean(dZ1, axis=0)                # Update weights (Gradient Descent - Part 5.4)        self.W2 -= learning_rate * dW2        self.b2 -= learning_rate * db2        self.W1 -= learning_rate * dW1        self.b1 -= learning_rate * db1        # =========================================================================    # EVALUATION (Part 6)    # =========================================================================        def evaluate(self, X, y):        """Compute loss and accuracy on a dataset."""        y_pred = self.forward(X)        loss = self.compute_loss(y, y_pred)        predictions = (y_pred >= 0.5).astype(int).flatten()        accuracy = np.mean(predictions == y)        return loss, accuracy        def confusion_matrix(self, X, y):        """Compute confusion matrix (Part 6.3)."""        predictions = self.predict(X)        TP = np.sum((predictions == 1) & (y == 1))        TN = np.sum((predictions == 0) & (y == 0))        FP = np.sum((predictions == 1) & (y == 0))        FN = np.sum((predictions == 0) & (y == 1))        return {'TP': TP, 'TN': TN, 'FP': FP, 'FN': FN}        # =========================================================================    # TRAINING WITH EARLY STOPPING (Parts 5.8, 8.2)    # =========================================================================        def train(self, X_train, y_train, X_val=None, y_val=None,               learning_rate=0.1, epochs=100, early_stopping_patience=10,              verbose=True):        """        Train the neural network with optional early stopping.                Parameters:            X_train, y_train: Training data            X_val, y_val: Validation data (for early stopping)            learning_rate: Step size for gradient descent (Part 5.5)            epochs: Maximum training iterations            early_stopping_patience: Stop if val loss doesn't improve (Part 8.2)            verbose: Print progress        """        self.train_loss_history = []        self.val_loss_history = []        self.train_acc_history = []        self.val_acc_history = []                patience_counter = 0                for epoch in range(epochs):            # Forward pass            self.forward(X_train)                        # Backward pass (learning)            self.backward(y_train, learning_rate)                        # Evaluate training            train_loss, train_acc = self.evaluate(X_train, y_train)            self.train_loss_history.append(train_loss)            self.train_acc_history.append(train_acc)                        # Evaluate validation (if provided)            if X_val is not None:                val_loss, val_acc = self.evaluate(X_val, y_val)                self.val_loss_history.append(val_loss)                self.val_acc_history.append(val_acc)                                # Early stopping check (Part 8.2)                if val_loss < self.best_val_loss:                    self.best_val_loss = val_loss                    self.best_epoch = epoch                    self.best_weights = {                        'W1': self.W1.copy(), 'b1': self.b1.copy(),                        'W2': self.W2.copy(), 'b2': self.b2.copy()                    }                    patience_counter = 0                else:                    patience_counter += 1                                if patience_counter >= early_stopping_patience:                    if verbose:                        print(f"\n  Early stopping at epoch {epoch+1}!")                        print(f"  Best epoch was {self.best_epoch+1} with val_loss={self.best_val_loss:.4f}")                    self._restore_best_weights()                    break                        # Progress output            if verbose and (epoch + 1) % 20 == 0:                msg = f"  Epoch {epoch+1:3d}: Train Loss={train_loss:.4f}, Train Acc={train_acc*100:.1f}%"                if X_val is not None:                    msg += f", Val Loss={val_loss:.4f}, Val Acc={val_acc*100:.1f}%"                print(msg)                if verbose:            final_acc = self.train_acc_history[-1]            print(f"\nTraining complete! Final train accuracy: {final_acc*100:.1f}%")            if X_val is not None:                print(f"Best validation loss: {self.best_val_loss:.4f} at epoch {self.best_epoch+1}")                return self        def _restore_best_weights(self):        """Restore weights from best epoch."""        if self.best_weights is not None:            self.W1 = self.best_weights['W1']            self.b1 = self.best_weights['b1']            self.W2 = self.best_weights['W2']            self.b2 = self.best_weights['b2'] print("NeuralNetwork class defined!")print("This combines ALL concepts from Parts 1-8.")

# =============================================================================
# THE COMPLETE NEURAL NETWORK CLASS
# =============================================================================

class NeuralNetwork:
    """
    Complete Neural Network implementation for binary classification.
    
    This class unifies all concepts from Parts 1-8:
    - Matrix operations (Part 1)
    - Neuron anatomy (Part 2)
    - Activation functions (Part 3)
    - Forward propagation (Part 4)
    - Training with backprop (Part 5)
    - Evaluation metrics (Part 6)
    - Hidden layers (Part 7)
    - Overfitting prevention (Part 8)
    
    Architecture: Input → Hidden (ReLU) → Output (Sigmoid)
    """
    
    # =========================================================================
    # ACTIVATION FUNCTIONS (Part 3)
    # =========================================================================
    
    @staticmethod
    def sigmoid(z):
        """Sigmoid: maps to (0, 1) - used for output layer (Part 3.3)"""
        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))
    
    @staticmethod
    def sigmoid_derivative(z):
        """Derivative of sigmoid: σ(z) * (1 - σ(z)) (Part 3.3.1)"""
        s = NeuralNetwork.sigmoid(z)
        return s * (1 - s)
    
    @staticmethod
    def relu(z):
        """ReLU: max(0, z) - used for hidden layers (Part 3.5)"""
        return np.maximum(0, z)
    
    @staticmethod
    def relu_derivative(z):
        """Derivative of ReLU: 1 if z > 0, else 0 (Part 3.5)"""
        return (z > 0).astype(float)
    
    # =========================================================================
    # INITIALIZATION (Part 7 - Xavier/He initialization)
    # =========================================================================
    
    def __init__(self, n_inputs, n_hidden, n_outputs=1, seed=None):
        """
        Initialize the neural network.
        
        Parameters:
            n_inputs: Number of input features (9 for 3x3 images)
            n_hidden: Number of hidden neurons (the "specialists")
            n_outputs: Number of outputs (1 for binary classification)
            seed: Random seed for reproducibility
        """
        if seed is not None:
            np.random.seed(seed)
        
        self.n_inputs = n_inputs
        self.n_hidden = n_hidden
        self.n_outputs = n_outputs
        
        # He initialization for ReLU layers (Part 8 - proper initialization)
        self.W1 = np.random.randn(n_hidden, n_inputs) * np.sqrt(2.0 / n_inputs)
        self.b1 = np.zeros(n_hidden)
        
        # Xavier initialization for sigmoid output
        self.W2 = np.random.randn(n_outputs, n_hidden) * np.sqrt(1.0 / n_hidden)
        self.b2 = np.zeros(n_outputs)
        
        # Cache for forward pass (needed for backprop)
        self.cache = {}
        
        # Training history
        self.train_loss_history = []
        self.val_loss_history = []
        self.train_acc_history = []
        self.val_acc_history = []
        
        # Best model weights (for early stopping)
        self.best_weights = None
        self.best_val_loss = float('inf')
        self.best_epoch = 0
    
    # =========================================================================
    # FORWARD PROPAGATION (Parts 4, 7)
    # =========================================================================
    
    def forward(self, X):
        """
        Forward pass: Input → Hidden (ReLU) → Output (Sigmoid)
        
        The "Committee Meeting" - each specialist examines the evidence,
        then the final decision maker combines their opinions.
        """
        # Ensure X is 2D
        X = np.atleast_2d(X)
        
        # Layer 1: Input → Hidden (with ReLU - Part 3.5)
        self.cache['X'] = X
        self.cache['Z1'] = np.dot(X, self.W1.T) + self.b1  # (batch, n_hidden)
        self.cache['A1'] = self.relu(self.cache['Z1'])      # ReLU activation
        
        # Layer 2: Hidden → Output (with Sigmoid - Part 3.3)
        self.cache['Z2'] = np.dot(self.cache['A1'], self.W2.T) + self.b2  # (batch, n_outputs)
        self.cache['A2'] = self.sigmoid(self.cache['Z2'])                  # Sigmoid for probability
        
        return self.cache['A2']
    
    def predict(self, X):
        """Make binary predictions (0 or 1)."""
        probs = self.forward(X)
        return (probs >= 0.5).astype(int).flatten()
    
    # =========================================================================
    # LOSS FUNCTION (Part 5.3 - Binary Cross-Entropy)
    # =========================================================================
    
    def compute_loss(self, y_true, y_pred):
        """
        Binary Cross-Entropy loss (Part 5.3)
        
        Measures "surprise" - how unexpected the predictions are.
        """
        epsilon = 1e-15  # Prevent log(0)
        y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
        y_true = y_true.reshape(-1, 1)
        loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
        return loss
    
    # =========================================================================
    # BACKPROPAGATION (Parts 5.6, 5.7, 7.4)
    # =========================================================================
    
    def backward(self, y_true, learning_rate):
        """
        Backpropagation: Compute gradients and update weights.
        
        The "Blame Assignment" - tracing errors back through the committee.
        """
        m = len(y_true)
        y_true = y_true.reshape(-1, 1)
        
        # Output layer gradients (Part 5.6)
        dZ2 = self.cache['A2'] - y_true  # (batch, n_outputs)
        dW2 = np.dot(dZ2.T, self.cache['A1']) / m
        db2 = np.mean(dZ2, axis=0)
        
        # Hidden layer gradients (Part 7.4 - chain rule)
        dA1 = np.dot(dZ2, self.W2)
        dZ1 = dA1 * self.relu_derivative(self.cache['Z1'])
        dW1 = np.dot(dZ1.T, self.cache['X']) / m
        db1 = np.mean(dZ1, axis=0)
        
        # Update weights (Gradient Descent - Part 5.4)
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
    
    # =========================================================================
    # EVALUATION (Part 6)
    # =========================================================================
    
    def evaluate(self, X, y):
        """Compute loss and accuracy on a dataset."""
        y_pred = self.forward(X)
        loss = self.compute_loss(y, y_pred)
        predictions = (y_pred >= 0.5).astype(int).flatten()
        accuracy = np.mean(predictions == y)
        return loss, accuracy
    
    def confusion_matrix(self, X, y):
        """Compute confusion matrix (Part 6.3)."""
        predictions = self.predict(X)
        TP = np.sum((predictions == 1) & (y == 1))
        TN = np.sum((predictions == 0) & (y == 0))
        FP = np.sum((predictions == 1) & (y == 0))
        FN = np.sum((predictions == 0) & (y == 1))
        return {'TP': TP, 'TN': TN, 'FP': FP, 'FN': FN}
    
    # =========================================================================
    # TRAINING WITH EARLY STOPPING (Parts 5.8, 8.2)
    # =========================================================================
    
    def train(self, X_train, y_train, X_val=None, y_val=None, 
              learning_rate=0.1, epochs=100, early_stopping_patience=10,
              verbose=True):
        """
        Train the neural network with optional early stopping.
        
        Parameters:
            X_train, y_train: Training data
            X_val, y_val: Validation data (for early stopping)
            learning_rate: Step size for gradient descent (Part 5.5)
            epochs: Maximum training iterations
            early_stopping_patience: Stop if val loss doesn't improve (Part 8.2)
            verbose: Print progress
        """
        self.train_loss_history = []
        self.val_loss_history = []
        self.train_acc_history = []
        self.val_acc_history = []
        
        patience_counter = 0
        
        for epoch in range(epochs):
            # Forward pass
            self.forward(X_train)
            
            # Backward pass (learning)
            self.backward(y_train, learning_rate)
            
            # Evaluate training
            train_loss, train_acc = self.evaluate(X_train, y_train)
            self.train_loss_history.append(train_loss)
            self.train_acc_history.append(train_acc)
            
            # Evaluate validation (if provided)
            if X_val is not None:
                val_loss, val_acc = self.evaluate(X_val, y_val)
                self.val_loss_history.append(val_loss)
                self.val_acc_history.append(val_acc)
                
                # Early stopping check (Part 8.2)
                if val_loss < self.best_val_loss:
                    self.best_val_loss = val_loss
                    self.best_epoch = epoch
                    self.best_weights = {
                        'W1': self.W1.copy(), 'b1': self.b1.copy(),
                        'W2': self.W2.copy(), 'b2': self.b2.copy()
                    }
                    patience_counter = 0
                else:
                    patience_counter += 1
                
                if patience_counter >= early_stopping_patience:
                    if verbose:
                        print(f"\n  Early stopping at epoch {epoch+1}!")
                        print(f"  Best epoch was {self.best_epoch+1} with val_loss={self.best_val_loss:.4f}")
                    self._restore_best_weights()
                    break
            
            # Progress output
            if verbose and (epoch + 1) % 20 == 0:
                msg = f"  Epoch {epoch+1:3d}: Train Loss={train_loss:.4f}, Train Acc={train_acc*100:.1f}%"
                if X_val is not None:
                    msg += f", Val Loss={val_loss:.4f}, Val Acc={val_acc*100:.1f}%"
                print(msg)
        
        if verbose:
            final_acc = self.train_acc_history[-1]
            print(f"\nTraining complete! Final train accuracy: {final_acc*100:.1f}%")
            if X_val is not None:
                print(f"Best validation loss: {self.best_val_loss:.4f} at epoch {self.best_epoch+1}")
        
        return self
    
    def _restore_best_weights(self):
        """Restore weights from best epoch."""
        if self.best_weights is not None:
            self.W1 = self.best_weights['W1']
            self.b1 = self.best_weights['b1']
            self.W2 = self.best_weights['W2']
            self.b2 = self.best_weights['b2']

print("NeuralNetwork class defined!")
print("This combines ALL concepts from Parts 1-8.")

Understanding Key Implementation Details

Why do we use a cache dictionary?

During backpropagation, we need values from the forward pass:

X - the input, needed to compute gradients for W1
Z1 - pre-activation of hidden layer, needed for ReLU derivative
A1 - hidden activations, needed to compute gradients for W2
Z2, A2 - output layer values for computing output gradients

Without caching, we'd have to recompute forward pass during backward pass (wasteful!).

Why save best_weights separately?

Early stopping works by:

Training for many epochs
Saving weights whenever validation loss improves
Restoring the best weights at the end

If we only kept current weights, we'd lose the best model when we continue training past the optimal point.

Why use np.atleast_2d(X)?

This ensures our math works for both:

Single sample: shape (9,) → (1, 9)
Batch of samples: shape (batch, 9) → unchanged

Matrix multiplication requires 2D arrays, so this handles both cases gracefully.

9.2 The Complete Data Pipeline

A proper data pipeline includes:

Step	Purpose	Part Referenced
Data Generation	Create V/H line images	Part 4
Train/Val/Test Split	Separate data for different purposes	Part 6, 8
Shuffling	Prevent order-based patterns	Part 5

Why Three Splits?

Split	Purpose	Used For
Training (60%)	Learn patterns	Backpropagation
Validation (20%)	Tune hyperparameters	Early stopping, model selection
Test (20%)	Final evaluation	Report true performance

Key Rule: NEVER use test data during training or tuning!

Why These Specific Percentages?

60/20/20 is a common starting point, but it depends on your data:

Dataset Size	Recommended Split	Reasoning
Small (<500)	60/20/20	Need enough validation/test for reliable estimates
Medium (500-10K)	70/15/15	Can afford more training data
Large (>10K)	80/10/10	Even 10% gives hundreds of test samples

For our 300 samples:

180 training (60%) → Enough to learn V/H patterns
60 validation (20%) → Enough to detect overfitting
60 test (20%) → Enough for reliable accuracy estimate

Why Shuffle the Data?

Without shuffling, disaster can strike!

Imagine our data is generated in order:

Samples 1-150:   All VERTICAL
Samples 151-300: All HORIZONTAL

If we split 60/20/20 without shuffling:

Training (1-180): 150 vertical, 30 horizontal (imbalanced!)
Validation (181-240): 0 vertical, 60 horizontal (all one class!)
Test (241-300): 0 vertical, 60 horizontal (all one class!)

The model would learn wrong patterns and evaluation would be meaningless!

Shuffling ensures each split has a representative mix of both classes.

cell 008

# =============================================================================# THE COMPLETE DATA PIPELINE# ============================================================================= def generate_line_dataset(n_samples=100, noise_level=0.0, seed=None):    """    Generate vertical (1) and horizontal (0) line images.        This is the dataset we've been working with throughout the series.    Our "mission" from Part 0: classify these images correctly!        Parameters:        n_samples: Total number of images to generate        noise_level: Amount of random noise (0.0 = clean, 0.3 = noisy)        seed: Random seed for reproducibility        Returns:        X: Array of flattened 3x3 images, shape (n_samples, 9)        y: Labels (1=vertical, 0=horizontal), shape (n_samples,)    """    if seed is not None:        np.random.seed(seed)        X, y = [], []        for i in range(n_samples):        image = np.zeros((3, 3))                if i < n_samples // 2:            # Vertical line - can be in ANY column            col = np.random.randint(0, 3)            image[:, col] = 1            label = 1        else:            # Horizontal line - can be in ANY row            row = np.random.randint(0, 3)            image[row, :] = 1            label = 0                # Add noise if specified        if noise_level > 0:            image = np.clip(image + np.random.randn(3, 3) * noise_level, 0, 1)                X.append(image.flatten())  # Flatten to 1D (Part 2)        y.append(label)        X, y = np.array(X), np.array(y)        # Shuffle (Part 5)    shuffle_idx = np.random.permutation(n_samples)    return X[shuffle_idx], y[shuffle_idx]  def create_train_val_test_split(n_total=300, noise_level=0.1, seed=42):    """    Create proper train/validation/test splits.        Split ratios: 60% train, 20% validation, 20% test    """    np.random.seed(seed)        # Generate all data    X, y = generate_line_dataset(n_total, noise_level=noise_level, seed=seed)        # Calculate split indices    n_train = int(n_total * 0.6)    n_val = int(n_total * 0.2)        # Split    X_train, y_train = X[:n_train], y[:n_train]    X_val, y_val = X[n_train:n_train+n_val], y[n_train:n_train+n_val]    X_test, y_test = X[n_train+n_val:], y[n_train+n_val:]        return (X_train, y_train), (X_val, y_val), (X_test, y_test)  # Create our datasetsprint("="*70)print("CREATING THE COMPLETE DATASET")print("="*70) (X_train, y_train), (X_val, y_val), (X_test, y_test) = create_train_val_test_split(    n_total=300, noise_level=0.15, seed=42) print(f"\nDataset created with 15% noise:")print(f"  Training:   {len(X_train)} samples ({sum(y_train)} vertical, {len(y_train)-sum(y_train)} horizontal)")print(f"  Validation: {len(X_val)} samples ({sum(y_val)} vertical, {len(y_val)-sum(y_val)} horizontal)")print(f"  Test:       {len(X_test)} samples ({sum(y_test)} vertical, {len(y_test)-sum(y_test)} horizontal)")print(f"\nTotal: {len(X_train) + len(X_val) + len(X_test)} samples")

# =============================================================================
# THE COMPLETE DATA PIPELINE
# =============================================================================

def generate_line_dataset(n_samples=100, noise_level=0.0, seed=None):
    """
    Generate vertical (1) and horizontal (0) line images.
    
    This is the dataset we've been working with throughout the series.
    Our "mission" from Part 0: classify these images correctly!
    
    Parameters:
        n_samples: Total number of images to generate
        noise_level: Amount of random noise (0.0 = clean, 0.3 = noisy)
        seed: Random seed for reproducibility
    
    Returns:
        X: Array of flattened 3x3 images, shape (n_samples, 9)
        y: Labels (1=vertical, 0=horizontal), shape (n_samples,)
    """
    if seed is not None:
        np.random.seed(seed)
    
    X, y = [], []
    
    for i in range(n_samples):
        image = np.zeros((3, 3))
        
        if i < n_samples // 2:
            # Vertical line - can be in ANY column
            col = np.random.randint(0, 3)
            image[:, col] = 1
            label = 1
        else:
            # Horizontal line - can be in ANY row
            row = np.random.randint(0, 3)
            image[row, :] = 1
            label = 0
        
        # Add noise if specified
        if noise_level > 0:
            image = np.clip(image + np.random.randn(3, 3) * noise_level, 0, 1)
        
        X.append(image.flatten())  # Flatten to 1D (Part 2)
        y.append(label)
    
    X, y = np.array(X), np.array(y)
    
    # Shuffle (Part 5)
    shuffle_idx = np.random.permutation(n_samples)
    return X[shuffle_idx], y[shuffle_idx]

def create_train_val_test_split(n_total=300, noise_level=0.1, seed=42):
    """
    Create proper train/validation/test splits.
    
    Split ratios: 60% train, 20% validation, 20% test
    """
    np.random.seed(seed)
    
    # Generate all data
    X, y = generate_line_dataset(n_total, noise_level=noise_level, seed=seed)
    
    # Calculate split indices
    n_train = int(n_total * 0.6)
    n_val = int(n_total * 0.2)
    
    # Split
    X_train, y_train = X[:n_train], y[:n_train]
    X_val, y_val = X[n_train:n_train+n_val], y[n_train:n_train+n_val]
    X_test, y_test = X[n_train+n_val:], y[n_train+n_val:]
    
    return (X_train, y_train), (X_val, y_val), (X_test, y_test)

# Create our datasets
print("="*70)
print("CREATING THE COMPLETE DATASET")
print("="*70)

(X_train, y_train), (X_val, y_val), (X_test, y_test) = create_train_val_test_split(
    n_total=300, noise_level=0.15, seed=42
)

print(f"\nDataset created with 15% noise:")
print(f"  Training:   {len(X_train)} samples ({sum(y_train)} vertical, {len(y_train)-sum(y_train)} horizontal)")
print(f"  Validation: {len(X_val)} samples ({sum(y_val)} vertical, {len(y_val)-sum(y_val)} horizontal)")
print(f"  Test:       {len(X_test)} samples ({sum(y_test)} vertical, {len(y_test)-sum(y_test)} horizontal)")
print(f"\nTotal: {len(X_train) + len(X_val) + len(X_test)} samples")

cell 009

# =============================================================================# VISUALIZE SAMPLE IMAGES FROM OUR DATASET# ============================================================================= fig, axes = plt.subplots(2, 5, figsize=(12, 5)) # Show 5 vertical and 5 horizontal examplesv_indices = np.where(y_train == 1)[0][:5]h_indices = np.where(y_train == 0)[0][:5] for i, idx in enumerate(v_indices):    ax = axes[0, i]    ax.imshow(X_train[idx].reshape(3, 3), cmap='Blues', vmin=0, vmax=1)    ax.set_title('VERTICAL', fontsize=10)    ax.axis('off') for i, idx in enumerate(h_indices):    ax = axes[1, i]    ax.imshow(X_train[idx].reshape(3, 3), cmap='Oranges', vmin=0, vmax=1)    ax.set_title('HORIZONTAL', fontsize=10)    ax.axis('off') plt.suptitle('Our Mission: Classify These 3x3 Images\n(With 15% Noise)',              fontsize=14, fontweight='bold')plt.tight_layout()plt.show() print("""OUR MISSION (from Part 0):════════════════════════════════════════════════════════════════════════ Build a neural network that can correctly classify these images as:  • VERTICAL (1) - line goes up-down  • HORIZONTAL (0) - line goes left-right The challenge: Noise makes the patterns harder to detect!The committee must learn to see through the noise.""")

# =============================================================================
# VISUALIZE SAMPLE IMAGES FROM OUR DATASET
# =============================================================================

fig, axes = plt.subplots(2, 5, figsize=(12, 5))

# Show 5 vertical and 5 horizontal examples
v_indices = np.where(y_train == 1)[0][:5]
h_indices = np.where(y_train == 0)[0][:5]

for i, idx in enumerate(v_indices):
    ax = axes[0, i]
    ax.imshow(X_train[idx].reshape(3, 3), cmap='Blues', vmin=0, vmax=1)
    ax.set_title('VERTICAL', fontsize=10)
    ax.axis('off')

for i, idx in enumerate(h_indices):
    ax = axes[1, i]
    ax.imshow(X_train[idx].reshape(3, 3), cmap='Oranges', vmin=0, vmax=1)
    ax.set_title('HORIZONTAL', fontsize=10)
    ax.axis('off')

plt.suptitle('Our Mission: Classify These 3x3 Images\n(With 15% Noise)', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("""
OUR MISSION (from Part 0):
════════════════════════════════════════════════════════════════════════

Build a neural network that can correctly classify these images as:
  • VERTICAL (1) - line goes up-down
  • HORIZONTAL (0) - line goes left-right

The challenge: Noise makes the patterns harder to detect!
The committee must learn to see through the noise.
""")

9.3 Training the Complete Network

Now we train our neural network using everything we've learned:

Setting	Value	Why (Part Reference)
Hidden neurons	8	Enough for patterns, not too many (Part 8 - overfitting)
Learning rate	0.5	Fast but stable (Part 5)
Epochs	200	Enough to learn, with early stopping (Part 8)
Early stopping patience	20	Stop if no improvement for 20 epochs
Activation (hidden)	ReLU	Prevents vanishing gradients (Parts 3, 8)
Activation (output)	Sigmoid	Gives probability (Part 3)

How We Chose These Values

Hidden neurons = 8:

Our data has 9 inputs and 2 classes. Rule of thumb:

Minimum: 2-4 (can represent basic patterns)
Our choice: 8 (room for multiple pattern detectors)
Maximum: ~20 for 180 training samples (avoid overfitting)

Why 8 works: We need neurons to detect "left column", "middle column", "right column" for vertical, plus "top row", "middle row", "bottom row" for horizontal. 6-8 neurons can capture these patterns.

Learning rate = 0.5:

Learning Rate	Behavior
Too low (0.001)	Very slow, may not converge in 200 epochs
Good (0.1 - 1.0)	Learns quickly, stable
Too high (5.0)	Overshoots, unstable, may diverge

For small networks with BCE loss, 0.5 is often a good starting point.

Epochs = 200 with patience = 20:

200 is a maximum "budget" of training steps
Patience of 20 means: "Stop if validation doesn't improve for 20 epochs"
This combination lets us train long enough to converge, but stops early if we're overfitting

Understanding Parameter Count

Total parameters = (input × hidden) + hidden + (hidden × output) + output
                 = (9 × 8) + 8 + (8 × 1) + 1
                 = 72 + 8 + 8 + 1 = 89 parameters

Rule of thumb: You want at least 10× more training samples than parameters.

We have 180 training samples
We have 89 parameters
Ratio: 180/89 ≈ 2× (borderline, which is why we use early stopping!)

cell 011

# =============================================================================# TRAIN THE COMPLETE NETWORK# ============================================================================= print("="*70)print("TRAINING THE NEURAL NETWORK")print("="*70) # Create the networkmodel = NeuralNetwork(    n_inputs=9,      # 3x3 image = 9 pixels    n_hidden=8,      # 8 specialists in our committee    n_outputs=1,     # Binary output (V or H)    seed=42) print(f"\nNetwork Architecture:")print(f"  Input layer:  {model.n_inputs} neurons (one per pixel)")print(f"  Hidden layer: {model.n_hidden} neurons (ReLU activation)")print(f"  Output layer: {model.n_outputs} neuron (Sigmoid activation)")print(f"  Total parameters: {9*8 + 8 + 8*1 + 1} = {9*8 + 8 + 8*1 + 1}") print("\n" + "-"*70)print("Training with early stopping...")print("-"*70) # Train!model.train(    X_train, y_train,    X_val, y_val,    learning_rate=0.5,    epochs=200,    early_stopping_patience=20,    verbose=True) print("\n" + "="*70)print("TRAINING COMPLETE!")print("="*70)

# =============================================================================
# TRAIN THE COMPLETE NETWORK
# =============================================================================

print("="*70)
print("TRAINING THE NEURAL NETWORK")
print("="*70)

# Create the network
model = NeuralNetwork(
    n_inputs=9,      # 3x3 image = 9 pixels
    n_hidden=8,      # 8 specialists in our committee
    n_outputs=1,     # Binary output (V or H)
    seed=42
)

print(f"\nNetwork Architecture:")
print(f"  Input layer:  {model.n_inputs} neurons (one per pixel)")
print(f"  Hidden layer: {model.n_hidden} neurons (ReLU activation)")
print(f"  Output layer: {model.n_outputs} neuron (Sigmoid activation)")
print(f"  Total parameters: {9*8 + 8 + 8*1 + 1} = {9*8 + 8 + 8*1 + 1}")

print("\n" + "-"*70)
print("Training with early stopping...")
print("-"*70)

# Train!
model.train(
    X_train, y_train,
    X_val, y_val,
    learning_rate=0.5,
    epochs=200,
    early_stopping_patience=20,
    verbose=True
)

print("\n" + "="*70)
print("TRAINING COMPLETE!")
print("="*70)

How Training Works: The Complete Flow

Here's what happens during each training epoch:

┌─────────────────────────────────────────────────────────────────────┐
│                        ONE TRAINING EPOCH                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. FORWARD PASS (Make predictions)                                │
│     Input X → [W1×X + b1] → ReLU → [W2×H + b2] → Sigmoid → Output │
│                    ↓                      ↓                        │
│               Cache Z1, A1            Cache Z2, A2                 │
│                                                                     │
│  2. COMPUTE LOSS                                                   │
│     BCE = -mean(y×log(ŷ) + (1-y)×log(1-ŷ))                        │
│                                                                     │
│  3. BACKWARD PASS (Compute gradients)                              │
│     ∂L/∂W2 ← output error × hidden activations (from cache)       │
│     ∂L/∂W1 ← hidden error × input (chain rule through ReLU)       │
│                                                                     │
│  4. UPDATE WEIGHTS                                                 │
│     W1 ← W1 - lr × ∂L/∂W1                                         │
│     W2 ← W2 - lr × ∂L/∂W2                                         │
│                                                                     │
│  5. EVALUATE                                                       │
│     Compute train loss/accuracy                                    │
│     Compute val loss/accuracy                                      │
│                                                                     │
│  6. EARLY STOPPING CHECK                                           │
│     If val_loss improved → save weights                            │
│     If no improvement for `patience` epochs → stop & restore best │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

This process repeats until:

Maximum epochs reached, OR
Early stopping triggers (no validation improvement)

cell 013

# =============================================================================# VISUALIZE TRAINING PROGRESS# ============================================================================= fig, axes = plt.subplots(1, 2, figsize=(14, 5)) epochs = range(1, len(model.train_loss_history) + 1) # Plot 1: Loss curvesax = axes[0]ax.plot(epochs, model.train_loss_history, 'b-', label='Training Loss', linewidth=2)ax.plot(epochs, model.val_loss_history, 'r-', label='Validation Loss', linewidth=2)ax.axvline(x=model.best_epoch+1, color='green', linestyle='--', linewidth=2,           label=f'Best epoch ({model.best_epoch+1})')ax.set_xlabel('Epoch', fontsize=12)ax.set_ylabel('Loss (BCE)', fontsize=12)ax.set_title('Training Progress: Loss', fontsize=14, fontweight='bold')ax.legend()ax.grid(True, alpha=0.3) # Plot 2: Accuracy curvesax = axes[1]ax.plot(epochs, [a*100 for a in model.train_acc_history], 'b-',         label='Training Accuracy', linewidth=2)ax.plot(epochs, [a*100 for a in model.val_acc_history], 'r-',         label='Validation Accuracy', linewidth=2)ax.axvline(x=model.best_epoch+1, color='green', linestyle='--', linewidth=2,           label=f'Best epoch ({model.best_epoch+1})')ax.set_xlabel('Epoch', fontsize=12)ax.set_ylabel('Accuracy (%)', fontsize=12)ax.set_title('Training Progress: Accuracy', fontsize=14, fontweight='bold')ax.legend()ax.grid(True, alpha=0.3)ax.set_ylim(40, 105) plt.tight_layout()plt.show() print("""TRAINING INSIGHTS:════════════════════════════════════════════════════════════════════════ • Training and validation curves should stay close (no overfitting!)• Early stopping saved the best model before potential overfitting• The committee learned the V/H pattern effectively""")

# =============================================================================
# VISUALIZE TRAINING PROGRESS
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

epochs = range(1, len(model.train_loss_history) + 1)

# Plot 1: Loss curves
ax = axes[0]
ax.plot(epochs, model.train_loss_history, 'b-', label='Training Loss', linewidth=2)
ax.plot(epochs, model.val_loss_history, 'r-', label='Validation Loss', linewidth=2)
ax.axvline(x=model.best_epoch+1, color='green', linestyle='--', linewidth=2,
           label=f'Best epoch ({model.best_epoch+1})')
ax.set_xlabel('Epoch', fontsize=12)
ax.set_ylabel('Loss (BCE)', fontsize=12)
ax.set_title('Training Progress: Loss', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

# Plot 2: Accuracy curves
ax = axes[1]
ax.plot(epochs, [a*100 for a in model.train_acc_history], 'b-', 
        label='Training Accuracy', linewidth=2)
ax.plot(epochs, [a*100 for a in model.val_acc_history], 'r-', 
        label='Validation Accuracy', linewidth=2)
ax.axvline(x=model.best_epoch+1, color='green', linestyle='--', linewidth=2,
           label=f'Best epoch ({model.best_epoch+1})')
ax.set_xlabel('Epoch', fontsize=12)
ax.set_ylabel('Accuracy (%)', fontsize=12)
ax.set_title('Training Progress: Accuracy', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_ylim(40, 105)

plt.tight_layout()
plt.show()

print("""
TRAINING INSIGHTS:
════════════════════════════════════════════════════════════════════════

• Training and validation curves should stay close (no overfitting!)
• Early stopping saved the best model before potential overfitting
• The committee learned the V/H pattern effectively
""")

9.4 Complete Evaluation

Now we evaluate our trained model on the test set - data it has NEVER seen during training or validation. This is the true measure of generalization.

Evaluation Metrics (Part 6)

Metric	What It Measures
Accuracy	Overall correctness
Precision	Of predicted positives, how many are correct?
Recall	Of actual positives, how many did we find?
F1 Score	Harmonic mean of precision and recall
Confusion Matrix	Detailed breakdown of TP, TN, FP, FN

What Do "Good" Values Look Like?

Metric	Poor	Okay	Good	Excellent
Accuracy	<60%	60-75%	75-90%	>90%
F1 Score	<0.5	0.5-0.7	0.7-0.9	>0.9

For our V/H classifier:

With 15% noise, >85% accuracy is quite good
Balanced precision/recall indicates no systematic bias
Similar train/val/test accuracy indicates good generalization

Reading the Confusion Matrix for Insights

The confusion matrix tells us not just HOW MANY errors, but WHAT KIND:

Scenario	Meaning	Possible Cause
High FP (false alarm)	Saying "vertical" too often	Model is too sensitive to vertical patterns
High FN (misses)	Missing vertical lines	Model isn't detecting vertical patterns well
Balanced errors	FP ≈ FN	Model is "confused" by noise, not biased

Ideal: Most values on the diagonal (TN, TP), minimal off-diagonal (FP, FN).

cell 015

# =============================================================================# COMPLETE EVALUATION ON TEST SET# ============================================================================= print("="*70)print("FINAL EVALUATION ON TEST SET")print("(Data the model has NEVER seen!)")print("="*70) # Get predictionstest_predictions = model.predict(X_test) # Confusion matrixcm = model.confusion_matrix(X_test, y_test) # Calculate metricsaccuracy = (cm['TP'] + cm['TN']) / len(y_test)precision = cm['TP'] / (cm['TP'] + cm['FP']) if (cm['TP'] + cm['FP']) > 0 else 0recall = cm['TP'] / (cm['TP'] + cm['FN']) if (cm['TP'] + cm['FN']) > 0 else 0f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0 print(f"\n📊 PERFORMANCE METRICS:")print("-"*40)print(f"  Accuracy:  {accuracy*100:.1f}%")print(f"  Precision: {precision*100:.1f}%")print(f"  Recall:    {recall*100:.1f}%")print(f"  F1 Score:  {f1*100:.1f}%") print(f"\n📋 CONFUSION MATRIX:")print("-"*40)print(f"                  Predicted")print(f"              HORIZ    VERT")print(f"  Actual HORIZ  {cm['TN']:3d}     {cm['FP']:3d}")print(f"  Actual VERT   {cm['FN']:3d}     {cm['TP']:3d}") print(f"\n  True Negatives (TN):  {cm['TN']:3d} - Correctly identified horizontal")print(f"  True Positives (TP):  {cm['TP']:3d} - Correctly identified vertical")print(f"  False Positives (FP): {cm['FP']:3d} - Horizontal wrongly called vertical")print(f"  False Negatives (FN): {cm['FN']:3d} - Vertical wrongly called horizontal")

# =============================================================================
# COMPLETE EVALUATION ON TEST SET
# =============================================================================

print("="*70)
print("FINAL EVALUATION ON TEST SET")
print("(Data the model has NEVER seen!)")
print("="*70)

# Get predictions
test_predictions = model.predict(X_test)

# Confusion matrix
cm = model.confusion_matrix(X_test, y_test)

# Calculate metrics
accuracy = (cm['TP'] + cm['TN']) / len(y_test)
precision = cm['TP'] / (cm['TP'] + cm['FP']) if (cm['TP'] + cm['FP']) > 0 else 0
recall = cm['TP'] / (cm['TP'] + cm['FN']) if (cm['TP'] + cm['FN']) > 0 else 0
f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0

print(f"\n📊 PERFORMANCE METRICS:")
print("-"*40)
print(f"  Accuracy:  {accuracy*100:.1f}%")
print(f"  Precision: {precision*100:.1f}%")
print(f"  Recall:    {recall*100:.1f}%")
print(f"  F1 Score:  {f1*100:.1f}%")

print(f"\n📋 CONFUSION MATRIX:")
print("-"*40)
print(f"                  Predicted")
print(f"              HORIZ    VERT")
print(f"  Actual HORIZ  {cm['TN']:3d}     {cm['FP']:3d}")
print(f"  Actual VERT   {cm['FN']:3d}     {cm['TP']:3d}")

print(f"\n  True Negatives (TN):  {cm['TN']:3d} - Correctly identified horizontal")
print(f"  True Positives (TP):  {cm['TP']:3d} - Correctly identified vertical")
print(f"  False Positives (FP): {cm['FP']:3d} - Horizontal wrongly called vertical")
print(f"  False Negatives (FN): {cm['FN']:3d} - Vertical wrongly called horizontal")

cell 016

# =============================================================================# VISUALIZE EVALUATION RESULTS# ============================================================================= fig, axes = plt.subplots(1, 3, figsize=(15, 5)) # Plot 1: Confusion Matrix Heatmapax = axes[0]cm_matrix = np.array([[cm['TN'], cm['FP']], [cm['FN'], cm['TP']]])im = ax.imshow(cm_matrix, cmap='Blues')ax.set_xticks([0, 1])ax.set_yticks([0, 1])ax.set_xticklabels(['HORIZ (0)', 'VERT (1)'])ax.set_yticklabels(['HORIZ (0)', 'VERT (1)'])ax.set_xlabel('Predicted', fontsize=12)ax.set_ylabel('Actual', fontsize=12)ax.set_title('Confusion Matrix', fontsize=14, fontweight='bold') # Add text annotationsfor i in range(2):    for j in range(2):        text = ax.text(j, i, cm_matrix[i, j], ha='center', va='center',                       fontsize=20, fontweight='bold',                      color='white' if cm_matrix[i, j] > cm_matrix.max()/2 else 'black') # Plot 2: Metrics Bar Chartax = axes[1]metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score']values = [accuracy*100, precision*100, recall*100, f1*100]colors = ['#2ecc71', '#3498db', '#9b59b6', '#e74c3c']bars = ax.bar(metrics, values, color=colors)ax.set_ylim(0, 105)ax.set_ylabel('Percentage (%)', fontsize=12)ax.set_title('Performance Metrics', fontsize=14, fontweight='bold')for bar, val in zip(bars, values):    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,             f'{val:.1f}%', ha='center', fontsize=11, fontweight='bold') # Plot 3: Sample Predictionsax = axes[2]ax.axis('off') # Show some predictionssample_text = "SAMPLE PREDICTIONS:\n" + "="*40 + "\n\n"for i in range(min(6, len(X_test))):    actual = "VERT" if y_test[i] == 1 else "HORIZ"    predicted = "VERT" if test_predictions[i] == 1 else "HORIZ"    prob = model.forward(X_test[i:i+1])[0, 0]    status = "✓" if actual == predicted else "✗"    sample_text += f"  {status} Actual: {actual:5s}  Predicted: {predicted:5s}  (prob={prob:.2f})\n" ax.text(0.05, 0.5, sample_text, fontsize=11, family='monospace',        verticalalignment='center', transform=ax.transAxes,        bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.9)) plt.tight_layout()plt.show()

# =============================================================================
# VISUALIZE EVALUATION RESULTS
# =============================================================================

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot 1: Confusion Matrix Heatmap
ax = axes[0]
cm_matrix = np.array([[cm['TN'], cm['FP']], [cm['FN'], cm['TP']]])
im = ax.imshow(cm_matrix, cmap='Blues')
ax.set_xticks([0, 1])
ax.set_yticks([0, 1])
ax.set_xticklabels(['HORIZ (0)', 'VERT (1)'])
ax.set_yticklabels(['HORIZ (0)', 'VERT (1)'])
ax.set_xlabel('Predicted', fontsize=12)
ax.set_ylabel('Actual', fontsize=12)
ax.set_title('Confusion Matrix', fontsize=14, fontweight='bold')

# Add text annotations
for i in range(2):
    for j in range(2):
        text = ax.text(j, i, cm_matrix[i, j], ha='center', va='center', 
                      fontsize=20, fontweight='bold',
                      color='white' if cm_matrix[i, j] > cm_matrix.max()/2 else 'black')

# Plot 2: Metrics Bar Chart
ax = axes[1]
metrics = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
values = [accuracy*100, precision*100, recall*100, f1*100]
colors = ['#2ecc71', '#3498db', '#9b59b6', '#e74c3c']
bars = ax.bar(metrics, values, color=colors)
ax.set_ylim(0, 105)
ax.set_ylabel('Percentage (%)', fontsize=12)
ax.set_title('Performance Metrics', fontsize=14, fontweight='bold')
for bar, val in zip(bars, values):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1, 
            f'{val:.1f}%', ha='center', fontsize=11, fontweight='bold')

# Plot 3: Sample Predictions
ax = axes[2]
ax.axis('off')

# Show some predictions
sample_text = "SAMPLE PREDICTIONS:\n" + "="*40 + "\n\n"
for i in range(min(6, len(X_test))):
    actual = "VERT" if y_test[i] == 1 else "HORIZ"
    predicted = "VERT" if test_predictions[i] == 1 else "HORIZ"
    prob = model.forward(X_test[i:i+1])[0, 0]
    status = "✓" if actual == predicted else "✗"
    sample_text += f"  {status} Actual: {actual:5s}  Predicted: {predicted:5s}  (prob={prob:.2f})\n"

ax.text(0.05, 0.5, sample_text, fontsize=11, family='monospace',
        verticalalignment='center', transform=ax.transAxes,
        bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.9))

plt.tight_layout()
plt.show()

9.5 Saliency: What Did the Network Learn?

Let's peek inside the trained committee's brain - what features do the hidden neurons look for?

Each hidden neuron learned to detect specific patterns. By visualizing their weights (reshaped to 3x3), we can see what they're "looking for."

How to Read the Saliency Visualizations

Each 3×3 grid shows ONE hidden neuron's "template":

Color	Weight	Meaning
Red	Positive	"I get excited when this pixel is bright"
Blue	Negative	"I get suppressed when this pixel is bright"
White/Gray	Near zero	"I don't care about this pixel"

Patterns to Look For

Good learning: Hidden neurons specialize in different features:

Pattern Type	What You'll See	What It Detects
Column detector	One column red, others blue	Vertical lines in that column
Row detector	One row red, others blue	Horizontal lines in that row
Edge detector	Mixed red/blue pattern	Edges or transitions
General detector	Mostly red or mostly blue	Overall brightness level

Signs of good learning:

Different neurons have different patterns (diversity!)
Some neurons clearly detect vertical patterns
Some neurons clearly detect horizontal patterns
The W2 weights show which neurons "vote" for which class

Signs of poor learning:

All neurons look similar (no specialization)
Random-looking patterns (didn't converge)
All weights near zero (vanishing gradients)

cell 018

# =============================================================================# SALIENCY: VISUALIZE WHAT THE NETWORK LEARNED# ============================================================================= print("="*70)print("INSIDE THE COMMITTEE'S BRAIN: What Each Specialist Looks For")print("="*70) # Get the input-to-hidden weightsW1 = model.W1  # Shape: (n_hidden, n_inputs) = (8, 9) # Get the hidden-to-output weights (tells us how each specialist contributes to final decision)W2 = model.W2.flatten()  # Shape: (8,) fig, axes = plt.subplots(2, 4, figsize=(14, 7)) for i in range(model.n_hidden):    ax = axes[i // 4, i % 4]        # Reshape this neuron's weights to 3x3    weights = W1[i].reshape(3, 3)        # Visualize    im = ax.imshow(weights, cmap='RdBu_r', vmin=-np.abs(weights).max(), vmax=np.abs(weights).max())        # Title with contribution direction    direction = "→VERT" if W2[i] > 0 else "→HORIZ"    ax.set_title(f'Specialist {i+1} {direction}\n(W2={W2[i]:.2f})', fontsize=10)    ax.axis('off')        # Add colorbar for first one    if i == 3:        plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04) plt.suptitle('Hidden Neuron Weights: Red = positive, Blue = negative\n'             '→VERT means this neuron votes for VERTICAL, →HORIZ for HORIZONTAL',              fontsize=12, fontweight='bold')plt.tight_layout()plt.show() print("""INTERPRETATION:════════════════════════════════════════════════════════════════════════ Each 3x3 heatmap shows what ONE hidden neuron "looks for":  • RED pixels: This neuron gets EXCITED when these pixels are bright  • BLUE pixels: This neuron gets INHIBITED when these pixels are bright The "→VERT" or "→HORIZ" shows how this specialist votes in the final decision:  • →VERT specialists contribute to "vertical" prediction when activated  • →HORIZ specialists contribute to "horizontal" prediction when activated Look for patterns! Some specialists might look for:  • Vertical column patterns (bright red in one column)  • Horizontal row patterns (bright red in one row)  • Edge detectors (mixed red/blue patterns)""")

# =============================================================================
# SALIENCY: VISUALIZE WHAT THE NETWORK LEARNED
# =============================================================================

print("="*70)
print("INSIDE THE COMMITTEE'S BRAIN: What Each Specialist Looks For")
print("="*70)

# Get the input-to-hidden weights
W1 = model.W1  # Shape: (n_hidden, n_inputs) = (8, 9)

# Get the hidden-to-output weights (tells us how each specialist contributes to final decision)
W2 = model.W2.flatten()  # Shape: (8,)

fig, axes = plt.subplots(2, 4, figsize=(14, 7))

for i in range(model.n_hidden):
    ax = axes[i // 4, i % 4]
    
    # Reshape this neuron's weights to 3x3
    weights = W1[i].reshape(3, 3)
    
    # Visualize
    im = ax.imshow(weights, cmap='RdBu_r', vmin=-np.abs(weights).max(), vmax=np.abs(weights).max())
    
    # Title with contribution direction
    direction = "→VERT" if W2[i] > 0 else "→HORIZ"
    ax.set_title(f'Specialist {i+1} {direction}\n(W2={W2[i]:.2f})', fontsize=10)
    ax.axis('off')
    
    # Add colorbar for first one
    if i == 3:
        plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)

plt.suptitle('Hidden Neuron Weights: Red = positive, Blue = negative\n'
             '→VERT means this neuron votes for VERTICAL, →HORIZ for HORIZONTAL', 
             fontsize=12, fontweight='bold')
plt.tight_layout()
plt.show()

print("""
INTERPRETATION:
════════════════════════════════════════════════════════════════════════

Each 3x3 heatmap shows what ONE hidden neuron "looks for":
  • RED pixels: This neuron gets EXCITED when these pixels are bright
  • BLUE pixels: This neuron gets INHIBITED when these pixels are bright

The "→VERT" or "→HORIZ" shows how this specialist votes in the final decision:
  • →VERT specialists contribute to "vertical" prediction when activated
  • →HORIZ specialists contribute to "horizontal" prediction when activated

Look for patterns! Some specialists might look for:
  • Vertical column patterns (bright red in one column)
  • Horizontal row patterns (bright red in one row)
  • Edge detectors (mixed red/blue patterns)
""")

9.6 Interactive Dashboard: Experiment Yourself!

Try different hyperparameters and see how they affect performance.

Hyperparameter	What It Controls	Trade-off
Hidden neurons	Model complexity	More = can learn more, but risk overfitting
Learning rate	Step size	Higher = faster but less stable
Noise level	Data difficulty	Higher = harder to learn

Experiments to Try

Experiment 1: Varying Model Complexity

Hidden Neurons	Expected Result
2	May underfit - not enough capacity
8	Good balance - our default
32	May overfit - watch train/val gap

Experiment 2: Varying Learning Rate

Learning Rate	Expected Result
0.01	Very slow convergence
0.5	Fast, stable (our default)
2.0	May oscillate or diverge

Experiment 3: Varying Noise Level

Noise Level	Expected Result
0.0	Near-perfect accuracy (too easy!)
0.15	Challenging but learnable
0.4	Very difficult, accuracy drops

What to Watch For

Healthy training: Train and val curves decrease together, then flatten
Overfitting: Train keeps improving, val gets worse (gap grows)
Underfitting: Both curves stay high and flat
Instability: Curves jump around wildly (reduce learning rate)

cell 020full lab recommended

# =============================================================================# INTERACTIVE DASHBOARD: EXPERIMENT WITH HYPERPARAMETERS# ============================================================================= def run_experiment(n_hidden=8, learning_rate=0.5, noise_level=0.15, n_samples=300, seed=42):    """Run a complete experiment with given hyperparameters."""        print("="*70)    print(f"EXPERIMENT: hidden={n_hidden}, lr={learning_rate}, noise={noise_level}")    print("="*70)        # Create data    (X_tr, y_tr), (X_v, y_v), (X_te, y_te) = create_train_val_test_split(        n_total=n_samples, noise_level=noise_level, seed=seed    )        # Create and train model    exp_model = NeuralNetwork(n_inputs=9, n_hidden=n_hidden, n_outputs=1, seed=seed)    exp_model.train(X_tr, y_tr, X_v, y_v,                     learning_rate=learning_rate, epochs=200,                     early_stopping_patience=20, verbose=False)        # Evaluate    test_loss, test_acc = exp_model.evaluate(X_te, y_te)        # Visualize    fig, axes = plt.subplots(1, 2, figsize=(12, 4))        epochs = range(1, len(exp_model.train_loss_history) + 1)        ax = axes[0]    ax.plot(epochs, exp_model.train_loss_history, 'b-', label='Train')    ax.plot(epochs, exp_model.val_loss_history, 'r-', label='Val')    ax.axvline(exp_model.best_epoch+1, color='g', linestyle='--', label=f'Best: {exp_model.best_epoch+1}')    ax.set_xlabel('Epoch')    ax.set_ylabel('Loss')    ax.set_title(f'Training Progress\nFinal Test Acc: {test_acc*100:.1f}%', fontweight='bold')    ax.legend()    ax.grid(True, alpha=0.3)        ax = axes[1]    ax.plot(epochs, [a*100 for a in exp_model.train_acc_history], 'b-', label='Train')    ax.plot(epochs, [a*100 for a in exp_model.val_acc_history], 'r-', label='Val')    ax.set_xlabel('Epoch')    ax.set_ylabel('Accuracy (%)')    ax.set_title(f'Accuracy Progress\nStopped at epoch {len(epochs)}', fontweight='bold')    ax.legend()    ax.grid(True, alpha=0.3)    ax.set_ylim(40, 105)        plt.tight_layout()    plt.show()        return test_acc # Interactive widgets (if available)if WIDGETS_AVAILABLE:    print("Interactive dashboard available! Adjust sliders and click 'Run Experiment'.\n")        hidden_slider = widgets.IntSlider(value=8, min=2, max=32, step=2, description='Hidden:')    lr_slider = widgets.FloatSlider(value=0.5, min=0.01, max=2.0, step=0.1, description='Learn Rate:')    noise_slider = widgets.FloatSlider(value=0.15, min=0.0, max=0.5, step=0.05, description='Noise:')        def on_button_click(b):        clear_output(wait=True)        display(widgets.VBox([hidden_slider, lr_slider, noise_slider, run_button]))        run_experiment(hidden_slider.value, lr_slider.value, noise_slider.value)        run_button = widgets.Button(description='Run Experiment')    run_button.on_click(on_button_click)        display(widgets.VBox([hidden_slider, lr_slider, noise_slider, run_button]))else:    print("Widgets not available. Running preset experiments instead.\n")        # Run a few preset experiments    print("\n" + "="*70)    print("PRESET EXPERIMENTS")    print("="*70)        experiments = [        {"n_hidden": 4, "learning_rate": 0.5, "noise_level": 0.1, "desc": "Simple model, low noise"},        {"n_hidden": 16, "learning_rate": 0.5, "noise_level": 0.3, "desc": "Complex model, high noise"},    ]        for exp in experiments:        print(f"\n>>> {exp['desc']}")        run_experiment(exp['n_hidden'], exp['learning_rate'], exp['noise_level'])

# =============================================================================
# INTERACTIVE DASHBOARD: EXPERIMENT WITH HYPERPARAMETERS
# =============================================================================

def run_experiment(n_hidden=8, learning_rate=0.5, noise_level=0.15, n_samples=300, seed=42):
    """Run a complete experiment with given hyperparameters."""
    
    print("="*70)
    print(f"EXPERIMENT: hidden={n_hidden}, lr={learning_rate}, noise={noise_level}")
    print("="*70)
    
    # Create data
    (X_tr, y_tr), (X_v, y_v), (X_te, y_te) = create_train_val_test_split(
        n_total=n_samples, noise_level=noise_level, seed=seed
    )
    
    # Create and train model
    exp_model = NeuralNetwork(n_inputs=9, n_hidden=n_hidden, n_outputs=1, seed=seed)
    exp_model.train(X_tr, y_tr, X_v, y_v, 
                    learning_rate=learning_rate, epochs=200, 
                    early_stopping_patience=20, verbose=False)
    
    # Evaluate
    test_loss, test_acc = exp_model.evaluate(X_te, y_te)
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    epochs = range(1, len(exp_model.train_loss_history) + 1)
    
    ax = axes[0]
    ax.plot(epochs, exp_model.train_loss_history, 'b-', label='Train')
    ax.plot(epochs, exp_model.val_loss_history, 'r-', label='Val')
    ax.axvline(exp_model.best_epoch+1, color='g', linestyle='--', label=f'Best: {exp_model.best_epoch+1}')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Loss')
    ax.set_title(f'Training Progress\nFinal Test Acc: {test_acc*100:.1f}%', fontweight='bold')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    ax = axes[1]
    ax.plot(epochs, [a*100 for a in exp_model.train_acc_history], 'b-', label='Train')
    ax.plot(epochs, [a*100 for a in exp_model.val_acc_history], 'r-', label='Val')
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Accuracy (%)')
    ax.set_title(f'Accuracy Progress\nStopped at epoch {len(epochs)}', fontweight='bold')
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.set_ylim(40, 105)
    
    plt.tight_layout()
    plt.show()
    
    return test_acc

# Interactive widgets (if available)
if WIDGETS_AVAILABLE:
    print("Interactive dashboard available! Adjust sliders and click 'Run Experiment'.\n")
    
    hidden_slider = widgets.IntSlider(value=8, min=2, max=32, step=2, description='Hidden:')
    lr_slider = widgets.FloatSlider(value=0.5, min=0.01, max=2.0, step=0.1, description='Learn Rate:')
    noise_slider = widgets.FloatSlider(value=0.15, min=0.0, max=0.5, step=0.05, description='Noise:')
    
    def on_button_click(b):
        clear_output(wait=True)
        display(widgets.VBox([hidden_slider, lr_slider, noise_slider, run_button]))
        run_experiment(hidden_slider.value, lr_slider.value, noise_slider.value)
    
    run_button = widgets.Button(description='Run Experiment')
    run_button.on_click(on_button_click)
    
    display(widgets.VBox([hidden_slider, lr_slider, noise_slider, run_button]))
else:
    print("Widgets not available. Running preset experiments instead.\n")
    
    # Run a few preset experiments
    print("\n" + "="*70)
    print("PRESET EXPERIMENTS")
    print("="*70)
    
    experiments = [
        {"n_hidden": 4, "learning_rate": 0.5, "noise_level": 0.1, "desc": "Simple model, low noise"},
        {"n_hidden": 16, "learning_rate": 0.5, "noise_level": 0.3, "desc": "Complex model, high noise"},
    ]
    
    for exp in experiments:
        print(f"\n>>> {exp['desc']}")
        run_experiment(exp['n_hidden'], exp['learning_rate'], exp['noise_level'])

Part 9 Summary: The Complete Journey

Mission Accomplished!

We set out in Part 0 to build a neural network that could classify vertical and horizontal lines. Now we have:

Component	Implementation	Part Referenced
Data representation	3x3 images → 9-element vectors	Part 1 (Matrices)
Network architecture	9 → 8 (ReLU) → 1 (Sigmoid)	Parts 2, 3, 7
Forward propagation	Matrix operations + activations	Parts 4, 7
Loss function	Binary Cross-Entropy	Part 5
Training	Backpropagation + Gradient Descent	Part 5
Evaluation	Accuracy, Confusion Matrix, F1	Part 6
Overfitting prevention	Early stopping + proper sizing	Part 8
Interpretability	Weight visualization	Part 6

The Committee Analogy Complete

Part	Committee Story
0	Introduced the committee concept
1	Learned the language (matrices)
2	First committee member joins
3	Member learns to vote (activation)
4	First attempt at decisions
5	Learning from mistakes
6	Evaluating performance
7	Full committee assembled
8	Growing pains addressed
9	Complete, working committee!

Key Takeaways

Neural networks are simple at their core - Just matrix multiplications and non-linear functions
Training is optimization - Find weights that minimize loss on training data
Generalization is the goal - Performance on unseen data is what matters
Architecture matters - Right-sized models with proper activations work best
Monitoring is essential - Track train AND validation metrics

Common Mistakes to Avoid

Mistake	Consequence	How to Avoid
No validation set	Can't detect overfitting	Always split your data
Using test data to tune	Overly optimistic results	Keep test data completely separate
Wrong activation for output	Invalid predictions	Sigmoid for binary, softmax for multi-class
Too large model for data	Overfitting	Start small, increase if underfitting
No shuffling	Biased splits	Always shuffle before splitting
Ignoring learning curves	Miss problems	Plot train/val loss every time

The Complete Neural Network Checklist

Before Training:

Data shuffled and split (train/val/test)
Model architecture chosen (appropriate size)
Activation functions set (ReLU hidden, sigmoid output)
Weights initialized (He for ReLU, Xavier for sigmoid)

During Training:

Monitoring both train AND validation loss
Early stopping configured
Learning rate reasonable (start with 0.1-1.0)

After Training:

Evaluate on TEST set (not validation!)
Check confusion matrix for error patterns
Visualize learned features if possible
Compare train/val/test accuracy for overfitting signs

Knowledge Check

cell 022

# =============================================================================# KNOWLEDGE CHECK - Part 9 (Final Review)# ============================================================================= print("FINAL KNOWLEDGE CHECK - Complete Neural Network Understanding")print("="*70) questions = [    {        "q": "1. In our complete network (9→8→1), what does the '8' represent?",        "options": [            "A) The number of training examples",            "B) The number of hidden neurons (specialists)",            "C) The learning rate",            "D) The number of epochs"        ],        "answer": "B",        "explanation": "The 8 represents hidden neurons - the 'specialists' in our committee who detect different patterns in the input."    },    {        "q": "2. Why do we use ReLU for hidden layers and Sigmoid for output?",        "options": [            "A) Random choice - they're interchangeable",            "B) ReLU prevents vanishing gradients; Sigmoid gives probability output",            "C) Sigmoid is faster than ReLU",            "D) ReLU only works for hidden layers"        ],        "answer": "B",        "explanation": "ReLU (derivative=1 when active) prevents vanishing gradients in deep networks. Sigmoid maps to (0,1) which we interpret as probability."    },    {        "q": "3. What is the purpose of the validation set?",        "options": [            "A) Extra training data",            "B) Final performance evaluation",            "C) Tune hyperparameters and detect overfitting",            "D) Test the code works"        ],        "answer": "C",        "explanation": "Validation set is used during training to tune hyperparameters and detect overfitting (early stopping). Test set is for final evaluation."    },    {        "q": "4. What does early stopping prevent?",        "options": [            "A) Underfitting",            "B) Overfitting",            "C) Slow training",            "D) Memory issues"        ],        "answer": "B",        "explanation": "Early stopping stops training when validation loss starts increasing, preventing the model from memorizing training data (overfitting)."    },    {        "q": "5. In the saliency visualization, what do red pixels in a hidden neuron's weights mean?",        "options": [            "A) Errors in that pixel",            "B) The neuron is broken",            "C) The neuron gets excited when those pixels are bright",            "D) Those pixels are ignored"        ],        "answer": "C",        "explanation": "Positive (red) weights mean the neuron responds strongly when those input pixels are bright. Negative (blue) weights mean inhibition."    },    {        "q": "6. What's the complete pipeline for using a neural network?",        "options": [            "A) Train → Test → Deploy",            "B) Data → Train → Evaluate → Deploy",            "C) Code → Train → Done",            "D) Data (split) → Train (with val monitoring) → Evaluate (on test) → Interpret"        ],        "answer": "D",        "explanation": "The complete pipeline: Split data (train/val/test), train with validation monitoring, evaluate on test set, then interpret/deploy."    }] for q in questions:    print(f"\n{q['q']}")    for opt in q["options"]:        print(f"   {opt}") print("\n" + "="*70)print("Scroll down for answers...")print("="*70)

# =============================================================================
# KNOWLEDGE CHECK - Part 9 (Final Review)
# =============================================================================

print("FINAL KNOWLEDGE CHECK - Complete Neural Network Understanding")
print("="*70)

questions = [
    {
        "q": "1. In our complete network (9→8→1), what does the '8' represent?",
        "options": [
            "A) The number of training examples",
            "B) The number of hidden neurons (specialists)",
            "C) The learning rate",
            "D) The number of epochs"
        ],
        "answer": "B",
        "explanation": "The 8 represents hidden neurons - the 'specialists' in our committee who detect different patterns in the input."
    },
    {
        "q": "2. Why do we use ReLU for hidden layers and Sigmoid for output?",
        "options": [
            "A) Random choice - they're interchangeable",
            "B) ReLU prevents vanishing gradients; Sigmoid gives probability output",
            "C) Sigmoid is faster than ReLU",
            "D) ReLU only works for hidden layers"
        ],
        "answer": "B",
        "explanation": "ReLU (derivative=1 when active) prevents vanishing gradients in deep networks. Sigmoid maps to (0,1) which we interpret as probability."
    },
    {
        "q": "3. What is the purpose of the validation set?",
        "options": [
            "A) Extra training data",
            "B) Final performance evaluation",
            "C) Tune hyperparameters and detect overfitting",
            "D) Test the code works"
        ],
        "answer": "C",
        "explanation": "Validation set is used during training to tune hyperparameters and detect overfitting (early stopping). Test set is for final evaluation."
    },
    {
        "q": "4. What does early stopping prevent?",
        "options": [
            "A) Underfitting",
            "B) Overfitting",
            "C) Slow training",
            "D) Memory issues"
        ],
        "answer": "B",
        "explanation": "Early stopping stops training when validation loss starts increasing, preventing the model from memorizing training data (overfitting)."
    },
    {
        "q": "5. In the saliency visualization, what do red pixels in a hidden neuron's weights mean?",
        "options": [
            "A) Errors in that pixel",
            "B) The neuron is broken",
            "C) The neuron gets excited when those pixels are bright",
            "D) Those pixels are ignored"
        ],
        "answer": "C",
        "explanation": "Positive (red) weights mean the neuron responds strongly when those input pixels are bright. Negative (blue) weights mean inhibition."
    },
    {
        "q": "6. What's the complete pipeline for using a neural network?",
        "options": [
            "A) Train → Test → Deploy",
            "B) Data → Train → Evaluate → Deploy",
            "C) Code → Train → Done",
            "D) Data (split) → Train (with val monitoring) → Evaluate (on test) → Interpret"
        ],
        "answer": "D",
        "explanation": "The complete pipeline: Split data (train/val/test), train with validation monitoring, evaluate on test set, then interpret/deploy."
    }
]

for q in questions:
    print(f"\n{q['q']}")
    for opt in q["options"]:
        print(f"   {opt}")

print("\n" + "="*70)
print("Scroll down for answers...")
print("="*70)

cell 023

# ANSWERSprint("ANSWERS - Final Knowledge Check")print("="*70)for i, q in enumerate(questions, 1):    print(f"\n{i}. Answer: {q['answer']}")    print(f"   {q['explanation']}")

What's Next?

Congratulations! You've completed the full implementation of a neural network from scratch!

You now understand:

How neural networks represent and process data
How they learn through backpropagation
How to evaluate and interpret their decisions
How to prevent common pitfalls like overfitting

Coming Up in Part 10: The Future

The final notebook will explore:

What other problems can neural networks solve?
CNNs - Convolutional Neural Networks for images
RNNs - Recurrent Neural Networks for sequences
Transformers - The architecture behind modern AI
Resources for continued learning

Continue to Part 10: part_10_whats_next.ipynb

Congratulations!

You've built a working neural network from absolute scratch!

              🎉 MISSION ACCOMPLISHED! 🎉
    
    From matrices to mastery in 9 parts:
    
    Part 0: The Mission          → Introduced the problem
    Part 1: Matrices             → The language of data
    Part 2: Single Neuron        → The building block
    Part 3: Activations          → Making decisions
    Part 4: Perceptron           → First predictions
    Part 5: Training             → Learning from mistakes
    Part 6: Evaluation           → Measuring success
    Part 7: Hidden Layers        → The full committee
    Part 8: Challenges           → Overcoming obstacles
    Part 9: Implementation       → COMPLETE SYSTEM!
    
    You are now ready for deep learning frameworks
    like PyTorch and TensorFlow!

"The Brain's Decision Committee is fully operational."