Training a Machine
Introduction
A day ago I learned how to train a machine to close on a prediction.
I thought it was easy as I just coded the math to narrow down on a prediction from the loss or the inaccuracy.
The best thing was that I thought it would be very complicated. But I had understood it I did that tedious math by hand and coded it myself.
The steps were as follows
- Analyze and predict the function.
- Assume certain values for the bias, constants, and step.
- Plugin the input to the prediction.
- Calculate the loss.
- Calculate the gradients for each of the constants and bias.
- Update the constants and the Bias by subtracting the respective gradients.
- Now repeat from step 3 till you get 0 loss.
From the above steps, there are critical things that we need to note
y = wx + b
∑(y'-y)² = ∑(wx + b - y)²
= ∑2 (wx + b - y) (x)
= ∑2x(wx + b - y)
= ∑2(wx + b - y)
w = w - ((gradient w.r.t. w) * step)
b = b - ((gradient w.r.t. b) * step)
Why do we subtract the gradients?
Fig 1: The loss vs weight curve |
From the loss function, we can see that the loss is a polynomial which will be a parabolic curve. In a parabola, the least value will be resulting in the lowest portion of the curve. Which is what we want the least value of the loss. From Fig 1 we can see that as weight w decreases the loss decreases up to a certain point and again increases from that point onwards. We need to bring the constant values to the minimum loss position so that we can predict the values.
The Problem that I solved
Area (sq ft) (x) | Price (y) |
---|---|
2,104 | 399,900 |
1,600 | 329,900 |
2,400 | 369,000 |
x = [2104, 1600, 2400]y = [399.9, 329.9, 369]
and we have the prediction function y' as
y' = w₁x² + w₂x + b
From the above function, we know that there are 2 weight constants w₁ & w₂. Also a bias b. Based on this we will have 3 gradients to vary w1, w2, and b respectively
We'll find the MSE as,
MSE = (y' - y)² = (w₁x² + w₂x + b - y)²
Now we'll find the gradient w.r.t. w₁ as,
gradient w.r.t. w₁ = 2x²(w₁x² + w₂x + b - y)
Similarly, the gradient w.r.t. w₂ and b is as follows,
gradient w.r.t. w₂ = 2x(w₁x² + w₂x + b - y)
gradient w.r.t. b = 2(w₁x² + w₂x + b - y)
Now we have the loss function and the gradient for controlling the loss. now we can start coding the logic we explained before.
The Code
x = [2104, 1600, 2400]
y = [399.9, 329.9, 369]
w1 = 1
w2 = 1
bias = 0
error_prev = 0
decent = 1 * (10 ** -13.78)
x_input = 2000
total_epoch = 100
def fwd(x_param):
return ((x_param ** 2) * w1) + (x_param * w2) + bias
print("Prediction before training:", fwd(x_input))
def loss(x_param, y_param):
y_prediction = fwd(x_param)
return (y_prediction - y_param) ** 2
def gradient(x_param, y_param, flag):
if flag == 0:
return 2 * (x_param ** 2) * (((x_param ** 2) * w1) + (x_param * w2) + bias - y_param)
elif flag == 1:
return 2 * x_param * (((x_param ** 2) * w1) + (x_param * w2) + bias - y_param)
elif flag == 2:
return 2 * (((x_param ** 2) * w1) + (x_param * w2) + bias - y_param)
return 0
for epoch in range(total_epoch):
gw1 = 0
gw2 = 0
gb = 0
error = 0
for x_val, y_val in zip(x, y):
error += loss(x_val, y_val)
gw1 += gradient(x_val, y_val, 0)
gw2 += gradient(x_val, y_val, 1)
gb += gradient(x_val, y_val, 2)
print("Progress:", math.floor((epoch * 100) / total_epoch), "%", "When w1 =", w1, "When w2 =", w2, "Bias =", bias,
"Prediction:", fwd(x_input), "Loss =", error)
if epoch != 0 and error_prev < error:
break
error_prev = error
w1 = w1 - (decent * gw1)
w2 = w2 - (decent * gw2)
bias = bias - (decent * gb)
print("Prediction after training:", fwd(x_input))
Output
- We do not want to increase the step too much as it will move far away from the desired point and we will start seeing the loss increase. For this, I have a code to stop when the loss starts to increase.
- We don't have the resource to calculate numbers with more than 32-bit values.
w1 = -1.7639253531609754e-05
w2 = 0.21248294140374407
bias = 0.00021487994761540023
Prediction before training: 354.40908356099675
Progress: 0 % When w1 = -1.7639253531609754e-05 When w2 = 0.21248294140374407 Bias = 0.00021487994761540023 Prediction: 354.40908356099675 Loss = 3735.977235008907
Progress: 1 % When w1 = -1.7639253938957483e-05 When w2 = 0.21248294229112952 Bias = 0.00021487994849988702 Prediction: 354.4090837063776 Loss = 3735.9771875601227
Progress: 2 % When w1 = -1.7639254346305185e-05 When w2 = 0.21248294317851493 Bias = 0.0002148799493843738 Prediction: 354.40908385175857 Loss = 3735.9771401113376
Progress: 3 % When w1 = -1.763925475365291e-05 When w2 = 0.21248294406590035 Bias = 0.0002148799502688606 Prediction: 354.4090839971393 Loss = 3735.9770926625533
Progress: 4 % When w1 = -1.763925516100059e-05 When w2 = 0.21248294495328576 Bias = 0.0002148799511533474 Prediction: 354.4090841425203 Loss = 3735.9770452137677
Progress: 5 % When w1 = -1.7639255568348356e-05 When w2 = 0.21248294584067115 Bias = 0.00021487995203783416 Prediction: 354.4090842879009 Loss = 3735.9769977649876
...
Progress: 97 % When w1 = -1.7639293044322988e-05 When w2 = 0.21248302748009715 Bias = 0.0002148800334105857 Prediction: 354.4090976629358 Loss = 3735.9726324802295
Progress: 98 % When w1 = -1.7639293451670497e-05 When w2 = 0.21248302836748187 Bias = 0.00021488003429507178 Prediction: 354.40909780831606 Loss = 3735.9725850315153
Progress: 99 % When w1 = -1.7639293859017786e-05 When w2 = 0.2124830292548666 Bias = 0.00021488003517955786 Prediction: 354.4090979536972 Loss = 3735.972537582807
Prediction after training: 354.4090980990777
Bibliography
- http://jalammar.github.io/visual-interactive-guide-basics-neural-networks/
Comments
Post a Comment