CS231n Lecture 03. Loss Functions and Optimization

Loss Functions (SVM, softmax)
Regularization
Optimization (Random Search, Gradient descent)

After Classifying, we need to ...

Define a loss function that quantifies our unhappiness with the scores across the training data.
Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization)

Loss Function

Given a dataset of examples $\{(x_i, y_i)\}^N_{i=1}$ where $x_i$ is image and $y_i$ is label,

Loss over the dataset is a sum of loss over examples:

$L = \frac{1}{N}\sum_i{L_i(f(x_i,W), y_i)}$

(multiclass) Hinge loss:

Given an example ($x_i$, $y_i$) where $x_i$ is the image and where $y_i$ is the label,

and using the shorthand for the scores vector: s = f($x_i$, W)

Hinge loss is

$L_i = \sum_{j\neq y_i}max(0, s_j - s_{y_i}+1)$

+) 1 is the value for the margin, bringing it closer to ground truth and widening the gap with other labels.

The value 1 is somewhat aribtrary.

Example

The SVM loss of cat
= the loss from the car + the loss from the frog
= max(0, 5.1-3.2+1) + max(0, -1.7-3.2+1)
= max(0, 2.9) + max(0, -3.9)
= 2.9 + 0
= 2.9

The SVM loss of car
= the loss from the cat + the loss from the frog
= max(0, 1.3-4.9+1) + max(0, 2.0-4.9+1)
= max(0, -2.6) + max(0, -1.9)
= 0 + 0
= 0

The SVM loss of frog
= the loss from the cat + the loss from the car
= max(0, 2.2-(-3,1)+1) + max(0, 2.5-(-3,1)+1)
= max(0, 6.3) + max(0, 6.6)
= 6.3 +6.6
= 12.9

The average SVM loss over full dataset
= $\frac{1}{3}(2.9+0+12.9)$
= 5.27

Questions of SVM

Q1: What happens to loss if car scores change a bit?

No change in loss of the car

Q2: What is the min/max possible loss?

0 ~ infinity

Q3: At initialization W is small so small approximates to 0. What is the loss?

# of classes - 1

Q4: What if the sum was over all classes? (including j = y_i)

The loss increases by 1.

Q5: What if we used mean instead of sum?

No change

Q6: What if we used the squared term? (L_i = $\sum_{j\neq y_i}max(0, s_j - s_{y_i}+1)^2$)

Change. Different classfication algorithm

Code for SVM loss

def L_i_vercotrized(x, y, W):
	scores = W.dot(x)
    margins = np.maximum(0, scores - scores[y] + 1)
    margins[y] = 0
    loss_i = np.sum(margins)
    return loss_i

Suppose that we found a W such that L = 0. Is this W unique? No! 2W also has L = 0!

Regularization

Among competing hypotheses,
the simplest is the best.
- William of Ockham

Data loss: Model predictions should match training data. $\frac{1}{N}\sum_{i=1}^NL_i(f(x_i, W), y_i)$
Regularization: Model should be simple, so it works on test data. $\lambda R(W)$

Therefore, the loss function is

$L(W)=\frac{1}{N}\sum_{i=1}^NL_i(f(x_i, W), y_i)+\lambda R(W)$

The type of Regularization

+++ L2 Regularization 관해서 한번 더 듣기

Softmax Classifier

Scores = unnormalized log probabilities of the classes

$P(Y=k|X=x_i)= \frac{e^{s_k}}{\sum_je^{s_j}}\ \textup{(softmax function)} \ where \ s=f(x_i;W)$

Want to maximize the log likelihood or to minimize the negative log likelihood of the correct class:

$L_i = -logP(Y=y_i|X=x_i)$

in summary,

$Li = - log(\frac{e^{s_k}}{\sum_je^{s_j}})$

Example

1. unnormalized log probabilities

2. convert log probabilites to probablities with exponential function

	unnormalized log prob.	unnormalized prob.	prob.
cat	3.2	e^{3.2} = 24.5
car	5.1	e^{5.1}=164.0
frog	-1.7	e^{-1.7}=0.18

3. normalized probabilites

	unnormalized log prob.	unnormalized prob.	prob.
cat	3.2	e^{3.2} = 24.5	24.5/(24.5+164.0+0.18)=0.13
car	5.1	e^{5.1}=164.0	164.0/(24.5+164.0+0.18)=0.87
frog	-1.7	e^{-1.7}=0.18	0.18/(24.5+164.0+0.18)=0.00

4. Compute loss with "-log(prob)"

L_cat = -log(0.13) = 0.89

Question of cross-entropy loss

Q1: What is the min/max possible loss L_i?

0~infinity

Q2: Usually at initialization W is small so all s approximates to 0. What is the loss?

log(# of classes)

Difference between SVM and Softmax

Q. Suppose I take a datapoint and jiggle a bit. What happens to the loss in both cases?

A. In Hinge loss, if there is a greater than 1, there is no change.

In cross-entropy loss, it continues to push the probability towards 1.

Optimization

Q. Then, how do we find the best W with the loss? Optimization!

1. Random Search - Bad idea... bad accuracy...

2. Follow the slope : Gradient descent

How to?

But...

Reference

https://www.youtube.com/watch?v=h7iBpEHGVNc&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv

'Computer Science and Engineering > Computer Vision' 카테고리의 다른 글

CS231n Lecture 05. Convolutional Neural Networks (0)	2025.03.19
CS231n Lecture 04. Neural Networks and Backpropagation (1)	2024.01.24
CS231n Lecture 02. Image Classification (0)	2024.01.05
CS231n Lecture 01. Introduction (0)	2024.01.03

밍기적

CS231n Lecture 03. Loss Functions and Optimization

Contents

After Classifying, we need to ...

Loss Function

(multiclass) Hinge loss:

Example

Questions of SVM

Code for SVM loss

Regularization

The type of Regularization

Softmax Classifier

Example

Question of cross-entropy loss

Difference between SVM and Softmax

Optimization

Q. Then, how do we find the best W with the loss? Optimization!

1. Random Search - Bad idea... bad accuracy...

2. Follow the slope : Gradient descent

How to?

But...

Reference

'Computer Science and Engineering > Computer Vision' 카테고리의 다른 글

티스토리툴바

CS231n Lecture 03. Loss Functions and Optimization

Contents

After Classifying, we need to ...

Loss Function

(multiclass) Hinge loss:

Example

Questions of SVM

Code for SVM loss

Regularization

The type of Regularization

Softmax Classifier

Example

Question of cross-entropy loss

Difference between SVM and Softmax

Optimization

Q. Then, how do we find the best W with the loss? Optimization!

1. Random Search - Bad idea... bad accuracy...

2. Follow the slope : Gradient descent

How to?

But...

Reference

'Computer Science and Engineering > Computer Vision' 카테고리의 다른 글

관련글

티스토리툴바