๊ธ€ ์ž‘์„ฑ์ž: ๋˜ฅํด๋ฒ .
๋ฐ˜์‘ํ˜•

 

๋ชฉ์ฐจ


  • Cost function ๊ทธ๋ž˜ํ”„
  • Gradient descent algorithm ์ ์šฉ
  • Optimizer ์ ์šฉ

 

Cost function ๊ทธ๋ž˜ํ”„


[๊ทธ๋ฆผ 1] Simplified hypothesis

 ์ด์ „ ํฌ์ŠคํŠธ์—์„œ ์šฐ๋ฆฌ๋Š” Cost function์˜ ๋ฏธ๋ถ„์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์œ„์™€ ๊ฐ™์€ ์ถ•์•ฝ ์‹์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

2019/08/07 - [Development/Machine Learning] - [๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ] 03. Linear Regression์˜ cost ์ตœ์†Œํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์›๋ฆฌ ์„ค๋ช…

 

[๋จธ์‹ ๋Ÿฌ๋‹ ์ž…๋ฌธ] 03. Linear Regression์˜ cost ์ตœ์†Œํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์›๋ฆฌ ์„ค๋ช…

๋ชฉ์ฐจ Minimize Cost function Gradient descent algorithm Convex function Minimize Cost function ์ง€๋‚œ๋ฒˆ ํฌ์ŠคํŠธ์—์„œ Hypothesis์™€ Cost function์„ ์•Œ์•„๋ณด์•˜๊ณ , ์šฐ๋ฆฌ๋Š” Cost function์„ ์ตœ์†Œํ™”์‹œํ‚ค๋Š” W์™€ b๋ฅผ..

cjwoov.tistory.com

 

 ํŒŒ์ด์ฌ์„ ํ†ตํ•ด์„œ ์ง์ ‘ Cost function ์‹์„ ํ‘œํ˜„ํ•˜๊ณ  ๊ทธ๋ž˜ํ”„๋กœ ๊ทธ๋ ค ๋ณผ ๊ฒƒ์ธ๋ฐ, ๊ทธ๋ž˜ํ”„๋กœ ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•ด์„  mtplotlib์ด๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

 

matplotlib ์„ค์น˜

python -m pip install -U pip
python -m pip install -U matplotlib

 

 ํŒŒ์ด์ฌ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

import tensorflow as tf
import matplotlib.pyplot as plt
X = [1, 2, 3]
Y = [1, 2, 3]

W = tf.placeholder(tf.float32)

hypothesis = X * W

cost = tf.reduce_mean(tf.square(hypothesis - Y))

sess = tf.Session()

sess.run(tf.global_variables_initializer())

W_val = []
cost_val = []

for i in range(-30, 50):
    feed_W = i * 0.1
    curr_cost, curr_W = sess.run([cost, W], feed_dict={W: feed_W})
    W_val.append(curr_W)
    cost_val.append(curr_cost)

plt.plot(W_val, cost_val)
plt.show()

 ๊ฐ„๋‹จํ•˜๊ฒŒ ์„ค๋ช…ํ•˜์ž๋ฉด [๊ทธ๋ฆผ 1]์‹ ๊ทธ๋Œ€๋กœ Hypothesis๋ฅผ ๋งŒ๋“ค์–ด์ฃผ๊ณ  W์˜ ๋ฒ”์œ„๋ฅผ -3๋ถ€ํ„ฐ 5๊นŒ์ง€ 0.1์”ฉ ๋‚˜๋ˆ„์–ด์„œ

๊ทธ๋ž˜ํ”„๋กœ ๊ทธ๋ ค ์ค€ ์ฝ”๋“œ๋‹ค.

 

[๊ทธ๋ฆผ 2] cost(W) ๊ทธ๋ž˜ํ”„

๊ทธ๋ฆผ์„ ๋ณด๋ฉด ์•Œ๊ฒ ์ง€๋งŒ cost๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” W์˜ ๊ฐ’์€ 1์ด๋‹ค.

 

 

 

 

Gradient descent algorithm ์ ์šฉ


 

[๊ทธ๋ฆผ 3] Gradient descent

 Gradient descent algorithm์˜ ์‹์€ [๊ทธ๋ฆผ 3]๊ณผ ๊ฐ™๊ณ  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์€ W๊ฐ’์„ ์กฐ์ •ํ•ด ๋‚˜๊ฐ€๋ฉฐ cost์˜ ์ตœ์†Ÿ๊ฐ’์„ ์ฐพ์•„๋‚˜๊ฐ„๋‹ค๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

 

learning_rate = 0.1
gradient = tf.reduce_mean((W * X - Y) * X)
descent = W - learning_rate * gradient
update = W.assign(descent)

[๊ทธ๋ฆผ 3]์˜ ์‹์—์„œ ์•ŒํŒŒ ๊ฐ’์€ learning rate๋ฅผ ์˜๋ฏธํ•˜๊ณ  ํŒŒ์ด์ฌ ์ฝ”๋“œ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์œ„์™€ ๊ฐ™๋‹ค.

ํ…์„œํ”Œ๋กœ์šฐ์—์„œ๋Š” W์— ๊ฐ’์„ ํ• ๋‹นํ•  ๋•Œ equal ์—ฐ์‚ฐ์ž(=)๋Š” ์‚ฌ์šฉ์ด ์•ˆ๋˜๋ฉฐ assign์ด๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด W์— ํ• ๋‹น ๊ฐ€๋Šฅํ•˜๋‹ค.

 

Gradient descent algorithm์„ ์ ์šฉํ•œ ์ „์ฒด ์ฝ”๋“œ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

import tensorflow as tf

x_data = [1, 2, 3]
y_data = [1, 2, 3]

W = tf.Variable(tf.random_normal([1]), name='weight')
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)

hypothesis = X * W

cost = tf.reduce_mean(tf.square(hypothesis - Y))

learning_rate = 0.1
gradient = tf.reduce_mean((W * X - Y) * X)
descent = W - learning_rate * gradient
update = W.assign(descent)

sess = tf.Session()

sess.run(tf.global_variables_initializer())
for step in range(21):
    sess.run(update, feed_dict={X: x_data, Y: y_data})
    print(step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run(W))

learning_rate(=์•ŒํŒŒ ๊ฐ’)์„ 0.1๋กœ ์ฃผ์—ˆ๊ณ , ๊ทธ๋‹ค์Œ๋ถ€ํ„ฐ [๊ทธ๋ฆผ 3]์˜ ์‹์„ ๊ทธ๋Œ€๋กœ ์ฝ”๋“œ๋กœ ํ‘œํ˜„ํ•˜์˜€๋‹ค.

๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

0 0.6850602 [0.61685693]
1 0.1948616 [0.79565704]
2 0.05542727 [0.8910171]
3 0.01576599 [0.94187576]
4 0.0044845473 [0.9690004]
5 0.0012756082 [0.98346686]
6 0.00036283964 [0.9911823]
7 0.000103206636 [0.99529725]
8 2.9357458e-05 [0.99749184]
9 8.350654e-06 [0.9986623]
10 2.3756713e-06 [0.99928653]
11 6.756982e-07 [0.9996195]
12 1.9224537e-07 [0.99979705]
13 5.4676246e-08 [0.99989176]
14 1.5574232e-08 [0.99994224]
15 4.4351474e-09 [0.9999692]
16 1.2629471e-09 [0.99998355]
17 3.5721945e-10 [0.99999124]
18 9.976494e-11 [0.99999535]
19 2.984753e-11 [0.9999975]
20 7.716494e-12 [0.9999987]

Gradient descent algorithm์„ ์ ์šฉํ•ด ๋‚˜๊ฐˆ์ˆ˜๋ก cost๋Š” 0(์ตœ์†Ÿ๊ฐ’)์— ๊ฐ€๊นŒ์›Œ์ง€๊ณ  W๋Š” 1์— ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

Optimizer ์ ์šฉ


 ์œ„์—์„œ ์šฐ๋ฆฌ๋Š” Gradient descent algorithm์„ ์ ์šฉํ•˜๊ธฐ์œ„ํ•ด ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฏธ๋ถ„์‹์„ ์ฝ”๋“œ๋กœ ํ‘œํ˜„ํ•˜์˜€๋‹ค.

learning_rate = 0.1
gradient = tf.reduce_mean((W * X - Y) * X)
descent = W - learning_rate * gradient
update = W.assign(descent)

ํ•˜์ง€๋งŒ ์šฐ๋ฆฌ๊ฐ€ ์ผ์ผํžˆ ๋งค์ผ ๋ฏธ๋ถ„ ์‹์„ ๋ณต์žกํ•˜๊ฒŒ ๊ณ„์‚ฐํ•˜๊ณ  ํ‘œํ˜„ํ•˜๋Š” ์ผ์€ ๋งค์šฐ ๋ฒˆ๊ฑฐ๋กœ์šด ์ผ์ด๋‹ค.

๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํ…์„œํ”Œ๋กœ์šฐ์—๋Š” GradientDescentOptimizer๋ผ๋Š” ํ•จ์ˆ˜๊ฐ€ ์žˆ๋Š”๋ฐ ์ด ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฐ„๋‹จํ•˜๊ฒŒ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
train = optimizer.minimize(cost)

GradientDescentOptimizer๋ฅผ ๋งŒ๋“ค์–ด ์ฃผ๊ณ  cost ํ•จ์ˆ˜๋ฅผ optimizer์˜ minimizeํ•จ์ˆ˜์— ๋„ฃ์–ด์ฃผ๊ธฐ๋งŒ ํ•˜๋ฉด ๋œ๋‹ค!

 

 

 ์ด๋ฅผ ์ ์šฉํ•œ ์ „์ฒด ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

import tensorflow as tf

X = [1, 2, 3]
Y = [1, 2, 3]

W = tf.Variable(5.0)

hypothesis = X * W

cost = tf.reduce_mean(tf.square(hypothesis - Y))

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
train = optimizer.minimize(cost)

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for step in range(10):
    print(step, sess.run(W))
    sess.run(train)

๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

0 5.0
1 1.2666664
2 1.0177778
3 1.0011852
4 1.000079
5 1.0000052
6 1.0000004
7 1.0
8 1.0
9 1.0

ํ•™์Šต์„ ๊ฑฐ๋“ญํ•  ์ˆ˜๋ก ์šฐ๋ฆฌ๊ฐ€ ์ฐพ๊ณ ์ž ํ•˜๋Š” W๊ฐ’์— ๊ฐ€๊นŒ์›Œ์ง€๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค!!! :)

 

 

์ฐธ๊ณ ์ž๋ฃŒ


https://youtu.be/Y0EF9VqRuEA
Sung Kim- ML lab 03.Linear Regression์˜ cost ์ตœ์†Œํ™”์˜ TensorFlow ๊ตฌํ˜„

 

๋ฐ˜์‘ํ˜•