๊ธ€ ์ž‘์„ฑ์ž: ๋˜ฅํด๋ฒ .
๋ฐ˜์‘ํ˜•

๋ชฉ์ฐจ


  • Machine Learning(๋จธ์‹ ๋Ÿฌ๋‹) ์ด๋ž€?
  • Supervised Learning / Unsupervised Learning
  • Supervised learning์˜ ์ข…๋ฅ˜
    •  Regression(ํšŒ๊ท€)
    • Classification(๋ถ„๋ฅ˜)

 

Machine Learning(๋จธ์‹ ๋Ÿฌ๋‹) ์ด๋ž€?


๊ธฐ๊ณ„ ํ•™์Šต(ๆฉŸๆขฐๅญธ็ฟ’) ๋˜๋Š” ๋จธ์‹  ๋Ÿฌ๋‹(์˜์–ด: machine learning)์€ ์ธ๊ณต ์ง€๋Šฅ์˜ ํ•œ ๋ถ„์•ผ๋กœ, ์ปดํ“จํ„ฐ๊ฐ€ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๊ธฐ์ˆ ์„ ๊ฐœ๋ฐœํ•˜๋Š” ๋ถ„์•ผ๋ฅผ ๋งํ•œ๋‹ค. 
                                  -https://ko.wikipedia.org/wiki/%EA%B8%B0%EA%B3%84_%ED%95%99%EC%8A%B5

 ๋จธ์‹ ๋Ÿฌ๋‹์€ ์ผ์ข…์˜ ์†Œํ”„ํŠธ์›จ์–ด(ํ”„๋กœ๊ทธ๋žจ)์ด๋‹ค.

์‚ฌ์šฉ์ž์˜ ์ž…๋ ฅ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด์—ฌ์ฃผ๋„๋ก ํ•˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ, ๊ฐœ๋ฐœ์ž๊ฐ€ '์ด๋Ÿฐ ํ™˜๊ฒฝ์—์„œ ์ด๋ ‡๊ฒŒ ๋ฐ˜์‘ํ•˜๊ณ  ๋˜ ๋‹ค๋ฅธ ํ™˜๊ฒฝ์—์„œ๋Š” ์ €๋ ‡๊ฒŒ ๋ฐ˜์‘ํ•˜๋ผ.'์™€ ๊ฐ™์ด ์„ค๊ณ„ํ•œ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์„ explicit programming์ด๋ผ ํ•œ๋‹ค.

๊ทธ๋Ÿฐ๋ฐ ์–ด๋–ค ๋ถ€๋ถ„์—์„œ๋Š” explicit ํ•˜๊ฒŒ ๊ตฌํ˜„ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค.

 

์˜ˆ๋ฅผ ๋“ค์–ด, ์ด๋ฉ”์ผ ์ŠคํŒธ ํ•„ํ„ฐ ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ์ŠคํŒธ์— ํ•ด๋‹น๋˜๋Š” ๊ทœ์น™๋“ค์ด ๋„ˆ๋ฌด ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐœ๋ฐœ์ž๊ฐ€ ์˜ˆ์ƒํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

๋˜ ๋‹ค๋ฅธ ์˜ˆ๋กœ, ์‚ฌ๋žŒ ์—†์ด ์ž๋™์ฐจ๊ฐ€ ์•Œ์•„์„œ ์ฃผํ–‰ํ•˜๋Š” ์ž๋™ ์ฃผํ–‰ ์ฐจ๋Ÿ‰ ๊ฐ™์€ ๊ฒƒ์€ ๊ฐœ๋ฐœ์ž๊ฐ€ ๊ทœ์น™์„ ์•Œ๊ณ  ์„ค๊ณ„ํ•˜๊ธฐ์—” ๊ทœ์น™๋“ค์ด ๋„ˆ~~๋ฌด๋‚˜ ๋งŽ๋‹ค.

 

 1959๋…„๋„ ๋ฏธ๊ตญ ์ปดํ“จํ„ฐ ๊ณผํ•™์ž Arthur Samuel์ด๋ผ๋Š” ์‚ฌ๋žŒ์€ ์ด๋Ÿฌํ•œ ์ƒ๊ฐ์„ ํ–ˆ๋‹ค.

'์šฐ๋ฆฌ๊ฐ€ ์ผ์ผ์ด ํ”„๋กœ๊ทธ๋ž˜๋ฐํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ, 

์–ด๋–ค ์ž๋ฃŒ์—์„œ ๋˜๋Š” ์–ด๋–ค ํ˜„์ƒ์—์„œ ์ž๋™์ ์œผ๋กœ ๋ฐฐ์šฐ๋ฉด ์–ด๋–จ๊นŒ..? '

(Field of study that gives computers the ability to learn without being explicity programmed)

์ด๊ฒƒ์ด ๋ฐ”๋กœ ๋จธ์‹  ๋Ÿฌ๋‹์ด๋‹ค.

์ฆ‰, ๊ฐœ๋ฐœ์ž๊ฐ€ ์•„๋‹Œ ํ”„๋กœ๊ทธ๋žจ ์ž์ฒด๊ฐ€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๊ณ  ํ•™์Šตํ•ด์„œ ๋ฐฐ์šฐ๋Š” ๊ฒƒ์„ ๋จธ์‹ ๋Ÿฌ๋‹์ด๋ผ ํ•œ๋‹ค.

 

Supervised Learning, training set / Unsupervised Learning


๋จธ์‹  ๋Ÿฌ๋‹์„ ํ•  ๋•Œ ํ”„๋กœ๊ทธ๋žจ์€ ํ•™์Šต์„ ํ•ด์•ผํ•œ๋‹ค. ํ•™์Šต์„ ํ•˜๊ธฐ ์œ„ํ•ด์„  ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์ ธ์•ผ ํ•˜๋Š”๋ฐ ํ•™์Šต์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์—๋Š”

Supervised learning, Unsupervised learning์ด ์žˆ๋‹ค.

 

  • Supervised learning

label๋“ค์ด ์ •ํ•ด์ ธ์žˆ๋Š” ๋ฐ์ดํ„ฐ(= training set)๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šต์„ ํ•˜๋Š” ๊ฒƒ์„ Supervised learning์ด๋ผ ํ•œ๋‹ค.

[๊ทธ๋ฆผ1] Supervised learning์˜ ์˜ˆ (http://cs231n.github.io/classification/)

 ์˜ˆ๋ฅผ ๋“ค์–ด, ์–ด๋–ค ์ด๋ฏธ์ง€๋ฅผ ์ฃผ๋ฉด ๊ทธ ์ด๋ฏธ์ง€๊ฐ€ ๊ณ ์–‘์ด์ผ๊นŒ? ๊ฐœ์ผ๊นŒ? ์ž๋™์œผ๋กœ ์•Œ์•„๋‚ด๋Š” ํ”„๋กœ๊ทธ๋žจ์ด ์žˆ๋‹ค ๊ฐ€์ •ํ•˜์ž.

(์ด ํ”„๋กœ๊ทธ๋žจ ์—ญ์‹œ ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ†ตํ•ด ๋งŒ๋“ค์–ด์ง„ ํ”„๋กœ๊ทธ๋žจ์ผ ๊ฒƒ์ด๋‹ค)

์‚ฌ๋žŒ๋“ค์ด ๊ณ ์–‘์ด ๊ทธ๋ฆผ์„ ์—ฌ๋Ÿฌ ๊ฐœ ๋ชจ์•„์„œ ์ฃผ๊ณ  ์ด ์‚ฌ์ง„์€ ๊ณ ์–‘์ด๋ผ๊ณ (cat์ด๋ผ๋Š” label๋ฅผ ๋ถ™์ž„) ํ”„๋กœ๊ทธ๋žจ์—๊ฒŒ ํ•™์Šต์‹œํ‚ค๊ณ  ๊ฐœ ๊ทธ๋ฆผ์„ ์—ฌ๋Ÿฌ ๊ฐœ ๋ชจ์•„์„œ ์ด ์‚ฌ์ง„์€ ๊ฐœ(dog๋ผ๋Š” label์„ ๋ถ™์ž„)๋ผ๊ณ  ํ”„๋กœ๊ทธ๋žจ์—๊ฒŒ ํ•™์Šต์„ ์‹œํ‚ค๋Š” ๋ฐฉ์‹์„ Supervised learning์ด๋ผ ํ•  ์ˆ˜ ์žˆ๊ฒ ๋‹ค.

 

  • Training set
X Y
3, 6, 9 3
2, 5, 7 2
2, 3, 5 1

์œ„ ํ‘œ์™€ ๊ฐ™์ด Label(Y)์ด ์ •ํ•ด์ ธ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํ”„๋กœ๊ทธ๋žจ์ด ํ•™์Šต์„ ํ•˜๋ฉด Model์ด ์ƒ๊ธด๋‹ค.

ํ•™์Šตํ•œ Model์— ๋Œ€ํ•ด์„œ, ์šฐ๋ฆฌ๊ฐ€ ๋ชจ๋ฅด๋Š” X๊ฐ€ ์žˆ๊ณ  ๊ทธ ๊ฐ’์ด X = [9, 3, 6] ๋ผ๊ณ  ๊ฐ€์ •ํ•˜๋ฉด ์ด X๋ฅผ ํ”„๋กœ๊ทธ๋žจ์—๊ฒŒ ์ „๋‹ฌํ•˜๋ฉด ํ”„๋กœ๊ทธ๋žจ์€ ์•„๋งˆ Y๋Š” 3์ด๋ผ๊ณ  ๋‹ตํ•  ๊ฒƒ์ด๋‹ค.

 ์ด๋Ÿฌํ•œ ํ•™์Šต ๋ฐฉ๋ฒ•์„ Supervised learning์ด๋ผ ํ•  ์ˆ˜ ์žˆ๊ณ , ํ•™์Šตํ•œ ๋ฐ์ดํ„ฐ(ํ‘œ)๋ฅผ training set์ด๋ผ ํ•œ๋‹ค.

 

  • Unsupervised learning

 ์œ„์˜ Supervised learning๊ณผ ๋ฐ˜๋Œ€๋กœ ์ผ์ผ์ด label์„ ์ค„ ์ˆ˜ ์—†๋Š” ํ˜•ํƒœ๋ฅผ Unsupervised learning์ด๋ผ ํ•œ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, Google New๊ฐ€ ์žˆ๋Š”๋ฐ,  ์ด ๋‰ด์Šค๋Š” ์ž๋™์ ์œผ๋กœ ์œ ์‚ฌํ•œ ๋‰ด์Šค๋“ค์„ ๊ทธ๋ฃนํ•‘ํ•œ๋‹ค. ์ด๋Ÿฐ ๊ฒฝ์šฐ์—๋Š” label์„ ๋‹ค ์ •ํ•ด์ฃผ๊ธฐ ์–ด๋ ต๋‹ค. ๋”ฐ๋ผ์„œ ์•Œ์•„์„œ ์œ ์‚ฌํ•œ ๋‰ด์Šค๋ผ๋ฆฌ ๋ชจ์•„์•ผ ํ•œ๋‹ค.

๋˜ ๋‹ค๋ฅธ ์˜ˆ๋กœ, ๋‹จ์–ด๋“ค ๊ฐ€์šด๋ฐ๋„ ๋น„์Šทํ•œ ๋‹จ์–ด๋“ค์„ ๋ชจ์œผ๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋กœ๊ทธ๋žจ์ด ์žˆ๋‹ค๋ฉด ์ด ๊ฒฝ์šฐ๋Š” ์—ญ์‹œ label์„ ๋งŒ๋“ค์–ด์„œ ํ•™์Šต์„ ์‹œํ‚ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ๋‹จ์–ด๋“ค์„ ๊ฐ€์ง€๊ณ  ์Šค์Šค๋กœ ํ•™์Šตํ•˜๋ฏ€๋กœ Unsupervised learning์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Supervised learning์˜ ์ข…๋ฅ˜


Supervised learning์€ ๋ฌธ์ œ์˜ ํƒ€์ž…์— ๋”ฐ๋ผ Regression problem, Classification problem๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.

์„ค๋ช…ํ•˜๊ธฐ ์‰ฝ๊ฒŒ ์‹œํ—˜์˜ ์„ฑ์ ์„ ์˜ˆ์ธกํ•˜๋Š” ์‹œ์Šคํ…œ์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•ด๋ณด์ž.

 

  • Regression

์‹œํ—˜ ์ ์ˆ˜์˜ ๋ฒ”์œ„๋Š” 0์ ๋ถ€ํ„ฐ 100์ ๊นŒ์ง€ ๋„“์€ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์„ ํ…๋ฐ(continous ํ•œ ์—ฐ์†์„ฑ์„ ๊ฐ–๊ณ  ์žˆ๋Š”๋ฐ) ์ด๋Ÿฐ ์˜ˆ์ธก์„ ํ•˜๋Š” ๊ฒƒ์„ Regression์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

x (hours) y (score)
10 90
9 80
3 50
2 30

๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œ์ผฐ์„ ๋•Œ, ์šฐ๋ฆฌ๊ฐ€ x = 7์ด๋ผ๊ณ  ์งˆ๋ฌธํ•˜๋ฉด ํ”„๋กœ๊ทธ๋žจ์€ y = 75์™€ ๊ฐ™์ด ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š”,

์ฆ‰ Continouse ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ํ•™์Šตํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ Regression ๋ชจ๋ธ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

  • Classification

Classification์€ Regression๊ณผ๋Š” ๋‹ฌ๋ฆฌ ์—ฐ์†์„ฑ์ด ์—†์ด ๋ช‡ ๊ฐ€์ง€ ๊ฐ’์œผ๋กœ ๋Š์–ด์ง€๋Š” ๊ฒฝ์šฐ๋‹ค.

Classification์˜ ์ข…๋ฅ˜๋„ Binary classification, Multi-label classification์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

Binary classification

๋ฌธ์ œ๋ฅผ ๋‹จ์ˆœํ™”์‹œ์ผœ์„œ ์ ์ˆ˜๋กœ ๋ถ„๋ฅ˜ํ•˜์ง€ ๋ง๊ณ , ์‹œํ—˜์ด PASS๋ƒ FAIL์ด๋ƒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ„์–ด์„œ ์˜ˆ์ธก์„ ํ•˜๋Š” ๊ฒƒ์„ Binary classification์ด๋ผ ํ•œ๋‹ค.

x (hours) y (pass/fail)
10 P
9 P
3 F
2 F

Multi-label classification

์ด๋ฒˆ์—๋Š” ์‹œํ—˜์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด A, B, C, E, F๋ผ๋Š” ํ•™์ ์œผ๋กœ, ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒ”์œ„๋ฅผ ์ข€ ๋” ์—ฌ๋Ÿฌ ๊ฐœ๋กœ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์„ Multi-label classification์ด๋ผ ํ•œ๋‹ค.

x (hours) y (gradel)
10 A
9 B
3 D
2 F

 

์ฐธ๊ณ  ์ž๋ฃŒ


https://www.youtube.com/watch?v=qPMeuL2LIqY
KimSung๋‹˜ - ML lec 01. ๊ธฐ๋ณธ์ ์ธ Machine Learning ์˜ ์šฉ์–ด์™€ ๊ฐœ๋… ์„ค๋ช…

https://bcho.tistory.com/966
์กฐ๋Œ€ํ˜‘๋‹˜ ๋ธ”๋กœ๊ทธ

 

๋ฐ˜์‘ํ˜•