Notice
Recent Posts
Recent Comments
Link
ยซ   2026/06   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30
Archives
Today
Total
๊ด€๋ฆฌ ๋ฉ”๋‰ด

project:eve

23.02.05 ์•™์ƒ๋ธ” ๋ณธ๋ฌธ

Daily

23.02.05 ์•™์ƒ๋ธ”

eveee 2023. 2. 5. 22:43

Studying here๐Ÿ ๋งฅ์‹ฌํ”Œ๋žœํŠธ ํ•œ๋‚จ

 

๋จธ์‹ ๋Ÿฌ๋‹์„ ๊ฑฐ์˜ ๋‹ค ๊ณต๋ถ€ํ•ด๊ฐ„๋‹ค. ๋‚จ์€ ๊ฑด ์•™์ƒ๋ธ”๊ณผ ๋‚˜์ด๋ธŒ ๋ฒ ์ด์ฆˆ, ์•™์ƒ๋ธ”์€ ๋” ๋ญ”๊ฐ€ ๋ณธ๊ฒฉ์ ์ธ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฐ™์•„์„œ ๊ณต๋ถ€๊ฐ€ ๋๋‚˜๊ฐ€๋Š” ์„ฑ์ทจ๊ฐ + ๋จธ์‹ ๋Ÿฌ๋‹์˜ ์„ธ๊ณ„์— ํ ๋ป‘ ๋น ์ ธ๋“ค์–ด๊ฐ€๋Š” ๊ฒƒ ๊ฐ™์•„์„œ ์ฆ๊ฒ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ์˜ค๋Š˜๋„ ๊ธฐ๋ถ„ ์ข‹๊ฒŒ ๊ธ€์„ ์จ๋ณด์ž.

 

์•™์ƒ๋ธ”(Ensemble)


์•™์ƒ๋ธ”์€ ๋ถ„์„์— ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๋ฒˆ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์—ฌ๋Ÿฌ ๋ถ„์„ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค. ์ฆ‰ ํ•˜๋‚˜์˜ ๋ถ„์„ ๋ฐ์ดํ„ฐ์— ์—ฌ๋Ÿฌ ๋ถ„์„ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ฑฐ๋‚˜ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„์„ ๋ฐ์ดํ„ฐ์— ํ•˜๋‚˜์˜ ๋ถ„์„ ๋ชจ๋ธ์„ ์ ์šฉํ•ด์„œ ๋„์ถœ๋œ ๊ฒฐ๊ณผ๋“ค์„ ์ข…ํ•ฉ, ํ•˜๋‚˜์˜ ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค.
๊ทธ๋Ÿฌ๋ฏ€๋กœ ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹จ์ผ ๋ถ„์„ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜๊ณ  ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ฝ˜ํ…Œ์ŠคํŠธ์—์„œ๋„ ์ƒ์œ„๊ถŒ์— ๋Œ€๋ถ€๋ถ„ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์ด ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. 
๋‹จ์ ์€ ์†๋„๊ฐ€ ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ  ๊ฒฐ๊ณผ ํ•ด์„์ด ์–ด๋ ค์šธ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ .

 

(**์•™์ƒ๋ธ”์— ๋Œ€ํ•ด ์ฑ…์ด๋‚˜ ์ผ๋ถ€ ๋ธ”๋กœ๊ทธ ๊ธ€์„ ๋ณด๋ฉด ์•ฝํ•œ ๊ฒ€์ถœ๊ธฐ๋ฅผ ๋ชจ์•„ ๊ฐ•ํ•œ ๊ฒ€์ถœ๊ธฐ๋ฅผ ๋งŒ๋“ ๋‹ค๊ณ  ํ•œ๋‹ค. ์ด '์•ฝํ•œ ๊ฒ€์ถœ๊ธฐ' ๋ผ๋Š” ๊ฒƒ์ด ๋ญ”์ง€ ์ดํ•ด๊ฐ€ ์•ˆ ๋๋Š”๋ฐ, ์•™์ƒ๋ธ” ํŠน์„ฑ์ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„์„ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์น˜๋‹ˆ๊นŒ ๊ทธ์ค‘ ๊ฐ๊ฐ์˜ ๋ชจ๋ธ๋“ค์„ ์•ฝํ•œ ๊ฒ€์ถœ๊ธฐ๋ผ๊ณ  ๋งํ•˜๋Š” ๊ฒƒ ๊ฐ™๋‹ค. ์•ฝํ•˜๋‹ค๊ณ  ํ•˜๋Š” ์ด์œ ๋Š” ๊ทธ๊ฒƒ๋“ค์„ ์ข…ํ•ฉํ•ด์„œ ๊ฐ•ํ•œ ๊ฒ€์ถœ๊ธฐ๋กœ ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ˜๋Œ€์˜ ์˜๋ฏธ๋ฅผ ์ฃผ๊ธฐ ์œ„ํ•ด์„œ์ธ ๊ฒƒ ๊ฐ™๊ณ .

๊ทธ๋ƒฅ ์ผ๋ฐ˜์ ์ธ ๊ฒ€์ถœ๊ธฐ์˜ ๊ฒฐ๊ณผ๋ฅผ ์ข…ํ•ฉํ•ด์„œ ๋” ๊ฐ•ํ•œ ๊ฒ€์ถœ๊ธฐ๋ฅผ ๋งŒ๋“ ๋‹ค๊ณ  ํ•˜๋ฉด ์•ˆ๋˜๋‚˜? ์•ฝํ•˜๋‹ค๊ณ  ํ•˜๋‹ˆ๊นŒ ์ด์ œ๊นŒ์ง€ ๋ฐฐ์šด ๋ถ„์„ ๋ชจ๋ธ๋“ค์ด ๋‹ค ๊ฒฐํ•จ์ด ์žˆ๋Š” ๋ชจ๋ธ๋กœ ๋ณด์ธ๋‹ค)

 

 

 

1) ์ข…๋ฅ˜

 

1- ๋ณดํŒ… 

๋ณดํŒ…์€ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ์— ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ๋จธ์‹ ๋Ÿฌ๋‹์„ ์‚ฌ์šฉํ•ด ๋„์ถœํ•œ ๊ฒฐ๊ณผ๋“ค์„ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ด๋‹ค.

(tmi์ด์ง€๋งŒ 'ํŒŒ์ด์ฌ ํ•œ ๊ถŒ์œผ๋กœ ๋๋‚ด๊ธฐ' ์ฑ…์„ ๋ณด๋ฉด์„œ ๊ณต๋ถ€ํ•˜๊ณ  ์žˆ๋Š”๋ฐ ๋ฐ๋ฆฐ์ด์ธ ๋‚ด๊ฐ€ ๋ด๋„ ํ‹€๋ฆฐ ๋‚ด์šฉ์ด ์ข…์ข… ๋ณด์ด๊ณ  ์‹ฌํ•œ ๊ฒฝ์šฐ์—๋Š” ๊ฐœ๋…์„ ๋ณต๋ถ™์„ ํ•ด๋†“์€ ๊ฒƒ์ด ๋ณด์—ฌ์„œ ์ข€ ํ‚น๋ฐ›๋Š”๋‹ค...๐Ÿ˜ก ์•™์ƒ๋ธ”์—์„œ๋„ ๋ฐฐ๊น… ๊ฐœ๋… ์„ค๋ช…ํ•˜๋ฉด์„œ ๊ทธ ์•ˆ์— ๋ณดํŒ…์ด ์žˆ๊ณ .. ๋ฐฐ๊น…๊ณผ ๋ณดํŒ…์€ ๋‚ด๊ฐ€ ๋ณด๊ธฐ์—๋Š” ์ „ํ˜€ ๋‹ค๋ฅธ ๊ฐœ๋…์ด๊ณ  ๊ตฌ๊ธ€์— ๊ฒ€์ƒ‰ํ•ด๋„ ๊ฐ๊ฐ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๋ถ„๋ฆฌํ•ด์„œ ์„ค๋ช…ํ•˜๋Š”๋ฐ ์™œ ์ฑ…์—์„œ๋Š” ๋ฌถ์–ด์„œ ์„ค๋ช…ํ•˜์ง€.. ์ž˜ ๋ชจ๋ฅด๊ฒ ๋‹ค)

 

๋ณดํŒ…์—๋Š” ํ•˜๋“œ ๋ณดํŒ…๊ณผ ์†Œํ”„ํŠธ ๋ณดํŒ… ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜๊ฐ€ ์žˆ๋Š”๋ฐ, ํ•˜๋“œ ๋ณดํŒ…์€ ๋„์ถœ๋œ ๊ฒฐ๊ณผ๋“ค์„ ๋‹ค์ˆ˜๊ฒฐ๋กœ ๋”ฐ์ ธ์„œ ๋” ๋งŽ์€ class ๊ฐ’์„ ์ตœ์ข… ๊ฒฐ๊ณผ๊ฐ’์œผ๋กœ ํŒ์ •ํ•œ๋‹ค. ์†Œํ”„ํŠธ ๋ณดํŒ…์€ ๋„์ถœ๋œ ๊ฒฐ๊ณผ์˜ ํ™•๋ฅ ์˜ ํ‰๊ท ์„ ๋”ฐ์ ธ์„œ ๋” ๋†’์€ ํ‰๊ท ์„ ๊ฐ€์ง„ class๊ฐ’์„ ์ตœ์ข… ๊ฒฐ๊ณผ๊ฐ’์œผ๋กœ ํŒ์ •ํ•œ๋‹ค. ๊ฐ๊ฐ predict๊ฐ’๊ณผ predict_proba ๊ฐ’์„ ์‚ฌ์šฉํ•ด ํŒ์ •ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ์ดํ•ด๊ฐ€ ํ›จ์”ฌ ๋” ์‰ฌ์šธ ๊ฒƒ์ด๋‹ค.

 

 

 

2- ๋ฐฐ๊น…

์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹จ์ผํ•œ ๋ถ„์„ ๋ชจ๋ธ์— ์ž…๋ ฅํ•˜์—ฌ ๋„์ถœ๋œ ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•. ๊ธฐ๋ณธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋„์ถœ๋œ ๊ฐ’์œผ๋กœ ์ตœ์ข… class๋ฅผ ํŒ๋‹จํ•  ๋•Œ๋Š” ๋ณดํŒ…์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋ฐฐ๊น… ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€ํ‘œ์ ์ธ ๋ถ„์„ ๋ชจ๋ธ์ด ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์ด๋‹ค.

 

 

 

3- ๋ถ€์ŠคํŒ…

๋ฐฐ๊น…์ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„์„ ๋ชจ๋ธ์„ ๋ณ‘๋ ฌ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ–ˆ๋‹ค๋ฉด ๋ถ€์ŠคํŒ…์€ ์ด ๊ณผ์ •์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ ๋ถ„์„์„ ์ง„ํ–‰ํ•˜๊ณ  ์‹ค์ œ ๊ฐ’๊ณผ ์ฐจ์ด๊ฐ€ ์žˆ๋Š” ๊ฒฐ๊ณผ๋Š” ๋ณด์ •ํ•ด์„œ ๋‘ ๋ฒˆ์งธ ๋ถ„์„์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ด๋Ÿฐ ๊ณผ์ •์œผ๋กœ ์ตœ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ์˜ฌ๋ฆฐ๋‹ค. 

*๋‹ค๋งŒ ๋ถ€์ŠคํŒ…์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ๋  ์šฐ๋ ค๊ฐ€ ์žˆ๋‹ค.

 

4- ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ

๋ฐฐ๊น…๊ณผ ๋ถ€์ŠคํŒ…๋ณด๋‹ค ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ์— ๋” ๋งŽ์€ ๋ฌด์ž‘์œ„์„ฑ์„ ์ฃผ์–ด ๋ชจ๋ธ์— ํ•™์Šตํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•˜๋Š” ๊ธฐ๋ฒ•. ์ˆ˜๋งŽ์€ ์˜์‚ฌ๊ฒฐ์ •ํŠธ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์–ธ๋œป ๋ณด๋ฉด ๋ฐฐ๊น…๊ณผ ์ฐจ์ด๊ฐ€ ์—†์–ด ๋ณด์ธ๋‹ค. ๋ฐฐ๊น…๋„ ๋˜‘๊ฐ™์ด ๋ถ“์ŠคํŠธ๋žฉ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ถ”์ถœํ•˜๊ณ  ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ธฐ๋ณธ๊ฐ’์ด ์˜์‚ฌ๊ฒฐ์ •ํŠธ๋ฆฌ๋กœ ๋˜์–ด์žˆ๋‹ค. ์ฐจ์ด์ ์€ ์ด๋ ‡๋‹ค๊ณ  ํ•œ๋‹ค.

 

*** ๋ฐฐ๊น…๊ณผ ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ์˜ ์ฐจ์ด

๋žœ๋คํฌ๋ ˆ์ŠคํŠธ๋Š” ๋ฐฐ๊น…๊ณผ ๊ฐ™์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์˜์‚ฌ๊ฒฐ์ •ํŠธ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋˜, ํŠธ๋ฆฌ ๋‚ด์—์„œ ๋ถ„ํ• ์ด ๊ณ ๋ ค๋  ๋•Œ๋งˆ๋‹ค ์ „์ฒด์˜ ๋…๋ฆฝ๋ณ€์ˆ˜๋ฅผ ๋ถ„ํ•  ์กฐ๊ฑด์œผ๋กœ ๋ณด์ง€ ์•Š๊ณ  ์„ ํƒ๋œ ์ผ๋ถ€์˜ ๋…๋ฆฝ๋ณ€์ˆ˜๋งŒ์„ ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํ• ํ•œ๋‹ค. 
๋งŒ์•ฝ ๋ฐ์ดํ„ฐ์— ๋งค์šฐ ๊ฐ•ํ•œ ์„ค๋ช…๋ณ€์ˆ˜ a์™€ ๊ทธ๋ ‡์ง€ ์•Š์€ ์ ๋‹นํ•œ ์„ค๋ช…๋ณ€์ˆ˜๋“ค b, c, d...๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜๋ฉด, ๋ฐฐ๊น…์˜ ์˜์‚ฌ๊ฒฐ์ •ํŠธ๋ฆฌ๋“ค์€ ์ดˆ๋ฐ˜์— ๋Œ€๋ถ€๋ถ„ a๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„ํ• ํ•˜๊ฒŒ ๋  ๊ฒƒ์ด๊ณ  ๊ฒฐ๊ตญ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค๋ฅด๋”๋ผ๋„ ๊ฒฐ๊ณผ์น˜๊ฐ€ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜์˜ฌ ๊ฒƒ์ด๋‹ค. ์ด๋Ÿด ๊ฒฝ์šฐ ๊ทธ ๊ฐ’๋“ค์„ ์ข…ํ•ฉํ•˜์—ฌ ๋„์ถœํ•˜๋Š” ๊ฒƒ์€ ๋ถ„์‚ฐ์„ ์ค„์ด๋Š” ๋ฐ์— ๋„์›€์ด ๋˜์ง€ ์•Š๋Š”๋‹ค๊ณ  ํ•œ๋‹ค.
๋ฐ˜๋ฉด์— ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ์˜ ์˜์‚ฌ๊ฒฐ์ •ํŠธ๋ฆฌ๋“ค์€ ๋ถ„ํ• ํ•˜๊ธฐ ์œ„ํ•œ ๊ธฐ์ค€์„ ์ •ํ•˜๊ธฐ ์œ„ํ•œ ๋…๋ฆฝ๋ณ€์ˆ˜ ๊ทธ๋ฃน์ด ์ œ๊ฐ๊ฐ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฐ๊ณผ๊ฐ’์„ ์ข…ํ•ฉํ•  ๋•Œ ๋ณ€๋™์„ฑ์ด ์ ์–ด์ง€๊ณ  ๋” ์•ˆ์ •์ ์ด๊ฒŒ ๋œ๋‹ค. ์ฆ‰ ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ์˜ ๋ถ„ํ•  ๊ธฐ์ค€์ด ๋…๋ฆฝ๋ณ€์ˆ˜ ์ „์ฒด์ผ ๊ฒฝ์šฐ ๋ฐฐ๊น…์˜ ๊ฒฐ๊ณผ๊ฐ’๊ณผ ๊ฐ™๋‹ค. 

 

3) ์ฃผ์š” ์†์„ฑ : ํ•จ์ˆ˜์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ์†์„ฑ๋งŒ ์ •๋ฆฌํ•ด๋ดค๋‹ค

n_estimators : ๋ถ„๋ฅ˜๊ธฐ์˜ ์ˆ˜. ๊ธฐ๋ณธ๊ฐ’์€ 100๊ฐœ์ด๊ณ  ์ฆ๊ฐ€ํ•˜๋ฉด ํ• ์ˆ˜๋ก ๋” ๋งŽ์€ ๋ถ„์„์„ ํ•˜๊ฒŒ ๋˜์–ด์„œ ์ •ํ™•๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค.
base_estimator : (bagging, boosting)์‚ฌ์šฉํ•  ์•Œ๊ณ ๋ฆฌ์ฆ˜. ๊ธฐ๋ณธ๊ฐ’์€ ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด์ด๋‹ค.

feature_importances_ : (boosting)๋…๋ฆฝ๋ณ€์ˆ˜๋ณ„ ์ค‘์š”๋„ ์ถœ๋ ฅ

 

 

 

4) ์ฝ”๋“œ

 

1- ๋ณดํŒ… : ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๋กœ ๋ถ„์„ํ•ด๋ณด์•˜๋‹ค.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

cancer = pd.read_csv('/Users/eve/Downloads/jupyter notebook/files/classification/breast-cancer.csv')


print(cancer.info())


# cancer
plt.figure()
sns.histplot(x='diagnosis', data=cancer, hue=cancer['diagnosis'])

sns.relplot(x='area_mean', y='texture_mean', hue='diagnosis', data=cancer)

๊ฒฐ์ธก์น˜๋Š” ์—†๋Š” ๊ฒƒ ๊ฐ™๋‹ค

 

diagnosis๋ฅผ ์ข…์†๋ณ€์ˆ˜๋กœ ์ง€์ •ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ๋ถ„ํ• (7:3)

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

clf = BaggingClassifier(base_estimator=DecisionTreeClassifier())

pred = clf.fit(train_x, train_y).predict(test_x)

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

clf_report = classification_report(test_y, pred)

cm = confusion_matrix(test_y, pred)

 

๋ฐฐ๊น…๊ณผ ๋ฐฐ๊น…์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ์‚ฌ์šฉํ•  ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด๋ฅผ ํ˜ธ์ถœํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ชจ๋ธ์— ํ›ˆ๋ จ/๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•ด ์˜ˆ์ธก๊ฐ’์„ ๋„์ถœํ•œ๋‹ค.

์ •ํ™•๋„ ๋ฐ ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด classification_report๋ฅผ ํ˜ธ์ถœํ–ˆ๋‹ค. ์ด๊ฑฐ ํ•˜๋‚˜๋ฉด ์ •ํ™•๋„, ์ •๋ฐ€๋„, ์žฌํ˜„์œจ ๋“ฑ๋“ฑ์„ ํ•œ ๋ฒˆ์— ๋‹ค ํ™•์ธํ•  ์ˆ˜ ์žˆ์–ด์„œ ์ผ์ผ์ด ํ˜ธ์ถœํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํ›จ์”ฌ ํŽธํ•˜๋‹ค.

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier

clf = BaggingClassifier(base_estimator=DecisionTreeClassifier())

pred = clf.fit(train_x, train_y).predict(test_x)

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

clf_report = classification_report(test_y, pred)

cm = confusion_matrix(test_y, pred)

ํ—‰.. ๋„ˆ๋ฌด ์ •ํ™•ํ•˜๊ฒŒ ๋‚˜์˜จ๋‹ค

 

์›๋ž˜ ์ฑ…์—์„œ๋Š” ๋ชจ๋“  ๋…๋ฆฝ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  2๊ฐœ๋งŒ ๊ณจ๋ผ์„œ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ, ๋‚˜๋Š” ๋ชจ๋“  ๋ณ€์ˆ˜๋ฅผ ์ž…๋ ฅํ•˜๋‹ˆ๊นŒ ๊ฒฐ๊ณผ๊ฐ€ ๋„ˆ๋ฌด ์ž˜ ๋‚˜์˜จ๋‹ค..... ์ฒ˜์Œ์—๋Š” ์ž˜๋ชปํ•œ ์ค„ ์•Œ์•˜๋Š”๋ฐ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์œผ๋‹ˆ๊นŒ ์ •์ƒ๊ฐ’์œผ๋กœ ๋‚˜์˜ค๋Š” ๊ฑธ ๋ณด๋‹ˆ ์ด ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ๊ฐ€ ์ˆ˜๊ฐ€ ์ ๊ธฐ๋„ ํ•˜๊ณ  ํ’ˆ์งˆ์ด ์ข‹์•„์„œ ๊ทธ๋Ÿฐ ๊ฒƒ ๊ฐ™๋‹ค. ์•„๋‹ˆ๋ฉด ์•™์ƒ๋ธ”์ด ์ด๋ ‡๊ฒŒ ์—„์ฒญ๋‚œ ๋ชจ๋ธ์ด๋ผ๋Š” ๋ฐ˜์ฆ์ธ๊ฐ€?ใ…‹ใ…‹

 

 

out of bag์ด๋ผ๋Š” ๊ฐœ๋…์ด ์žˆ๋‹ค. ๋ฐฐ๊น…์„ ํ•˜๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ/๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์—์„œ ์ƒ˜ํ”Œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฝ‘์„ ๋•Œ ๋ถ“์ŠคํŠธ๋ž˜ํ•‘(bootstraping, ๋‹จ์ˆœ๋žœ๋ค์ถ”์ถœ)์ด๋ผ๋Š” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๋ฉด ์•ฝ 63%์˜ ๋ฐ์ดํ„ฐ๋งŒ ์ƒ˜ํ”Œ๋ง๋˜๊ณ  ๋‚˜๋จธ์ง€ 37%๋Š” ์ฃฝ์€ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋“ค๋งŒ ๋ชจ์•„์„œ ๋”ฐ๋กœ ํ•™์Šตํ•˜๊ฑฐ๋‚˜ ์˜ˆ์ธกํ•˜๋„๋ก ๋ชจ๋ธ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฐฐ๊น…์„ ์„ ์–ธํ•  ๋•Œ oob_score=True๋กœ ์ง€์ •ํ•ด์ฃผ์–ด์•ผ๋งŒ ๊ฐ€๋Šฅํ•˜๋‹ค.

์•„๋ž˜ ์ฝ”๋“œ๋Š” ์œ„์˜ ์œ ๋ฐฉ์•” ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ํ•™์Šต์‹œํ‚ค๊ณ  ์ •ํ™•๋„๋งŒ ์ถœ๋ ฅํ•œ ๊ฒƒ์ด๋‹ค. ์ด๊ฒƒ๋„ ์—ญ์‹œ ์ •ํ™•๋„๋Š” 1๋กœ ๋‚˜์˜จ๋‹ค.

clf_oob = BaggingClassifier(base_estimator=DecisionTreeClassifier(), oob_score=True)

oob = clf_oob.fit(X, y).oob_score_

print(oob) # -> 1.0

 

 

์ด๋ฒˆ์—๋Š” ํšŒ๊ท€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  ๋ถ„์„ํ•ด๋ณด์ž. ๋ถ„๋ฅ˜์™€ ๊ธฐ๋ณธ์ ์ธ ํ‹€์ด ๋Œ€๋ถ€๋ถ„ ์œ ์‚ฌํ•˜๋ฏ€๋กœ ํ•œ ์ฝ”๋“œ๋กœ ์ •๋ฆฌํ•œ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ์ฐจ๋Ÿ‰ ์ŠคํŽ™์— ๋”ฐ๋ฅธ ๊ฐ€๊ฒฉ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

import pandas as pd

carprice = pd.read_csv('/Users/eve/Downloads/jupyter notebook/files/regression/CarPrice_Assignment.csv')


foruse = carprice.select_dtypes(['number'])
X_features = foruse.columns.difference(['car_ID', 'symboling', 'price'])


X = foruse[X_features]
y = carprice['price']


train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7,
                                                   random_state=1)


from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor


reg = BaggingRegressor(base_estimator=DecisionTreeRegressor(), oob_score=True)

pred = reg.fit(train_x, train_y).predict(test_x)

from sklearn.metrics import mean_squared_error, mean_absolute_error

print('mae : {0}'.format(mean_absolute_error(test_y, pred)))
print('mse : {0}'.format(mean_squared_error(test_y, pred)))

 

 

 

2- ๋ถ€์ŠคํŒ…

๋ถ€์ŠคํŒ…์€ ์—์ด๋‹ค๋ถ€์ŠคํŠธ(adaboost)๋ผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋ฐ์ดํ„ฐ๋Š” ๋ฐฐ๊น…์—์„œ์™€ ๊ฐ™์ด ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ๋ถ„์„ํ–ˆ๋‹ค.

import pandas as pd
cancer = pd.read_csv('/Users/eve/Downloads/jupyter notebook/files/classification/breast-cancer.csv')

cancer = cancer.iloc[:, 1:10]
cancer['diagnosis'] = np.where(cancer['diagnosis']=='M', 1, 0)

X = cancer.drop(columns=['diagnosis'])
y = cancer['diagnosis']

train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7,
                                                   random_state=1, stratify=y)


from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score


clf = AdaBoostClassifier(base_estimator=DecisionTreeClassifier())

pred = clf.fit(train_x, train_y).predict(test_x)
pred_proba = clf.fit(train_x, train_y).predict_proba(test_x)[:, 1]

cr = classification_report(test_y, pred)
cm = confusion_matrix(test_y, pred)

print(cr)
print('\n')
print(cm)

๋ฐฐ๊น…๋ณด๋‹ค๋Š” ์‚ด์ง ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

๋ถ€์ŠคํŒ…์—๋Š” ๋ฐฐ๊น…์—๋Š” ์—†๋Š” ๋…๋ฆฝ๋ณ€์ˆ˜๋ณ„ ์ค‘์š”๋„ ๋ฉ”์„œ๋“œ๊ฐ€ ์žˆ๋‹ค.

imp = clf.feature_importances_

fig = plt.figure(figsize=(15, 6))
plt.bar(x=X.columns, height=imp)

concave points๊ฐ€ ๊ฒฐ๊ณผ๊ฐ’์— ๊ฐ€์žฅ ๋งŽ์ด ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

(tmi : concave point๋Š” ์„ธํฌ์— ์›€ํ‘น ํŒจ์ธ ์ž๊ตญ์ด ์žˆ๋Š” ํšŸ์ˆ˜๋ผ๊ณ  ํ•œ๋‹ค)

 

 

 

ํšŒ๊ท€๋„ ๋ฐฐ๊น… ํšŒ๊ท€์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ฐจ๋Ÿ‰์˜ ์ŠคํŽ™ ๋Œ€๋น„ ๊ฐ€๊ฒฉ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์œผ๋ฏ€๋กœ ์ค‘๋ณต๋˜๋Š” ๋ถ€๋ถ„์€ ์ƒ๋žต.

from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

rg = AdaBoostRegressor(base_estimator=None)

pred = rg.fit(train_x, train_y).predict(test_x)

from sklearn.metrics import mean_absolute_error, mean_squared_error

print(round(mean_absolute_error(test_y, pred),2))
print(round(mean_squared_error(test_y, pred), 2))

print('acc:{0}%'.format(round(rg.score(test_x, test_y)*100, 2)))

์—ญ์‹œ ๋ฐฐ๊น…๋ณด๋‹ค๋Š” ์กฐ๊ธˆ ๋” ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

 

 

 

3- ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ

X = cancer.drop(columns=['diagnosis'])
y = cancer['diagnosis']

train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7, random_state=1,
                                                  stratify=y)

from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier(n_estimators=500, min_samples_split=5)

pred = clf.fit(train_x, train_y).predict(test_x)
pred_proba = clf.fit(train_x, train_y).predict_proba(test_x)

clf.score(test_x, test_y)

import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
from sklearn.metrics import classification_report, plot_roc_curve, roc_auc_score

clf_report = classification_report(test_y, pred)
print(clf_report)

fig = plt.figure(figsize=(10, 6))
plot_roc_curve(clf, test_x, test_y)
plt.show()

auc_score = roc_auc_score(test_y, pred_proba[:, 1])
print(auc_score)

imp = clf.feature_importances_
collst = X.columns

fig = plt.figure(figsize=(15, 6))
plt.bar(x=collst, height=imp)
plt.show()

๋ถ„์„ ๊ณผ์ •์€ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋˜‘๊ฐ™๊ณ , ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ์—ฌ๋Ÿฌ๊ฐ€์ง€๋กœ ํ‘œํ˜„ํ•ด ๋ดค๋‹ค. 

roc๊ทธ๋ž˜ํ”„์™€ auc score๋ฅผ ๋ณด๋‹ˆ ๊ฝค ๋†’์€ ์˜ˆ์ธก ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Œ. ๋…๋ฆฝ๋ณ€์ˆ˜ ์ค‘์—๋Š” ์—ญ์‹œ concave points๊ฐ€ ์ œ์ผ ๊ฐ•๋ ฅํ•œ ์˜ํ–ฅ๋„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋‚˜ bagging์ด๋‚˜ boosting ๋ถ„์„์„ ํ•  ๋•Œ์™€ ๋‹ฌ๋ฆฌ ๋‹ค๋ฅธ ๋ณ€์ˆ˜๋“ค๋„ ์˜ํ–ฅ๋ ฅ์ด ์ƒ๊ธด ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์•„๋งˆ ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ์˜ ํŠน์„ฑ์ƒ ๋ถ„ํ•  ๊ธฐ์ค€ ๋ณ€์ˆ˜ ๊ทธ๋ฃน์„ ๊ณ„์† ๋ฐ”๊พธ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ ‡๊ฒŒ ๋‚˜์˜ค๋Š” ๊ฒƒ ๊ฐ™๋‹ค.

 

 

ํšŒ๊ท€ ๋ฐฉ์‹๋„ ๋น„์Šทํ•˜๋‹ค. 

usedata = carprice.select_dtypes('number')

X = usedata.drop(columns=['price'])
y = usedata['price']

train_x, test_x, train_y, test_y = train_test_split(X, y, train_size=0.7, random_state=1)

from sklearn.ensemble import RandomForestRegressor

rg = RandomForestRegressor(n_estimators=500, max_depth=5)

pred = rg.fit(train_x, train_y).predict(test_x)

print('acc score :\t{0}'.format(round(rg.score(test_x, test_y),2)))

from sklearn.metrics import mean_absolute_error, mean_squared_error

print('mae :\t{0}'.format(mean_absolute_error(test_y, pred)))
print('mse :\t{0}'.format(mean_squared_error(test_y, pred)))
print('rmse :\t{0}'.format(np.sqrt(mean_squared_error(test_y, pred))))

 

๊ฒฐ๊ณผ๋Š” ๋ฐฐ๊น…๊ณผ ๋น„์Šทํ•˜๋ฉด ์กฐ๊ธˆ ๋” ํ–ฅ์ƒ๋œ ๊ฒƒ ๊ฐ™๋‹ค. ์ •ํ™•๋„๋Š” ๊ฑฐ์˜ ๋น„์Šทํ•˜์ง€๋งŒ ๋ถ„์‚ฐ์ด ๋งŽ์ด ์ค„์€ ๊ฒƒ ๊ฐ™๋‹ค.

 

 


์ด๋ ‡๊ฒŒ ์•™์ƒ๋ธ” ํŒŒํŠธ๋ฅผ ๋งˆ์นœ๋‹ค.. ์ด๊ฑฐ ์“ฐ๋А๋ผ๊ณ  ๋ฐ˜๋‚˜์ ˆ์€ ์“ด ๊ฒƒ ๊ฐ™์€๋ฐ ๊ทธ๋ž˜๋„ ๋ณต์Šต์„ ํ•˜๋‹ˆ๊นŒ ๋ชจ๋ฅด๋Š” ๋ถ€๋ถ„๋„ ํ•œ๋ฒˆ ๋” ์งš๊ณ  ๋„˜์–ด๊ฐˆ ์ˆ˜ ์žˆ์—ˆ์œผ๋‹ˆ ๋งŒ์กฑ!