๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
์นดํ…Œ๊ณ ๋ฆฌ ์—†์Œ

[ResNet] Deep Residual Learning for Image Recognition

by coderSohyun 2024. 2. 4.

https://arxiv.org/pdf/1512.03385.pdf

 

ResNet์ด๋ž€?

๋ ˆ์ฆˆ๋„ท(Residual Network, ResNet)์€ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์ž„. ๋Œ€๊ทœ๋ชจ ์ด๋ฏธ์ง€๋„ท ๋ฐ์ดํ„ฐ์„ธํŠธ ํ•™์Šตํ•จ

VGG ๋ชจ๋ธ๊ณผ ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต, ReLU, ํ’€๋ง, ์™„์ „ ์—ฐ๊ฒฐ ๊ณ„์ธต ๋“ฑ์„ ์ด์šฉํ•ด ๊ตฌ์„ฑํ•จ.

VGG ๋ชจ๋ธ ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” ๋” ์ž‘์€ ํฌ๊ธฐ์˜ ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œ์ผฐ์ง€๋งŒ, ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ๋กœ ์ธํ•ด ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•จ.

๊ทธ๋ž˜์„œ ๋ ˆ์ฆˆ๋„ท์€ ์ด๋Ÿฌํ•œ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ

์ž”์ฐจ ์—ฐ๊ฒฐ(Residual Connection), ํ•ญ๋“ฑ ์‚ฌ์ƒ(Identity Mapping), ์ž”์ฐจ ๋ธ”๋ก(Residual Block)์„ ํ†ตํ•ด

๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ๋†’์ž„ 

๋ ˆ์ฆˆ๋„ท์€ ๊ณ„์ธต์˜ ์ˆ˜์— ๋”ฐ๋ผ ResNet-18. 34. 50, 101, 152์˜ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋จ

 

๋ ˆ์ฆˆ๋„ท์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ๋Š”

- ์ž…๋ ฅ์ธต

- ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต

- ๋ฐฐ์น˜ ์ •๊ทœํ™” ๊ณ„์ธต

- ํ™œ์„ฑํ™” ํ•จ์ˆ˜ 

- ์ž”์ฐจ ๋ธ”๋ก

- ํ‰๊ท ๊ฐ’ ํ’€๋ง ๊ณ„์ธต

- ์™„์ „ ์—ฐ๊ฒฐ ๊ณ„์ธต

- ์ถœ๋ ฅ์ธต

์œผ๋กœ ์ด๋ค„์ ธ ์žˆ์Œ 

๋ ˆ์ฆˆ๋„ท์—๋Š” 34, 50, 101, 152๊ฐœ์˜ ๊ณ„์ธต์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋„คํŠธ์›Œํฌ๊ฐ€ ์žˆ์Œ 

๋ชจ๋ธ์€ ์ž”์ฐจ ๋ธ”๋ก์˜ ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ ๊ฒฐ์ •๋จ

 

๋ ˆ์ฆˆ๋„ท์€ ๋‘ ๊ฐœ์˜ ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต๊ณผ ๋‹จ์ถ• ์—ฐ๊ฒฐ๋กœ ์ด๋ค„์ ธ ์žˆ์Œ

๋‹จ์ถ• ์—ฐ๊ฒฐ์€ ์ด์ „ ๊ณ„์ธต์˜ ์ถœ๋ ฅ๊ฐ’์„ ํ˜„์žฌ ๊ณ„์ธต์˜ ์ž…๋ ฅ๊ฐ’๊ณผ ๋”ํ•ด์ฃผ๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„๋จ 

(๊ธฐ์กด ์ˆœ๋ฐฉํ–ฅ ์‹ ๊ฒฝ๋ง ๋ฐฉ์‹์€ ์ด์ „ ๊ณ„์ธต์˜ ์ •๋ณด๊ฐ€ ํ˜„์žฌ ๊ณ„์ธต์—๋งŒ ์˜ํ–ฅ์„๋ผ์นœ ๋ฐ˜๋ฉด,

๋ ˆ์ฆˆ๋„ท์€ ์ด์ „ ๊ณ„์ธต์—์„œ ๋ฐœ์ƒํ•œ ์ •๋ณด๋ฅผ ๋‹ค์Œ ๊ณ„์ธต์— ์ „๋‹ฌํ•จ)  

 

์ด์ „ ๊ณ„์ธต์—์„œ ๋ฐœ์ƒํ•œ ์ •๋ณด๋ฅผ ๊ณ„์† ์ „๋‹ฌํ•œ๋‹ค๋ฉด

๋ชจ๋ธ์ด ๊นŠ์–ด์ง€๋”๋ผ๋„ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๊ณ  ์ •๋ณด๊ฐ€ ์†์‹ค๋˜๋Š” ํ˜„์ƒ์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ์Œ.

์ผ๋ฐ˜์ ์ธ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์€ ํ˜„์žฌ ๊ณ„์ธต์—์„œ ์ •๋ณด๊ฐ€ ์†์‹ค๋˜๋ฉด ๋‹ค์Œ ๊ณ„์ธต์—์„œ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•จ.

 

๋˜ํ•œ ๊ณ„์ธต์ด ๋งŽ์•„์ ธ ๋ชจ๋ธ์ด ๊นŠ์–ด์ง€๋ฉด ๊ธฐ์šธ๊ธฐ๊ฐ€ ์—ญ์ „ํŒŒ ๊ณผ์ •์—์„œ ์ ์ฐจ ์ž‘์•„์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•จ

๋ ˆ์ฆˆ๋„ท์˜ ๋‹จ์ถ• ์—ฐ๊ฒฐ์€ ์ด์ „ ๊ณ„์ธต์˜ ์ถœ๋ ฅ๊ฐ’์„ ํ˜„์žฌ ๊ณ„์ธต์˜ ์ž…๋ ฅ๊ฐ’๊ณผ ๋”ํ•ด

์ด์ „ ๊ณ„์ธต์—์„œ ๋ฐœ์ƒํ•œ ์ •๋ณด๋ฅผ ๊ณ„์† ์ „๋‹ฌํ•จ

์ด๋ ‡๊ฒŒ ๋”ํ•ด์ง„ ๊ธฐ์šธ๊ธฐ๋Š” ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Œ

๋‹จ์ถ• ์—ฐ๊ฒฐ์„ ํ†ตํ•ด ๊นŠ์€ ๋ชจ๋ธ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ 

์ •๋ณด๋ฅผ ์œ ์ง€ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์ด ํŠน์ • ๊ฐ€์ค‘์น˜์— ์ˆ˜๋ ดํ•˜๋Š” ์†๋„๋ฅผ ๋‹จ์ถ•์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ 

 

๊ธฐ์šธ๊ธฐ ์ €ํ•˜ ๋ฌธ์ œ 

๊นŠ์€ ๊ตฌ์กฐ์˜ ๋ชจ๋ธ์„ ์„ค๊ณ„ํ•œ๋‹ค๋ฉด๋” ๋งŽ์€ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์–ด, 

๊ณ„์ธต๋งˆ๋‹ค ๋” ์„ธ๋ฐ€ํ•œ ์ง€์—ญ ํŠน์ง•๊ณผ ์ „์—ญ ํŠน์ง•์„ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

์ด๋Š” ๋ชจ๋ธ์˜ ํ‘œํ˜„๋ ฅ ํ–ฅ์ƒ์œผ๋กœ ์ด์–ด์ง€๋ฏ€๋กœ ๋” ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋จ

 

๋ ˆ์ฆˆ๋„ท์€ ์ด๋Ÿฌํ•œ ์›๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊นŠ์€ ๊ณ„์ธต์„ ์Œ“๋Š” ์‹คํ—˜์„ ์ง„ํ–‰ํ•จ

 

์‹คํ—˜์—์„œ๋Š” 20๊ฐœ์˜ ๊ณ„์ธต๊ณผ 56๊ฐœ์˜ ๊ณ„์ธต์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ ˆ์ฆˆ๋„ท ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด ์ •ํ™•๋„๋ฅผ ์ธก์ •ํ•ด๋ด„

์‹คํ—˜ ๊ฒฐ๊ณผ, ์˜คํžˆ๋ ค 56๊ฐœ์˜ ๊ณ„์ธต์ด ์ •ํ™•๋„๊ฐ€ ๋” ๋‚ฎ๊ฒŒ ๋‚˜์˜ด

-> ๊ธฐ์šธ๊ธฐ ์ €ํ•˜ ๋ฌธ์ œ (Degration problem) : ์ผ์ • ์ˆ˜์ค€ ์ด์ƒ์œผ๋กœ ๊ณ„์ธต์„ ๊นŠ๊ฒŒ ์Œ“์œผ๋ฉด ์˜คํžˆ๋ ค ํ•™์Šต๋˜์ง€ ์•Š๋Š” ํ˜„์ƒ

์ด ๋ฌธ์ œ๋Š” ๊ธฐ์šธ๊ธฐ ํญ์ฃผ๋‚˜ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค์˜ ๋ฌธ์ œ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†์Œ

๋ ˆ์ฆˆ๋„ท์€ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ์‚ฌ์ด์˜ ์ฐจ์ด๋งŒ ํ•™์Šตํ•ด ๊ธฐ์šธ๊ธฐ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•จ 

 

์ž”์ฐจ ํ•™์Šต (Residual Learning)

ํ•ญ๋“ฑ ์‚ฌ์ƒ (Identity Mapping)์˜ ๋“ฑ์žฅ : ๊ธฐ์šธ๊ธฐ ์ €ํ•˜์˜ ์›์ธ์„ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ํ•ญ๋“ฑ ์‚ฌ์ƒ ์‹คํ—˜์„ ์ง„ํ–‰ํ•จ. 

๊ณ„์ธต์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ํ•™์Šต์ด ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์— ์–•์€ ๋ชจ๋ธ์„ ๋จผ์ € ํ•™์Šตํ•œ ํ›„

ํ•ญ๋“ฑ ์‚ฌ์ƒ์œผ๋กœ ์ดˆ๊ธฐํ™”๋œ ๊ณ„์ธต์„ ์ถ”๊ฐ€ํ•ด ๋ชจ๋ธ์„ ๊นŠ๊ฒŒ ๊ตฌ์„ฑํ•จ. 

 

์ด๋ฏธ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ทธ๋Œ€๋กœ ์ถœ๋ ฅํ•˜๋Š” ๊ตฌ์กฐ์ด๋ฏ€๋กœ 

์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•˜์ง€ ์•Š์„ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜์ง€๋งŒ 

์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์„ ๋™์ผํ•˜๊ฒŒ ์ฃผ์—ˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ธฐ์šธ๊ธฐ ์ €ํ•˜ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•จ

์ด๋ฅผ ํ†ตํ•ด ๋‹จ์ˆœํžˆ ๊ณ„์ธต์„ ๊นŠ๊ฒŒ๋งŒ ๊ตฌ์„ฑํ•˜๋”๋ผ๋„ ๊ธฐ์šธ๊ธฐ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ํŒŒ์•…ํ•จ

 

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ ˆ์ฆˆ๋„ท์—์„œ๋Š” ์ž”์ฐจ ํ•™์Šต (Residual Learning) ๊ธฐ๋ฒ•์„ ์ ์šฉํ•จ

์ž”์ฐจ ํ•™์Šต์ด๋ž€ ๋ชจ๋ธ์ด ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ์‚ฌ์ด์˜ ์ฐจ์ด(Residual)๋งŒ ํ•™์Šตํ•˜๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž„.  

 

๊ธฐ์กด ์ธ๊ณต ์‹ ๊ฒฝ๋ง์€ ์ด์ „ ๊ณ„์ธต์—์„œ ํ™œ์„ฑํ™”๋œ ๊ฐ’์„ ๋‹ค์Œ ๊ณ„์ธต์œผ๋กœ ์ „๋‹ฌํ•œ๋‹ค 

์ด ๋ฐฉ๋ฒ•์€ H(x) ๊ฐ’์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค

 

๊ทธ๋Ÿฌ๋‚˜ ๊ณ„์ธต์ด ๊นŠ์–ด์งˆ์ˆ˜๋ก ๊ธฐ์šธ๊ธฐ ์ €ํ•˜ ๋ฌธ์ œ๋กœ ์ธํ•ด H(x)๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์–ด๋ ค์›Œ์ง„๋‹ค. 

 

๊ทธ๋Ÿฌ๋ฏ€๋กœ ๋ ˆ์ฆˆ๋„ท์—์„œ๋Š” H(x)๋ฅผ F(x)+x๋กœ ๋ณ€๊ฒฝํ•œ๋‹ค

์ด ๊ตฌ์กฐ๋ฅผ ๋นŒ๋”ฉ ๋ธ”๋ก(Building Block)์ด๋ผ ํ•œ๋‹ค

์ด ๊ตฌ์กฐ์—์„œ๋Š” x๋Š” ํ•ญ๋“ฑ ์‚ฌ์ƒ์ด๋ฏ€๋กœ ์ด์ „ ๊ณ„์ธต์—์„œ ํ•™์Šต๋œ ๊ฒฐ๊ณผ๋ฅผ ๊ทธ๋Œ€๋กœ ๊ฐ€์ ธ์˜จ๋‹ค 

๊ทธ๋Ÿฌ๋ฏ€๋กœ x๋Š” ์ด๋ฏธ ์ •ํ•ด์ง„ ๊ณ ์ •๊ฐ’์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Œ

 

ํ•˜์ง€๋งŒ ๋ ˆ์ฆˆ๋„ท์€ ์ž”์ฐจ ์—ฐ๊ฒฐ์„ ํ†ตํ•ด ์ž…๋ ฅ๊ฐ’ x๊ฐ€ ์ถœ๋ ฅ๊ฐ’์— ๋”ํ•ด์ ธ

์ด์ „ ๊ณ„์ธต์—์„œ ํ•™์Šต๋œ ์ •๋ณด๊ฐ€ ๋ณด์กด๋˜๊ณ  

์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Œ

 

์ด๋ฅผ ํ†ตํ•ด ๋ ˆ์ฆˆ๋„ท์€ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์˜ ์ฐจ์ด๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ 

ํ•™์Šต ๋Šฅ๋ ฅ์ด ํ–ฅ์ƒ๋œ๋‹ค 

 

์ž”์ฐจ ์—ฐ๊ฒฐ 

์ž”์ฐจ ์—ฐ๊ฒฐ(Residual Connection)์ด๋ž€ ์Šคํ‚ต ์—ฐ๊ฒฐ(Skip Connection), ๋‹จ์ถ• ์—ฐ๊ฒฐ(Shortcut Connection)์ด๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ 

์ž…๋ ฅ๊ฐ’์ด ์‹ ๊ฒฝ๋ง ๊ณ„์ธต์„ ํ†ต๊ณผํ•œ ํ›„ ์ถœ๋ ฅ๊ฐ’์— ๋”ํ•ด์ง€๋Š” ์—ฐ๊ฒฐ์„ ์˜๋ฏธํ•จ 

 

์ผ๋ฐ˜์ ์ธ ๋”ฅ๋Ÿฌ๋‹ ์‹ ๊ฒฝ๋ง์—์„œ๋Š” ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์„ ์ง์ ‘ ์—ฐ๊ฒฐํ•˜์—ฌ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•จ

์ด ๊ฒฝ์šฐ ๋„คํŠธ์›Œํฌ๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ์ž…์ถœ๋ ฅ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์–ด์ ธ ์ •๋ณด์˜ ์†์‹ค ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„์ง 

 

๋ถ‰์€์ƒ‰ ๊ณก์„ ์ด ์ž”์ฐจ ์—ฐ๊ฒฐ์„ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ์ž„

์ด ์—ฐ๊ฒฐ์„ ํ†ตํ•ด ์ž…๋ ฅ๊ฐ’๊ณผ ์ถœ๋ ฅ๊ฐ’ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ์ค„์–ด๋“ค์–ด ํ•™์Šต์ด ์ˆ˜์›”ํ•ด์ง

์ •๋ณด์˜ ์†์‹ค์ด ์ค„์–ด๋“ค์–ด ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ

์ž”์ฐจ ์—ฐ๊ฒฐ ์ˆ˜์‹

๋ ˆ์ฆˆ๋„ท์—์„œ ์ž”์ฐจ ์—ฐ๊ฒฐ์€ ๋ง์…ˆ ์—ฐ์‚ฐ์œผ๋กœ ๋งŒ๋“ค์–ด์ง

๊ทธ๋Ÿฌ๋ฏ€๋กœ ๋‹ค์Œ ๊ณ„์ธต์—์„œ F(x)+x์™€ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ด

x๋Š” ์ด์ „ ๊ณ„์ธต์˜ ์ถœ๋ ฅ๊ฐ’์ด๋ฉฐ

Wi๋Š” ํ˜„์žฌ ๊ณ„์ธต์„ ์˜๋ฏธ

์ด๋•Œ F์˜ ์ถœ๋ ฅ๊ฐ’๊ณผ x์˜ ์ฐจ์›์ด ๋™์ผํ•˜๋‹ค๋ฉด ๋ง์…ˆ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•จ

F์˜ ์ถœ๋ ฅ๊ฐ’๊ณผ x์˜ ์ฐจ์›์ด ๋™์ผํ•˜์ง€ ์•Š์œผ๋ฉด 8.2 ์™€ ๊ฐ™์ด ์ฒ˜๋ฆฌํ•จ

Ws๋Š” F์˜ ์ถœ๋ ฅ๊ฐ’์˜ ์ฐจ์›์„ ๋งž์ถ”๊ธฐ ์œ„ํ•ด x์— ์ ์šฉํ•˜๋Š” ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์ž„ 

์•„๋ž˜ ๊ณ„์‚ฐ ๋ถ€๋ถ„ ์ดํ•ด ์•ˆ ๊ฐ 

 

๋ ˆ์ฆˆ๋„ท์€ ์•ž์„  ๊ทธ๋ฆผ 8.6์ฒ˜๋Ÿผ ๊ธฐ๋ณธ์ ์œผ๋กœ 2๊ฐœ์˜ ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต์ด ์—ฐ๊ฒฐ๋˜์–ด ๋นŒ๋”ฉ ๋ธ”๋ก์„ ๊ตฌ์„ฑํ•จ 

ํ•˜์ง€๋งŒ ๋” ๊นŠ์€ ๊ตฌ์กฐ๋กœ ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜๋ฉด ์—ฐ์‚ฐ๋Ÿ‰์ด ๋Š˜์–ด๋‚˜ ํ•™์Šต์— ์–ด๋ ค์›€์„ ๊ฒช์Œ

~ (CNN๋ž€ ๋ฐ ๊ตฌ์กฐ ๋ฐ ์ž‘๋™ ์›๋ฆฌ ๋‹ค์‹œ ์ •๋ฆฌํ•˜๊ณ  ๋Œ์•„์˜ค๊ธฐ)

 

๋ชจ๋ธ ๊ตฌํ˜„ 

 

class BasicBlock(nn.Module) : BasicBlock์ด๋ผ๋Š” ํด๋ž˜์Šค๋ฅผ ์ •์˜ํ•จ

์ด ํด๋ž˜์Šค๋Š” nn.Module์—์„œ ์ƒ์†๋จ. 

PyTorch์—์„œ๋Š” ์‚ฌ์šฉ์ž ์ง€์ • ์‹ ๊ฒฝ๋ง ๋ ˆ์ด์–ด๋‚˜ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด nn.Module์„ ์ƒ์†ํ•ด์•ผ ํ•จ

 

expansion = 1 

ResNet๊ณผ ๊ฐ™์€ ์•„ํ‚คํ…์ฒ˜์—์„œ expansion์ด๋ผ๋Š” ์šฉ์–ด๋Š” ์ž…๋ ฅ๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ ์ฑ„๋„ ์ˆ˜๊ฐ€ ์–ผ๋งˆ๋‚˜ ์ฆ๊ฐ€ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋ƒ„. 

์—ฌ๊ธฐ์„œ expansion์ด 1๋กœ ์„ค์ •๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์ž…๋ ฅ๊ณผ ๋™์ผํ•œ ์ฑ„๋„ ์ˆ˜๋ฅผ ์œ ์ง€ํ•จ์„ ์˜๋ฏธํ•จ

 

def __init__(self, inplanes, planes, stride=1) : ์ด๋Š” BasicBlock ํด๋ž˜์Šค์˜ ์ƒ์„ฑ์ž ๋ฉ”์„œ๋“œ์ž„. ํด๋ž˜์Šค์˜ ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•จ 

- inplanes๋Š” ์ž…๋ ฅ ์ฑ„๋„์˜ ์ˆ˜

- planes๋Š” ์ถœ๋ ฅ ์ฑ„๋„์˜ ์ˆ˜ (ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์–ด์ด์—์„œ์˜ ํ•„ํ„ฐ ๋˜๋Š” ์ปค๋„ ์ˆ˜)

- stride๋Š” ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ณดํญ์ž„. ๊ธฐ๋ณธ๊ฐ’์€ 1๋กœ ์„ค์ •ํ•จ

 

super().__init__() : ๋ถ€๋ชจ ํด๋ž˜์Šค์˜ ์ƒ์„ฑ์ž (nn.Module)๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ BasicBlock ํด๋ž˜์Šค๋ฅผ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ดˆ๊ธฐํ™”ํ•จ 

 

์ฝ”๋“œ๋ฅผ ๋ณด๋‹ค๋ณด๋‹ˆ ์ œ๋Œ€๋กœ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ResNet์˜ ๊ตฌ์กฐ์˜ ๊ณ„์ธต๋“ค์— ๋Œ€ํ•ด์„œ ์•Œ ํ•„์š”๊ฐ€ ์žˆ์Œ

์ด๋Š” ๊ธฐ๋ณธ์ ์ธ ๋”ฅ๋Ÿฌ๋‹ CNN ํŒŒํŠธ ๋‚ด์šฉ๋“ค์ž„ 

 

๋‹ค์‹œ ํ•œ๋ฒˆ ๋ด๋ณด๊ฒ ์Œ 

 

 

๋…ผ๋ฌธ ๋‚ด์šฉ 

Abstract

์‹ ๊ฒฝ๋ง์€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ํ•™์Šต์‹œํ‚ค๋Š”๊ฒŒ ์–ด๋ ต๋‹ค. ๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ž”์ฐจ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ž”์ฐจ ํ•™์Šต์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ๋ ˆํผ๋Ÿฐ์Šค๊ฐ€ ์กด์žฌํ•˜๋Š” ์ธํ’‹์„ ๋ ˆ์ด์–ด์˜ ์ž”์ฐจ ํ•™์Šต ํ•จ์ˆ˜์— ์ง‘์–ด๋„ฃ์Œ์œผ๋กœ์จ ๊ณ„์ธต์„ ์žฌ๊ตฌ์„ฑํ•˜๋ดค๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์‹คํ—˜์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด ์ž”์ฐจ ๋„คํŠธ์›Œํฌ๊ฐ€ ์ •๊ทœํ™”ํ•˜๊ธฐ ๋” ์‰ฝ๊ณ , ๋” ๊นŠ์€ ๊นŠ์ด์—์„œ ์ƒ๋Œ€์ ์œผ๋กœ ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. - ์ดํ•˜ ์‹คํ—˜ ๊ฒฐ๊ณผ ํ†ตํ•œ ์ž๋ž‘ ์ƒ๋žต..ใ…Ž -

 

Introduction

๊นŠ์€ ์‹ ๊ฒฝ๋ง์€ image classification์ด๋‚˜ ์‚ฌ์†Œํ•œ visual recognition ์ผ๋“ค์„ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ๋„์›€์„ ์ฃผ๊ณ  ์žˆ๋‹ค. 

 

Deep networks naturally integrate low/mid/higher level features and classifiers in an end-to-end multilayer fashion, and the "levels" of features can be enriched by the number of stacked layers(depth)

 

๋ฌธ์ œ ์ œ๊ธฐ : ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋ ˆ์ด์–ด๋ฅผ ๊นŠ์ด ์Œ“์œผ๋ฉด ํ•ญ์ƒ ์„ฑ๋Šฅ์ด ์ข‹์•„์ง€๋Š”๊ฑธ๊นŒ?

 

ํ•˜์ง€๋งŒ ์ด ์งˆ๋ฌธ์„ ๋‹ตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” gradient vanishing/exploding ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ด์•ผํ–ˆ๋‹ค. 

๋‹คํ–‰์ด๋„ ์ด ๋ฌธ์ œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค๋กœ ๊ฐœ์„ ๋˜์–ด์™”๋‹ค. 

(by normalized initialization and intermediate normalization layers, which enable networks with tens of layers to start converging for stochastic gradient descent(SGD) with back-propagation.)

 

์œ„์˜ ๋ฌธ์ œ ์ œ๊ธฐ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ณ„์† ์–ธ๊ธ‰๋˜๋Š” ๋ฌธ์ œ๋Š” Degradation problem์ด๋‹ค.

์ด๋Š” ์˜ค๋ฒ„ํ”ผํŒ… ๋ฌธ์ œ๋Š” ์•„๋‹ˆ๋‹ค. 

์™œ๋ƒํ•˜๋ฉด ์˜ค๋ฒ„ํ”ผํŒ… ๋ฌธ์ œ์ผ ๊ฒฝ์šฐ, train accuracy๋Š” ๋†’๊ณ  test accuracy๋Š” ๋‚ฎ์•„์•ผํ•˜๋Š”๋ฐ, 

์ด ๊ฒฝ์šฐ์—๋Š” ๋‘ accuracy๊ฐ€ ๋ชจ๋‘ ๋‚ฎ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. 

๋˜ํ•œ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ์ƒ ๋ ˆ์ด์–ด๋ฅผ ๊นŠ์ด ์Œ“์•˜์„ ๋•Œ ์ตœ์ ํ™”๊ฐ€ ์ž˜ ์•ˆ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

Degradation(of training accuracy)๋ฅผ ํ†ตํ•ด์„œ ์šฐ๋ฆฌ๋Š” ๋ชจ๋“  ์‹œ์Šคํ…œ์ด ๋ชจ๋‘ Optimizeํ•˜๊ธฐ ์‰ฌ์šด ๊ฒƒ์ด ์•„๋‹ˆ๊ตฌ๋‚˜๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 

(์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ด degradation ๋ฌธ์ œ๋ฅผ ๋” ๊นŠ์€ ๋ ˆ์ด์–ด๊ฐ€ ์Œ“์ผ ์ˆ˜๋ก optimize๊ฐ€ ๋ณต์žกํ•ด์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ผ์–ด๋‚˜๋Š” ๋ถ€์ž‘์šฉ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๊ณ  ์ด๋ฅผ ํ•ด๊ฒฐํ•ด๋ณด๊ณ ์ž ํ•˜๋Š” ๊ฒƒ์ด๋‹ค) 

์ด๋ฅผ ์‹คํ—˜ํ•˜๊ธฐ ์œ„ํ•ด ์–•์€ ์•„ํ‚คํ…์ฒ˜์™€ ๋” ๊นŠ์€ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋น„๊ตํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค. 

์—ฌ๊ธฐ์„œ ๋” ๊นŠ์€ ์•„ํ‚คํ…์ฒ˜๋Š” ์–•์€ ์•„ํ‚คํ…์ฒ˜์— identity mapping์„ ํ†ตํ•ด ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•œ ๊ฒƒ์ด๋‹ค. 

identity mapping์„ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์–•์€ ์•„ํ‚คํ…์ฒ˜์™€ ๊ฐ™์€ training error๋ฅผ ๋ณด์—ฌ์•ผํ•œ๋‹ค๊ณ  ์ƒ๊ฐ๋˜์ง€๋งŒ 

์ด๋Š” ๊ทธ๋‹ฅ ์ข‹์€ ํ•ด๊ฒฐ์ฑ…์€ ์•„๋‹ˆ์˜€๋‹ค. 

 

๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” deep residual learning framework๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค. 

Instead of hoping each few stacked layers directly fit a desired underlying mapping,

we explicitly let these layers fit a residual mapping. 

 

Formally, denoting the desired underlying mapping as H(x), we let the stacked nonlinear layers fit another mapping of F(x) := H(x)−x. The original mapping is recast into F(x)+x. We hypothesize that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. To the extreme, if an identity mapping were optimal, it would be easier to push the residual to zero than to fit an identity mapping by a stack of nonlinear layers.

 

๊ณต์‹์ ์œผ๋กœ ์›ํ•˜๋Š” ๊ธฐ๋ณธ ๋งคํ•‘์„ H(x)๋กœ ํ‘œ์‹œํ•˜๊ณ , ์Šคํƒ๋œ ๋น„์„ ํ˜• ๋ ˆ์ด์–ด๋ฅผ F(x) := H(x)-x์˜ ๋‹ค๋ฅธ ๋งคํ•‘์— ๋งž์ถ”๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์›๋ž˜ ๋งคํ•‘์€ F(x)+x๋กœ ๋‹ค์‹œ ์บ์ŠคํŒ…๋ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ฐธ์กฐ๋˜์ง€ ์•Š์€ ์›๋ณธ ๋งคํ•‘์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ž”์—ฌ ๋งคํ•‘์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋” ์‰ฝ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทน๋‹จ์ ์œผ๋กœ ๋งํ•˜์ž๋ฉด, ์•„์ด๋ดํ‹ฐํ‹ฐ ๋งคํ•‘์ด ์ตœ์ ์ด๋ผ๋ฉด ๋น„์„ ํ˜• ๋ ˆ์ด์–ด ์Šคํƒ์œผ๋กœ ์•„์ด๋ดํ‹ฐํ‹ฐ ๋งคํ•‘์„ ๋งž์ถ”๋Š” ๊ฒƒ๋ณด๋‹ค ์ž”์—ฌ๋ฅผ 0์œผ๋กœ ๋ฐ€์–ด๋ถ™์ด๋Š” ๊ฒƒ์ด ๋” ์‰ฌ์šธ ๊ฒƒ์ž…๋‹ˆ๋‹ค. 

-> ํ•˜๋Š” ์ด์œ ๋Š” ์ตœ์ ํ™”๋ฅผ ๋” ์‰ฝ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ 

Function์— ๋Œ€ํ•ด์„œ ๋งž์ถ”๋Š” ๊ฒƒ๋ณด๋‹ค 0์ด๋ผ๋Š” ์ˆซ์ž ๊ฐœ๋…์œผ๋กœ ์ž”์ฐจ๋ฅผ ์ˆ˜๋ ดํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ๋” ์‰ฌ์šธ ๊ฒƒ์ž„. why?

 

feedforward neural network

 

F(x)+x๋Š” Shortcut Connection๊ณผ ๋™์ผํ•œ๋ฐ 

์ด๋Š” ํ•˜๋‚˜ ๋˜๋Š” ์ด์ƒ์˜ ๋ ˆ์ด์–ด๋ฅผ skipํ•˜๊ฒŒ ํ•ด์คŒ. (x๋ฅผ ๋”ํ•ด์คŒ) 

์ฆ‰ ์—ฌ๊ธฐ์„œ๋Š” identity mapping์œผ๋กœ shortcut connection์ด ๋˜๊ฒŒ ํ•˜๋ฉด์„œ skip์„ ๋งŒ๋“ฌ 

 

Identity Short Connection์€ ์ถ”๊ฐ€์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋„ ํ•„์š”ํ•˜์ง€ ์•Š๊ณ  

๋ณต์žกํ•œ ๊ณฑ์…ˆ ์—ฐ์‚ฐ๋„ ํ•„์š”ํ•˜์ง€ ์•Š๋Š” ์žฅ์ ์ด ์žˆ์Œ 

 

๊ทธ๋ž˜์„œ ์ด์ œ๋ถ€ํ„ฐ ์‹คํ—˜์ ์ธ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด degradation ๋ฌธ์ œ๋ฅผ ๋ณด์ด๊ณ  

์ด ๋…ผ๋ฌธ์˜ ๋ฐฉ๋ฒ•์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด ๋‚˜์˜ด

๋ชฉํ‘œ๋Š” 2๊ฐ€์ง€ 

1) plain net๊ณผ ๋‹ค๋ฅด๊ฒŒ residual net์€ ๋” ์‰ฝ๊ฒŒ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์žˆ์Œ 

2) residual net์ด ๋” ์‰ฝ๊ฒŒ ์ •ํ™•๋„๋ฅผ ๋†’์ธ ๊ฒƒ์œผ๋กœ ๋ณด์ž„ 

 

 

 

 

Related Work

Residual Representation

 

Shortcut Connections 

 

Deep Residual Learning

Residual Learning

 

Identity Mapping by Shortcut 

 

Network Architectrues

 

Implementation

 

https://ganghee-lee.tistory.com/41

https://tobigs.gitbook.io/tobigs/deep-learning/computer-vision/resnet-deep-residual-learning-for-image-recognition

https://velog.io/@osk3441/%EB%94%A5%EB%9F%AC%EB%8B%9D-%EB%85%BC%EB%AC%B8%EC%9D%98-%EA%B5%AC%EC%A1%B0%EB%A5%BC-%EC%9D%B4%ED%95%B4%ED%95%98%EC%9E%90ResNet

 

https://www.youtube.com/watch?v=vRtM4K8e_Q4

https://www.youtube.com/watch?v=GWt6Fu05voI

https://www.youtube.com/watch?v=o_3mboe1jYI&t=86s

https://www.youtube.com/watch?v=iadEDPoEME8

https://www.youtube.com/watch?v=Hi9WasDwU2U

https://www.youtube.com/watch?v=c9aEBKYnkbo

https://www.youtube.com/watch?v=671BsKl8d0E&t=373s

https://www.youtube.com/watch?v=Fh3vxJNoREA