๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐“’๐“ช๐“ฝ๐“ฎ๐“ฐ๐“ธ๐“ป๐”‚26

High-Resolution Image Synthesis with Latent Diffusion Models stable diffusion is latent diffusion latent diffusion is open sourced version of diffusion 2024. 2. 9.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) 1. ํŒŒ์ดํ† ์น˜ ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ํ™œ์šฉํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์™€ ์ปดํ“จํ„ฐ๋น„์ „ ์‹ฌ์ธตํ•™์Šต p601~623 2. Shusen Wang - Vision Transformer for Image Classification (์œ ํŠœ๋ธŒ) 3. ๊ณ ๋ ค๋Œ€ํ•™๊ต DSBA - [Paper Review] ViT ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„œ์˜ ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์ปดํ“จํ„ฐ ๋น„์ „์—์„œ๋„ ๋งŽ์€ ์˜ํ–ฅ์„ ์ฃผ์—ˆ๋Š”๋ฐ, ์ด์ „ ์ปดํ“จํ„ฐ๋น„์ „ ๊ด€๋ จ ์—ฐ๊ตฌ๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์— ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์˜ ์…€ํ”„ ์–ดํ…์…˜ ๋ชจ๋“ˆ์„ ์ฐฉ์šฉํ•œ ๋ชจ๋ธ์ด ๋งŽ์•˜์ง€๋งŒ, ViT(Vision Transformer)๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๊ตฌ์กฐ ์ž์ฒด๋ฅผ ์ปดํ“จํ„ฐ๋น„์ „ ๋ถ„์•ผ์— ์ ์šฉํ•œ ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋‹ค. CNN ๋ชจ๋ธ์˜ ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต ๋ฐฉ๋ฒ•์€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด์„œ ์ง€์—ญ ํŠน์ง•์„ ์ถ”์ถœํ–ˆ๋‹ค๋ฉด ViT๋Š” ์…€ํ”„ ์–ดํ…์…˜์„ ์‚ฌ์šฉํ•ด ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•œ.. 2024. 2. 7.
CNN(ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง) 3์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ - ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ ์ด๋ฏธ์ง€๋Š” ์„ธ๋กœ, ๊ฐ€๋กœ, ์ฑ„๋„์˜ 3์ฐจ์› ๋ฐ์ดํ„ฐ์ด๋‹ค ์ด๋ฏธ์ง€์™€ ๊ฐ™์€ 3์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์„ ์‚ดํŽด๋ณด๊ฒ ๋‹ค 2์ฐจ์›์ผ ๋•Œ์™€ ๋น„๊ตํ•˜๋ฉด, ๊ธธ์ด ๋ฐฉํ–ฅ(์ฑ„๋„ ๋ฐฉํ–ฅ)์œผ๋กœ ํŠน์ง• ๋งต์ด ๋Š˜์–ด๋‚ฌ์Šต๋‹ˆ๋‹ค. ์ฑ„๋„์ชฝ์œผ๋กœ ํŠน์ง• ๋งต์ด ์—ฌ๋Ÿฌ ๊ฐœ ์žˆ๋‹ค๋ฉด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์™€ ํ•„ํ„ฐ์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์„ ์ฑ„๋„๋งˆ๋‹ค ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋”ํ•ด์„œ ํ•˜๋‚˜์˜ ์ถœ๋ ฅ์„ ์–ป์Šต๋‹ˆ๋‹ค. 3์ฐจ์›์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์—์„œ ์ฃผ์˜ํ•  ์ ์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ฑ„๋„ ์ˆ˜์™€ ํ•„ํ„ฐ์˜ ์ฑ„๋„ ์ˆ˜๊ฐ€ ๊ฐ™์•„์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•œํŽธ, ํ•„ํ„ฐ ์ž์ฒด์˜ ํฌ๊ธฐ๋Š” ์›ํ•˜๋Š” ๊ฐ’์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จ, ๋ชจ๋“  ์ฑ„๋„์˜ ํ•„ํ„ฐ๊ฐ€ ๊ฐ™์€ ํฌ๊ธฐ์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์˜ˆ์—์„œ๋Š” ํ•„ํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ (3,3)์ด์ง€๋งŒ, ์›ํ•œ๋‹ค๋ฉด (2,2)๋‚˜ (1,1), (5,5) ๋“ฑ์œผ๋กœ ์„ค์ •ํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” ์ถœ๋ ฅ.. 2024. 2. 7.
[ResNet] Deep Residual Learning for Image Recognition https://arxiv.org/pdf/1512.03385.pdf ResNet์ด๋ž€? ๋ ˆ์ฆˆ๋„ท(Residual Network, ResNet)์€ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์ž„. ๋Œ€๊ทœ๋ชจ ์ด๋ฏธ์ง€๋„ท ๋ฐ์ดํ„ฐ์„ธํŠธ ํ•™์Šตํ•จ VGG ๋ชจ๋ธ๊ณผ ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต, ReLU, ํ’€๋ง, ์™„์ „ ์—ฐ๊ฒฐ ๊ณ„์ธต ๋“ฑ์„ ์ด์šฉํ•ด ๊ตฌ์„ฑํ•จ. VGG ๋ชจ๋ธ ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” ๋” ์ž‘์€ ํฌ๊ธฐ์˜ ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œ์ผฐ์ง€๋งŒ, ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ๋กœ ์ธํ•ด ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•จ. ๊ทธ๋ž˜์„œ ๋ ˆ์ฆˆ๋„ท์€ ์ด๋Ÿฌํ•œ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž”์ฐจ ์—ฐ๊ฒฐ(Residual Connection), ํ•ญ๋“ฑ ์‚ฌ์ƒ(Identity Mapping), ์ž”์ฐจ ๋ธ”๋ก(Residual Block)์„ ํ†ตํ•ด ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ๋†’์ž„ ๋ ˆ์ฆˆ๋„ท์€ ๊ณ„์ธต์˜ ์ˆ˜์— ๋”ฐ๋ผ ResNet-18.. 2024. 2. 4.
[์ˆœํ™˜์‹ ๊ฒฝ๋ง] RNN์˜ ๋ฌธ์ œ์  RNN์ด๋ž€? RNN(Recurrent Neural Network)์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๋ชจ๋ธ๋กœ ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ์™€ ๊ฐ™์ด ์ด์ „ ์‹œ๊ฐ(๊ณ„์ธต)์˜ ์ถœ๋ ฅ ๊ฐ’(์€๋‹‰ ๊ฐ’)์ด ๋‹ค์Œ ์‹œ๊ฐ(๊ณ„์ธต)์œผ๋กœ ์ „ํŒŒ๋˜์–ด ์ฆ‰, ๊ณผ๊ฑฐ ์ •๋ณด๋ฅผ ๊ณ„์Šนํ•˜์—ฌ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ๋Œ€์‘ํ•˜๋Š” ์‹ ๊ฒฝ๋ง์ž„. ์€๋‹‰์ธต์—์„œ ๋‚˜์˜จ ๊ฒฐ๊ณผ๊ฐ’์ด ๋‹ค์‹œ ์€๋‹‰์ธต์œผ๋กœ ๋Œ์•„๊ฐ€ ์ƒˆ๋กœ์šด ์ž…๋ ฅ ๊ฐ’๊ณผ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์ˆœํ™˜ ๊ตฌ์กฐ์ž„. ์™„์ „์—ฐ๊ฒฐ์ธต(Fully Connected Layer), CNN(Convolutional Neural Network)์€ ์€๋‹‰์ธต์—์„œ ๋‚˜์˜จ ๊ฒฐ๊ณผ๊ฐ’์ด ์ถœ๋ ฅ์ธต ๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™ํ•˜์ง€๋งŒ, RNN์€ ์€๋‹‰์ธต์œผ๋กœ ๋˜๋Œ์•„๊ฐ€ ์ˆœํ™˜ํ•œ๋‹ค๋Š” ์ ์—์„œ ํฐ ์ฐจ์ด๊ฐ€ ์žˆ์Œ. ์•„๋ž˜ ๊ทธ๋ฆผ์€ ์€๋‹‰์ธต์—์„œ ๊ฒฐ๊ณผ ๊ฐ’์ด ๋‚˜์™€ ๋‹ค์‹œ ์€๋‹‰์ธต์œผ๋กœ ๋˜๋Œ์•„๊ฐ€๋Š” ์ˆœํ™˜ ๊ตฌ์กฐ๋ฅผ ํ˜•์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ ์œ„ ๊ทธ๋ฆผ.. 2024. 2. 3.
Attention is All You Need https://arxiv.org/pdf/1706.03762.pdf ๋ชจ๋ธ ์ƒ์„ฑ ๋ฐฐ๊ฒฝ ๊ธฐ์กด RNN์˜ ๋ฌธ์ œ์  : Long term dependency, Gradient Vanishing, Gradient Exploding Problem Seq2Seq ๋ชจ๋ธ์˜ ๋ฌธ์ œ์  : Context Vector์˜ ์ •๋ณด ์†์‹ค ๋ฌธ์ œ, RNN์— ์˜์กด์  Transformer๋Š” self-attention mechanism์„ ํ†ตํ•ด์„œ ์ด๋ฅผ ํ•ด๊ฒฐํ•จ - Long term dependency์ด ๋ฌด์—‡์ด๊ณ  ์–ด๋–ค ์ด์œ ๋กœ ๋ฐœ์ƒํ•˜๋Š”์ง€, ์–ด๋Š ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š”์ง€, ์ด์— ๋Œ€ํ•œ ํ•ด๊ฒฐ๋ฐฉ์•ˆ์€ ๋ฌด์—‡์ธ์ง€ - Gradient Vanishing, Exploding ๋ฌธ์ œ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€ - Seq2Seq์˜ Context Vector์— ๊ด€ํ•œ ๋ฌธ์ œ ๋ฐ ์ „๋ฐ˜์ ์ธ ํ•œ๊ณ„์  ํŒŒ์•…ํ•˜๊ณ  -.. 2023. 12. 6.