๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

์ „์ฒด ๊ธ€26

Blog Update ํ•ญ์ƒ ์•Œ์ฐฌ ๋งˆ์Œ์œผ๋กœ ๊ธ€์“ฐ๊ธฐ๋ฅผ ์‹œ์ž‘ํ–ˆ๋‹ค๊ฐ€ ๋งˆ๋ฌด๋ฆฌํ•˜์ง€ ๋ชปํ•˜๊ณ  ํ์ง€๋ถ€์ง€ ๋๋‚ธ ๊ธ€๋“ค์ด ๋Œ€๋‹ค์ˆ˜์ธ ๊ฒƒ ๊ฐ™๋‹ค.. ์ข…๊ฐ•ํ•˜๊ณ  ๋‚˜์„œ๋Š” ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ๋“ค์„ ๋ธ”๋กœ๊ทธ์— ์ œ๋Œ€๋กœ ์ •๋ฆฌํ•ด๋†”์„œํ•„์š”ํ•  ๋•Œ ๊ธˆ๋ฐฉ ๋ฆฌ๋งˆ์ธ๋“œ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด๋†”์•ผ๊ฒ ๋‹ค..:)  ๊ทธ๋Ÿฐ ์˜๋ฏธ์—์„œ.. ๋‚˜ ์–ผ๋  ์ข…๊ฐ•์‹œ์ผœ์ค˜ใ… ใ… ใ… ใ…  2024. 6. 14.
[Overview] Image Formation Image Formation : Projection of 3D scene onto 2D plane : scene๊ณผ image๊ฐ„์˜ geometric and photometric relation์— ๋Œ€ํ•ด์„œ ์ดํ•ดํ•  ํ•„์š”๊ฐ€ ์žˆ์Œ - geometric : scene์˜ ํ•œ point๊ฐ€ ์žˆ์„ ๋•Œ, image์— ์–ด๋–ป๊ฒŒ ํ‘œํ˜„๋˜๋Š”์ง€์˜ ๊ด€์  - photometric : scene์˜ brightness์™€ apearance๊ฐ€ image์—์„œ๋Š” ์–ด๋–ป๊ฒŒ ํ‘œํ˜„๋˜๋Š”์ง€์˜ ๊ด€์  Topics : (1) Pinhole and Perspective Projection - ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด ๋˜๋Š”, ์—ญ์‚ฌ๊ฐ€ ๊ธด pinhole ์นด๋ฉ”๋ผ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ฒ ์Œ - ๋ฌผ๋ก  ์žฅ์ ๋„ ๋งŽ์€ ์นด๋ฉ”๋ผ์ด์ง€๋งŒ (can produce great clarity) ๋น›์„ ๋ชจ์œผ๋Š”๋ฐ ๋ฌธ์ œ๊ฐ€ .. 2024. 4. 6.
[๋…ผ๋ฌธ ๋ฐœํ‘œ] NeRF : Representing Scenes as Neural Radiance Fields for View Synthesis ์ด๋ฒˆ์— NeRF ๋…ผ๋ฌธ์„ ์ฝ์—ˆ๋Š”๋ฐ์š”, ๊ฐ„๋žตํ•˜๊ฒŒ NeRF์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•œ ๋‹ค์Œ์— ๊ตฌ์ฒด์ ์œผ๋กœ NeRF์˜ ์•„ํ‚คํ…์ฒ˜์™€ ๊ตฌํ˜„ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ ๋ฐœํ‘œํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. NeRF๋Š” Neural Radiance Field์˜ ์•ฝ์ž์ž…๋‹ˆ๋‹ค. ์ œ๋ชฉ์—์„œ๋Š” View Synthesis๋ฅผ ํ•˜๊ธฐ ์œ„ํ•ด์„œ NeRF๋ฅผ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•œ๋‹ค๋Š”๋ฐ์š”, ์—ฌ๊ธฐ์„œ view synthesis๋ผ๋Š” ๊ฒƒ์€ ์—ฌ๋Ÿฌ view์—์„œ ์ฐ์€ ์–ด๋–ค ๊ฐ์ฒด์˜ ์‚ฌ์ง„์„ ํ•™์Šต ์‹œ์ผฐ์„ ๋•Œ ๊ฐ์ฒด๋ฅผ ์ƒˆ๋กœ์šด view์—์„œ ๋ฐ”๋ผ๋ณด์•˜์„ ๋•Œ์˜ ๋ชจ์Šต์„ ์•Œ์•„๋‚ด๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค. ์•ž์„œ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด view synthesis ์ž‘์—…์ด ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š๊ฒŒ ๋‚˜์˜ค๊ฑฐ๋‚˜ ๋งŽ์€ ๋ฐ์ดํ„ฐ์…‹์„ ์š”๊ตฌํ•ด ๋„ˆ๋ฌด ๋งŽ์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์š”ํ•˜๋Š” ์–ด๋ ค์›€์ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ์•ˆ์ธ "NeRF"๋ฅผ ์ œ์‹œํ•œ๊ฑด๋ฐ.. 2024. 3. 13.
Lecture 2. Image Classification Image Classification What is Image Classification? Example : Input : ๊ณ ์–‘์ด ์‚ฌ์ง„ ์ปดํ“จํ„ฐ๋Š” ์‚ฌ์ „์— ์ •ํ•ด์ง„ label๋“ค์˜ ์ง‘ํ•ฉ์„(predetermined set of labels) ๊ฐ€์ง€๊ณ , input๊ฐ’๊ณผ ์ผ์น˜ํ•˜๋Š” label๊ฐ’์„ output์œผ๋กœ ์ถœ๋ ฅํ•˜๋„๋ก ๊ณ„์‚ฐํ•œ๋‹ค. Output : Cat Semantic Gap (์˜๋ฏธ์  ์ฐจ์ด) ์ •์˜ : ์‹ค์ œ ์ด๋ฏธ์ง€๊ฐ€ ๊ฐ–๊ณ  ์žˆ๋Š” ์˜๋ฏธ์™€ ์ปดํ“จํ„ฐ๊ฐ€ ๋ณด๋Š” ํ”ฝ์…€๊ฐ’ ์˜๋ฏธ์˜ ์ฐจ์ด ์šฐ๋ฆฌ๋Š” ์‰ฝ๊ฒŒ ๊ณ ์–‘์ด๋ฅผ ๋ณด๊ณ  "๊ณ ์–‘์ด"์ž„์„ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์ปดํ“จํ„ฐ์˜ ๊ฒฝ์šฐ์—๋Š” ํ•˜๋‚˜์˜ image๊ฐ€ ๊ฑฐ๋Œ€ํ•œ ์ˆซ์ž ๊ทธ๋ฆฌ๋“œ(gigantic grid of numbers)๋กœ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ณ ์–‘์ด๋ฅผ ๋ฐ”๋กœ ์—ฐ์ƒํ•  ์ˆ˜ ์—†๋Š” ๊ฒƒ์ด๋‹ค. Challenges : Viewpoin.. 2024. 2. 16.
Lecture 13. Generative Models Overview - Unsupervised Learning - Generative Models PixelRNN and PixelCNN Variational Autoencoders (VAE) Generative Adversarial Networks (GAN) Classification : Input : Image Output : Text (Label) Object Detection : Input : Image Output : Bounding Boxes of instances Semantic Segmentation (having label for every pixel) : ? Image Captioning : Input : Image Output : Caption (form of natural languag.. 2024. 2. 13.
[๋…ผ๋ฌธ ์Šคํ„ฐ๋””] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) ๋…ผ๋ฌธ ์ƒ์„ฑ ๋ฐฐ๊ฒฝ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ์—์„œ๋Š” ์ด์ œ RNN์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  transformer๊ฐ€ NLP์˜ ํ‘œ์ค€์ด๋ผ๊ณ  ํ•  ์ •๋„๋กœ ์ž๋ฆฌ๊ฐ€ ์žกํžŒ ์ค‘์š”ํ•œ ๋ชจ๋ธ์ด๋‹ค. ์ด๋ฅผ ์ปดํ“จํ„ฐ ๋น„์ „์˜ Image Classification์— ์ ์šฉ์„ ํ•ด๋ณด๊ธฐ ์œ„ํ•ด ๋งŽ์€ ๋…ธ๋ ฅ๋“ค์ด ์žˆ์—ˆ์ง€๋งŒ, ์—ฌ์ „ํžˆ CNN ๋ชจ๋ธ์— ์˜์กด์ ์ธ ๋ชจ๋ธ๋“ค์ด ๋งŽ์ด ๋‚˜์™”๊ณ  ์™„๋ฒฝํ•˜๊ฒŒ transformer๋งŒ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ๋“ค์€ ์ด๋ก ์ ์œผ๋กœ๋Š” ํšจ์œจ์ ์ด๊ฒ ์ง€๋งŒ, specialized attention pattern๋“ค์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์‹  ํ•˜๋“œ์›จ์–ด ๊ฐ€์†๊ธฐ์—์„œ๋Š” ์•„์ง ํšจ๊ณผ์ ์œผ๋กœ ํ™•์žฅ๋˜์ง€ ์•Š์•˜๋‹ค. ๊ทธ๋ž˜์„œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” CNN ๊ตฌ์กฐ๋ฅผ ๋ฒ„๋ฆฐ, ์˜จ์ „ํžˆ transformer๋งŒ ์‚ฌ์šฉํ•˜์—ฌ Image Classificationํ•  ์ˆ˜ ์žˆ๋„๋ก ViT(Vision Transformer) ๋ชจ๋ธ์ด ๋‚˜์˜ด ๋…ผ๋ฌธ ๋ชจ๋ธ ๊ตฌ.. 2024. 2. 9.
High-Resolution Image Synthesis with Latent Diffusion Models stable diffusion is latent diffusion latent diffusion is open sourced version of diffusion 2024. 2. 9.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) 1. ํŒŒ์ดํ† ์น˜ ํŠธ๋žœ์Šคํฌ๋จธ๋ฅผ ํ™œ์šฉํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์™€ ์ปดํ“จํ„ฐ๋น„์ „ ์‹ฌ์ธตํ•™์Šต p601~623 2. Shusen Wang - Vision Transformer for Image Classification (์œ ํŠœ๋ธŒ) 3. ๊ณ ๋ ค๋Œ€ํ•™๊ต DSBA - [Paper Review] ViT ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์—์„œ์˜ ํŠธ๋žœ์Šคํฌ๋จธ๋Š” ์ปดํ“จํ„ฐ ๋น„์ „์—์„œ๋„ ๋งŽ์€ ์˜ํ–ฅ์„ ์ฃผ์—ˆ๋Š”๋ฐ, ์ด์ „ ์ปดํ“จํ„ฐ๋น„์ „ ๊ด€๋ จ ์—ฐ๊ตฌ๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์— ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์˜ ์…€ํ”„ ์–ดํ…์…˜ ๋ชจ๋“ˆ์„ ์ฐฉ์šฉํ•œ ๋ชจ๋ธ์ด ๋งŽ์•˜์ง€๋งŒ, ViT(Vision Transformer)๋Š” ํŠธ๋žœ์Šคํฌ๋จธ ๊ตฌ์กฐ ์ž์ฒด๋ฅผ ์ปดํ“จํ„ฐ๋น„์ „ ๋ถ„์•ผ์— ์ ์šฉํ•œ ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋‹ค. CNN ๋ชจ๋ธ์˜ ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต ๋ฐฉ๋ฒ•์€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด์„œ ์ง€์—ญ ํŠน์ง•์„ ์ถ”์ถœํ–ˆ๋‹ค๋ฉด ViT๋Š” ์…€ํ”„ ์–ดํ…์…˜์„ ์‚ฌ์šฉํ•ด ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•œ.. 2024. 2. 7.
CNN(ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง) 3์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ - ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ ์ด๋ฏธ์ง€๋Š” ์„ธ๋กœ, ๊ฐ€๋กœ, ์ฑ„๋„์˜ 3์ฐจ์› ๋ฐ์ดํ„ฐ์ด๋‹ค ์ด๋ฏธ์ง€์™€ ๊ฐ™์€ 3์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์„ ์‚ดํŽด๋ณด๊ฒ ๋‹ค 2์ฐจ์›์ผ ๋•Œ์™€ ๋น„๊ตํ•˜๋ฉด, ๊ธธ์ด ๋ฐฉํ–ฅ(์ฑ„๋„ ๋ฐฉํ–ฅ)์œผ๋กœ ํŠน์ง• ๋งต์ด ๋Š˜์–ด๋‚ฌ์Šต๋‹ˆ๋‹ค. ์ฑ„๋„์ชฝ์œผ๋กœ ํŠน์ง• ๋งต์ด ์—ฌ๋Ÿฌ ๊ฐœ ์žˆ๋‹ค๋ฉด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์™€ ํ•„ํ„ฐ์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์„ ์ฑ„๋„๋งˆ๋‹ค ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋”ํ•ด์„œ ํ•˜๋‚˜์˜ ์ถœ๋ ฅ์„ ์–ป์Šต๋‹ˆ๋‹ค. 3์ฐจ์›์˜ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์—์„œ ์ฃผ์˜ํ•  ์ ์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ฑ„๋„ ์ˆ˜์™€ ํ•„ํ„ฐ์˜ ์ฑ„๋„ ์ˆ˜๊ฐ€ ๊ฐ™์•„์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•œํŽธ, ํ•„ํ„ฐ ์ž์ฒด์˜ ํฌ๊ธฐ๋Š” ์›ํ•˜๋Š” ๊ฐ’์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จ, ๋ชจ๋“  ์ฑ„๋„์˜ ํ•„ํ„ฐ๊ฐ€ ๊ฐ™์€ ํฌ๊ธฐ์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์˜ˆ์—์„œ๋Š” ํ•„ํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ (3,3)์ด์ง€๋งŒ, ์›ํ•œ๋‹ค๋ฉด (2,2)๋‚˜ (1,1), (5,5) ๋“ฑ์œผ๋กœ ์„ค์ •ํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” ์ถœ๋ ฅ.. 2024. 2. 7.