๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
์นดํ…Œ๊ณ ๋ฆฌ ์—†์Œ

Attention is All You Need

by coderSohyun 2023. 12. 6.

https://arxiv.org/pdf/1706.03762.pdf

 

๋ชจ๋ธ ์ƒ์„ฑ ๋ฐฐ๊ฒฝ 

๊ธฐ์กด RNN์˜ ๋ฌธ์ œ์  : Long term dependency, Gradient Vanishing, Gradient Exploding Problem

Seq2Seq ๋ชจ๋ธ์˜ ๋ฌธ์ œ์  : Context Vector์˜ ์ •๋ณด ์†์‹ค ๋ฌธ์ œ, RNN์— ์˜์กด์  

 

Transformer๋Š” self-attention mechanism์„ ํ†ตํ•ด์„œ ์ด๋ฅผ ํ•ด๊ฒฐํ•จ 

 

- Long term dependency์ด ๋ฌด์—‡์ด๊ณ  ์–ด๋–ค ์ด์œ ๋กœ ๋ฐœ์ƒํ•˜๋Š”์ง€, ์–ด๋Š ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š”์ง€, ์ด์— ๋Œ€ํ•œ ํ•ด๊ฒฐ๋ฐฉ์•ˆ์€ ๋ฌด์—‡์ธ์ง€ 

- Gradient Vanishing, Exploding ๋ฌธ์ œ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€

- Seq2Seq์˜ Context Vector์— ๊ด€ํ•œ ๋ฌธ์ œ ๋ฐ ์ „๋ฐ˜์ ์ธ ํ•œ๊ณ„์  ํŒŒ์•…ํ•˜๊ณ  

- ์–ด๋–ค ์ ์œผ๋กœ Transformer๊ฐ€ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค ํ–ˆ๋Š”์ง€  

๋ชจ๋ธ ๊ตฌ์กฐ ์„ค๋ช… 

 

๋…ผ๋ฌธ ๋‚ด์šฉ ์ •๋ฆฌ 

 

์˜๋ฌธ์  

Transformer์˜ ๋ฌธ์ œ์ ์€ ๋ฌด์—‡์ผ๊นŒ?

Transformer๋Š” ์–ด๋””์— ์‚ฌ์šฉ๋ ๊นŒ?