4.2.3 Self-Attention Layer的内部运算逻辑