Compare each token with its aligned counterpart:
$V_A = g([A, \beta])$, $V_B = g([B, \alpha])$
Aggregate via sum pooling:
$v_A = \sum_i V_{A,i}$, $v_B = \sum_j V_{B,j}$
Classify:
$\hat{y} = h([v_A, v_B])$
Key insight: Attention enables comparing sequences of different lengths without recurrence.
Parikh et al. (2016)