Graph Consideration Community (GAT)
Graph Consideration Community (GAT), as launched in [1], intently follows the work from [3] in its consideration setup. GAT formulation additionally holds many similarities to the now notorious transformer paper [4], with each papers having been printed months away from one another.
Consideration in graphs is used to rank or weigh the relative significance of each neighboring node (keys) with respect to every supply node (question). These consideration scores are calculated for each node function within the graph and its respective neighbors. Node options, denoted by
undergo a linear transformation with a weight matrix denoted
earlier than consideration mechanism is utilized. With linearly remodeled node options, uncooked consideration rating is calculated as proven in Equation (1). To calculate normalized consideration scores, softmax operate is used as proven in Equation (2) just like consideration calculation in [4].
Within the paper (GAT), the eye mechanism Consideration( ) used is a single-layer feedforward neural community parameterized by a adopted by a LeakyReLU non-linearity, as proven in Equation (3). The || image denotes concatenation alongside the function dimension. Be aware: multi-head consideration formulation is deliberately skipped on this article because it holds no relevance to consideration formulation itself. Each GAT and GATv2 leverage multi-headed consideration of their implementations.
As it may be seen, the learnable consideration parameter a is launched as a linear mixture to the remodeled node options Wh. As elaborated within the upcoming sections, this setup is called static consideration and is the primary limiting issue of GAT, although for causes that aren’t instantly apparent.
Static Consideration
Contemplate the next graph beneath the place node h1 is the question node with the next neighbors (keys) {h2, h3, h4, h5}.
Calculating the uncooked consideration rating between the question node and h2 following the GAT formulation is proven in Equation (4).
As talked about earlier, learnable consideration parameter a is mixed linearly with the concatenated question and key nodes. Which means that the contributions of a with respect to Wh1 and Wh2 are linearly separable as a = [a1 || a2]. Utilizing a1 and a2, Equation (4) might be restructured as the next (Equation (5)).
Calculating the uncooked consideration scores for the remainder of the neighborhood with respect to the question node h1, a sample begins to emerge.
From Equation (6), it may be seen that the question time period
is repeated every time within the calculation of consideration scores e. Which means that whereas the question time period is technically included within the consideration calculation, it primarily impacts all neighbors equally and doesn’t have an effect on their relative ordering. Solely the important thing phrases
decide the relative order of consideration scores with respect to one another.
Such a consideration known as static consideration by [2]. This design implies that the rating of neighbors’ significance
is set globally throughout all nodes unbiased of the particular question nodes. This limitation prevents GAT from capturing regionally nuanced relationships the place totally different nodes would possibly prioritize totally different subsets of neighbors. As said in [2], “[static attention] can’t mannequin conditions the place totally different keys have totally different relevance to totally different queries”.