Executive Summary
GNNs is a type of NN designed for graph data. Differently from traditional text, images, and numerical data, graph data do not have a solid structure. Nodes may or may not have any connections and relations could be directional or nondirectional. If we were to use RNNs to consume our graph data, it would required us to convert graph data into a sequence of nodes but this means we won’t be able to capture the most important information about the graph data which is it’s structure.
The development of CNN has inspired and led to the development of GNNs. This is because the convolutional layers share a lot of relevant properties. For example, local connectivity and shift invariance. With graph data, we believe there exist local connectivity whereby nodes’ neighbours should be highly relevant or correlated to the central node. Graph data should also be shift invariance, meaning that each neighbourhood within the convolution should be treated the same despite where the neighbourhood is within the graph.
There are two main ways of applying convolutions in graphs:

Spatial – similar to normal convolution in that they spatially move over nodes within a graph. Spatial graph convolutions makes convolutions by aggregating the node and its neighbouring nodes into a single new node

Spectral – the main limitation here is scalability as they have high time complexity
Different tasks within GNNs

Node prediction – Predicting the node’s value or label

Graph prediction – Predicting the graph’s value or label (biological molecules)

Graph generation – Generate graphs

Edge prediction – Predicting the edge’s value or label
Spatialtemporal GNNs
This is the space that handles time forecasting problems using GNNs. Spatialtemporal graph can handle multiple graphs, each representing a different time step. Due to this sequential nature, we could also look into using RNNs and CNNs. However, the main issue with CNNs is that it does not handle nonlocal correlation well (global information). Graph convolutions tend to only cover the first order neighbouring nodes and although you can have multiple convolution layers covering different ordering of neighbouring nodes, they come at different input size and can’t be resize properly, meaning that nodes don’t get global information of the graph.