deeprobust.graph.defense package

Submodules

deeprobust.graph.defense.adv_training module

class AdvTraining(model, adversary=None, device='cpu')[source]

Adversarial training framework for defending against attacks.

Parameters:
  • model – model to protect, e.g, GCN
  • adversary – attack model
  • device (str) – ‘cpu’ or ‘cuda’
adv_train(features, adj, labels, idx_train, train_iters, **kwargs)[source]

Start adversarial training.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • train_iters (int) – number of training epochs

deeprobust.graph.defense.gcn module

class GCN(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]

2 Layer Graph Convolutional Network.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • dropout (float) – dropout rate for GCN
  • lr (float) – learning rate for GCN
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
  • with_bias (bool) – whether to include bias term in GCN weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCN.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import GCN
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> gcn = GCN(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> gcn = gcn.to('cpu')
>>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping
>>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping
>>> gcn.test(idx_test)
fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]

Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
  • normalize (bool) – whether to normalize the input adjacency matrix.
  • patience (int) – patience for early stopping, only valid when idx_val is given
initialize()[source]

Initialize parameters of GCN.

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters:
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns:

output (log probabilities) of GCN

Return type:

torch.FloatTensor

test(idx_test)[source]

Evaluate GCN performance on test set.

Parameters:idx_test – node testing indices
class GraphConvolution(in_features, out_features, with_bias=True)[source]

Simple GCN layer, similar to https://github.com/tkipf/pygcn

forward(input, adj)[source]

Graph Convolutional Layer forward function

deeprobust.graph.defense.gcn_preprocess module

class GCNJaccard(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • dropout (float) – dropout rate for GCN
  • lr (float) – learning rate for GCN
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
  • with_bias (bool) – whether to include bias term in GCN weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNJaccard.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNJaccard
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNJaccard(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
drop_dissimilar_edges(features, adj, metric='similarity')[source]

Drop dissimilar edges.(Faster version using numba)

fit(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • features – node features. The format can be numpy.array or scipy matrix
  • adj – the adjacency matrix.
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters:
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns:

output (log probabilities) of GCNJaccard

Return type:

torch.FloatTensor

class GCNSVD(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • dropout (float) – dropout rate for GCN
  • lr (float) – learning rate for GCN
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
  • with_bias (bool) – whether to include bias term in GCN weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNSVD.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNSVD
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNSVD(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
fit(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • k (int) – number of singular values and vectors to compute.
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters:
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns:

output (log probabilities) of GCNSVD

Return type:

torch.FloatTensor

truncatedSVD(data, k=50)[source]

Truncated SVD on input data.

Parameters:
  • data – input matrix to be decomposed
  • k (int) – number of singular values and vectors to compute.
Returns:

reconstructed matrix.

Return type:

numpy.array

deeprobust.graph.defense.pgd module

class PGD(params, proxs, alphas, lr=<sphinx.ext.autodoc.importer._MockObject object>, momentum=0, dampening=0, weight_decay=0)[source]

Proximal gradient descent.

Parameters:
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
  • proxs (iterable) – iterable of proximal operators
  • alpha (iterable) – iterable of coefficients for proximal gradient descent
  • lr (float) – learning rate
  • momentum (float) – momentum factor (default: 0)
  • weight_decay (float) – weight decay (L2 penalty) (default: 0)
  • dampening (float) – dampening for momentum (default: 0)
class ProxOperators[source]

Proximal Operators.

prox_l1(data, alpha)[source]

Proximal operator for l1 norm.

prox_nuclear(data, alpha)[source]

Proximal operator for nuclear norm (trace norm).

class SGD(params, lr=<sphinx.ext.autodoc.importer._MockObject object>, momentum=0, dampening=0, weight_decay=0, nesterov=False)[source]
step(closure=None)[source]

Performs a single optimization step.

Parameters:closure (callable, optional) – A closure that reevaluates the model and returns the loss.

deeprobust.graph.defense.prognn module

class EstimateAdj(adj, symmetric=False, device='cpu')[source]

Provide a pytorch parameter matrix for estimated adjacency matrix and corresponding operations.

class ProGNN(model, args, device)[source]

ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.

Parameters:
  • model – model: The backbone GNN model in ProGNN
  • args – model configs
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

See details in https://github.com/ChandlerBang/Pro-GNN.

fit(features, adj, labels, idx_train, idx_val, **kwargs)[source]

Train Pro-GNN.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices
test(features, labels, idx_test)[source]

Evaluate the performance of ProGNN on test set

deeprobust.graph.defense.r_gcn module

Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
http://pengcui.thumedialab.com/papers/RGCN.pdf
Author’s Tensorflow implemention:
https://github.com/thumanlab/nrlweb/tree/master/static/assets/download
class GGCL_D(in_features, out_features, dropout)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is distribution

class GGCL_F(in_features, out_features, dropout=0.6)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is feature

class GaussianConvolution(in_features, out_features)[source]

[Deprecated] Alternative gaussion convolution layer.

class RGCN(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]

Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.

Parameters:
  • nnodes (int) – number of nodes in the input grpah
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • gamma (float) – hyper-parameter for RGCN. See more details in the paper.
  • beta1 (float) – hyper-parameter for RGCN. See more details in the paper.
  • beta2 (float) – hyper-parameter for RGCN. See more details in the paper.
  • lr (float) – learning rate for GCN
  • dropout (float) – dropout rate for GCN
  • device (str) – ‘cpu’ or ‘cuda’.
fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]

Train RGCN.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • train_iters (int) – number of training epochs
  • verbose (bool) – whether to show verbose logs

Examples

We can first load dataset and then train RGCN.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import RGCN
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1],
                 nclass=labels.max()+1, nhid=32, device='cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val,
              train_iters=200, verbose=True)
>>> model.test(idx_test)
predict()[source]
Returns:output (log probabilities) of RGCN
Return type:torch.FloatTensor
test(idx_test)[source]

Evaluate the peformance on test set

Module contents

class GCN(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]

2 Layer Graph Convolutional Network.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • dropout (float) – dropout rate for GCN
  • lr (float) – learning rate for GCN
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
  • with_bias (bool) – whether to include bias term in GCN weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCN.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import GCN
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> gcn = GCN(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> gcn = gcn.to('cpu')
>>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping
>>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping
>>> gcn.test(idx_test)
fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]

Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
  • normalize (bool) – whether to normalize the input adjacency matrix.
  • patience (int) – patience for early stopping, only valid when idx_val is given
initialize()[source]

Initialize parameters of GCN.

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters:
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns:

output (log probabilities) of GCN

Return type:

torch.FloatTensor

test(idx_test)[source]

Evaluate GCN performance on test set.

Parameters:idx_test – node testing indices
class GCNSVD(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • dropout (float) – dropout rate for GCN
  • lr (float) – learning rate for GCN
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
  • with_bias (bool) – whether to include bias term in GCN weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNSVD.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNSVD
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNSVD(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
fit(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • k (int) – number of singular values and vectors to compute.
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters:
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns:

output (log probabilities) of GCNSVD

Return type:

torch.FloatTensor

truncatedSVD(data, k=50)[source]

Truncated SVD on input data.

Parameters:
  • data – input matrix to be decomposed
  • k (int) – number of singular values and vectors to compute.
Returns:

reconstructed matrix.

Return type:

numpy.array

class GCNJaccard(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]

GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • dropout (float) – dropout rate for GCN
  • lr (float) – learning rate for GCN
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
  • with_bias (bool) – whether to include bias term in GCN weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GCNJaccard.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import GCNJaccard
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = GCNJaccard(nfeat=features.shape[1],
          nhid=16,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu').to('cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
drop_dissimilar_edges(features, adj, metric='similarity')[source]

Drop dissimilar edges.(Faster version using numba)

fit(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]

First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • features – node features. The format can be numpy.array or scipy matrix
  • adj – the adjacency matrix.
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized adjacency

Parameters:
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns:

output (log probabilities) of GCNJaccard

Return type:

torch.FloatTensor

class RGCN(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]

Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.

Parameters:
  • nnodes (int) – number of nodes in the input grpah
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • gamma (float) – hyper-parameter for RGCN. See more details in the paper.
  • beta1 (float) – hyper-parameter for RGCN. See more details in the paper.
  • beta2 (float) – hyper-parameter for RGCN. See more details in the paper.
  • lr (float) – learning rate for GCN
  • dropout (float) – dropout rate for GCN
  • device (str) – ‘cpu’ or ‘cuda’.
fit(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]

Train RGCN.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
  • train_iters (int) – number of training epochs
  • verbose (bool) – whether to show verbose logs

Examples

We can first load dataset and then train RGCN.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import RGCN
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> # train defense model
>>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1],
                 nclass=labels.max()+1, nhid=32, device='cpu')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val,
              train_iters=200, verbose=True)
>>> model.test(idx_test)
predict()[source]
Returns:output (log probabilities) of RGCN
Return type:torch.FloatTensor
test(idx_test)[source]

Evaluate the peformance on test set

class ProGNN(model, args, device)[source]

ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.

Parameters:
  • model – model: The backbone GNN model in ProGNN
  • args – model configs
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

See details in https://github.com/ChandlerBang/Pro-GNN.

fit(features, adj, labels, idx_train, idx_val, **kwargs)[source]

Train Pro-GNN.

Parameters:
  • features – node features
  • adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
  • labels – node labels
  • idx_train – node training indices
  • idx_val – node validation indices
test(features, labels, idx_test)[source]

Evaluate the performance of ProGNN on test set

class GraphConvolution(in_features, out_features, with_bias=True)[source]

Simple GCN layer, similar to https://github.com/tkipf/pygcn

forward(input, adj)[source]

Graph Convolutional Layer forward function

class GGCL_F(in_features, out_features, dropout=0.6)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is feature

class GGCL_D(in_features, out_features, dropout)[source]

Graph Gaussian Convolution Layer (GGCL) when the input is distribution

class GAT(nfeat, nhid, nclass, heads=8, output_heads=1, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]

2 Layer Graph Attention Network based on pytorch geometric.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • heads (int) – number of attention heads
  • output_heads (int) – number of attention output heads
  • dropout (float) – dropout rate for GAT
  • lr (float) – learning rate for GAT
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_bias (bool) – whether to include bias term in GAT weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train GAT.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import GAT
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> gat = GAT(nfeat=features.shape[1],
          nhid=8, heads=8,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> gat = gat.to('cpu')
>>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset
>>> gat.fit(pyg_data, patience=100, verbose=True) # train with earlystopping
fit(pyg_data, train_iters=1000, initialize=True, verbose=False, patience=100, **kwargs)[source]

Train the GAT model, when idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • pyg_data – pytorch geometric dataset object
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
  • patience (int) – patience for early stopping, only valid when idx_val is given
initialize()[source]

Initialize parameters of GAT.

predict()[source]
Returns:output (log probabilities) of GAT
Return type:torch.FloatTensor
test()[source]

Evaluate GAT performance on test set.

Parameters:idx_test – node testing indices
train_with_early_stopping(train_iters, patience, verbose)[source]

early stopping based on the validation loss

class ChebNet(nfeat, nhid, nclass, num_hops=3, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]

2 Layer ChebNet based on pytorch geometric.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • num_hops (int) – number of hops in ChebConv
  • dropout (float) – dropout rate for ChebNet
  • lr (float) – learning rate for ChebNet
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_bias (bool) – whether to include bias term in ChebNet weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train ChebNet.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import ChebNet
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> cheby = ChebNet(nfeat=features.shape[1],
          nhid=16, num_hops=3,
          nclass=labels.max().item() + 1,
          dropout=0.5, device='cpu')
>>> cheby = cheby.to('cpu')
>>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset
>>> cheby.fit(pyg_data, patience=10, verbose=True) # train with earlystopping
fit(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]

Train the ChebNet model, when idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • pyg_data – pytorch geometric dataset object
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
  • patience (int) – patience for early stopping, only valid when idx_val is given
initialize()[source]

Initialize parameters of ChebNet.

predict()[source]
Returns:output (log probabilities) of ChebNet
Return type:torch.FloatTensor
test()[source]

Evaluate ChebNet performance on test set.

Parameters:idx_test – node testing indices
train_with_early_stopping(train_iters, patience, verbose)[source]

early stopping based on the validation loss

class SGC(nfeat, nclass, K=3, cached=True, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]

SGC based on pytorch geometric. Simplifying Graph Convolutional Networks.

Parameters:
  • nfeat (int) – size of input feature dimension
  • nclass (int) – size of output dimension
  • K (int) – number of propagation in SGC
  • cached (bool) – whether to set the cache flag in SGConv
  • lr (float) – learning rate for SGC
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_bias (bool) – whether to include bias term in SGC weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train SGC.

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.defense import SGC
>>> data = Dataset(root='/tmp/', name='cora')
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> sgc = SGC(nfeat=features.shape[1], K=3, lr=0.1,
          nclass=labels.max().item() + 1, device='cuda')
>>> sgc = sgc.to('cuda')
>>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset
>>> sgc.fit(pyg_data, train_iters=200, patience=200, verbose=True) # train with earlystopping
fit(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]

Train the SGC model, when idx_val is not None, pick the best model according to the validation loss.

Parameters:
  • pyg_data – pytorch geometric dataset object
  • train_iters (int) – number of training epochs
  • initialize (bool) – whether to initialize parameters before training
  • verbose (bool) – whether to show verbose logs
  • patience (int) – patience for early stopping, only valid when idx_val is given
initialize()[source]

Initialize parameters of SGC.

predict()[source]
Returns:output (log probabilities) of SGC
Return type:torch.FloatTensor
test()[source]

Evaluate SGC performance on test set.

Parameters:idx_test – node testing indices
train_with_early_stopping(train_iters, patience, verbose)[source]

early stopping based on the validation loss

class SimPGCN(nnodes, nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, lambda_=5, gamma=0.1, bias_init=0, with_bias=True, device=None)[source]
SimP-GCN: Node similarity preserving graph convolutional networks.
https://arxiv.org/abs/2011.09643
Parameters:
  • nnodes (int) – number of nodes in the input grpah
  • nfeat (int) – size of input feature dimension
  • nhid (int) – number of hidden units
  • nclass (int) – size of output dimension
  • lambda (float) – coefficients for SSL loss in SimP-GCN
  • gamma (float) – coefficients for adaptive learnable self-loops
  • bias_init (float) – bias init for the score
  • dropout (float) – dropout rate for GCN
  • lr (float) – learning rate for GCN
  • weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
  • with_bias (bool) – whether to include bias term in GCN weights.
  • device (str) – ‘cpu’ or ‘cuda’.

Examples

We can first load dataset and then train SimPGCN.

See the detailed hyper-parameter setting in https://github.com/ChandlerBang/SimP-GCN.

>>> from deeprobust.graph.data import PrePtbDataset, Dataset
>>> from deeprobust.graph.defense import SimPGCN
>>> # load clean graph data
>>> data = Dataset(root='/tmp/', name='cora', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # load perturbed graph data
>>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora')
>>> perturbed_adj = perturbed_data.adj
>>> model = SimPGCN(nnodes=features.shape[0], nfeat=features.shape[1],
    nhid=16, nclass=labels.max()+1, device='cuda')
>>> model = model.to('cuda')
>>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True)
>>> model.test(idx_test)
initialize()[source]

Initialize parameters of SimPGCN.

myforward(fea, adj)[source]

output embedding and log_softmax

predict(features=None, adj=None)[source]

By default, the inputs should be unnormalized data

Parameters:
  • features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
  • adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns:

output (log probabilities) of GCN

Return type:

torch.FloatTensor

test(idx_test)[source]

Evaluate GCN performance on test set.

Parameters:idx_test – node testing indices
class Node2Vec[source]

node2vec: Scalable Feature Learning for Networks. KDD’15. To use this model, you need to “pip install node2vec” first.

Examples

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.global_attack import NodeEmbeddingAttack
>>> from deeprobust.graph.defense import Node2Vec
>>> data = Dataset(root='/tmp/', name='cora_ml', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # set up attack model
>>> attacker = NodeEmbeddingAttack()
>>> attacker.attack(adj, attack_type="remove", n_perturbations=1000)
>>> modified_adj = attacker.modified_adj
>>> print("Test Node2vec on clean graph")
>>> model = Node2Vec()
>>> model.fit(adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
>>> print("Test Node2vec on attacked graph")
>>> model = Node2Vec()
>>> model.fit(modified_adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
node2vec(adj, embedding_dim=64, walk_length=30, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1, p=4, q=1)[source]

Compute Node2Vec embeddings for the given graph.

Parameters:
  • adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph
  • embedding_dim (int, optional) – Dimension of the embedding
  • walks_per_node (int, optional) – Number of walks sampled from each node
  • walk_length (int, optional) – Length of each random walk
  • workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)
  • window_size (int, optional) – Window size (see gensim.models.Word2Vec)
  • num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)
  • p (float) – The hyperparameter p in node2vec
  • q (float) – The hyperparameter q in node2vec
class DeepWalk(type='skipgram')[source]

DeepWalk: Online Learning of Social Representations. KDD’14. The implementation is modified from https://github.com/abojchevski/node_embedding_attack

Examples

>>> from deeprobust.graph.data import Dataset
>>> from deeprobust.graph.global_attack import NodeEmbeddingAttack
>>> from deeprobust.graph.defense import DeepWalk
>>> data = Dataset(root='/tmp/', name='cora_ml', seed=15)
>>> adj, features, labels = data.adj, data.features, data.labels
>>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test
>>> # set up attack model
>>> attacker = NodeEmbeddingAttack()
>>> attacker.attack(adj, attack_type="remove", n_perturbations=1000)
>>> modified_adj = attacker.modified_adj
>>> print("Test DeepWalk on clean graph")
>>> model = DeepWalk()
>>> model.fit(adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
>>> print("Test DeepWalk on attacked graph")
>>> model.fit(modified_adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
>>> print("Test DeepWalk SVD")
>>> model = DeepWalk(type="svd")
>>> model.fit(modified_adj)
>>> model.evaluate_node_classification(labels, idx_train, idx_test)
deepwalk_skipgram(adj, embedding_dim=64, walk_length=80, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1)[source]

Compute DeepWalk embeddings for the given graph using the skip-gram formulation.

Parameters:
  • adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph
  • embedding_dim (int, optional) – Dimension of the embedding
  • walks_per_node (int, optional) – Number of walks sampled from each node
  • walk_length (int, optional) – Length of each random walk
  • workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)
  • window_size (int, optional) – Window size (see gensim.models.Word2Vec)
  • num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)
deepwalk_svd(adj, window_size=10, embedding_dim=64, num_neg_samples=1, sparse=True)[source]

Compute DeepWalk embeddings for the given graph using the matrix factorization formulation. adj: sp.csr_matrix, shape [n_nodes, n_nodes]

Adjacency matrix of the graph
window_size: int
Size of the window
embedding_dim: int
Size of the embedding
num_neg_samples: int
Number of negative samples
sparse: bool
Whether to perform sparse operations
Returns:Embedding matrix.
Return type:np.ndarray, shape [num_nodes, embedding_dim]
svd_embedding(x, embedding_dim, sparse=False)[source]

Computes an embedding by selection the top (embedding_dim) largest singular-values/vectors. :param x: sp.csr_matrix or np.ndarray

The matrix that we want to embed
Parameters:
  • embedding_dim – int Dimension of the embedding
  • sparse – bool Whether to perform sparse operations
Returns:

np.ndarray, shape [?, embedding_dim], np.ndarray, shape [?, embedding_dim] Embedding matrices.