deeprobust.graph.defense package¶
Submodules¶
deeprobust.graph.defense.adv_training module¶
-
class
AdvTraining
(model, adversary=None, device='cpu')[source]¶ Adversarial training framework for defending against attacks.
Parameters: - model – model to protect, e.g, GCN
- adversary – attack model
- device (str) – ‘cpu’ or ‘cuda’
-
adv_train
(features, adj, labels, idx_train, train_iters, **kwargs)[source]¶ Start adversarial training.
Parameters: - features – node features
- adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- train_iters (int) – number of training epochs
deeprobust.graph.defense.gcn module¶
-
class
GCN
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]¶ 2 Layer Graph Convolutional Network.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- dropout (float) – dropout rate for GCN
- lr (float) – learning rate for GCN
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
- with_bias (bool) – whether to include bias term in GCN weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCN.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import GCN >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> gcn = GCN(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> gcn = gcn.to('cpu') >>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping >>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping >>> gcn.test(idx_test)
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]¶ Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.
Parameters: - features – node features
- adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
- normalize (bool) – whether to normalize the input adjacency matrix.
- patience (int) – patience for early stopping, only valid when idx_val is given
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
Parameters: - features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns: output (log probabilities) of GCN
Return type: torch.FloatTensor
-
class
GraphConvolution
(in_features, out_features, with_bias=True)[source]¶ Simple GCN layer, similar to https://github.com/tkipf/pygcn
deeprobust.graph.defense.gcn_preprocess module¶
-
class
GCNJaccard
(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- dropout (float) – dropout rate for GCN
- lr (float) – learning rate for GCN
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
- with_bias (bool) – whether to include bias term in GCN weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNJaccard.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNJaccard >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNJaccard(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
-
drop_dissimilar_edges
(features, adj, metric='similarity')[source]¶ Drop dissimilar edges.(Faster version using numba)
-
fit
(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.
Parameters: - features – node features. The format can be numpy.array or scipy matrix
- adj – the adjacency matrix.
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
Parameters: - features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns: output (log probabilities) of GCNJaccard
Return type: torch.FloatTensor
-
class
GCNSVD
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- dropout (float) – dropout rate for GCN
- lr (float) – learning rate for GCN
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
- with_bias (bool) – whether to include bias term in GCN weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNSVD.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNSVD >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNSVD(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
-
fit
(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.
Parameters: - features – node features
- adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- k (int) – number of singular values and vectors to compute.
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
Parameters: - features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns: output (log probabilities) of GCNSVD
Return type: torch.FloatTensor
deeprobust.graph.defense.pgd module¶
-
class
PGD
(params, proxs, alphas, lr=<sphinx.ext.autodoc.importer._MockObject object>, momentum=0, dampening=0, weight_decay=0)[source]¶ Proximal gradient descent.
Parameters: - params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
- proxs (iterable) – iterable of proximal operators
- alpha (iterable) – iterable of coefficients for proximal gradient descent
- lr (float) – learning rate
- momentum (float) – momentum factor (default: 0)
- weight_decay (float) – weight decay (L2 penalty) (default: 0)
- dampening (float) – dampening for momentum (default: 0)
deeprobust.graph.defense.prognn module¶
-
class
EstimateAdj
(adj, symmetric=False, device='cpu')[source]¶ Provide a pytorch parameter matrix for estimated adjacency matrix and corresponding operations.
-
class
ProGNN
(model, args, device)[source]¶ ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.
Parameters: - model – model: The backbone GNN model in ProGNN
- args – model configs
- device (str) – ‘cpu’ or ‘cuda’.
Examples
See details in https://github.com/ChandlerBang/Pro-GNN.
deeprobust.graph.defense.r_gcn module¶
- Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
- http://pengcui.thumedialab.com/papers/RGCN.pdf
- Author’s Tensorflow implemention:
- https://github.com/thumanlab/nrlweb/tree/master/static/assets/download
-
class
GGCL_D
(in_features, out_features, dropout)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is distribution
-
class
GGCL_F
(in_features, out_features, dropout=0.6)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is feature
-
class
GaussianConvolution
(in_features, out_features)[source]¶ [Deprecated] Alternative gaussion convolution layer.
-
class
RGCN
(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]¶ Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
Parameters: - nnodes (int) – number of nodes in the input grpah
- nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- gamma (float) – hyper-parameter for RGCN. See more details in the paper.
- beta1 (float) – hyper-parameter for RGCN. See more details in the paper.
- beta2 (float) – hyper-parameter for RGCN. See more details in the paper.
- lr (float) – learning rate for GCN
- dropout (float) – dropout rate for GCN
- device (str) – ‘cpu’ or ‘cuda’.
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]¶ Train RGCN.
Parameters: - features – node features
- adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- train_iters (int) – number of training epochs
- verbose (bool) – whether to show verbose logs
Examples
We can first load dataset and then train RGCN.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import RGCN >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1], nclass=labels.max()+1, nhid=32, device='cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True) >>> model.test(idx_test)
Module contents¶
-
class
GCN
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device=None)[source]¶ 2 Layer Graph Convolutional Network.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- dropout (float) – dropout rate for GCN
- lr (float) – learning rate for GCN
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
- with_bias (bool) – whether to include bias term in GCN weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCN.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import GCN >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> gcn = GCN(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> gcn = gcn.to('cpu') >>> gcn.fit(features, adj, labels, idx_train) # train without earlystopping >>> gcn.fit(features, adj, labels, idx_train, idx_val, patience=30) # train with earlystopping >>> gcn.test(idx_test)
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, initialize=True, verbose=False, normalize=True, patience=500, **kwargs)[source]¶ Train the gcn model, when idx_val is not None, pick the best model according to the validation loss.
Parameters: - features – node features
- adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
- normalize (bool) – whether to normalize the input adjacency matrix.
- patience (int) – patience for early stopping, only valid when idx_val is given
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
Parameters: - features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns: output (log probabilities) of GCN
Return type: torch.FloatTensor
-
class
GCNSVD
(nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNSVD is a 2 Layer Graph Convolutional Network with Truncated SVD as preprocessing. See more details in All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs, https://dl.acm.org/doi/abs/10.1145/3336191.3371789.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- dropout (float) – dropout rate for GCN
- lr (float) – learning rate for GCN
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
- with_bias (bool) – whether to include bias term in GCN weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNSVD.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNSVD >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNSVD(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, k=20)
-
fit
(features, adj, labels, idx_train, idx_val=None, k=50, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First perform rank-k approximation of adjacency matrix via truncated SVD, and then train the gcn model on the processed graph, when idx_val is not None, pick the best model according to the validation loss.
Parameters: - features – node features
- adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- k (int) – number of singular values and vectors to compute.
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
Parameters: - features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns: output (log probabilities) of GCNSVD
Return type: torch.FloatTensor
-
class
GCNJaccard
(nfeat, nhid, nclass, binary_feature=True, dropout=0.5, lr=0.01, weight_decay=0.0005, with_relu=True, with_bias=True, device='cpu')[source]¶ GCNJaccard first preprocesses input graph via droppining dissimilar edges and train a GCN based on the processed graph. See more details in Adversarial Examples on Graph Data: Deep Insights into Attack and Defense, https://arxiv.org/pdf/1903.01610.pdf.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- dropout (float) – dropout rate for GCN
- lr (float) – learning rate for GCN
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_relu (bool) – whether to use relu activation function. If False, GCN will be linearized.
- with_bias (bool) – whether to include bias term in GCN weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GCNJaccard.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import GCNJaccard >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = GCNJaccard(nfeat=features.shape[1], nhid=16, nclass=labels.max().item() + 1, dropout=0.5, device='cpu').to('cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, threshold=0.03)
-
drop_dissimilar_edges
(features, adj, metric='similarity')[source]¶ Drop dissimilar edges.(Faster version using numba)
-
fit
(features, adj, labels, idx_train, idx_val=None, threshold=0.01, train_iters=200, initialize=True, verbose=True, **kwargs)[source]¶ First drop dissimilar edges with similarity smaller than given threshold and then train the gcn model on the processed graph. When idx_val is not None, pick the best model according to the validation loss.
Parameters: - features – node features. The format can be numpy.array or scipy matrix
- adj – the adjacency matrix.
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- threshold (float) – similarity threshold for dropping edges. If two connected nodes with similarity smaller than threshold, the edge between them will be removed.
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized adjacency
Parameters: - features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns: output (log probabilities) of GCNJaccard
Return type: torch.FloatTensor
-
class
RGCN
(nnodes, nfeat, nhid, nclass, gamma=1.0, beta1=0.0005, beta2=0.0005, lr=0.01, dropout=0.6, device='cpu')[source]¶ Robust Graph Convolutional Networks Against Adversarial Attacks. KDD 2019.
Parameters: - nnodes (int) – number of nodes in the input grpah
- nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- gamma (float) – hyper-parameter for RGCN. See more details in the paper.
- beta1 (float) – hyper-parameter for RGCN. See more details in the paper.
- beta2 (float) – hyper-parameter for RGCN. See more details in the paper.
- lr (float) – learning rate for GCN
- dropout (float) – dropout rate for GCN
- device (str) – ‘cpu’ or ‘cuda’.
-
fit
(features, adj, labels, idx_train, idx_val=None, train_iters=200, verbose=True, **kwargs)[source]¶ Train RGCN.
Parameters: - features – node features
- adj – the adjacency matrix. The format could be torch.tensor or scipy matrix
- labels – node labels
- idx_train – node training indices
- idx_val – node validation indices. If not given (None), GCN training process will not adpot early stopping
- train_iters (int) – number of training epochs
- verbose (bool) – whether to show verbose logs
Examples
We can first load dataset and then train RGCN.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import RGCN >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> # train defense model >>> model = RGCN(nnodes=perturbed_adj.shape[0], nfeat=features.shape[1], nclass=labels.max()+1, nhid=32, device='cpu') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True) >>> model.test(idx_test)
-
class
ProGNN
(model, args, device)[source]¶ ProGNN (Properties Graph Neural Network). See more details in Graph Structure Learning for Robust Graph Neural Networks, KDD 2020, https://arxiv.org/abs/2005.10203.
Parameters: - model – model: The backbone GNN model in ProGNN
- args – model configs
- device (str) – ‘cpu’ or ‘cuda’.
Examples
See details in https://github.com/ChandlerBang/Pro-GNN.
-
class
GraphConvolution
(in_features, out_features, with_bias=True)[source]¶ Simple GCN layer, similar to https://github.com/tkipf/pygcn
-
class
GGCL_F
(in_features, out_features, dropout=0.6)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is feature
-
class
GGCL_D
(in_features, out_features, dropout)[source]¶ Graph Gaussian Convolution Layer (GGCL) when the input is distribution
-
class
GAT
(nfeat, nhid, nclass, heads=8, output_heads=1, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]¶ 2 Layer Graph Attention Network based on pytorch geometric.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- heads (int) – number of attention heads
- output_heads (int) – number of attention output heads
- dropout (float) – dropout rate for GAT
- lr (float) – learning rate for GAT
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_bias (bool) – whether to include bias term in GAT weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train GAT.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import GAT >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> gat = GAT(nfeat=features.shape[1], nhid=8, heads=8, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> gat = gat.to('cpu') >>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset >>> gat.fit(pyg_data, patience=100, verbose=True) # train with earlystopping
-
fit
(pyg_data, train_iters=1000, initialize=True, verbose=False, patience=100, **kwargs)[source]¶ Train the GAT model, when idx_val is not None, pick the best model according to the validation loss.
Parameters: - pyg_data – pytorch geometric dataset object
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
- patience (int) – patience for early stopping, only valid when idx_val is given
-
class
ChebNet
(nfeat, nhid, nclass, num_hops=3, dropout=0.5, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]¶ 2 Layer ChebNet based on pytorch geometric.
Parameters: - nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- num_hops (int) – number of hops in ChebConv
- dropout (float) – dropout rate for ChebNet
- lr (float) – learning rate for ChebNet
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_bias (bool) – whether to include bias term in ChebNet weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train ChebNet.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import ChebNet >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> cheby = ChebNet(nfeat=features.shape[1], nhid=16, num_hops=3, nclass=labels.max().item() + 1, dropout=0.5, device='cpu') >>> cheby = cheby.to('cpu') >>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset >>> cheby.fit(pyg_data, patience=10, verbose=True) # train with earlystopping
-
fit
(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]¶ Train the ChebNet model, when idx_val is not None, pick the best model according to the validation loss.
Parameters: - pyg_data – pytorch geometric dataset object
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
- patience (int) – patience for early stopping, only valid when idx_val is given
-
class
SGC
(nfeat, nclass, K=3, cached=True, lr=0.01, weight_decay=0.0005, with_bias=True, device=None)[source]¶ SGC based on pytorch geometric. Simplifying Graph Convolutional Networks.
Parameters: - nfeat (int) – size of input feature dimension
- nclass (int) – size of output dimension
- K (int) – number of propagation in SGC
- cached (bool) – whether to set the cache flag in SGConv
- lr (float) – learning rate for SGC
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_bias (bool) – whether to include bias term in SGC weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train SGC.
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.defense import SGC >>> data = Dataset(root='/tmp/', name='cora') >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> sgc = SGC(nfeat=features.shape[1], K=3, lr=0.1, nclass=labels.max().item() + 1, device='cuda') >>> sgc = sgc.to('cuda') >>> pyg_data = Dpr2Pyg(data) # convert deeprobust dataset to pyg dataset >>> sgc.fit(pyg_data, train_iters=200, patience=200, verbose=True) # train with earlystopping
-
fit
(pyg_data, train_iters=200, initialize=True, verbose=False, patience=500, **kwargs)[source]¶ Train the SGC model, when idx_val is not None, pick the best model according to the validation loss.
Parameters: - pyg_data – pytorch geometric dataset object
- train_iters (int) – number of training epochs
- initialize (bool) – whether to initialize parameters before training
- verbose (bool) – whether to show verbose logs
- patience (int) – patience for early stopping, only valid when idx_val is given
-
class
SimPGCN
(nnodes, nfeat, nhid, nclass, dropout=0.5, lr=0.01, weight_decay=0.0005, lambda_=5, gamma=0.1, bias_init=0, with_bias=True, device=None)[source]¶ - SimP-GCN: Node similarity preserving graph convolutional networks.
- https://arxiv.org/abs/2011.09643
Parameters: - nnodes (int) – number of nodes in the input grpah
- nfeat (int) – size of input feature dimension
- nhid (int) – number of hidden units
- nclass (int) – size of output dimension
- lambda (float) – coefficients for SSL loss in SimP-GCN
- gamma (float) – coefficients for adaptive learnable self-loops
- bias_init (float) – bias init for the score
- dropout (float) – dropout rate for GCN
- lr (float) – learning rate for GCN
- weight_decay (float) – weight decay coefficient (l2 normalization) for GCN. When with_relu is True, weight_decay will be set to 0.
- with_bias (bool) – whether to include bias term in GCN weights.
- device (str) – ‘cpu’ or ‘cuda’.
Examples
We can first load dataset and then train SimPGCN.
See the detailed hyper-parameter setting in https://github.com/ChandlerBang/SimP-GCN.
>>> from deeprobust.graph.data import PrePtbDataset, Dataset >>> from deeprobust.graph.defense import SimPGCN >>> # load clean graph data >>> data = Dataset(root='/tmp/', name='cora', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # load perturbed graph data >>> perturbed_data = PrePtbDataset(root='/tmp/', name='cora') >>> perturbed_adj = perturbed_data.adj >>> model = SimPGCN(nnodes=features.shape[0], nfeat=features.shape[1], nhid=16, nclass=labels.max()+1, device='cuda') >>> model = model.to('cuda') >>> model.fit(features, perturbed_adj, labels, idx_train, idx_val, train_iters=200, verbose=True) >>> model.test(idx_test)
-
predict
(features=None, adj=None)[source]¶ By default, the inputs should be unnormalized data
Parameters: - features – node features. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
- adj – adjcency matrix. If features and adj are not given, this function will use previous stored features and adj from training to make predictions.
Returns: output (log probabilities) of GCN
Return type: torch.FloatTensor
-
class
Node2Vec
[source]¶ node2vec: Scalable Feature Learning for Networks. KDD’15. To use this model, you need to “pip install node2vec” first.
Examples
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.global_attack import NodeEmbeddingAttack >>> from deeprobust.graph.defense import Node2Vec >>> data = Dataset(root='/tmp/', name='cora_ml', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # set up attack model >>> attacker = NodeEmbeddingAttack() >>> attacker.attack(adj, attack_type="remove", n_perturbations=1000) >>> modified_adj = attacker.modified_adj >>> print("Test Node2vec on clean graph") >>> model = Node2Vec() >>> model.fit(adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test) >>> print("Test Node2vec on attacked graph") >>> model = Node2Vec() >>> model.fit(modified_adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test)
-
node2vec
(adj, embedding_dim=64, walk_length=30, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1, p=4, q=1)[source]¶ Compute Node2Vec embeddings for the given graph.
Parameters: - adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph
- embedding_dim (int, optional) – Dimension of the embedding
- walks_per_node (int, optional) – Number of walks sampled from each node
- walk_length (int, optional) – Length of each random walk
- workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)
- window_size (int, optional) – Window size (see gensim.models.Word2Vec)
- num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)
- p (float) – The hyperparameter p in node2vec
- q (float) – The hyperparameter q in node2vec
-
-
class
DeepWalk
(type='skipgram')[source]¶ DeepWalk: Online Learning of Social Representations. KDD’14. The implementation is modified from https://github.com/abojchevski/node_embedding_attack
Examples
>>> from deeprobust.graph.data import Dataset >>> from deeprobust.graph.global_attack import NodeEmbeddingAttack >>> from deeprobust.graph.defense import DeepWalk >>> data = Dataset(root='/tmp/', name='cora_ml', seed=15) >>> adj, features, labels = data.adj, data.features, data.labels >>> idx_train, idx_val, idx_test = data.idx_train, data.idx_val, data.idx_test >>> # set up attack model >>> attacker = NodeEmbeddingAttack() >>> attacker.attack(adj, attack_type="remove", n_perturbations=1000) >>> modified_adj = attacker.modified_adj >>> print("Test DeepWalk on clean graph") >>> model = DeepWalk() >>> model.fit(adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test) >>> print("Test DeepWalk on attacked graph") >>> model.fit(modified_adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test) >>> print("Test DeepWalk SVD") >>> model = DeepWalk(type="svd") >>> model.fit(modified_adj) >>> model.evaluate_node_classification(labels, idx_train, idx_test)
-
deepwalk_skipgram
(adj, embedding_dim=64, walk_length=80, walks_per_node=10, workers=8, window_size=10, num_neg_samples=1)[source]¶ Compute DeepWalk embeddings for the given graph using the skip-gram formulation.
Parameters: - adj (sp.csr_matrix, shape [n_nodes, n_nodes]) – Adjacency matrix of the graph
- embedding_dim (int, optional) – Dimension of the embedding
- walks_per_node (int, optional) – Number of walks sampled from each node
- walk_length (int, optional) – Length of each random walk
- workers (int, optional) – Number of threads (see gensim.models.Word2Vec process)
- window_size (int, optional) – Window size (see gensim.models.Word2Vec)
- num_neg_samples (int, optional) – Number of negative samples (see gensim.models.Word2Vec)
-
deepwalk_svd
(adj, window_size=10, embedding_dim=64, num_neg_samples=1, sparse=True)[source]¶ Compute DeepWalk embeddings for the given graph using the matrix factorization formulation. adj: sp.csr_matrix, shape [n_nodes, n_nodes]
Adjacency matrix of the graph- window_size: int
- Size of the window
- embedding_dim: int
- Size of the embedding
- num_neg_samples: int
- Number of negative samples
- sparse: bool
- Whether to perform sparse operations
Returns: Embedding matrix. Return type: np.ndarray, shape [num_nodes, embedding_dim]
-
svd_embedding
(x, embedding_dim, sparse=False)[source]¶ Computes an embedding by selection the top (embedding_dim) largest singular-values/vectors. :param x: sp.csr_matrix or np.ndarray
The matrix that we want to embedParameters: - embedding_dim – int Dimension of the embedding
- sparse – bool Whether to perform sparse operations
Returns: np.ndarray, shape [?, embedding_dim], np.ndarray, shape [?, embedding_dim] Embedding matrices.
-