## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# GraphMix: Improved Training of GNNs for Semi-Supervised Learning

AAAI, pp.10024-10032, (2021)

EI

Keywords

Abstract

We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the...More

Code:

Data:

Introduction

- Due to the presence of graph-structured data across a wide variety of domains, such as biological networks, citation networks and social networks, there have been several attempts to design neural networks, known as graph neural networks (GNN), that can process arbitrarily structured graphs.
- 2018, 2019; Qu, Bengio, and Tang 2019; Gao and Ji 2019; Ma et al 2019), among others
- Many of these approaches are designed for addressing the problem of semi-supervised learning over graph-structured data (Zhou et al 2018).
- The authors instead propose an architecture-agnostic method for regularized training of GNNs for semi-supervised node classification.

Highlights

- Due to the presence of graph-structured data across a wide variety of domains, such as biological networks, citation networks and social networks, there have been several attempts to design neural networks, known as graph neural networks (GNN), that can process arbitrarily structured graphs
- We show that with our proposed method, we can achieve state-of-the-art performance even when using simpler GNN architectures such as Graph Convolutional Networks (Kipf and Welling 2017), with no additional memory cost and with minimal additional computation cost
- We conduct a theoritical analysis to demonstrate the effectiveness of the proposed method over the underlying GNNs
- An important question is how these more discriminative node representations can be transferred to the GNN? One potential approach could involve maximizing the mutual information between the hidden states of the FullyConnected Network (FCN) and the GNN using formulations similar to those proposed by (Hjelm et al 2019; Sun et al 2020)
- We propose parameter sharing between FCN and GNN to facilitate the transfer of discriminative node representations from the FCN to the GNN
- We observe that GraphMix always improves the accuracy of the underlying GNNs such as GCN, GAT and Graph-U-Net across all the dataset, with GraphMix(GCN) achieving the best results

Methods

- The authors first describe GraphMix at a high-level and give a more formal description. GraphMix augments the vanilla GNN with a Fully-Connected Network (FCN).
- One potential approach could involve maximizing the mutual information between the hidden states of the FCN and the GNN using formulations similar to those proposed by (Hjelm et al 2019; Sun et al 2020)
- Using the more discriminative representations of the nodes from FCN, as well as the graph structure, the GNN loss is computed in the usual way to further refine the node representations
- In this way the authors can exploit the improved representations from Manifold Mixup for training GNNs. Results reported from the literature.
- Input droputrate=0.5 and hidden dropout rate=0.5 work best for Cora and Citeseer and Input dropout rate=0.2 and hidden dropout rate =0.2 work best for Pubmed

Results

- The authors provide results on three recently proposed datasets which are relatively larger than standard benchmark datasets (Cora/Citeseer/Pubmed).
- The authors use Cora-Full dataset proposed in (Bojchevski and Günnemann 2018) and Coauthor-CS and Coauthor-Physics datasets proposed in (Shchur et al 2018).
- Mix(GCN) improves the results over GCN for all the three datasets with a significant margin.
- The details of the datasets is given in Appendix A.4

Conclusion

- GraphMix is a simple and efficient regularizer for semisupervised node classification using graph neural networks.
- The authors' extensive experiments demonstrate state-of-the-art performance using GraphMix on benchmark datasets.
- The authors' theoretical analysis compares generalization bounds of GraphMix vs the underlying GNNs. The strong empirical results of GraphMix suggest that in parallel to designing new architectures, exploring better regularization for graph-structured data is a promising avenue for research.
- A future research direction is to jointly model the node features and edges of the graph such that they can be further used for generating the synthetic interpolated nodes and their corresponding connectivity to the other nodes in the graph

- Table1: Results of node classification (% test accuracy) on the standard split of datasets. [*] means the results are taken from the corresponding papers. We conduct 100 trials and report mean and standard deviation over the trials (refer to Table 8 in the Appendix for comparison with other methods on standard Train/Validation/Test split)
- Table2: Results of node classification (% test accuracy) using 10 random Train/Validation/Test split of datasets. We conduct 100 trials and report mean and standard deviation over the trials
- Table3: Comparison of GraphMix with other methods (% test accuracy ), for Cora-Full, Coauthor-CS, Coauthor-Physics. ∗ refers to the results reported in (<a class="ref-link" id="cShchur_et+al_2018_a" href="#rShchur_et+al_2018_a">Shchur et al 2018</a>)
- Table4: Ablation study results using 10 labeled samples per class (% test accuracy). We report mean and standard deviation over ten trials. See Section A.5 for the meaning of methods in leftmost column
- Table5: Results on Link Classification (%F1 score). ∗ means the results are taken from the corresponding papers
- Table6: Dataset statistics
- Table7: Dataset statistics for Larger datasets
- Table8: Comparison of GraphMix with other methods (% test accuracy ), for Cora, Citeseer and Pubmed
- Table9: Results using less labeled samples (% test accuracy). K referes to the number of labeled samples per class

Funding

- Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match stateof-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: CoraFull, Co-author-CS and Co-author-Physics
- We show that with our proposed method, we can achieve state-of-the-art performance even when using simpler GNN architectures such as Graph Convolutional Networks (Kipf and Welling 2017), with no additional memory cost and with minimal additional computation cost
- We observe that GraphMix always improves the accuracy of the underlying GNNs such as GCN, GAT and Graph-U-Net across all the dataset, with GraphMix(GCN) achieving the best results

Study subjects and analysis

recently proposed datasets: 3

4.2 Results on Larger Datasets. We also provide results on three recently proposed datasets which are relatively larger than standard benchmark datasets (Cora/Citeseer/Pubmed). We use Cora-Full dataset proposed in (Bojchevski and Günnemann 2018) and Coauthor-CS and Coauthor-Physics datasets proposed in (Shchur et al 2018)

datasets with a significant margin: 3

GraphMix (GCN) GraphMix (Graph-U-Net). Mix(GCN) improves the results over GCN for all the three datasets with a significant margin. We note that we did minimal hyperparameter search for GraphMix(GCN) as mentioned in Appendix A.8

labeled samples: 10

Comparison of GraphMix with other methods (% test accuracy ), for Cora-Full, Coauthor-CS, Coauthor-Physics. ∗ refers to the results reported in (Shchur et al 2018). Ablation study results using 10 labeled samples per class (% test accuracy). We report mean and standard deviation over ten trials. See Section A.5 for the meaning of methods in leftmost column. Results on Link Classification (%F1 score). ∗ means the results are taken from the corresponding papers

Reference

- Bartlett, P. L.; and Mendelson, S. 2002. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research 3(Nov): 463–482.
- Beckham, C.; Honari, S.; Verma, V.; Lamb, A.; Ghadiri, F.; Devon Hjelm, R.; Bengio, Y.; and Pal, C. 2019. On Adversarial Mixup Resynthesis. arXiv e-prints arXiv:1903.02709.
- Belkin, M.; Niyogi, P.; and Sindhwani, V. 2006. Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. J. Mach. Learn. Res. 7: 2399–2434. ISSN 1532-4435. URL http://dl.acm.org/citation.cfm?id=1248547.1248632.
- Berthelot, D.; Carlini, N.; Goodfellow, I.; Papernot, N.; Oliver, A.; and Raffel, C. 2019. MixMatch: A Holistic Approach to Semi-Supervised Learning. arXiv e-prints arXiv:1905.02249.
- Blum, A.; and Mitchell, T. 1998. Combining Labeled and Unlabeled Data with Co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, 92–100. New York, NY, USA: ACM. ISBN 1-58113-057-0. doi:10.1145/279943.279962. URL http://doi.acm.org/10.1145/279943.279962.
- Bojchevski, A.; and Günnemann, S. 2018. Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking. In International Conference on Learning Representations. URL https://openreview.net/forum?id=r1ZdKJ-0W.
- Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2013. Spectral Networks and Locally Connected Networks on Graphs. CoRR abs/1312.6203.
- Chapelle, O.; Schlkopf, B.; and Zien, A. 2010. SemiSupervised Learning. The MIT Press, 1st edition. ISBN 0262514125, 9780262514125.
- Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; and Garnett, R., eds., Advances in Neural Information Processing Systems 29, 3844–3852.
- Deng, Z.; Dong, Y.; and Zhu, J. 2019. Batch Virtual Adversarial Training for Graph Convolutional Networks. CoRR abs/1902.09192. URL http://arxiv.org/abs/1902.09192.
- Devries, T.; and Taylor, G. W. 2017. Improved Regularization of Convolutional Neural Networks with Cutout. CoRR abs/1708.04552. URL http://arxiv.org/abs/1708.04552.
- Ding, M.; Tang, J.; and Zhang, J. 2018. Semi-supervised Learning on Graphs with Generative Adversarial Nets. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM ’18, 913– 922. New York, NY, USA: ACM. ISBN 978-1-4503-6014-2. doi:10.1145/3269206.3271768. URL http://doi.acm.org/10.1145/3269206.3271768.
- Feng, F.; He, X.; Tang, J.; and Chua, T. 2019. Graph Adversarial Training: Dynamically Regularizing Based on Graph Structure. CoRR abs/1902.08226. URL http://arxiv.org/abs/1902.08226.
- Gao, H.; and Ji, S. 2019. Graph U-Nets. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 2083–2092. Long Beach, California, USA: PMLR. URL http://proceedings.mlr.press/v97/gao19a.html.
- Gilmer, J.; Schoenholz, S. S.; Riley, P. F.; Vinyals, O.; and Dahl, G. E. 2017. Neural Message Passing for Quantum Chemistry. In ICML.
- Gori, M.; Monfardini, G.; and Scarselli, F. 2005. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, 729–734. IEEE.
- Grandvalet, Y.; and Bengio, Y. 2005. Semi-supervised Learning by Entropy Minimization. In Saul, L. K.; Weiss, Y.; and Bottou, L., eds., Advances in Neural Information Processing Systems 17, 529–536.
- Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In NIPS.
- Henaff, M.; Bruna, J.; and LeCun, Y. 2015. Deep Convolutional Networks on Graph-Structured Data. ArXiv abs/1506.05163.
- Hjelm, R. D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; and Bengio, Y. 2019. Learning deep representations by mutual information estimation and maximization. In International Conference on Learning Representations. URL https://openreview.net/forum?id= Bklr3j0cKX.
- Jeong, J.; Verma, V.; Hyun, M.; Kannala, J.; and Kwak, N. 2020. Interpolation-based semi-supervised learning for object detection.
- Kipf, T. N.; and Welling, M. 2016. Variational graph autoencoders. arXiv preprint arXiv:1611.07308.
- Kipf, T. N.; and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.
- Ko, T.; Peddinti, V.; Povey, D.; and Khudanpur, S. 2015. Audio augmentation for speech recognition. In INTERSPEECH.
- Kumar, S.; Hooi, B.; Makhija, D.; Kumar, M.; Faloutsos, C.; and Subrahmanian, V. 2018. Rev2: Fraudulent user prediction in rating platforms. In WSDM.
- Kumar, S.; Spezzano, F.; Subrahmanian, V.; and Faloutsos, C. 2016. Edge weight prediction in weighted signed networks. In ICDM.
- Laine, S.; and Aila, T. 2016. Temporal Ensembling for SemiSupervised Learning. CoRR abs/1610.02242. URL http://arxiv.org/abs/1610.02242.
- Lee, D.-H. 2013. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks.
- Li, Q.; Han, Z.; and Wu, X.-M. 2018. Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning. In AAAI.
- Lu, Q.; and Getoor, L. 2003. Link-based Classification. In Proceedings of the Twentieth International Conference on International Conference on Machine Learning, ICML’03, 496–503. AAAI Press. ISBN 1-57735-189-4. URL http://dl.acm.org/citation.cfm?id=3041838.3041901.
- Ma, J.; Cui, P.; Kuang, K.; Wang, X.; and Zhu, W. 2019. Disentangled Graph Convolutional Networks. In ICML.
- Miyato, T.; ichi Maeda, S.; Koyama, M.; and Ishii, S. 2018. Virtual Adversarial Training: a Regularization Method for Supervised and Semi-supervised Learning. IEEE transactions on pattern analysis and machine intelligence.
- Monti, F.; Boscaini, D.; Masci, J.; Rodola, E.; Svoboda, J.; and Bronstein, M. M. 2016. Geometric deep learning on graphs and manifolds using mixture model CNNs. CoRR abs/1611.08402. URL http://arxiv.org/abs/1611.08402.
- Park, D. S.; Chan, W.; Zhang, Y.; Chiu, C.-C.; Zoph, B.; Cubuk, E. D.; and Le, Q. V. 2019. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition. arXiv e-prints arXiv:1904.08779.
- Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In KDD.
- Qu, M.; Bengio, Y.; and Tang, J. 2019. GMNN: Graph Markov Neural Networks. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 5241–5250. Long Beach, California, USA: PMLR.
- Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagenbuchner, M.; and Monfardini, G. 2009. The Graph Neural Network Model. Trans. Neur. Netw. 20(1): 61–80. ISSN 1045-9227. doi: 10.1109/TNN.2008.2005605. URL http://dx.doi.org/10.1109/ TNN.2008.2005605.
- Shchur, O.; Mumme, M.; Bojchevski, A.; and Günnemann, S. 2018. Pitfalls of Graph Neural Network Evaluation. CoRR abs/1811.05868. URL http://arxiv.org/abs/1811.05868.
- Sun, F.-Y.; Hoffman, J.; Verma, V.; and Tang, J. 2020. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In International Conference on Learning Representations. URL https://openreview.net/forum?id=r1lfF2NYvH.
- Tarvainen, A.; and Valpola, H. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30, 1195–1204.
- Taskar, B.; Wong, M.-F.; Abbeel, P.; and Koller, D. 2004. Link prediction in relational data. In NIPS.
- van der Maaten, L.; and Hinton, G. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9: 2579– 2605. URL http://www.jmlr.org/papers/v9/vandermaaten08a.html.
- Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; and Bengio, Y. 2018. Graph Attention Networks. In ICLR.
- Velickovic, P.; Fedus, W.; Hamilton, W. L.; Liò, P.; Bengio, Y.; and Hjelm, R. D. 2019. Deep graph infomax. In ICLR.
- Verma, S.; and Zhang, Z.-L. 2019. Stability and generalization of graph convolutional neural networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 1539–1548.
- Verma, V.; Lamb, A.; Beckham, C.; Najafi, A.; Mitliagkas, I.; Lopez-Paz, D.; and Bengio, Y. 2019a. Manifold Mixup: Better Representations by Interpolating Hidden States. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, 6438–6447. Long Beach, California, USA: PMLR. URL http://proceedings.mlr.press/v97/verma19a.html.
- Verma, V.; Lamb, A.; Juho, K.; Bengio, Y.; and Lopez-Paz, D. 2019b. Interpolation Consistency Training for Semisupervised Learning. In Kraus, S., ed., Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 1016, 2019. ijcai.org. doi:10.24963/ijcai.2019. URL https://doi.org/10.24963/ijcai.2019.
- Weston, J.; Ratle, F.; Mobahi, H.; and Collobert, R. 2012. Deep Learning via Semi-Supervised Embedding. In Montavon, G.; Orr, G.; and Müller, K. R., eds., In Neural Networks: Tricks of the Trade. Springer, second edition.
- Xie, Z.; Wang, S. I.; Li, J.; Lévy, D.; Nie, A.; Jurafsky, D.; and Ng, A. Y. 2017. Data Noising as Smoothing in Neural Network Language Models. ArXiv abs/1703.02573.
- Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.-i.; and Jegelka, S. 2018. Representation Learning on Graphs with Jumping Knowledge Networks. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 5453–5462.
- Stockholmsmässan, Stockholm Sweden: PMLR. URL http://proceedings.mlr.press/v80/xu18c.html.
- Yang, Z.; Cohen, W.; and Salakhudinov, R. 2016. Revisiting Semi-Supervised Learning with Graph Embeddings. In ICML.
- Zhang, H.; Cisse, M.; Dauphin, Y. N.; and Lopez-Paz, D. 2018. mixup: Beyond Empirical Risk Minimization. International Conference on Learning Representations URL https://openreview.net/forum?id=r1Ddp1-Rb.
- Zhou, J.; Cui, G.; Zhang, Z.; Yang, C.; Liu, Z.; and Sun, M. 2018. Graph Neural Networks: A Review of Methods and Applications. CoRR abs/1812.08434. URL http://arxiv.org/abs/1812.08434.
- Zhu, X.; and Ghahramani, Z. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. Technical report.
- Zhu, X.; Ghahramani, Z.; and Lafferty, J. D. 2003. Semisupervised learning using gaussian fields and harmonic functions. In ICML.
- A recently proposed method for accurate target predictions for unlabeled data uses the average of predicted targets across K random augmentations of the input sample (Berthelot et al. 2019). Along these lines, in GraphMix we compute the predicted-targets as the average of predictions made by GNN on K drop-out versions of the input sample.
- For semi-supervised link classification, we use two datasets Bitcoin Alpha and Bitcoin OTC from (Kumar et al. 2016, 2018). The nodes in these datasets correspond to the bitcoin users and the edge weights between them correspond to the degree of trust between the users. Following (Qu, Bengio, and Tang 2019), we treat edges with weights greater than 3 as positive instances, and edges with weights less than -3 are treated as negative ones. Given a few labeled edges, the task is to predict the labels of the remaining edges. The statistics of these datasets as well as the number of training/validation/test nodes is presented in Appendix A.4.

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn