## AI helps you reading Science

## AI Insight

AI extracts a summary of this paper

Weibo:

# Uncertainty Aware Semi-Supervised Learning on Graph Data

NIPS 2020, (2020)

EI

Keywords

Abstract

Thanks to graph neural networks (GNNs), semi-supervised node classification has shown the state-of-the-art performance in graph data. However, GNNs have not considered different types of uncertainties associated with class probabilities to minimize risk of increasing misclassification under uncertainty in real life. In this work, we pro...More

Introduction

- Inherent uncertainties derived from different root causes have realized as serious hurdles to find effective solutions for real world problems.
- Graph neural networks (GNNs) [16, 32] have received tremendous attention in the data science community
- Despite their superior performance in semi-supervised node classification and regression, they didn’t consider various types of uncertainties in the their decision process.
- The authors first considered multidimensional uncertainty types in both deep learning (DL) and belief and evidence theory domains for node-level classification, misclassification detection, and out-of-distribution (OOD) detection tasks.
- By leveraging the learning capability of GNNs and considering multidimensional uncertainties, the authors propose a uncertainty-aware estimation framework by quantifying

Highlights

- Inherent uncertainties derived from different root causes have realized as serious hurdles to find effective solutions for real world problems
- We observe the following performance order: Dissonance > Entropy ≈ Aleatoric > Vacuity ≈ Epistemic, which is aligned with our conjecture: higher dissonance with conflicting prediction leads to higher misclassification detection
- We proposed a multi-source uncertainty framework of Graph neural networks (GNNs) for semi-supervised node classification
- We leveraged various types of uncertainty estimates from both deep learning (DL) and evidence/belief theory domains
- We found that dissonance-based detection yielded the best performance on misclassification detection while vacuity-based detection performed the best for OOD detection, compared to other competitive counterparts
- It was noticeable that applying Graph-based Kernel Dirichlet distribution Estimation (GKDE) and the Teacher network further enhanced the accuracy in node classification and uncertainty estimates

Methods

- The authors conduct experiments on the tasks of misclassification and OOD detections to answer the following questions for semi-supervised node classification: Q1.
- Misclassification Detection: What type of uncertainty is the most promising indicator of high confidence in node classification predictions?
- Dissonance is more effective than other uncertainty estimates in misclassification detection.
- Vacuity is more effective than other uncertainty estimates in OOD detection.
- GKDE can help improve the estimation quality of node-level Dirichlet distributions, resulting in a higher OOD detection

Results

- The misclassification detection experiment involves detecting whether a given prediction is incorrect using an uncertainty estimate.
- The outperformance of dissonance-based detection is fairly impressive.
- This confirms that low dissonance is the key to maximize the accuracy of node classification prediction.
- OOD Detection
- This experiment involves detecting whether an input example is out-of-distribution (OOD) given an estimate of uncertainty.
- Due to the space constraint, the experimental setup for the OOD detection is detailed in Appendix B.3

Conclusion

- The authors proposed a multi-source uncertainty framework of GNNs for semi-supervised node classification.
- The authors' proposed framework provides an effective way of predicting node classification and out-of-distribution detection considering multiple types of uncertainty.
- The authors leveraged various types of uncertainty estimates from both DL and evidence/belief theory domains.
- It was noticeable that applying GKDE and the Teacher network further enhanced the accuracy in node classification and uncertainty estimates.
- Based on GKDE, the evidence contribution for the node i and a training node l ∈ {1, .

- Table1: AUROC and AUPR for the Misclassification Detection
- Table2: AUROC and AUPR for the OOD Detection
- Table3: Ablation study of our proposed models: (1) S-GCN: Subjective GCN with vacuity and dissonance estimation; (2) S-BGCN: S-GCN with Bayesian framework; (3) S-BGCN-T: S-BGCN with a Teacher Network; (4) S-BGCN-T-K: S-BGCN-T with GKDE to improve uncertainty estimation
- Table4: Description of datasets and their experimental setup for the node classification prediction
- Table5: Description of datasets and their experimental setup for the OOD detection
- Table6: Big-O time complexity of our method and baseline GCN
- Table7: Hyperparameter configurations of S-BGCN-T-K model
- Table8: Ablation experiment on AUROC and AUPR for the Misclassification Detection
- Table9: Ablation experiment on AUROC and AUPR for the OOD Detection
- Table10: Hyper-parameters of S-BGAT-T-K model
- Table11: Semi-supervised node classification accuracy based on GAT
- Table12: Table 12
- Table13: Table 13: Epistemic uncertainty for semi-supervised image classification
- Table14: Compare with DropEdge on Misclassification Detection

Related work

- DL research has mainly considered aleatoric uncertainty (AU) and epistemic uncertainty (EU) using BNNs for computer vision applications. AU consists of homoscedastic uncertainty (i.e., constant errors for different inputs) and heteroscedastic uncertainty (i.e., different errors for different inputs) [5]. A Bayesian DL framework was presented to simultaneously estimate both AU and EU in regression (e.g., depth regression) and classification (e.g., semantic segmentation) tasks [14]. Later, distributional uncertainty was defined based on distributional mismatch between testing and training data distributions [20]. Dropout variational inference [7] was used for an approximate inference in BNNs using epistemic uncertainty, similar to DropEdge [24]. Other algorithms have considered overall uncertainty in node classification [3, 18, 34]. However, no prior work has considered uncertainty decomposition in GNNs.

Funding

- This work is supported by the National Science Foundation (NSF) under Grant No #1815696 and #1750911

Study subjects and analysis

real network datasets: 6

By collecting evidence from the given labels of training nodes, the Graph-based Kernel Dirichlet distribution Estimation (GKDE) method is designed for accurately predicting node-level Dirichlet distributions and detecting out-of-distribution (OOD) nodes. We validated the outperformance of our proposed model compared to the state-of-the-art counterparts in terms of misclassification detection and OOD detection based on six real network datasets. We found that dissonance-based detection yielded the best results on misclassification detection while vacuity-based detection was the best for OOD detection

real graph datasets: 6

We demonstrate via a theoretical analysis that an OOD node may have a high predictive uncertainty under GKDE. • Comprehensive experiments for validating the performance of our proposed framework: Based on the six real graph datasets, we compared the performance of our proposed framework with that of other competitive counterparts. We found that the dissonance-based detection yielded the best results in misclassification detection while vacuity-based detection best performed in OOD detection

datasets: 6

6.1 Experiment Setup. Datasets: We used six datasets, including three citation network datasets [26] (i.e., Cora, Citeseer, Pubmed) and three new datasets [30] (i.e., Coauthor Physics, Amazon Computer, and Amazon Photo). We summarized the description and experimental setup of the used datasets in Appendix B.21

datasets: 3

We observe the following performance order: Dissonance > Entropy ≈ Aleatoric > Vacuity ≈ Epistemic, which is aligned with our conjecture: higher dissonance with conflicting prediction leads to higher misclassification detection. We also conducted experiments on additional three datasets and observed similar trends of the results, as demonstrated in Appendix C. OOD Detection

network datasets: 6

Due to the space constraint, the experimental setup for the OOD detection is detailed in Appendix B.3. In Table 2, across six network datasets, our vacuity-based detection significantly outperformed the other competitive methods, exceeding the performance of the epistemic uncertainty and other type of uncertainties. This demonstrates that vacuity-based model is more effective than other uncertainty estimates-based counterparts in increasing OOD detection

Reference

- C. W. De Silva. Intelligent control: fuzzy logic applications. CRC press, 1995.
- S. Depeweg, J.-M. Hernandez-Lobato, F. Doshi-Velez, and S. Udluft. Decomposition of uncertainty in bayesian deep learning for efficient and risk-sensitive learning. In International Conference on Machine Learning, pages 1184–1193. PMLR, 2018.
- D. Eswaran, S. Günnemann, and C. Faloutsos. The power of certainty: A dirichlet-multinomial model for belief propagation. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 144–152. SIAM, 2017.
- T. Fawcett. An introduction to roc analysis. Pattern recognition letters, pages 861–874, 2006.
- Y. Gal. Uncertainty in deep learning. University of Cambridge, 2016.
- Y. Gal and Z. Ghahramani. Bayesian convolutional neural networks with bernoulli approximate variational inference. arXiv preprint arXiv:1506.02158, 2015.
- Y. Gal and Z. Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059, 2016.
- X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256, 2010.
- D. Hendrycks and K. Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
- G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- A. Josang. Subjective logic. Springer, 2016.
- A. Josang, J.-H. Cho, and F. Chen. Uncertainty characteristics of subjective opinions. In 2018 21st International Conference on Information Fusion (FUSION), pages 1998–2005. IEEE, 2018.
- A. Kendall, V. Badrinarayanan, and R. Cipolla. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680, 2015.
- A. Kendall and Y. Gal. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in neural information processing systems, pages 5574–5584, 2017.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
- D.-H. Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, 2013.
- Z.-Y. Liu, S.-Y. Li, S. Chen, Y. Hu, and S.-J. Huang. Uncertainty aware graph gaussian process for semi-supervised learning. In AAAI, pages 4957–4964, 2020.
- L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, pages 2579–2605, 2008.
- A. Malinin and M. Gales. Predictive uncertainty estimation via prior networks. In Advances in Neural Information Processing Systems, pages 7047–7058, 2018.
- J. McAuley, C. Targett, Q. Shi, and A. Van Den Hengel. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pages 43–52, 2015.
- T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence, pages 1979–1993, 2018.
- N. J. Nilsson. Probabilistic logic. Artificial intelligence, pages 71–87, 1986.
- Y. Rong, W. Huang, T. Xu, and J. Huang. Dropedge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, 2019.
- S. Ryu, Y. Kwon, and W. Y. Kim. Uncertainty quantification of molecular property prediction with bayesian neural networks. arXiv preprint arXiv:1903.08375, 2019.
- P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI magazine, pages 93–93, 2008.
- M. Sensoy, L. Kaplan, and M. Kandemir. Evidential deep learning to quantify classification uncertainty. In Advances in Neural Information Processing Systems, pages 3179–3189, 2018.
- K. Sentz, S. Ferson, et al. Combination of evidence in Dempster-Shafer theory, volume 4015. Sandia National Laboratories Albuquerque, 2002.
- G. Shafer. A mathematical theory of evidence, volume 42. Princeton university press, 1976.
- O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
- A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in neural information processing systems, pages 1195–1204, 2017.
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph Attention Networks. International Conference on Learning Representations, 2018.
- Z. Yang, W. Cohen, and R. Salakhudinov. Revisiting semi-supervised learning with graph embeddings. In International conference on machine learning, pages 40–48. PMLR, 2016.
- Y. Zhang, S. Pal, M. Coates, and D. Ustebay. Bayesian graph convolutional neural networks for semisupervised classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5829–5836, 2019.
- J. Zhou, G. Cui, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.
- 1. General relations on all prediction scenarios. (a) uv + udiss ≤ 1; (b) uv > uepis.
- 2. Special relations on the OOD and the CP.
- 1. Dissonance has a upper bound with udiss =
- 2. Similarly, for epistemic uncertainty, which is monotonically decreasing as K increases based on Lemma 1, the maximum epistemic can be shown to be
- 2. Then we have, ualea ≥ ln 2 2 > 1 − 2 ln 2 ≥ uepis
- 2. We get its derivative, g (x)

Tags

Comments

数据免责声明

页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果，我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问，可以通过电子邮件方式联系我们：report@aminer.cn