Text Analysis and Visualization Research on the Hetu Dangse During the Qing Dynasty of China

  • Zhiyu Wang Harbin institute of Technology
  • Jingyu Wu Liaoning University
  • Guang Yu Harbin institute of Technology
  • Zhiping Song Liaoning University


In traditional historical research, interpreting historical documents subjectively and manually causes problems such as one-sided understanding, selective analysis, and one-way knowledge connection. In this study, we aim to use machine learning to automatically analyze and explore historical documents from a text analysis and visualization perspective. This technology solves the problem of large-scale historical data analysis that is difficult for humans to read and intuitively understand. In this study, we use the historical documents of the Qing Dynasty Hetu Dangse,preserved in the Archives of Liaoning Province, as data analysis samples. China’s Hetu Dangse is the largest Qing Dynasty thematic archive with Manchu and Chinese characters in the world. Through word frequency analysis, correlation analysis, co-word clustering, word2vec model, and SVM (Support Vector Machines) algorithms, we visualize historical documents, reveal the relationships between functions of the government departments in the Shengjing area of the Qing Dynasty, achieve the automatic classification of historical archives, improve the efficient use of historical materials as well as build connections between historical knowledge. Through this, archivists can be guided practically in historical materials’ management and compilation.

Author Biographies

Zhiyu Wang, Harbin institute of Technology

School of Management, Harbin institute of Technology

School of History, Liaoning University

Jingyu Wu, Liaoning University

School of History, Liaoning University

Zhiping Song, Liaoning University

School of History, Liaoning University


Aleksandrs Ivanovs and Aleksey Varfolomeyev, “Service-oriented Architecture of Intelligent Environment for Historical Records Studies,” Procedia Computer Science 104 (2017): 57–64, http://doi.org/10.1016/j.procs.2017.01.062.

Amit Kumar Sharma, Sandeep Chaurasia, and Devesh Kumar Srivastava, “Sentimental Short Sentences Classification by Using CNN Deep Learning Model with Fine Tuned Word2Vec,” Procedia Computer Science 167 (2020): 1139–47, https://doi.org/10.1016/j.procs.2020.03.416.

Archives of Liaoning Province in China, “The Hetu Dangse Series Archives Publication,” Qing History Research 6, no. 2 (2009): 1.

B Hongxi, “Research on the Sanling Management Institutions of the Qing Dynasty Outside the Pass,” Manchu Minority Research 4, no. 12 (1997): 38–56.

F Yonggong and G Jialu, “Brief Introduction of Shengjing Upper Three Banners Baoyi Zuoling,” Historical Archives 9, no. 30 (1992): 93–7.

Gavin Hackeling, Mastering Machine Learning with Scikit-learn (Birmingham: Packt Publishing, 2017).

Guangli Zhu et al., “Building Multi-subtopic Bi-level Network for Micro-blog Hot Topic Based on Feature Co-occurrence and Semantic Community Division,” Journal of Network and Computer Applications 170 (2020): 102815, https://doi.org/10.1016/j.jnca.2020.102815.

Guus Schreiber et al., “Semantic Annotation and Search of Cultural-heritage Collections: The MultimediaN E-Culture Demonstrator,” Journal of Web Semantics 6, no. 4 (2008): 243–49, https://doi.org/10.1016/j.websem.2008.08.001.

Hobson Lane, Cole Howard, Hannes Hapke, Natural Language Processing in Action: Understanding, Analyzing, and Generating Text with Python (New York: Manning Publications, 2019), 165.

Jakub Kuna and Łukasz Kowalski, “Exploring a Non-existent City via Historical GIS System by the Example of the Jewish District ‘Podzamcze’ in Lublin (Poland),” Journal of Cultural Heritage 46 (2020): 328–34, https://doi.org/10.1016/j.culher.2020.07.010.

Jiming Hu and Yin Zhang, “Research Patterns and Trends of Recommendation System in China Using Co-word Analysis,” Information Processing and Management 51, no. 4 (2015): 329–39, https://doi.org/10.1016/j.ipm.2015.02.002.

Kaixu Zhang and Yunqing Xia, “CRF-based Approach to Sentence Segmentation and Punctuation for Ancient Chinese Prose,” Journal of Tsinghua University (Science and Technology) 10, no. 27 (2009): 39–49, https://doi.org/10.16511/j.cnki.qhdxxb.2009.10.027.

Kuo-Chung Chu, Hsin-Ke Lu, and Wen-I Liu, “Identifying Emerging Relationship in Healthcare Domain Journals via Citation Network Analysis,” Information Technology and Libraries 37, no. 1 (2018): 39–51, https://doi.org/10.6017/ital.v37i1.9595.

Laurens Van der Maaten, Eric Postma, and Jaap van den Herik, “Dimensionality Reduction: A Comparative Review,” Tilburg University Technical Report, TiCC-TR 2009-005 (2009), https://lvdmaaten.github.io/publications/papers/TR_Dimensionality_Reduction_Review_2009.pdf.

M Kim et al., “Inference on Historical Factions Based on Multi-layered Network of Historical Figures,” Expert Systems with Applications 161 (2020): 113703, http://doi.org/10.1016/j.eswa.2020.113703.

Michael Stauffer, Andreas Fischer, and Kaspar Riesen, “Keyword Spotting in Historical Handwritten Documents Based on Graph Matching,” Pattern Recognition 81 (2018): 240–53, https://doi.org/10.1016/j.patcog.2018.04.001.

Nees Jan Van Eck and Ludo Waltman, “Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping, Scientometrics, 84, no. 2 (2010): 523–38, https://doi.org/10.1007/s11192-009-0146-3.

Renata Solar and Dalibor Radovan, “Use of GIS for Presentation of the Map and Pictorial Collection of the National and University Library of Slovenia,” Information Technology and Libraries 24, no. 4 (2005): 196–200, https://doi.org/10.6017/ital.v24i4.3385.

Richard Smiraglia, Domain Analysis for Knowledge Organization: Tools for Ontology Extraction (Oxford: Chandos Publishing, 2015).

S. Ravikumar, Ashutosh Agrahari, and S. N. Singh, “Mapping the Intellectual Structure of Scientometrics: A Co-word Analysis of the Journal Scientometrics (2005–2010),” Scientometrics 102 (2015): 929–55, https://doi.org/10.1007/s11192-014-1402-8.

Shaochun Dong et al., “Semantic Enhanced WebGIS Approach to Visualize Chinese Historical Natural Hazards,” Journal of Cultural Heritage 14, no. 3 (2013): 181–89, https://doi.org/10.1016/j.culher.2012.06.009.

ShengJing Ministry of Revenue, Guangxu's Great Qing Huidian Volume 25 (Zhonghua Book Company, 1991), 211–12.

Wang Tao, “Data Mining of German Historical Documents in the 18th Century, Taking Topic Models as Examples,” Xuehai 1, no. 20 (2017): 206–16, https://doi.org/10.16091/j.cnki.cn32-1308/c.2017.01.021.

Wangyue, “Research on the Yamens and Their Affair Relationships in Shengjing Area,” Shenyang Palace Museum Journal 1, no. 31 (2011): 67–77.

Wu Sihang et al., “Precise Detection of Chinese Characters in Historical Documents with Deep Reinforcement Learning,” Pattern Recognition 107 (2020): 107503, https://doi.org/10.1016/j.patcog.2020.107503.

Z Yanchang and L Xinzhu, “The Study of the Function of Shengjing Office from the Use of the Official Communication — An Academic Investigation Based on Hetu Dangse,” Shanxi Archives 8, no. 12 (2020): 179–88.

How to Cite
Wang, Z., Wu, J., Yu, G., & Song, Z. (2021). Text Analysis and Visualization Research on the Hetu Dangse During the Qing Dynasty of China. Information Technology and Libraries, 40(3). https://doi.org/10.6017/ital.v40i3.13279