Classifying research papers according to their research topics is an important task to improve their retrievability, assist the creation of smart analytics, and support a variety of approaches for analysing and making sense of the research environment. In this page, we present the CSO Classifier, a new unsupervised approach for automatically classifying research papers according to the Computer Science Ontology (CSO), a comprehensive ontology of research areas in the field of Computer Science.
Try out our CSO Classifier
The CSO Classifier takes as input the metadata associated with a research paper (title, abstract, keywords) and returns a selection of research concepts drawn from the ontology. It consists of two main components: (i) the syntactic module and (ii) the semantic module. Figure 1 depicts its architecture. The syntactic module parses the input documents and identifies CSO concepts that are explicitly referred in the document. The semantic module uses part-of-speech tagging to identify promising terms and then exploits word embeddings to infer semantically related topics. Finally, the CSO Classifier combines the results of these two modules, discards outliers, and enhances them by including relevant super-areas.
We developed the classifier in Python 3 and we release it under Apache 2.0 Licence.
If you want to know more about this research initiative please refer to the following papers:
- Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F. and Motta, E. 2018. Classifying Research Papers with the Computer Science Ontology. ISWC-P&D-Industry-BlueSky 2018 (2018). Read from ORO
- Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles. In: TPDL 2019: 23rd International Conference on Theory and Practice of Digital Libraries. Springer. Read from ORO
- Salatino, A.A., Osborne, F., Motta, E.: CSO Classifier 3.0: A Scalable Unsupervised Method for Classifying Documents in Terms of Research Topics. In International Journal on Digital Libraries, Springer, 2021. Read from ORO
Download and Install
The CSO Classifier is an ongoing project. You can follow its development through our Github repository https://github.com/angelosalatino/cso-classifier, or you can download the latest release from Zenodo:
You can also install the CSO Classifier using the package manager for Python:
- Ensure you have Python 3.6 or above installed. Download latest version.
- Use pip to install the classifier:
pip install cso-classifier
- Download English package for spaCy using
python -m spacy download en_core_web_sm
Try the CSO Classifier
The CSO Classifier is also available as a web app for assisting users in selecting the best set of research topics to describe a scientific article. This web app integrates the CSO Classifier, and GROBID, a machine learning tool for extracting metadata from research papers in PDF.
Feel free to run the classifier on your own papers (also in PDF).
Go to the classifier.