Information Retrieval for Software Code and Documentation
The growing crow-sourced online software development documentation and the advancement in Web searches have changed how we learn to develop software. How do we preserve this valuable software documentation? How do we integrate the documentation with our day-to-day software development process, tools, and ecosystem? How do we use the knowledge in the documentation to build better software, system and networks? The primary objective is to retrieve knowledge from the crowd-sourced informational software documentation and to use the knowledge to help improve software development tools and ecosystems.
Related Publications
2019
Modeling stack overflow tags and topics as a hierarchy of concepts
Developers rely on online Q&A forums to look up technical solutions, to pose questions on implementation problems, and to enhance their community profile by contributing answers. Many popular developer communication platforms, such as the Stack Overflow Q&A forum, require threads of discussion to be tagged by their contributors for easier lookup in both asking and answering questions. In this paper, we propose to leverage Stack Overflow’s tags to create a hierarchical organization of concepts discussed on this platform. The resulting concept hierarchy couples tags with a model of their relevancy to prospective questions and answers. For this purpose, we configure and apply a supervised multi-label hierarchical topic model to Stack Overflow questions and demonstrate the quality of the model in several ways: by identifying tag synonyms, by tagging previously unseen Stack Overflow posts, and by exploring how the hierarchy could aid exploratory searches of the corpus. The results suggest that when traversing the inferred hierarchical concept model of Stack Overflow the questions become more specific as one explores down the hierarchy and more diverse as one jumps to different branches. The results also indicate that the model is an improvement over the baseline for the detection of tag synonyms and that the model could enhance existing ensemble methods for suggesting tags for new questions. The paper indicates that the concept hierarchy as a modeling imperative can create a useful representation of the Stack Overflow corpus. This hierarchy can be in turn integrated into development tools which rely on information retrieval and natural language processing, and thereby help developers more efficiently navigate crowd-sourced online documentation.
@article{CHEN2019283,title={Modeling stack overflow tags and topics as a hierarchy of concepts},journal={Journal of Systems and Software},volume={156},pages={283 - 299},year={2019},issn={0164-1212},doi={https://doi.org/10.1016/j.jss.2019.07.033},url={http://www.sciencedirect.com/science/article/pii/S0164121219301499},author={Chen, Hui and Coogle, John and Damevski, Kostadin},keywords={Concept hierarchy, Hierarchical topic model, Stack overflow, Tag synonym identification, Tag prediction, Entropy-based search evaluation}}