the mass lab logo

Modelling and Analytics of Software and Systems

Building secure, efficient, and usable applications, systems, and networks

About

The Laboratory for Modelling and Analytics of Software and Systems (MASS Lab) is Professor Hui Chen's research group at the City University of New York. The group is investigating a multi-pronged approach for engineering large and complex software and networked systems. For instance, analyzing how developers interact with IDEs allows us to predict development activities and to improve these productivity tools; observing software changes in software projects provides us opportunity not only to discover software architecture and common bugs, but also to suggest coding solutions.

The group welcomes collaborations with industry and with motivated graduate and undergraduate students. Please contact Professor Hui Chen if you are interested in joining the research group.

Active and Recent Projects

Hierarchical Usage Context for Software Exceptions

The primary objective is to provide usage contexts for software faults manifested as software exceptions. The modelling tools are unsupervised probabilistic graphical models. The datasets of interest are the combination of interaction traces and software crash reports. The output of the models includes a tree or a hierarchy of usage contexts and the probabilistic association of software exceptions to the tree of contexts, which contributes to a debugging methodology called “debugging in the large”, a postmortem analysis of large amount of usage data to recognize patterns of bugs.

Publication

  1. Chen, H., Damevski, K., Shepherd, D., & Kraft, N. A. (2019). Modeling hierarchical usage context for software exceptions based on interaction data. Automated Software Engineering. https://doi.org/10.1007/s10515-019-00265-3

    Traces of user interactions with a software system, captured in production, are commonly used as an input source for user experience testing. In this paper, we present an alternative use, introducing a novel approach of modeling user interaction traces enriched with another type of data gathered in production—software fault reports consisting of software exceptions and stack traces. The model described in this paper aims to improve developers’ comprehension of the circumstances surrounding a specific software exception and can highlight specific user behaviors that lead to a high frequency of software faults. Modeling the combination of interaction traces and software crash reports to form an interpretable and useful model is challenging due to the complexity and variance in the combined data source. Therefore, we propose a probabilistic unsupervised learning approach, adapting the nested hierarchical Dirichlet process, which is a Bayesian non-parametric hierarchical topic model originally applied to natural language data. This model infers a tree of topics, each of whom describes a set of commonly co-occurring commands and exceptions. The topic tree can be interpreted hierarchically to aid in categorizing the numerous types of exceptions and interactions. We apply the proposed approach to large scale datasets collected from the ABB RobotStudio software application, and evaluate it both numerically and with a small survey of the RobotStudio developers.

    @article{chen2019modeling,
      author = {Chen, Hui and Damevski, Kostadin and Shepherd, David and Kraft, Nicholas A.},
      title = {Modeling hierarchical usage context for software exceptions based on interaction data},
      journal = {Automated Software Engineering},
      year = {2019},
      month = aug,
      day = {13},
      issn = {1573-7535},
      doi = {10.1007/s10515-019-00265-3},
      url = {https://doi.org/10.1007/s10515-019-00265-3}
    }
    

Information Retrieval for Software Code and Documentation

The growing crow-sourced online software development documentation and the advancement in Web searches have changed how we learn to develop software. How do we preserve this valuable software documentation? How do we integrate the documentation with our day-to-day software development process, tools, and ecosystem? How do we use the knowledge in the documentation to build better software, system and networks? The primary objective is to retrieve knowledge from the crowd-sourced informational software documentation and to use the knowledge to help improve software development tools and ecosystems.

Publication

  1. Chen, H., Coogle, J., & Damevski, K. (2019). Modeling stack overflow tags and topics as a hierarchy of concepts. Journal of Systems and Software, 156, 283–299. https://doi.org/https://doi.org/10.1016/j.jss.2019.07.033

    Developers rely on online Q&A forums to look up technical solutions, to pose questions on implementation problems, and to enhance their community profile by contributing answers. Many popular developer communication platforms, such as the Stack Overflow Q&A forum, require threads of discussion to be tagged by their contributors for easier lookup in both asking and answering questions. In this paper, we propose to leverage Stack Overflow’s tags to create a hierarchical organization of concepts discussed on this platform. The resulting concept hierarchy couples tags with a model of their relevancy to prospective questions and answers. For this purpose, we configure and apply a supervised multi-label hierarchical topic model to Stack Overflow questions and demonstrate the quality of the model in several ways: by identifying tag synonyms, by tagging previously unseen Stack Overflow posts, and by exploring how the hierarchy could aid exploratory searches of the corpus. The results suggest that when traversing the inferred hierarchical concept model of Stack Overflow the questions become more specific as one explores down the hierarchy and more diverse as one jumps to different branches. The results also indicate that the model is an improvement over the baseline for the detection of tag synonyms and that the model could enhance existing ensemble methods for suggesting tags for new questions. The paper indicates that the concept hierarchy as a modeling imperative can create a useful representation of the Stack Overflow corpus. This hierarchy can be in turn integrated into development tools which rely on information retrieval and natural language processing, and thereby help developers more efficiently navigate crowd-sourced online documentation.

    @article{CHEN2019283,
      title = {Modeling stack overflow tags and topics as a hierarchy of concepts},
      journal = {Journal of Systems and Software},
      volume = {156},
      pages = {283 - 299},
      year = {2019},
      issn = {0164-1212},
      doi = {https://doi.org/10.1016/j.jss.2019.07.033},
      url = {http://www.sciencedirect.com/science/article/pii/S0164121219301499},
      author = {Chen, Hui and Coogle, John and Damevski, Kostadin},
      keywords = {Concept hierarchy, Hierarchical topic model, Stack overflow, Tag synonym identification, Tag prediction, Entropy-based search evaluation}
    }
    

Interaction-Aware Recommendation Systems for Software Developers

The primary objective is integrate developer activity modeling into recommendation systems for software developers. Software development is a complex cognitive task. We can reduce software developers’ cognitive load by providing effective tools. These tools should be capable of recognizing developers’ activities and thereby make recommendations, such as, best IDE commands or plugins to choose from and situated learning of best practices.

Publication

  1. Chen, H., Ciborowska, A., & Damevski, K. (2019). Using Automated Prompts for Student Reflection on Computer Security Concepts. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education (pp. 506–512). New York, NY, USA: ACM. https://doi.org/10.1145/3304221.3319731
    @inproceedings{Chen2019UAP_3304221_3319731,
      author = {Chen, Hui and Ciborowska, Agnieszka and Damevski, Kostadin},
      title = {Using Automated Prompts for Student Reflection on Computer Security Concepts},
      booktitle = {Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education},
      series = {ITiCSE '19},
      year = {2019},
      isbn = {978-1-4503-6301-3},
      location = {Aberdeen, Scotland Uk},
      pages = {506--512},
      numpages = {7},
      url = {http://doi.acm.org/10.1145/3304221.3319731},
      doi = {10.1145/3304221.3319731},
      acmid = {3319731},
      publisher = {ACM},
      address = {New York, NY, USA},
      keywords = {automated reflection, reflection, reflection prompt}
    }
    
  2. Damevski, K., Chen, H., Shepherd, D. C., Kraft, N. A., & Pollock, L. (2018). Predicting Future Developer Behavior in the IDE Using Topic Models. IEEE Transactions on Software Engineering, 44(11), 1100–1111. https://doi.org/10.1109/TSE.2017.2748134
    @article{DAMEVSKI_8024001,
      author = {Damevski, Kostadin and Chen, Hui and Shepherd, David C. and Kraft, Nicholas A. and Pollock, Lori},
      journal = {IEEE Transactions on Software Engineering},
      title = {Predicting Future Developer Behavior in the IDE Using Topic Models},
      year = {2018},
      volume = {44},
      number = {11},
      pages = {1100-1111},
      keywords = {program debugging;recommender systems;software engineering;future developer behavior;early software command recommender systems;negative user reaction;unusually complex applications;command recommendations;recommendation generation;user experience;command recommenders;future task context;debug OR;future development commands;software development interaction data;predicting future IDE commands;empirically-interpretable observations;Natural languages;Data models;Analytical models;Predictive models;Visualization;Adaptation models;Data analysis;Command recommendation systems;IDE interaction data},
      doi = {10.1109/TSE.2017.2748134},
      url = {https://dx.doi.org/10.1109/TSE.2017.2748134},
      issn = {0098-5589},
      month = nov
    }
    
  3. Damevski, K., Chen, H., Shepherd, D., & Pollock, L. (2016). Interactive Exploration of Developer Interaction Traces Using a Hidden Markov Model. In Proceedings of the 13th International Workshop on Mining Software Repositories (pp. 126–136). New York, NY, USA: ACM. https://doi.org/10.1145/2901739.2901741
    @inproceedings{Damevski_2016_IED_2901739_2901741,
      author = {Damevski, Kostadin and Chen, Hui and Shepherd, David and Pollock, Lori},
      title = {Interactive Exploration of Developer Interaction Traces Using a
              Hidden Markov Model},
      booktitle = {Proceedings of the 13th International Workshop on Mining
              Software Repositories},
      series = {MSR '16},
      year = {2016},
      isbn = {978-1-4503-4186-8},
      location = {Austin, Texas},
      pages = {126--136},
      numpages = {11},
      url = {http://doi.acm.org/10.1145/2901739.2901741},
      doi = {10.1145/2901739.2901741},
      acmid = {2901741},
      publisher = {ACM},
      address = {New York, NY, USA},
      keywords = {IDE usage data, field studies, hidden-markov model}
    }
    

Accountable and Secure Systems

Preventative countermeasures and accountability are two complementary approaches to address computer security. Preventative countermeasures have been the primary approach in practice and in research. Real-world experiences from dealing with security indicate that accountability is not only complement to preventative countermeasures, but also necessary, in particular, when online privacy becomes a growing concern to individuals and societies at the advent of sophisticated cross-site referencing tools and algorithms. The primary object is thus to build accountable systems and networks to address real-word computer security.

Publication

  1. Xiao, Y., Zeng, L., Chen, H., & Li, T. (2019). Prototyping Flow-net Logging for Accountability Management in Linux Operating Systems for IoTs. IEEE Access, 1–1. https://doi.org/10.1109/ACCESS.2019.2937637
    @article{8813093,
      author = {Xiao, Yang and Zeng, Lei and Chen, Hui and Li, Tieshan},
      journal = {IEEE Access},
      title = {Prototyping Flow-net Logging for Accountability Management in Linux Operating Systems for IoTs},
      year = {2019},
      volume = {},
      number = {},
      pages = {1-1},
      keywords = {Linux;Access control;Kernel;Internet of Things;Computer security;Computer Security;Accountability;Logging;Auditing;Flow-net;IoT},
      doi = {10.1109/ACCESS.2019.2937637},
      issn = {2169-3536},
      month = {}
    }
    
  2. Fu, B., Xiao, Y., & Chen, H. (2018). FNF: Flow-net based fingerprinting and its applications. Computers & Security, 75, 167–181. https://doi.org/https://doi.org/10.1016/j.cose.2018.02.005
    @article{FU2018167,
      title = {{FNF}: Flow-net based fingerprinting and its applications},
      journal = {Computers \& Security},
      volume = {75},
      pages = {167 - 181},
      month = jun,
      year = {2018},
      issn = {0167-4048},
      doi = {https://doi.org/10.1016/j.cose.2018.02.005},
      url = {http://www.sciencedirect.com/science/article/pii/S0167404818300877},
      author = {Fu, Bo and Xiao, Yang and Chen, Hui},
      keywords = {Flow-net, Logging, Fingerprint, Intrusion detection, Computer networks, Computer systems}
    }
    
  3. Zeng, L., Chen, H., & Xiao, Y. (2017). Accountable Administration in Operating Systems. International Journal of Information and Computer Security, 9, 157–179. https://doi.org/10.1504/IJICS.2017.10005900
    @article{zeng2017accountable,
      author = {Zeng, Lei and Chen, Hui and Xiao, Yang},
      title = {Accountable Administration in Operating Systems},
      journal = {International Journal of Information and Computer Security},
      year = {2017},
      volume = {9},
      no = {3},
      pages = {157--179},
      doi = {10.1504/IJICS.2017.10005900},
      url = {https://dx.doi.org/10.1504/IJICS.2017.10005900},
      publisher = {Indersciene}
    }
    
  4. Zeng, L., Xiao, Y., Chen, H., Sun, B., & Han, W. (2016). Computer operating system logging and security issues: a survey. Security and Communication Networks, 9(17), 4804–4821. https://doi.org/10.1002/sec.1677
    @article{ZENG_SEC1677,
      author = {Zeng, Lei and Xiao, Yang and Chen, Hui and Sun, Bo and Han, Wenlin},
      title = {Computer operating system logging and security issues: a survey},
      journal = {Security and Communication Networks},
      volume = {9},
      number = {17},
      issn = {1939-0122},
      url = {https://dx.doi.org/10.1002/sec.1677},
      doi = {10.1002/sec.1677},
      pages = {4804--4821},
      keywords = {logging, operating system, Linux, Unix, security},
      year = {2016}
    }
    
  5. Zeng, L., Xiao, Y., & Chen, H. (2015). Auditing overhead, auditing adaptation, and benchmark evaluation in Linux. Security and Communication Networks, 8(18), 3523–3534. https://doi.org/10.1002/sec.1277
    @article{ZENG_SEC1277,
      author = {Zeng, Lei and Xiao, Yang and Chen, Hui},
      title = {Auditing overhead, auditing adaptation, and benchmark evaluation in {Linux}},
      journal = {Security and Communication Networks},
      volume = {8},
      number = {18},
      issn = {1939-0122},
      url = {https://dx.doi.org/10.1002/sec.1277},
      doi = {10.1002/sec.1277},
      pages = {3523--3534},
      keywords = {logging, overhead, Linux, auditing},
      year = {2015}
    }
    
  6. Xiao, Z., Xiao, Y., & Chen, H. (2014). An Accountable Framework for Sensing-oriented Mobile Cloud Computing. Journal of Internet Technology, 15(5), 813–822. Retrieved from https://dx.doi.org/10.6138%2fJIT.2014.15.5.11
    @article{Xiaozf2014,
      title = {An Accountable Framework for Sensing-oriented Mobile Cloud Computing},
      author = {Xiao, Zhifeng and Xiao, Yang and Chen, Hui},
      journal = {Journal of Internet technology},
      volume = {15},
      number = {5},
      pages = {813 - 822},
      month = sep,
      year = {2014},
      url = {https://dx.doi.org/10.6138\%2fJIT.2014.15.5.11}
    }