Feature Engineering for Enhanced Model Performance in Software Effort Estimation
Sreekumar P. Pillai1, Radharamanan T.2, Madhukumar S.D.3

1Sreekumar P. Pillai, Research Scholar, School of Management Studies, National Institute of Technology Calicut, Kozhikode, Kerala, India.
2Dr. T. Radharamanan, Head of Department, School of Management Studies, NITC, Kozhikode, Kerala, India.
3Dr. S.D. Madhukumar, Professor, Dean (SW), Dept. of Computer Science, NITC, Kozhikode, Kerala, India.

Manuscript received on 03 August 2019. | Revised Manuscript received on 09 August 2019. | Manuscript published on 30 September 2019. | PP: 6053-6063 | Volume-8 Issue-3 September 2019 | Retrieval Number: C5602098319/2019©BEIESP | DOI: 10.35940/ijrte.C5602.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Many new methodologies have been defined in the last two decades in the domain of Software Effort Estimation. They include manual methods based on expert judgment, analogy-based models, parametric models, regression models, machine learning models, and more recently, deep learning models. Except for manual methods, all other models depend heavily on data. Lack of quality data in this domain is a motivation to explore means to optimize the sparse data available. Machine learning algorithms depend on domain features, and their ability to represent and model the domain, to solve the problems irrespective of whether it is classification or regression, image, or voice synthesis. There is continued research for the best representation of the issue through the right feature space. While most of the traditional research rely on the original dataset and concentrate more on feature selection, modern-day approaches explore creating additional features that have the potential to extend the models representational space. This research builds on our last research exploring the potential to improve Software Effort Estimation accuracy by employing engineered features in addition to the original ones. The features are created manually based on the literature. Through the engineered features, we captured additional representational features such as missingness and proportion of categorical data available in the dataset. We present the rationale for the features generated and compare the prediction accuracy between a model using the original dataset and the engineered data set.Our experiments in Feature Engineering is innovative in the Software Estimation domain and the results conclusive establishing its use in predicting Software Effort. We report an improved accuracy of 38% with engineered features at PRED(15), and 11% improvement at PRED(20). The quantitative growth that we have been able to achieve in terms of accuracy is promising enough for this to be adopted as a standard in future research on the subject and practical applications.
Keywords: Software Effort Estimation, Software Cost Estimation, Effort Prediction, Feature Engineering, Generalized Linear Model, Artificial Neural Network, Random Forests.

Scope of the Article:
Software Engineering Methodologies