Predictive learning analytics with data pipeline variability
BücherAngebote / Angebote:
Monte Carlo simulation studies are used to examine how eight factors impact predictions of a binary target outcome in data science pipelines: (1) the choice of four DMMs [Logistic Regression (LR), Elastic Net Regression (GLMNET), Random Forest (RF), Extreme Gradient Boosting (XGBoost)], (2) the choice of three filter preprocessing feature selection techniques [Correlation Attribute Evaluation (CAE), Fisher's Scoring Algorithm (FSA), Information Gain Attribute Evaluation (IG)], (3) number of training observations, (4) number of features, (5) error of measurement, (6) class imbalance magnitude, (7) missing data pattern, and (8) feature selection cutoff. The findings are consistent with literature about which data properties and algorithms perform best. Measurement error negatively impacted pipeline performance across all factors, DMMs, and feature selection techniques.
Folgt in ca. 10 Arbeitstagen