Research Insights

Business Statistics

AdaBoost Semiparametric Model Averaging Prediction for Multiple Categories

Jialiang Li, Alan Wan
Published in Journal of the American Statistical Association, forthcoming

As the business world embraces the concept of big data, sophisticated learning tools based on massive data information are being developed rapidly for many real-life problems. How do we enhance data mining devices by using cutting-edge statistical methodology? This is the question that CityU's statistician Alan Wan tries to answer in his newly published article in the Journal of the American Statistical Association.

Together with Jialiang Li of the National University of Singapore, Alan Wan, Professor and Head of Department of Management Sciences, developed a new statistical methodology known as "semiparametric model averaging prediction." They combined this new method with the popular machine learning tool of adaptive boosting (AdaBoost) for further enhancing the predictive performance, especially for classification-related problems.

"The methodology developed in this article was motivated by a problem frequently encountered in transportation planning and operations, namely, automobile classification, which is important for surveillance, traffic congestion and accidents prevention,"Alan Wan says.

Rigorous theoretical analysis and extensive numerical experiments show that the new method has excellent prediction ability compared to traditional statistical models and off-the-shelf machine learning classification tools.

Another important research merit of the new method is that it overcomes an often-cited criticism associated with semiparametric models, namely, the choice of a suitable index variable, as the method averages multiple sub-models each with a different index and the model weights automatically adjust the relative importance of these sub-models. As well, the proposed method can be applied without assuming any true model form which is hard to postulate for big data.