    Title: 基於支持向量機計算的相互熵之特徵選取
    A Feature Selection Study based on SVM and Mutual Entropy
    Authors: 游上葦
    Yu, Shang-Wei
    Contributors: 周珮婷
    Yu, Shang-Wei
    Keywords: 機器學習

    Machine learning
    Feature selection
    Dimension reduction
    Support Vector Machine
    Shannon Entropy
    Mutual Entropy
    Date: 2019
    Issue Date: 2019-09-05 15:42:06 (UTC+8)
    Abstract: 特徵選取為機器學習領域中一重要部分,適當的選取特徵(變數),除了減少機器運算時間、人力與金錢外,也可以避免模型過度配適或是欠擬和的情況發生。雖然經過多年發展已有很多特徵選取的方法,但同一種模型,不一定適用所有資料情況,因此提出新方法希望在特徵選取上會有更多選擇。
    Feature selection technique plays a significant role in machine learning. Selecting features (variables) adequately can not only reduce the expenditure, operating time in machine and the cost of labor but also prevent under fitting or overfitting. Although lots of feature selection methods have been developed for decades, it is impossible to apply a unique method to all types of data sets. In this study, we propose a new method to calculate the correlation between variables based on the Shannon entropy from information theory and SVM classifier. Variables are grouped into several clusters and selected by the new correlation measurement. Besides, we define the importance of variable by the test statistic of KS test using Gaussian mixed model and E-M algorithm for the propose of result assessment. The performance of proposed method on two simulated data and five real data are demonstrated and compared with other feature selection methods. The predicted results are stable through the proposed method with a reduced dataset.
