研究生考试录取相关因素的实验报告
一,研究目的
通过对南开大学国际经济研究所1999级研究生考试分数及录取情况的研究,引入录取与未录取这一虚拟变量,比较线性概率模型与Probit模型,Logit模型,预测正确率。 二,模型设定
表1,南开大学国际经济研究所1999级研究生考试分数及录取情况见数据表
obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Y 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SCORE 401 401 392 387 384 379 378 378 376 371 362 362 361 359 358 356 356 355 354 354 353 350 349 349 348 347 347 344 339 338 338 336 obs 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SCORE 332 332 332 331 330 328 328 328 321 321 318 318 316 308 308 304 303 303 299 297 294 293 293 292 291 291 287 286 286 282 282 282 obs 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SCORE 275 273 273 272 267 266 263 261 260 256 252 252 245 243 242 241 239 235 232 228 219 219 214 210 204 198 189 188 182 166 123 33 0 334 66 0 278 定义变量SCORE :考生考试分数;Y :考生录取为1,未录取为0。 上图为样本观测值。 1. 线性概率模型
根据上面资料建立模型
用Eviews得到回归结果如图:
Dependent Variable: Y Method: Least Squares Date: 12/10/10 Time: 20:38 Sample: 1 97
Included observations: 97
Variable C SCORE
R-squared
Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat
Coefficient -0.847407 0.003297
Std. Error 0.159663 0.000521
t-Statistic -5.307476 6.325970
Prob. 0.0000 0.0000 0.144330 0.353250 0.436060 0.489147 40.01790 0.000000
0.296390 Mean dependent var 0.288983 S.D. dependent var 0.297866 Akaike info criterion 8.428818 Schwarz criterion -19.14890 F-statistic 0.359992 Prob(F-statistic)
??-0.847407+0.003297 SCOREi 参数估计结果为:Yi Se=(0.159663)( 0.000521) t=(-5.307476) (6.325970)
p=(0.0000) (0.0000)
预测正确率:
Forecast: YF Actual: Y
Forecast sample: 1 97 Included observations: 97 Root Mean Squared Error Mean Absolute Error
Mean Absolute Percentage Error Theil Inequality Coefficient Bias Proportion Variance Proportion Covariance Proportion
0.294780 0.233437 8.689503 0.475786 0.000000 0.294987 0.705013
2.Logit模型
Dependent Variable: Y
Method: ML - Binary Logit (Quadratic hill climbing) Date: 12/10/10 Time: 21:38
Sample: 1 97
Included observations: 97
Convergence achieved after 11 iterations
Covariance matrix computed using second derivatives
Variable Coefficient Std. Error z-Statistic Prob. C -243.7362 125.5564 -1.941248 0.0522 SCORE
0.679441
0.350492
1.938536
0.0526 Mean dependent var 0.144330 S.D. dependent var 0.353250 S.E. of regression 0.115440 Akaike info criterion 0.123553 Sum squared resid 1.266017 Schwarz criterion 0.176640 Log likelihood -3.992330 Hannan-Quinn criter. 0.145019 Restr. log likelihood -40.03639 Avg. log likelihood -0.041158 LR statistic (1 df) 72.08812 McFadden R-squared 0.900282
Probability(LR stat) 0.000000
Obs with Dep=0 83 Total obs 97
Obs with Dep=1
14
得Logit模型估计结果如下
pi = F(yi) =
11?e?(?243.7362?0.6794xi) 拐点坐标 (358.7, 0.5)
其中Y=-243.7362+0.6794X
预测正确率
Forecast: YF Actual: Y
Forecast sample: 1 97 Included observations: 97 Root Mean Squared Error 0.114244 Mean Absolute Error
0.025502 Mean Absolute Percentage Error 1.275122 Theil Inequality Coefficient 0.153748 Bias Proportion 0.000000 Variance Proportion
0.025338
Covariance Proportion 0.974662
3.Probit模型
Dependent Variable: Y
Method: ML - Binary Probit (Quadratic hill climbing) Date: 12/10/10 Time: 21:40
Sample: 1 97
Included observations: 97
Convergence achieved after 11 iterations
Covariance matrix computed using second derivatives
Variable Coefficient Std. Error z-Statistic Prob. C -144.4560 70.19809 -2.057833 0.0396 SCORE
0.402868
0.196186
2.053504
0.0400 Mean dependent var 0.144330 S.D. dependent var 0.353250 S.E. of regression 0.116277 Akaike info criterion 0.122406 Sum squared resid 1.284441 Schwarz criterion 0.175493 Log likelihood -3.936702 Hannan-Quinn criter. 0.143872 Restr. log likelihood -40.03639 Avg. log likelihood -0.040585 LR statistic (1 df) 72.19938 McFadden R-squared 0.901672
Probability(LR stat) 0.000000
Obs with Dep=0 83 Total obs 97
Obs with Dep=1
14
Probit模型最终估计结果是
pi = F(yi) = F (-144.456 + 0.4029 xi) 拐点坐标 (358.5, 0.5)
预测正确率
Forecast: YF Actual: Y
Forecast sample: 1 97 Included observations: 97 Root Mean Squared Error 0.115072 Mean Absolute Error
0.025387 Mean Absolute Percentage Error 1.216791 Theil Inequality Coefficient 0.154476 Bias Proportion
0.000084
Variance Proportion Covariance Proportion
0.020837 0.979080
预测正确率结论:线性概率模型RMSE=0.294780 MAE=0.233437 MAPE=8.689503 Logit模型 RMSE=0.114244 MAE=0.025502 MAPE=1.275122 Probit模型 RMSE=0.115072 MAE=0.025387 MAPE=1.216791
由上面结果可知线性概率模型的RMSE、MAE、MAPE 均远远大于Logit模型和Probit模型,说明其误差率比Logit模型和Probit模型大很多,所以正确率远远小于Logit模型和Probit模型。而Logit模型和Probit模型的RMSE、MAE、MAPE相差很小,所以正确率相差不大。综上所诉,此数据可以用Logit模型和Probit模型代替线性概率模型进行分析。
比较线性模型和Probit模型、Logit模型



