lightgbm python基本使用

2018年3月12日 0 条评论 206 次阅读 0 人点赞

通过几天无聊的数据预处理,总算是得到了较为可用的数据。将这些 数据使用lightgbm进行特征评估,可对特质值打分,为之后的调参做准备。

下面看看lightgbm是 如何使用的

代码如下,数据是lightgbm自带的数据

  1. import json
  2. import lightgbm as lgb
  3. import pandas as pd
  4. from sklearn.metrics import roc_auc_score
  5. path="/Users/shuubiasahi/Documents/githup/LightGBM/examples/regression/"
  6. print("load data")
  7. df_train=pd.read_csv(path+"regression.train",header=None,sep='\t')
  8. df_test=pd.read_csv(path+"regression.train",header=None,sep='\t')
  9. y_train = df_train[0].values
  10. y_test = df_test[0].values
  11. X_train = df_train.drop(0, axis=1).values
  12. X_test = df_test.drop(0, axis=1).values
  13. # create dataset for lightgbm
  14. lgb_train = lgb.Dataset(X_train, y_train)
  15. lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
  16. # specify your configurations as a dict
  17. params = {
  18. 'task': 'train',
  19. 'boosting_type': 'gbdt',
  20. 'objective': 'binary',
  21. 'metric': {'l2', 'auc'},
  22. 'num_leaves': 31,
  23. 'learning_rate': 0.05,
  24. 'feature_fraction': 0.9,
  25. 'bagging_fraction': 0.8,
  26. 'bagging_freq': 5,
  27. 'verbose': 0
  28. }
  29. print('Start training...')
  30. # train
  31. gbm = lgb.train(params,
  32. lgb_train,
  33. num_boost_round=20,
  34. valid_sets=lgb_eval,
  35. early_stopping_rounds=5)
  36. print('Save model...')
  37. # save model to file
  38. gbm.save_model('lightgbm/model.txt')
  39. print('Start predicting...')
  40. # predict
  41. y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
  42. # eval
  43. print(y_pred)
  44. print('The roc of prediction is:', roc_auc_score(y_test, y_pred) )
  45. print('Dump model to JSON...')
  46. # dump model to json (and save to file)
  47. model_json = gbm.dump_model()
  48. with open('lightgbm/model.json', 'w+') as f:
  49. json.dump(model_json, f, indent=4)
  50. print('Feature names:', gbm.feature_name())
  51. print('Calculate feature importances...')
  52. # feature importances
  53. print('Feature importances:', list(gbm.feature_importance()))

 

运行结果此处省略。

最后,特征重要性如下

Feature importances: [26, 10, 2, 34, 8, 53, 9, 0, 1, 31, 5, 6, 1, 27, 8, 4, 2, 7, 4, 7, 1, 24, 63, 3, 53, 90, 56, 65]

lyssom

这个人太懒什么东西都没留下

文章评论(0)