首页
学习
活动
专区
工具
TVP
发布
精选内容/技术社群/优惠产品,尽在小程序
立即前往

信息增益与信息增益比

在信息论与概率统计中,熵(entropy)是表示随机变量不确定性的度量。

设X是一个取有限个值的离散随机变量,其概率分布为:

则随机变量X的熵定义为:

当随机变量只有两个取值时(0-1),这时的分布也称为贝努利分布,

其熵为:。

下面作图展示贝努利分布熵和概率的关系:

### 0-1分布的H(p)曲线###

from math import log

import numpy as np

import pandas as pd

from pandas import DataFrame

import matplotlib.pyplot as plt

p = np.arange(0.01, 1, 0.01)

Hp = []

for pi in p:

Hp.append(-pi*log(pi,2)-(1-pi)*log(1-pi,2))

plt.plot(p, Hp, 'r')

计算信息增益和信息增益比

###编写函数计算信息增益及信息增益比InfoGain() ###

#计算熵

def CalcEntropy(col):

colP= pd.crosstab(col, 'percent')/len(col)

def Entr(p):

if p == 0:

entr = 0

else:

entr = -p*1.0*log(p, 2)

return entr

entropy = list(map(Entr, colP.percent))

entropy = sum(entropy)

return entropy

#计算条件熵

def CalcHentropy(feature, y):

featP = pd.crosstab(feature, 'percent')/len(feature)

crossP = np.array(pd.crosstab(feature,y))

def entr(x):

if x[0] == 0 or x[1] == 0:

entr = 0

else:

entr = -x[0]*1.0/(x[0]+x[1])*log(x[0]*1.0/(x[0]+x[1]), 2)\

-x[1]*1.0/(x[0]+x[1])*log(x[1]*1.0/(x[0]+x[1]), 2)

return entr

hentropy=list(map(entr, crossP))

hentropy = np.dot(featP.percent, hentropy)

return hentropy

#计算信息增益和信息增益比

def InfoGain(feature, y):

feat_entr = CalcEntropy(feature)

y_entr = CalcEntropy(y)

H_entr = CalcHentropy(feature, y)

infogain = y_entr - H_entr

infogainrate = infogain/feat_entr

return infogain, infogainrate

### 数据例子 ###

A1 = np.array([elem for elem in ["青年","中年","老年"] \

for i in range(5)])

A2 = np.array(["否","否","是","是","否","否","否","是",\

"否","否","否","否","是","是","否"])

A3 = np.array(["否","否","否","是","否","否","否","是",\

"是","是","是","是","否","否","否"])

A4 = np.array(["一般","好","好","一般","一般","一般",\

"好","好","非常好","非常好","非常好",\

"好","好","非常好","一般"])

Y = np.array(["否","否","是","是","否","否","否","是","是",\

"是","是","是","是","是","否"])

data = DataFrame({'A1': A1, 'A2': A2, 'A3':A3, 'A4': A4, 'Y': Y})

infogain, infogainrate = InfoGain(A1, Y)

即特征A1的信息增益为0.083,信息增益比为0.052。

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20180429G0TTNJ00?refer=cp_1026
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长 进交流群

领取专属 10元无门槛券

私享最新 技术干货

扫码加入开发者社群
领券