Dlimeng
机器学习系列--贝叶斯分类算法
关注作者
前往小程序,Get
更优
阅读体验!
立即前往
腾讯云
开发者社区
文档
建议反馈
控制台
登录/注册
首页
学习
活动
专区
工具
TVP
最新优惠活动
文章/答案/技术大牛
搜索
搜索
关闭
发布
首页
学习
活动
专区
工具
TVP
最新优惠活动
返回腾讯云官网
Dlimeng
首页
学习
活动
专区
工具
TVP
最新优惠活动
返回腾讯云官网
社区首页
>
专栏
>
机器学习系列--贝叶斯分类算法
机器学习系列--贝叶斯分类算法
Dlimeng
关注
发布于 2023-06-29 16:28:30
162
0
发布于 2023-06-29 16:28:30
举报
文章被收录于专栏:
开源心路
简介
贝叶斯分类算法是一大类分类算法的总称
贝叶斯分类算法以样本可能属于某类的概率来作为分类依据
朴素贝叶斯分类算法是贝叶斯分类算法中最简单的一种
注:朴素的意思是条件概率独立性
此处要想真正理解,需要有概率论的基础知识
P(A|x1x2x3x4)=p(A|x1)*p(A|x2)p(A|x3)p(A|x4)则为条件概率独立
P(xy|z)=p(xyz)/p(z)=p(xz)/p(z)*p(yz)/p(z)
算法
如果一个事物在一些属性条件发生的情况下,事物属于A的概率大于属于B的概率,则判定事物属于A
公式
步骤
1、分解各类先验样本数据中的特征
2、计算各类数据中,各特征的条件概率
(比如:特征1出现的情况下,属于A类的概率p(A|特征1),属于B类的概率p(B|特征1),属于C类的概率p(C|特征1)......)
3、分解待分类数据中的特征(特征1、特征2、特征3、特征4......)
4、计算各特征的各条件概率的乘积,如下所示:
判断为A类的概率:p(A|特征1)*p(A|特征2)*p(A|特征3)*p(A|特征4).....
判断为B类的概率:p(B|特征1)*p(B|特征2)*p(B|特征3)*p(B|特征4).....
判断为C类的概率:p(C|特征1)*p(C|特征2)*p(C|特征3)*p(C|特征4).....
5、结果中的最大值就是该样本所属的类别
代码
object NaiveBayes { /** * 先验数据 */ def dataSet(): (Array[Array[String]], Array[Int]) ={ val dataList = Array(Array("my", "dog", "has", "flea", "problems", "help", "please"), Array("maybe", "not", "take", "him", "to", "dog", "park", "stupid"), Array("my", "dalmation", "is", "so", "cute", "I", "love", "him"), Array("stop", "posting", "stupid", "worthless", "garbage"), Array("mr", "licks", "ate", "my", "steak", "how", "to", "stop", "him"), Array("quit", "buying", "worthless", "dog", "food", "stupid")) //分类 val dataType=Array(0, 1, 0, 1, 0, 1) (dataList,dataType) }
/** * 设置分类 * @param dataList 数据集合 * @param inputSet 输入类型 */ def setWordsType(dataList:Array[String],inputSet:Array[String]): Array[Int] ={ /*** * 先验数据 * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * * */ val returnList=new Array[Int](dataList.length) val dataIndex = dataList.zipWithIndex for(word <- inputSet){ if(dataList.contains(word)){ //println(dataIndex.filter(_._1 == word).toBuffer) //与inputSet数据相等的为1 returnList(dataIndex.filter(_._1 == word)(0)._2) = 1 }else { println("the word: %s is not in my Vocabulary!\n",word) } } returnList }
/** * 先验数据 * @param trainData 训练数据 * @param trainType 训练类型 */ def trainSet(trainData:Array[Array[Int]],trainType:Array[Int]): (Array[Double], Array[Double], Double) ={ /** * 0 = {int[32]@797} * 1 = {int[32]@798} * 2 = {int[32]@799} * 3 = {int[32]@800} * 4 = {int[32]@801} * 5 = {int[32]@802} * * ArrayBuffer(0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1) * ArrayBuffer(0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0) * ArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0) * ArrayBuffer(0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) * ArrayBuffer(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) * ArrayBuffer(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) * */ val trainLength=trainData.length val wordsNum=trainData(0).length //每个分类的概率,这里分类只有0/1,所以只返回一个类别1的概率 val pType=trainType.sum/trainLength.toDouble var p0Num=Array.fill(wordsNum)(1) var p1Num=Array.fill(wordsNum)(1)
var p0Denom = 2.0 var p1Denom = 2.0
/** * for 循环 0~5 * p0Denom:2.0 p1Denom:2.0 p0Denom:9.0 p1Denom:2.0 p0Denom:9.0 p1Denom:10.0 p0Denom:17.0 p1Denom:10.0 p0Denom:17.0 p1Denom:15.0 p0Denom:26.0 p1Denom:15.0 */ for (i <- 0 until trainLength) {
if (trainType(i) == 1) { var cnt = 0 // p1Num = p1Num.map { x => val v = x + trainData(i)(cnt) cnt += 1 v } p1Denom += trainData(i).sum } else { var cnt = 0 p0Num = p0Num.map { x => val v = x + trainData(i)(cnt) cnt += 1 v } p0Denom += trainData(i).sum } } /** * p1Num * ArrayBuffer(2, 2, 3, 3, 2, 4, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1) * p1Denom 21.0 * * p0Num * ArrayBuffer(1, 1, 1, 2, 1, 1, 2, 2, 2, 4, 2, 2, 2, 2, 3, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2) * p0Denom 26.0 */ (p1Num.map(x => Math.log(x / p1Denom)), p0Num.map(x => Math.log(x / p0Denom)), pType)
}
def classifyNB(vec2Classify: Array[Int], p0Vec: Array[Double], p1Vec: Array[Double], pClass1: Double): Int = { var cnt = 0 val p1 = vec2Classify.map { x => val v = x * p1Vec(cnt) cnt += 1 v }.sum + math.log(pClass1) cnt = 0 val p0 = vec2Classify.map { x => val v = x * p0Vec(cnt) cnt += 1 v }.sum + math.log(1.0 - pClass1) //log(p(w/c0)p(c0))=log(p(w/c0))+log(p(c0))= sum(vec2Classify * p0Vec) + log(1.0 - pClass1) if (p1 > p0) 1 else 0 }
def main(args: Array[String]): Unit = { val DataSet = dataSet() val listOPosts = DataSet._1 val listClasses = DataSet._2 val myVocabList = listOPosts.reduce((a1, a2) => a1.++:(a2)).distinct /** * myVocabList的数据 * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) */ var trainMat = new ArrayBuffer[Array[Int]](listOPosts.length) listOPosts.foreach(postinDoc => trainMat.append(setWordsType(myVocabList, postinDoc)))
//训练集 val p = trainSet(trainMat.toArray, listClasses) val p0V = p._2 val p1V = p._1 val pAb = p._3 val testEntry = Array("love", "my", "dalmation") val thisDoc = setWordsType(myVocabList, testEntry) println(testEntry.mkString(",") + " classified as: " + classifyNB(thisDoc, p0V, p1V, pAb)) val testEntry2 = Array("stupid", "garbage") val thisDoc2 = setWordsType(myVocabList, testEntry2) println(testEntry2.mkString(",") + " classified as: " + classifyNB(thisDoc2, p0V, p1V, pAb)) } }
本文参与
腾讯云自媒体同步曝光计划
,分享自作者个人站点/博客。
原始发表:2018-09-09,如有侵权请联系
cloudcommunity@tencent.com
删除
前往查看
机器学习
分类算法
集合
数据
算法
本文分享自
作者个人站点/博客
前往查看
如有侵权,请联系
cloudcommunity@tencent.com
删除。
本文参与
腾讯云自媒体同步曝光计划
,欢迎热爱写作的你一起参与!
机器学习
分类算法
集合
数据
算法
评论
登录
后参与评论
0 条评论
热度
最新
推荐阅读
LV.
文章
0
获赞
0
目录
简介
贝叶斯分类算法是一大类分类算法的总称
贝叶斯分类算法以样本可能属于某类的概率来作为分类依据
朴素贝叶斯分类算法是贝叶斯分类算法中最简单的一种
注:朴素的意思是条件概率独立性
此处要想真正理解,需要有概率论的基础知识
P(A|x1x2x3x4)=p(A|x1)*p(A|x2)p(A|x3)p(A|x4)则为条件概率独立
P(xy|z)=p(xyz)/p(z)=p(xz)/p(z)*p(yz)/p(z)
算法
如果一个事物在一些属性条件发生的情况下,事物属于A的概率大于属于B的概率,则判定事物属于A
公式
步骤
1、分解各类先验样本数据中的特征
2、计算各类数据中,各特征的条件概率
(比如:特征1出现的情况下,属于A类的概率p(A|特征1),属于B类的概率p(B|特征1),属于C类的概率p(C|特征1)......)
3、分解待分类数据中的特征(特征1、特征2、特征3、特征4......)
4、计算各特征的各条件概率的乘积,如下所示:
判断为A类的概率:p(A|特征1)*p(A|特征2)*p(A|特征3)*p(A|特征4).....
判断为B类的概率:p(B|特征1)*p(B|特征2)*p(B|特征3)*p(B|特征4).....
判断为C类的概率:p(C|特征1)*p(C|特征2)*p(C|特征3)*p(C|特征4).....
5、结果中的最大值就是该样本所属的类别
代码
object NaiveBayes { /** * 先验数据 */ def dataSet(): (Array[Array[String]], Array[Int]) ={ val dataList = Array(Array("my", "dog", "has", "flea", "problems", "help", "please"), Array("maybe", "not", "take", "him", "to", "dog", "park", "stupid"), Array("my", "dalmation", "is", "so", "cute", "I", "love", "him"), Array("stop", "posting", "stupid", "worthless", "garbage"), Array("mr", "licks", "ate", "my", "steak", "how", "to", "stop", "him"), Array("quit", "buying", "worthless", "dog", "food", "stupid")) //分类 val dataType=Array(0, 1, 0, 1, 0, 1) (dataList,dataType) }
/** * 设置分类 * @param dataList 数据集合 * @param inputSet 输入类型 */ def setWordsType(dataList:Array[String],inputSet:Array[String]): Array[Int] ={ /*** * 先验数据 * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) * * */ val returnList=new Array[Int](dataList.length) val dataIndex = dataList.zipWithIndex for(word <- inputSet){ if(dataList.contains(word)){ //println(dataIndex.filter(_._1 == word).toBuffer) //与inputSet数据相等的为1 returnList(dataIndex.filter(_._1 == word)(0)._2) = 1 }else { println("the word: %s is not in my Vocabulary!\n",word) } } returnList }
/** * 先验数据 * @param trainData 训练数据 * @param trainType 训练类型 */ def trainSet(trainData:Array[Array[Int]],trainType:Array[Int]): (Array[Double], Array[Double], Double) ={ /** * 0 = {int[32]@797} * 1 = {int[32]@798} * 2 = {int[32]@799} * 3 = {int[32]@800} * 4 = {int[32]@801} * 5 = {int[32]@802} * * ArrayBuffer(0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1) * ArrayBuffer(0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0) * ArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0) * ArrayBuffer(0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) * ArrayBuffer(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) * ArrayBuffer(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) * */ val trainLength=trainData.length val wordsNum=trainData(0).length //每个分类的概率,这里分类只有0/1,所以只返回一个类别1的概率 val pType=trainType.sum/trainLength.toDouble var p0Num=Array.fill(wordsNum)(1) var p1Num=Array.fill(wordsNum)(1)
var p0Denom = 2.0 var p1Denom = 2.0
/** * for 循环 0~5 * p0Denom:2.0 p1Denom:2.0 p0Denom:9.0 p1Denom:2.0 p0Denom:9.0 p1Denom:10.0 p0Denom:17.0 p1Denom:10.0 p0Denom:17.0 p1Denom:15.0 p0Denom:26.0 p1Denom:15.0 */ for (i <- 0 until trainLength) {
if (trainType(i) == 1) { var cnt = 0 // p1Num = p1Num.map { x => val v = x + trainData(i)(cnt) cnt += 1 v } p1Denom += trainData(i).sum } else { var cnt = 0 p0Num = p0Num.map { x => val v = x + trainData(i)(cnt) cnt += 1 v } p0Denom += trainData(i).sum } } /** * p1Num * ArrayBuffer(2, 2, 3, 3, 2, 4, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1) * p1Denom 21.0 * * p0Num * ArrayBuffer(1, 1, 1, 2, 1, 1, 2, 2, 2, 4, 2, 2, 2, 2, 3, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 2) * p0Denom 26.0 */ (p1Num.map(x => Math.log(x / p1Denom)), p0Num.map(x => Math.log(x / p0Denom)), pType)
}
def classifyNB(vec2Classify: Array[Int], p0Vec: Array[Double], p1Vec: Array[Double], pClass1: Double): Int = { var cnt = 0 val p1 = vec2Classify.map { x => val v = x * p1Vec(cnt) cnt += 1 v }.sum + math.log(pClass1) cnt = 0 val p0 = vec2Classify.map { x => val v = x * p0Vec(cnt) cnt += 1 v }.sum + math.log(1.0 - pClass1) //log(p(w/c0)p(c0))=log(p(w/c0))+log(p(c0))= sum(vec2Classify * p0Vec) + log(1.0 - pClass1) if (p1 > p0) 1 else 0 }
def main(args: Array[String]): Unit = { val DataSet = dataSet() val listOPosts = DataSet._1 val listClasses = DataSet._2 val myVocabList = listOPosts.reduce((a1, a2) => a1.++:(a2)).distinct /** * myVocabList的数据 * ArrayBuffer(quit, buying, worthless, dog, food, stupid, mr, licks, ate, my, steak, how, to, stop, him, posting, garbage, dalmation, is, so, cute, I, love, maybe, not, take, park, has, flea, problems, help, please) */ var trainMat = new ArrayBuffer[Array[Int]](listOPosts.length) listOPosts.foreach(postinDoc => trainMat.append(setWordsType(myVocabList, postinDoc)))
//训练集 val p = trainSet(trainMat.toArray, listClasses) val p0V = p._2 val p1V = p._1 val pAb = p._3 val testEntry = Array("love", "my", "dalmation") val thisDoc = setWordsType(myVocabList, testEntry) println(testEntry.mkString(",") + " classified as: " + classifyNB(thisDoc, p0V, p1V, pAb)) val testEntry2 = Array("stupid", "garbage") val thisDoc2 = setWordsType(myVocabList, testEntry2) println(testEntry2.mkString(",") + " classified as: " + classifyNB(thisDoc2, p0V, p1V, pAb)) } }
领券
问题归档
专栏文章
快讯文章归档
关键词归档
开发者手册归档
开发者手册 Section 归档
0
0
0
推荐