聚类算法-DBSCAN-C++实现

最后更新于:2022-04-01 20:31:24

程序流程图: ![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187cfa19827.jpg) DBSCAN核心功能函数,计算每个point的eps范围内的point数量pts; 对于所有pts >Minpts的point,记为Core point; 对于所有的corepoint,将其eps范围内的core point下标添加到vector::corepts中; 对于所有的corepoint,采用深度优先的方式遍历core point的所有cluster,使得相互连接的core point具有相同的cluster编号; 计算所有pts < Minpts且在Core point范围内的,记为Borderpoint; 将所有Borderpoint加入到任意一个关联的core point; 剩余的point的为Noise point,文件写写入时忽略; 将point信息写入clustering文件,程序结束。 ~~~ /* DBSCAN Algorithm 15S103182 Ethan */ #include #include #include #include #include #include #include #include #include using namespace std; class point{ public: float x; float y; int cluster=0; int pointType=1;//1 noise 2 border 3 core int pts=0;//points in MinPts vector corepts; int visited = 0; point (){} point (float a,float b,int c){ x = a; y = b; cluster = c; } }; float stringToFloat(string i){ stringstream sf; float score=0; sf<>score; return score; } vector openFile(const char* dataset){ fstream file; file.open(dataset,ios::in); if(!file) { cout <<"Open File Failed!" < a; return a; } vector data; int i=1; while(!file.eof()){ string temp; file>>temp; int split = temp.find(',',0); point p(stringToFloat(temp.substr(0,split)),stringToFloat(temp.substr(split+1,temp.length()-1)),i++); data.push_back(p); } file.close(); cout<<"successful!"< dataset,float Eps,int MinPts){ int len = dataset.size(); //calculate pts cout<<"calculate pts"< corePoint; for(int i=0;i=MinPts) { dataset[i].pointType = 3; corePoint.push_back(dataset[i]); } } cout<<"joint core point"< ps; if(corePoint[i].visited == 1) continue; ps.push(&corePoint[i]); point *v; while(!ps.empty()){ v = ps.top(); v->visited = 1; ps.pop(); for(int j=0;jcorepts.size();j++){ if(corePoint[v->corepts[j]].visited==1) continue; corePoint[v->corepts[j]].cluster = corePoint[i].cluster; corePoint[v->corepts[j]].visited = 1; ps.push(&corePoint[v->corepts[j]]); } } } cout<<"border point,joint border point to core point"< dataset = openFile("dataset3.txt"); DBSCAN(dataset,1.5,2); return 0; } ~~~ 数据文件格式:(x,y) ![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187cfa2ea61.jpg) 运行结果格式:(x,y,cluster) ![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d363a488.jpg) 图形化展现: 特殊形状: ![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d364d035.jpg) 桥梁: ![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d3671f34.jpg) 变化密度: ![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d6ddaa44.jpg) 总结: DBSCAN算法能够很好处理不同形状与大小的数据,并且抗噪音数据。但对于变化的密度,DBSCAN算法具有局限性。在实现Core point连接时,遇到了一点小小的麻烦,很难将相互连接的core point的cluster编号统一,后来通过给每个core point增加一个数组用于记录相连core point的下标信息,并采用深度优先进行遍历的方式,不仅提高了计算速度,同时也保证了准确性。对于实验数据结果,如果不对其进行图形化展现,很难看出聚类的效果,采用WPF(C#)技术对数据点进行处理,对不同cluster编号的点,赋予不同的颜色,进行图形展现。
';