聚类算法-DBSCAN-C++实现
最后更新于:2022-04-01 20:31:24
程序流程图:
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187cfa19827.jpg)
DBSCAN核心功能函数,计算每个point的eps范围内的point数量pts;
对于所有pts >Minpts的point,记为Core point;
对于所有的corepoint,将其eps范围内的core point下标添加到vector::corepts中;
对于所有的corepoint,采用深度优先的方式遍历core point的所有cluster,使得相互连接的core point具有相同的cluster编号;
计算所有pts < Minpts且在Core point范围内的,记为Borderpoint;
将所有Borderpoint加入到任意一个关联的core point;
剩余的point的为Noise point,文件写写入时忽略;
将point信息写入clustering文件,程序结束。
~~~
/*
DBSCAN Algorithm
15S103182
Ethan
*/
#include
#include
#include
#include
#include
#include
#include
#include
#include
using namespace std;
class point{
public:
float x;
float y;
int cluster=0;
int pointType=1;//1 noise 2 border 3 core
int pts=0;//points in MinPts
vector corepts;
int visited = 0;
point (){}
point (float a,float b,int c){
x = a;
y = b;
cluster = c;
}
};
float stringToFloat(string i){
stringstream sf;
float score=0;
sf<>score;
return score;
}
vector openFile(const char* dataset){
fstream file;
file.open(dataset,ios::in);
if(!file)
{
cout <<"Open File Failed!" < a;
return a;
}
vector data;
int i=1;
while(!file.eof()){
string temp;
file>>temp;
int split = temp.find(',',0);
point p(stringToFloat(temp.substr(0,split)),stringToFloat(temp.substr(split+1,temp.length()-1)),i++);
data.push_back(p);
}
file.close();
cout<<"successful!"< dataset,float Eps,int MinPts){
int len = dataset.size();
//calculate pts
cout<<"calculate pts"< corePoint;
for(int i=0;i=MinPts) {
dataset[i].pointType = 3;
corePoint.push_back(dataset[i]);
}
}
cout<<"joint core point"< ps;
if(corePoint[i].visited == 1) continue;
ps.push(&corePoint[i]);
point *v;
while(!ps.empty()){
v = ps.top();
v->visited = 1;
ps.pop();
for(int j=0;jcorepts.size();j++){
if(corePoint[v->corepts[j]].visited==1) continue;
corePoint[v->corepts[j]].cluster = corePoint[i].cluster;
corePoint[v->corepts[j]].visited = 1;
ps.push(&corePoint[v->corepts[j]]);
}
}
}
cout<<"border point,joint border point to core point"< dataset = openFile("dataset3.txt");
DBSCAN(dataset,1.5,2);
return 0;
}
~~~
数据文件格式:(x,y)
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187cfa2ea61.jpg)
运行结果格式:(x,y,cluster)
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d363a488.jpg)
图形化展现:
特殊形状:
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d364d035.jpg)
桥梁:
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d3671f34.jpg)
变化密度:
![](https://docs.gechiui.com/gc-content/uploads/sites/kancloud/2016-04-21_57187d6ddaa44.jpg)
总结:
DBSCAN算法能够很好处理不同形状与大小的数据,并且抗噪音数据。但对于变化的密度,DBSCAN算法具有局限性。在实现Core point连接时,遇到了一点小小的麻烦,很难将相互连接的core point的cluster编号统一,后来通过给每个core point增加一个数组用于记录相连core point的下标信息,并采用深度优先进行遍历的方式,不仅提高了计算速度,同时也保证了准确性。对于实验数据结果,如果不对其进行图形化展现,很难看出聚类的效果,采用WPF(C#)技术对数据点进行处理,对不同cluster编号的点,赋予不同的颜色,进行图形展现。
';