Clustering algorithms using stochastic analysis and ensemble techniques.
split_by_gaps(vec, num_gaps = 1, index = None)
Aggregates the indices of a vector based on gaps between index values. The number of gaps is specified, and the largest gaps in the sorted array are used to cluster values.
Arguments | Type | Description | |
---|---|---|---|
vec |
np.ndarray |
A one-dimensional array of values. | |
num_gaps |
Keyword | int |
The number of gaps to use to break vec into clusters. |
index |
Keyword | Index |
The Index which labels the indices of vec , and which will be the item set of the returned Aggregation . |
A visual example works best for understanding gap clustering. Below we generate a random vector of 50 components, apply gap clustering with 3 gaps, and plot the sorted vector components colored by cluster.
import stoclust.clustering as clust
import stoclust.visualization as viz
import numpy as np
vec = np.random.rand(50)
agg = clust.split_by_gaps(vec,num_gaps=3)
rank = np.empty_like(vec)
rank[np.argsort(vec)] = np.arange(len(vec))
fig = viz.scatter2D(
rank,vec,agg=agg,mode='markers',
layout=dict(
title = 'Sorted vector components (gap clustering)',
xaxis = dict(
title='Sorted rank'
),
yaxis = dict(
title='Value'
),
legend = dict(
title='Cluster'
)
)
)
fig.show()