stoclust

Logo

Clustering algorithms using stochastic analysis and ensemble techniques.

split_by_gaps

  ↳ clustering

split_by_gaps(vec, num_gaps = 1, index = None)

Aggregates the indices of a vector based on gaps between index values. The number of gaps is specified, and the largest gaps in the sorted array are used to cluster values.

Arguments

Arguments   Type Description
vec   np.ndarray A one-dimensional array of values.
num_gaps Keyword int The number of gaps to use to break vec into clusters.
index Keyword Index The Index which labels the indices of vec, and which will be the item set of the returned Aggregation.

Example

A visual example works best for understanding gap clustering. Below we generate a random vector of 50 components, apply gap clustering with 3 gaps, and plot the sorted vector components colored by cluster.

import stoclust.clustering as clust
import stoclust.visualization as viz
import numpy as np

vec = np.random.rand(50)
agg = clust.split_by_gaps(vec,num_gaps=3)

rank = np.empty_like(vec)
rank[np.argsort(vec)] = np.arange(len(vec))

fig = viz.scatter2D(
    rank,vec,agg=agg,mode='markers',
    layout=dict(
        title = 'Sorted vector components (gap clustering)',
        xaxis = dict(
            title='Sorted rank'
        ),
        yaxis = dict(
            title='Value'
        ),
        legend = dict(
            title='Cluster'
        )
    )
)

fig.show()