split_by_gaps

↳ clustering

split_by_gaps(vec, num_gaps = 1, index = None)

Aggregates the indices of a vector based on gaps between index values. The number of gaps is specified, and the largest gaps in the sorted array are used to cluster values.

Arguments

Arguments		Type	Description
`vec`		`np.ndarray`	A one-dimensional array of values.
`num_gaps`	Keyword	`int`	The number of gaps to use to break `vec` into clusters.
`index`	Keyword	`Index`	The `Index` which labels the indices of `vec`, and which will be the item set of the returned `Aggregation`.

Example

A visual example works best for understanding gap clustering. Below we generate a random vector of 50 components, apply gap clustering with 3 gaps, and plot the sorted vector components colored by cluster.

import stoclust.clustering as clust
import stoclust.visualization as viz
import numpy as np

vec = np.random.rand(50)
agg = clust.split_by_gaps(vec,num_gaps=3)

rank = np.empty_like(vec)
rank[np.argsort(vec)] = np.arange(len(vec))

fig = viz.scatter2D(
    rank,vec,agg=agg,mode='markers',
    layout=dict(
        title = 'Sorted vector components (gap clustering)',
        xaxis = dict(
            title='Sorted rank'
        ),
        yaxis = dict(
            title='Value'
        ),
        legend = dict(
            title='Cluster'
        )
    )
)

fig.show()

stoclust

Home

Documentation

split_by_gaps

Arguments

Example