ruckus.sampling module

class ruckus.sampling.KernelHerd(p, rkhs, size, w_init=None, domain=None)[source]

Bases: object

Iterator which produces the kernel herd [1] from an RKHS and a distribution embedded in it.

Caution: this method only works well if the embeddings \(\phi(x)\) have little overlap with one another!

Given a distribution embedding:

\[\mu_p = \int \phi(x) p(x) dx \ ,\]

a deterministic chaotic sequence \((x_t)\) may be generated by the following algorithm, beginning with a vector \(w_0\in H\):

\[\begin{split}x_{t} = \underset{x}{\mathrm{argmax}}\left<\phi(x),w_t\right> \\ w_{t+1} = w_t + \mu_p - \phi(x_t)\end{split}\]

Over time, the kernel embedding of the sequence

\[\hat{\mu}_T = \frac{1}{T}\sum_0^{T-1} \phi(x_t)\]

converges to \(\mu_p\) in \(O(T^{-1})\).

  1. Chen, Y., Welling, M., Smola, A. “Super-Samples from Kernel Herding.” Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

Parameters
  • p (numpy.ndarray) – The embedded vector \(\mu_p\).

  • rkhs (RKHS) – The fitted RKHS in which \(\mu_p\) is embedded.

  • size (int) – Number of samples to be generated.

  • w_init (numpy.ndarray) – The initial vector for the herding algorithm. If None, this is set to p.

  • domain (numpy.ndarray) – The range of values to be sampled from. If None, this is set to rkhs.X_fit_.