ruckus.embedding.RandomFourierRBF class¶

class ruckus.embedding.RandomFourierRBF(n_components=100, gamma=None, complex=False, engine=None, engine_params=None, take=None, filter=None, copy_X=True)[source]¶

Bases: ruckus.base.RKHS

RandomFourierRBF generates an embedding map \(\phi:X\rightarrow H\) by constructing random Fourier phase signals; that is,

\[\begin{split}\phi(x) = \frac{1}{\sqrt{K}}\begin{bmatrix} e^{i x\cdot w_1} \\ \vdots \\ e^{i x\cdot w_K} \end{bmatrix}\end{split}\]

where \(K\) is the specified n_components and \((w_1,\dots,w_K)\) is drawn from a multivariate normal with covariance matrix \(\mathrm{diag}(\gamma,\dots,\gamma)\). The result that the kernel \(k(x,y) = \left<\phi(x),\phi(y)\right>\) is approximately a Gaussian RBF with scale parameter \(\gamma\) [1].

Rather than drawing a truly random set of phase vectors (which converges \(O(n^{-1/2})\)) we use quasi-Monte Carlo sampling via scipy.stats.qmc.QMCEngine(), which converges \(O((\log n)^d n^{-1})\) where \(d\) corresponds to the number of features in \(X\).

Rahimi, A., Recht, B. “Random Features for Large-Scale Kernel Machines.” Advances in Neural Information Processing Systems 20 (NIPS 2007)

Parameters

n_components (int) – Default = 100. The number of random Fourier features to generate.
gamma (float) – Default = None. Specifies the scale parameter of the Gaussian kernel to be approximated. If None, set to 1/n_features.
complex (bool) – Default = False. If False, the output vector has shape (n_samples,2*n_components), where real and imaginary parts are written in pairs.
engine (child class of scipy.stats.qmc.QMCEngine()) – Default = None. The sampler class to use. If None, set to scipy.stats.qmc.Sobol().
engine_params (dict) – Default = None. Initialization parameters to use for engine.
take (numpy.ndarray of dtype int or bool, or tuple of numpy.ndarray instances of type int, or None) – Default = None. Specifies which values to take from the datapoint for transformation. If None, the entire datapoint will be taken in its original shape. If bool array, acts as a mask setting values marked False to 0 and leaving values marked True unchanged. If int array, the integers specify the indices (along the first feature dimension) which are to be taken, in the order/shape of the desired input. If tuple of int arrays, allows for drawing indices across multiple dimensions, similar to passing a tuple to a numpy array.
filter (numpy.ndarray of dtype float or None) – Default = None. Specifies a linear preprocessing of the data. Applied after take. If None, no changes are made to the input data. If the same shape as the input datapoints, filter and the datapoint are multiplied elementwise. If filter has a larger dimension than the datapoint, then its first dimensions will be contracted with the datapoint via numpy.tensordot(). The final shape is determined by the remaining dimensions of filter.
copy_X (bool) – Default = True. If True, input X is copied and stored by the model in the X_fit_ attribute. If no further changes will be done to X, setting copy_X=False saves memory by storing a reference.

Parameters

ws_ (numpy.ndarray of shape (n_components,n_features)) – Randomly-selected phase coefficients used to generate Fourier features.
shape_in_ (tuple) – The required shape of the input datapoints, aka the shape of the domain space \(X\).
shape_out_ (tuple) – The final shape of the transformed datapoints, aka the shape of the Hilbert space \(H\).
X_fit_ (numpy.ndarray of shape (n_samples,)+self.shape_in_) – The data which was used to fit the model.

fit(X, y=None)[source]¶

Fit the model from data in X.

Parameters

X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in self.take and self.filter.
y (Ignored) – Not used, present for API consistency by convention.

Returns

The instance itself

Return type

RKHS

transform(X)[source]¶

Transform X.

Parameters: X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Data vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_.
Returns: The transformed data
Return type: numpy.ndarray of shape (n_samples,)+self.shape_out_