ruckus.base.RKHS class

class ruckus.base.RKHS(*, take=None, filter=None, copy_X=True)[source]

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Base instance of a Reproducing Kernel Hilbert Space [1]. An RKHS consists of a Hilbert space \(H\), a feature mapping \(\phi:X \rightarrow H\) from the data space \(X\) into \(H\), and a kernel \(k(x,y)\) on \(X^2\) defined by \(k(x,y) = \left<\phi(x),\phi(y)\right>_H\). This base RKHS sets \(H=X\) by default, with \(\phi(x)=x\) and \(k(x,y)=x^T y\).

Certain functions \(f\) may be represented in \(H\) with a vector \(F\) satisfying \(\left<F,\phi(x)\right>_H=f(x)\) for all \(x \in X\). This representation can be discovered using ridge regression [2]. The set of valid functions depends on \(H\) and \(k\). This base RKHS class can only represent linear functions.

The fit() method will typically determine the dimensions and shapes of \(H\) and \(X\), as well as any other necessary parameters for determining the feature mapping \(\phi\). The transform() method will implement the feature mapping \(\phi\). The kernel() method will evaluate the kernel \(k\). The fit_function() method will find the representation of a function \(f\) given the vector \(y_i=f(x_i)\) of its values on the predictor variables.

RKHS instances can be combined with one another via composition, direct sum and tensor product. These produce compound RKHS classes, CompositeRKHS, DirectSumRKHS, and ProductRKHS. These combinations can be instantiated with the corresponding class, or generated from arbitrary RKHS instances using the operations @ for composition, + for direct sum, and * for tensor product. See the corresponding classes for further details.

  1. Aronszajn, N. “Theory of reproducing kernels.” Trans. Amer. Math. Soc. 68 (1950), 337-404.

  2. Murphy, K. P. “Machine Learning: A Probabilistic Perspective”, The MIT Press. chapter 14.4.3, pp. 492-493

Parameters
  • take (numpy.ndarray of dtype int or bool, or tuple of numpy.ndarray instances of type int, or None) – Default = None. Specifies which values to take from the datapoint for transformation. If None, the entire datapoint will be taken in its original shape. If bool array, acts as a mask setting values marked False to 0 and leaving values marked True unchanged. If int array, the integers specify the indices (along the first feature dimension) which are to be taken, in the order/shape of the desired input. If tuple of int arrays, allows for drawing indices across multiple dimensions, similar to passing a tuple to a numpy array.

  • filter (numpy.ndarray of dtype float or None) – Default = None. Specifies a linear preprocessing of the data. Applied after take. If None, no changes are made to the input data. If the same shape as the input datapoints, filter and the datapoint are multiplied elementwise. If filter has a larger dimension than the datapoint, then its first dimensions will be contracted with the datapoint via numpy.tensordot(). The final shape is determined by the remaining dimensions of filter.

  • copy_X (bool) – Default = True. If True, input X is copied and stored by the model in the X_fit_ attribute. If no further changes will be done to X, setting copy_X=False saves memory by storing a reference.

Parameters
  • shape_in_ (tuple) – The required shape of the input datapoints, aka the shape of the domain space \(X\).

  • shape_out_ (tuple) – The final shape of the transformed datapoints, aka the shape of the Hilbert space \(H\).

  • X_fit_ (numpy.ndarray of shape (n_samples,)+self.shape_in_) – The data which was used to fit the model.

fit(X, y=None)[source]

Fit the model from data in X.

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in self.take and self.filter.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns

The instance itself

Return type

RKHS

fit_function(y, X=None, regressor=None, alpha=1)[source]

Fit a function using its values on the predictor data and a regressor.

Parameters
  • y (numpy.ndarray of shape (n_samples, n_targets)) – Target vector, where n_samples is the number of samples and n_targets is the number of target functions.

  • X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Default = None. Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_. If None, self.X_fit_ is used.

  • regressor (sklearn.base.BaseEstimator) – The regressor object to use to fit the function. If None, a sklearn.linear_model.Ridge instance is used with fit_intercept=False and alpha specified below.

  • alpha – The ridge parameter used in the default Ridge regressor.

  • type – float

Returns

regressor, fitted to provide the function representation.

Return type

object

fit_transform(X, y=None)[source]

Fit the model from data in X and transform X.

Parameters

X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in self.take and self.filter.

Returns

The transformed data

Return type

numpy.ndarray of shape (n_samples,)+self.shape_out_

kernel(X, Y=None)[source]

Evaluates the kernel on X and Y (or X and X).

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Data vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_.

  • Y (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Default = None. Data vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_. If None, X is used.

Returns

The matrix K[i,j] = k(X[i],Y[j])

Return type

numpy.ndarray of shape (n_samples_1,n_samples_2)

transform(X)[source]

Transform X.

Parameters

X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Data vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_.

Returns

The transformed data

Return type

numpy.ndarray of shape (n_samples,)+self.shape_out_