ruckus.base.ProductRKHS class

class ruckus.base.ProductRKHS(factors, *, copy_X=True)[source]

Bases: ruckus.base.RKHS

Given a sequence of RKHS’s with Hilbert spaces \(H_1\), …, \(H_n\) and feature maps \(\phi_1\), …, \(\phi_n\), their composition lives in the tensor product Hilbert space \(H_1\otimes \dots \otimes H_n\) and has feature map \(\phi_1 \otimes \dots \otimes \phi_n\) [1]. Correspondingly, the shape_out_ of a ProductRKHS instance is the tuple-sum of the shape_out_ tuples of its factors, while all its factors share the same shape_in_.

Product RKHS’s are particularly useful for working with kernel embeddings of distributions and their conditional probabilities [2]. A ProductRKHS can be reduced to its marginal along a set of factors using the marginal() method, and can be reduced into a marginal space paired with a ridge-regressed conditional map using the conditional() method.

  1. Aronszajn, N. “Theory of reproducing kernels.” Trans. Amer. Math. Soc. 68 (1950), 337-404.

  2. Muandet, K., Fukuzimu, K., Sriperumbudur, B., Schölkopf, B. “Kernel Mean Embedding of Distributions: A Review and Beyond.” Foundations and Trends in Machine Learning: Vol. 10: No. 1-2, pp 1-141 (2017)

Parameters
  • factors (list of RKHS objects) – The factor RKHS objects, listed in the order that their dimensions will appear in indexing.

  • copy_X (bool) – Default = True. If True, input X is copied and stored by the model in the X_fit_ attribute. If no further changes will be done to X, setting copy_X=False saves memory by storing a reference.

Parameters
  • shape_in_ (tuple) – The required shape of the input datapoints, aka the shape of the domain space \(X\).

  • shape_out_ (tuple) – The final shape of the transformed datapoints, aka the shape of the Hilbert space \(H\).

  • X_fit_ (numpy.ndarray of shape (n_samples,)+self.shape_in_) – The data which was used to fit the model.

conditional(predictor_inds, response_inds, regressor=None, alpha=1.0)[source]

Returns a pair of outputs, the first being a sklearn.pipelines.Pipeline consisting of the marginal RKHS of predictor_inds and a regressor which represents the conditional distribution embedding, and the second being the marginal RKHS of response_inds.

For two systems \(X\) and \(Y\), embedded in Hilbert spaces \(H_1\) and \(H_2\) respectively, the conditional distribution embedding is a linear map \(C_{Y|X}:H_1\rightarrow H_2\) such that \(C_{Y|X}\phi_1(x)\) gives the kernel embedding of the distribution of \(Y\) conditioned on \(X=x\). This is typically determined by using a ridge regression, though we allow the user to pass a custom regressor for model selection purposes. See [1] for details.

  1. Muandet, K., Fukuzimu, K., Sriperumbudur, B., Schölkopf, B. “Kernel Mean Embedding of Distributions: A Review and Beyond.” Foundations and Trends in Machine Learning: Vol. 10: No. 1-2, pp 1-141 (2017)

Parameters
  • predictor_inds (array -like of int) – List of indices of the factors in self.factors on which the response_inds will be conditioned.

  • response_inds – List of indices of the factors in self.factors which are to be conditioned on the predictor_inds.

  • regressor (sklearn.base.BaseEstimator) – The regressor object to use to fit the conditional embedding. If None, a sklearn.linear_model.Ridge instance is used with fit_intercept=False and alpha specified below.

  • alpha (float) – The ridge parameter used in the default Ridge regressor.

Returns

(pipe,``response``), where pipe is a pipeline consisting of the marginal of predictor_inds and the fitted regressor, and response is the marginal of response_inds.

Return type

(sklearn.pipelines.Pipeline, ProductRKHS)

fit(X, y=None)[source]

Fit the model from data in X.

Parameters

X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in fac.take and fac.filter for each fac in self.factors.

Returns

The instance itself

Return type

RKHS

kernel(X, Y=None)[source]

Evaluates the kernel on X and Y (or X and X) by multiplying the kernels of the factors.

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Data vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_.

  • Y (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Default = None. Data vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_. If None, X is used.

Returns

The matrix K[i,j] = k(X[i],Y[j])

Return type

numpy.ndarray of shape (n_samples_1,n_samples_2)

marginal(var_inds, copy_X=False)[source]

Construct a ProductRKHS from only the factors specified by var_inds. Only to be used if ProductRKHS is already fit, and you’d rather not fit again.

Parameters
  • var_inds (array -like of int) – List of indices of the factors in self.factors from which to the marginal ProductRKHS.

  • copy_X (bool) – Default = True. If True, input self.X_fit_ is copied and stored as the new model’s X_fit_ attribute. If no further changes will be done to X, setting copy_X=False saves memory by storing a reference.

Returns

The marginal ProductRKHS of the var_inds.

Return type

ProductRKHS

transform(X)[source]

Transform X.

Parameters

X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Data vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. These must match self.shape_in_.

Returns

The transformed data

Return type

numpy.ndarray of shape (n_samples,)+self.shape_out_