ruckus.cv_wrappers module

class ruckus.cv_wrappers.ConditionalMapWrapper(prod_rkhs, predictor_inds, response_inds, regressor=None, alpha=1.0, scoring=None)[source]

Bases: sklearn.base.BaseEstimator

Cross-validation wrapper for constructing a ProductRKHS and conditioning some of its factor spaces on the others.

For two systems \(X\) and \(Y\), embedded in Hilbert spaces \(H_1\) and \(H_2\) respectively, the conditional distribution embedding is a linear map \(C_{Y|X}:H_1\rightarrow H_2\) such that \(C_{Y|X}\phi_1(x)\) gives the kernel embedding of the distribution of \(Y\) conditioned on \(X=x\). This is typically determined by using a ridge regression, though we allow the user to pass a custom regressor for model selection purposes. See [1] for details.

  1. Muandet, K., Fukuzimu, K., Sriperumbudur, B., Schölkopf, B. “Kernel Mean Embedding of Distributions: A Review and Beyond.” Foundations and Trends in Machine Learning: Vol. 10: No. 1-2, pp 1-141 (2017)

Parameters
  • prod_rkhs (ProductRKHS) – The ProductRKHS instance to fit to the data.

  • predictor_inds (array -like of int) – List of indices of the factors in prod_rkhs.factors on which the response_inds will be conditioned.

  • response_inds – List of indices of the factors in prod_rkhs.factors which are to be conditioned on the predictor_inds.

  • regressor (sklearn.base.BaseEstimator) – The regressor object to use to fit the conditional embedding. If None, a sklearn.linear_model.Ridge instance is used with fit_intercept=False and alpha specified below.

  • alpha (float) – The ridge parameter used in the default Ridge regressor.

  • scoring (callable) – The scoring function which will be applied to the regressor. If None, joint_probs_hilbert_schmidt_scorer() is used.

Parameters
  • conditional_map (sklearn.pipelines.Pipeline) – A pipeline consisting of the marginal of predictor_inds and the fitted regressor.

  • marginal_response (ProductRKHS) – The marginal of response_inds.

fit(X, y=None)[source]

Fit the model from data in X.

Parameters

X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in fac.take and fac.filter for each fac in prod_rkhs.factors.

Returns

The instance itself

Return type

ConditionalMapWrapper

score(X)[source]

Scores the model’s performance on data X using the specified scoring function.

Parameters

X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in fac.take and fac.filter for each fac in prod_rkhs.factors.

Returns

The score.

Return type

float