ruckus.cv_wrappers module¶

class ruckus.cv_wrappers.ConditionalMapWrapper(prod_rkhs, predictor_inds, response_inds, regressor=None, alpha=1.0, scoring=None)[source]¶

Bases: sklearn.base.BaseEstimator

Cross-validation wrapper for constructing a ProductRKHS and conditioning some of its factor spaces on the others.

For two systems \(X\) and \(Y\), embedded in Hilbert spaces \(H_1\) and \(H_2\) respectively, the conditional distribution embedding is a linear map \(C_{Y|X}:H_1\rightarrow H_2\) such that \(C_{Y|X}\phi_1(x)\) gives the kernel embedding of the distribution of \(Y\) conditioned on \(X=x\). This is typically determined by using a ridge regression, though we allow the user to pass a custom regressor for model selection purposes. See [1] for details.

Muandet, K., Fukuzimu, K., Sriperumbudur, B., Schölkopf, B. “Kernel Mean Embedding of Distributions: A Review and Beyond.” Foundations and Trends in Machine Learning: Vol. 10: No. 1-2, pp 1-141 (2017)

Parameters

prod_rkhs (ProductRKHS) – The ProductRKHS instance to fit to the data.
predictor_inds (array -like of int) – List of indices of the factors in prod_rkhs.factors on which the response_inds will be conditioned.
response_inds – List of indices of the factors in prod_rkhs.factors which are to be conditioned on the predictor_inds.
regressor (sklearn.base.BaseEstimator) – The regressor object to use to fit the conditional embedding. If None, a sklearn.linear_model.Ridge instance is used with fit_intercept=False and alpha specified below.
alpha (float) – The ridge parameter used in the default Ridge regressor.
scoring (callable) – The scoring function which will be applied to the regressor. If None, joint_probs_hilbert_schmidt_scorer() is used.

Parameters

conditional_map (sklearn.pipelines.Pipeline) – A pipeline consisting of the marginal of predictor_inds and the fitted regressor.
marginal_response (ProductRKHS) – The marginal of response_inds.

fit(X, y=None)[source]¶

Fit the model from data in X.

Parameters: X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in fac.take and fac.filter for each fac in prod_rkhs.factors.
Returns: The instance itself
Return type: ConditionalMapWrapper

score(X)[source]¶

Scores the model’s performance on data X using the specified scoring function.

Parameters: X (numpy.ndarray of shape (n_samples, n_features_1,...,n_features_d)) – Training vector, where n_samples is the number of samples and (n_features_1,...,n_features_d) is the shape of the input data. Must be consistent with preprocessing instructions in fac.take and fac.filter for each fac in prod_rkhs.factors.
Returns: The score.
Return type: float