ruckus.embedding.EigenRKHS class¶
- class ruckus.embedding.EigenRKHS(use_kernel='rbf', *, gamma=None, degree=3, coef0=1, kernel_params=None, n_jobs=None, n_nystrom_samples=1.0, sample_method='random', sample_iter=300, n_components=None, centered=False, eigen_solver='auto', tol=0, max_iter=None, iterated_power='auto', remove_zero_eig=False, random_state=None, take=None, filter=None, copy_X=True)[source]¶
Bases:
sklearn.decomposition._kernel_pca.KernelPCA,ruckus.base.RKHSEigenRKHSis a child class ofsklearn.decomposition._kernel_pca.KernelPCA, which adapts it to ourRKHSclass formula, allowing interactivity with other RKHS’s. We also add new options regarding centering and Nyström sampling for efficiency. Because of this dependency, our code and documentation inherits notably from that ofKernelPCA, particularly in methods where only minor revisions were made.EigenRKHSis initialized with a kernel \(k(x,y)\)—which now defaults to a Gaussian RBF—and computes the eigenvector decomposition \(k(x,y) = \sum_a \lambda_a \phi_a(x)\phi_a(x)\) to determine the feature mappings \(\phi(x)\) into the Hilbert space \(\mathcal{H}\). Because computing the eigenvectors scales cubically with the number of samples, we have added options for Nyström sampling, which selects a smaller subset of the data to use for the eigenvector computation, and then uses those eigenvectors to transform the remaining data [1].- Parameters
use_kernel (
strorcallable) – Default ="rbf". Seesklearn.decomposition._kernel_pca.KernelPCAfor kernel options.gamma (
float) – Default =None. Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other kernels. IfgammaisNone, then it is set to1/n_features.degree (
int) – Default = 3. Degree for poly kernels. Ignored by other kernels.coef0 (
float) – Default = 1. Independent term in poly and sigmoid kernels. Ignored by other kernels.kernel_params (
dict) – Default =None. Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels.n_jobs (
int) – Default =None. Number of parallel jobs to run. Seesklearn.decomposition._kernel_pca.KernelPCAfor details.n_nystrom_samples (
intorfloat) – Default =1.0. The number of samples to draw fromXto compute the SVD. Ifint, then drawn_nystrom_samplessamples. If float, then drawn_nystrom_samples * X.shape[0]samples.sample_method (
str) – Default ="random". How to draw the Nyström samples. If"random", then subsample randomly with replacement. If"kmeans", then find then_nystrom_samplesoptimal means.sample_iter – Default = 300. If
sample_method = "kmeans", the number of times to iterate the algorithm.n_components (
int) – Default =None. Number of components. If None, all non-zero components are kept.centered – Default =
False. Whether to center the kernel before computing the SVD. This must beFalsefor embeddings of distributions to be valid.eigen_solver ({
"auto","dense","arpack","randomized"}) – Default ="auto". Solver to use for eigenvector computation. Seesklearn.decomposition._kernel_pca.KernelPCAfor details.tol (
float) – Default = 0. Convergence tolerance for arpack. If 0, optimal value will be chosen by arpack.max_iter (
int) – Default =None. Maximum number of iterations for arpack. If None, optimal value will be chosen by arpack.iterated_power (
int >= 0or"auto") – Default ="auto". Number of iterations for the power method computed bysvd_solver == "randomized". When"auto", it is set to 7 whenn_components < 0.1 * min(X.shape), other it is set to 4.remove_zero_eig (
bool) – Default =False. If True, then all components with zero eigenvalues are removed, so that the number of components in the output may be < n_components (and sometimes even zero due to numerical instability). When n_components is None, this parameter is ignored and components with zero eigenvalues are removed regardless.random_state (
int) – Used when eigen_solver == “arpack” or “randomized”.sklearn.decomposition._kernel_pca.KernelPCAfor more details.take (
numpy.ndarrayofdtype intorbool, ortupleofnumpy.ndarrayinstances of typeint, orNone) – Default =None. Specifies which values to take from the datapoint for transformation. IfNone, the entire datapoint will be taken in its original shape. Ifboolarray, acts as a mask setting values markedFalseto0and leaving values marked True unchanged. Ifintarray, the integers specify the indices (along the first feature dimension) which are to be taken, in the order/shape of the desired input. Iftupleofintarrays, allows for drawing indices across multiple dimensions, similar to passing atupleto anumpyarray.filter (
numpy.ndarrayofdtype floatorNone) – Default =None. Specifies a linear preprocessing of the data. Applied after take. IfNone, no changes are made to the input data. If the same shape as the input datapoints,filterand the datapoint are multiplied elementwise. Iffilterhas a larger dimension than the datapoint, then its first dimensions will be contracted with the datapoint vianumpy.tensordot(). The final shape is determined by the remaining dimensions of filter.copy_X (
bool) – Default =True. IfTrue, inputXis copied and stored by the model in theX_fit_attribute. If no further changes will be done toX, settingcopy_X=Falsesaves memory by storing a reference.
- Parameters
eigenvalues_ (
numpy.ndarrayof shape(n_components,)) – Eigenvalues of the centered kernel matrix in decreasing order. Ifn_componentsandremove_zero_eigare not set, then all values are stored.eigenvectors_ (
numpy.ndarrayof shape(n_samples,n_components)) – Eigenvectors of the kernel matrix. Ifn_componentsandremove_zero_eigare not set, then all components are stored.shape_in_ (
tuple) – The required shape of the input datapoints, aka the shape of the domain space \(X\).shape_out_ (
tuple) – The final shape of the transformed datapoints, aka the shape of the Hilbert space \(H\).X_fit_ (
numpy.ndarrayof shape(n_samples,)+self.shape_in_) – The data which was used to fit the model.X_nys_ (
numpy.ndarrayof shape(n_nystrom_samples,n_features_in_)) – The nystrom subsamples of the data used to fit the model.n_features_in_ (
int) – The size of the features after preprocessing.
- fit(X, y=None)[source]¶
Fit the model from data in
X. This method filters the data, determines whether it is to be centered, and takes the specified Nyström subsamples. After this,sklearn.decomposition._kernel_pca.KernelPCA._fit_transform()is invoked.- Parameters
X (
numpy.ndarrayof shape(n_samples, n_features_1,...,n_features_d)) – Training vector, wheren_samplesis the number of samples and(n_features_1,...,n_features_d)is the shape of the input data. Must be consistent with preprocessing instructions inself.takeandself.filter. Final filtered data will be flattened on the feature axes.y (Ignored) – Not used, present for API consistency by convention.
- Returns
The instance itself
- Return type
RKHS
- fit_transform(X, y=None)[source]¶
Fit the model from data in
Xand transformX.- Parameters
X (
numpy.ndarrayof shape(n_samples, n_features_1,...,n_features_d)) – Training vector, wheren_samplesis the number of samples and(n_features_1,...,n_features_d)is the shape of the input data. Must be consistent with preprocessing instructions inself.takeandself.filter.- Returns
The transformed data
- Return type
numpy.ndarrayof shape(n_samples,)+self.shape_out_
- kernel(X, Y=None)[source]¶
Applies
self.takeandself.filterto data, then calls_get_kernel()for kernel evaluation.- Parameters
X (
numpy.ndarrayof shape(n_samples, n_features_1,...,n_features_d)) – Data vector, wheren_samplesis the number of samples and(n_features_1,...,n_features_d)is the number of features. Must be consistent with preprocessing instructions inself.takeandself.factors.Y (
numpy.ndarrayof shape(n_samples, n_features_1,...,n_features_d)) – Default =None. Data vector, wheren_samplesis the number of samples and(n_features_1,...,n_features_d)is the shape of the input data. Must be consistent with preprocessing instructions inself.takeandself.factors. IfNone,Xis used.
- Returns
The matrix
K[i,j] = k(X[i],Y[j])- Return type
numpy.ndarrayof shape(n_samples_1,n_samples_2)
- transform(X)[source]¶
Transform
X. This differs fromsklearn.decomposition._kernel_pca.KernelPCA.transform()in the data preprocessing.- Parameters
X (
numpy.ndarrayof shape(n_samples, n_features_1,...,n_features_d)) – Data vector, wheren_samplesis the number of samples and(n_features_1,...,n_features_d)is the shape of the input data. These must matchself.shape_in_.- Returns
The transformed data
- Return type
numpy.ndarrayof shape(n_samples,)+self.shape_out_