ruckus.embedding.EigenRKHS class¶
- class ruckus.embedding.EigenRKHS(use_kernel='rbf', *, gamma=None, degree=3, coef0=1, kernel_params=None, n_jobs=None, n_nystrom_samples=1.0, sample_method='random', sample_iter=300, n_components=None, centered=False, eigen_solver='auto', tol=0, max_iter=None, iterated_power='auto', remove_zero_eig=False, random_state=None, take=None, filter=None, copy_X=True)[source]¶
Bases:
sklearn.decomposition._kernel_pca.KernelPCA
,ruckus.base.RKHS
EigenRKHS
is a child class ofsklearn.decomposition._kernel_pca.KernelPCA
, which adapts it to ourRKHS
class formula, allowing interactivity with other RKHS’s. We also add new options regarding centering and Nyström sampling for efficiency. Because of this dependency, our code and documentation inherits notably from that ofKernelPCA
, particularly in methods where only minor revisions were made.EigenRKHS
is initialized with a kernel \(k(x,y)\)—which now defaults to a Gaussian RBF—and computes the eigenvector decomposition \(k(x,y) = \sum_a \lambda_a \phi_a(x)\phi_a(x)\) to determine the feature mappings \(\phi(x)\) into the Hilbert space \(\mathcal{H}\). Because computing the eigenvectors scales cubically with the number of samples, we have added options for Nyström sampling, which selects a smaller subset of the data to use for the eigenvector computation, and then uses those eigenvectors to transform the remaining data [1].- Parameters
use_kernel (
str
orcallable
) – Default ="rbf"
. Seesklearn.decomposition._kernel_pca.KernelPCA
for kernel options.gamma (
float
) – Default =None
. Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other kernels. Ifgamma
isNone
, then it is set to1/n_features
.degree (
int
) – Default = 3. Degree for poly kernels. Ignored by other kernels.coef0 (
float
) – Default = 1. Independent term in poly and sigmoid kernels. Ignored by other kernels.kernel_params (
dict
) – Default =None
. Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels.n_jobs (
int
) – Default =None
. Number of parallel jobs to run. Seesklearn.decomposition._kernel_pca.KernelPCA
for details.n_nystrom_samples (
int
orfloat
) – Default =1.0
. The number of samples to draw fromX
to compute the SVD. Ifint
, then drawn_nystrom_samples
samples. If float, then drawn_nystrom_samples * X.shape[0]
samples.sample_method (
str
) – Default ="random"
. How to draw the Nyström samples. If"random"
, then subsample randomly with replacement. If"kmeans"
, then find then_nystrom_samples
optimal means.sample_iter – Default = 300. If
sample_method = "kmeans"
, the number of times to iterate the algorithm.n_components (
int
) – Default =None
. Number of components. If None, all non-zero components are kept.centered – Default =
False
. Whether to center the kernel before computing the SVD. This must beFalse
for embeddings of distributions to be valid.eigen_solver ({
"auto"
,"dense"
,"arpack"
,"randomized"
}) – Default ="auto"
. Solver to use for eigenvector computation. Seesklearn.decomposition._kernel_pca.KernelPCA
for details.tol (
float
) – Default = 0. Convergence tolerance for arpack. If 0, optimal value will be chosen by arpack.max_iter (
int
) – Default =None
. Maximum number of iterations for arpack. If None, optimal value will be chosen by arpack.iterated_power (
int >= 0
or"auto"
) – Default ="auto"
. Number of iterations for the power method computed bysvd_solver == "randomized"
. When"auto"
, it is set to 7 whenn_components < 0.1 * min(X.shape)
, other it is set to 4.remove_zero_eig (
bool
) – Default =False
. If True, then all components with zero eigenvalues are removed, so that the number of components in the output may be < n_components (and sometimes even zero due to numerical instability). When n_components is None, this parameter is ignored and components with zero eigenvalues are removed regardless.random_state (
int
) – Used when eigen_solver == “arpack” or “randomized”.sklearn.decomposition._kernel_pca.KernelPCA
for more details.take (
numpy.ndarray
ofdtype int
orbool
, ortuple
ofnumpy.ndarray
instances of typeint
, orNone
) – Default =None
. Specifies which values to take from the datapoint for transformation. IfNone
, the entire datapoint will be taken in its original shape. Ifbool
array, acts as a mask setting values markedFalse
to0
and leaving values marked True unchanged. Ifint
array, the integers specify the indices (along the first feature dimension) which are to be taken, in the order/shape of the desired input. Iftuple
ofint
arrays, allows for drawing indices across multiple dimensions, similar to passing atuple
to anumpy
array.filter (
numpy.ndarray
ofdtype float
orNone
) – Default =None
. Specifies a linear preprocessing of the data. Applied after take. IfNone
, no changes are made to the input data. If the same shape as the input datapoints,filter
and the datapoint are multiplied elementwise. Iffilter
has a larger dimension than the datapoint, then its first dimensions will be contracted with the datapoint vianumpy.tensordot()
. The final shape is determined by the remaining dimensions of filter.copy_X (
bool
) – Default =True
. IfTrue
, inputX
is copied and stored by the model in theX_fit_
attribute. If no further changes will be done toX
, settingcopy_X=False
saves memory by storing a reference.
- Parameters
eigenvalues_ (
numpy.ndarray
of shape(n_components,)
) – Eigenvalues of the centered kernel matrix in decreasing order. Ifn_components
andremove_zero_eig
are not set, then all values are stored.eigenvectors_ (
numpy.ndarray
of shape(n_samples,n_components)
) – Eigenvectors of the kernel matrix. Ifn_components
andremove_zero_eig
are not set, then all components are stored.shape_in_ (
tuple
) – The required shape of the input datapoints, aka the shape of the domain space \(X\).shape_out_ (
tuple
) – The final shape of the transformed datapoints, aka the shape of the Hilbert space \(H\).X_fit_ (
numpy.ndarray
of shape(n_samples,)+self.shape_in_
) – The data which was used to fit the model.X_nys_ (
numpy.ndarray
of shape(n_nystrom_samples,n_features_in_)
) – The nystrom subsamples of the data used to fit the model.n_features_in_ (
int
) – The size of the features after preprocessing.
- fit(X, y=None)[source]¶
Fit the model from data in
X
. This method filters the data, determines whether it is to be centered, and takes the specified Nyström subsamples. After this,sklearn.decomposition._kernel_pca.KernelPCA._fit_transform()
is invoked.- Parameters
X (
numpy.ndarray
of shape(n_samples, n_features_1,...,n_features_d)
) – Training vector, wheren_samples
is the number of samples and(n_features_1,...,n_features_d)
is the shape of the input data. Must be consistent with preprocessing instructions inself.take
andself.filter
. Final filtered data will be flattened on the feature axes.y (Ignored) – Not used, present for API consistency by convention.
- Returns
The instance itself
- Return type
RKHS
- fit_transform(X, y=None)[source]¶
Fit the model from data in
X
and transformX
.- Parameters
X (
numpy.ndarray
of shape(n_samples, n_features_1,...,n_features_d)
) – Training vector, wheren_samples
is the number of samples and(n_features_1,...,n_features_d)
is the shape of the input data. Must be consistent with preprocessing instructions inself.take
andself.filter
.- Returns
The transformed data
- Return type
numpy.ndarray
of shape(n_samples,)+self.shape_out_
- kernel(X, Y=None)[source]¶
Applies
self.take
andself.filter
to data, then calls_get_kernel()
for kernel evaluation.- Parameters
X (
numpy.ndarray
of shape(n_samples, n_features_1,...,n_features_d)
) – Data vector, wheren_samples
is the number of samples and(n_features_1,...,n_features_d)
is the number of features. Must be consistent with preprocessing instructions inself.take
andself.factors
.Y (
numpy.ndarray
of shape(n_samples, n_features_1,...,n_features_d)
) – Default =None
. Data vector, wheren_samples
is the number of samples and(n_features_1,...,n_features_d)
is the shape of the input data. Must be consistent with preprocessing instructions inself.take
andself.factors
. IfNone
,X
is used.
- Returns
The matrix
K[i,j] = k(X[i],Y[j])
- Return type
numpy.ndarray
of shape(n_samples_1,n_samples_2)
- transform(X)[source]¶
Transform
X
. This differs fromsklearn.decomposition._kernel_pca.KernelPCA.transform()
in the data preprocessing.- Parameters
X (
numpy.ndarray
of shape(n_samples, n_features_1,...,n_features_d)
) – Data vector, wheren_samples
is the number of samples and(n_features_1,...,n_features_d)
is the shape of the input data. These must matchself.shape_in_
.- Returns
The transformed data
- Return type
numpy.ndarray
of shape(n_samples,)+self.shape_out_