ruckus.embedding.OneHotRKHS class¶
- class ruckus.embedding.OneHotRKHS(axis=0, *, take=None, copy_X=True)[source]¶
Bases:
ruckus.base.RKHS
OneHotRKHS
is for processing categorical data. If \(X\) is a discrete set, this generates an embedding map \(\phi:X\rightarrow H\) into a Hilbert space \(H\) whose dimension is the cardinality of \(X\), such that \(\phi(x)\) maps the element \(x\) to a one-hot vector with the 1-valued component in the dimension which uniquely corresponds to \(x\).This is particularly advantageous when working with kernel embeddings of distributions, as the embedded distribution vector is itself a probability vector (positive components and sums to 1).
- Parameters
axis (int or tuple of ints) – Default = 0. Specifies the axis or axes along which unique entries will be determined. The alphabet will be taken as the unique subarrays indexed by the given axes, and the transformed vector will have the shape of the given axes + an additional axis indexing the alphabet. The 0 axis (that is, the sample axis) will always be included, even if not given.
take (
numpy.ndarray
ofdtype int
orbool
, ortuple
ofnumpy.ndarray
instances of typeint
, orNone
) – Default =None
. Specifies which values to take from the datapoint for transformation. IfNone
, the entire datapoint will be taken in its original shape. Ifbool
array, acts as a mask setting values markedFalse
to0
and leaving values marked True unchanged. Ifint
array, the integers specify the indices (along the first feature dimension) which are to be taken, in the order/shape of the desired input. Iftuple
ofint
arrays, allows for drawing indices across multiple dimensions, similar to passing atuple
to anumpy
array.copy_X (
bool
) – Default =True
. IfTrue
, inputX
is copied and stored by the model in theX_fit_
attribute. If no further changes will be done toX
, settingcopy_X=False
saves memory by storing a reference.
- Parameters
alphabet_ (
numpy.ndarray
ofobjects
) – The unique elements fromself.X_fit_
.shape_in_ (
tuple
) – The required shape of the input datapoints, aka the shape of the domain space \(X\).shape_out_ (
tuple
) – The final shape of the transformed datapoints, aka the shape of the Hilbert space \(H\).X_fit_ (
numpy.ndarray
of shape(n_samples,)+self.shape_in_
) – The data which was used to fit the model.
- fit(X, y=None)[source]¶
Fit the model from data in
X
.- Parameters
X (
numpy.ndarray
of shape(n_samples, n_features_1,...,n_features_d)
) – Training vector, wheren_samples
is the number of samples and(n_features_1,...,n_features_d)
is the shape of the input data. Must be consistent with preprocessing instructions inself.take
.y (Ignored) – Not used, present for API consistency by convention.
- Returns
The instance itself
- Return type
RKHS
- transform(X)[source]¶
Transform
X
.- Parameters
X (
numpy.ndarray
of shape(n_samples, n_features_1,...,n_features_d)
) – Data vector, wheren_samples
is the number of samples and(n_features_1,...,n_features_d)
is the shape of the input data. These must matchself.shape_in_
.- Returns
The transformed data
- Return type
numpy.ndarray
of shape(n_samples,)+self.shape_out_