mlinsights.search_rank#
SearchEngineVectors#
- class mlinsights.search_rank.search_engine_vectors.SearchEngineVectors(**pknn)[source]#
Implements a kind of local search engine which looks for similar results assuming they are vectors. The class is using sklearn.neighborsNearestNeighbors to find the nearest neighbors of a vector and follows the same API. The class populates members:
features_
: vectors used to compute the neighborsknn_
: parameters for the sklearn.neighborsNearestNeighborsmetadata_
: metadata, can be None
- Parameters:
pknn – list of parameters, see
sklearn.neighbors.NearestNeighbors
- fit(data=None, features=None, metadata=None)[source]#
Every vector comes with a list of metadata.
- Parameters:
data – a dataframe or None if the, the features and the metadata are specified with an array and a dictionary
features – features columns or an array
metadata – data
- kneighbors(X, n_neighbors=None)[source]#
Searches for neighbors close to X.
@param X features @return score, ind, meta
score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix, meta is the metadata
- static read_zip(zipfilename, **kwargs)[source]#
Restores the features, the metadata to a
SearchEngineVectors
.- Parameters:
zipfilename – a zipfile.ZipFile or a filename
kwargs – parameters for pandas.read_csv
- Returns:
- to_zip(zipfilename, **kwargs)[source]#
Saves the features and the metadata into a zipfile. The function does not save the k-nn.
@param zipfilename a zipfile.ZipFile or a filename @param kwargs parameters for pandas.to_csv (for the metadata) @return zipfilename
The function relies on function to_zip. It only works for Python 3.6+.
SearchEnginePredictions#
- class mlinsights.search_rank.search_engine_predictions.SearchEnginePredictions(fct, fct_params=None, **knn)[source]#
Extends class
SearchEngineVectors
by looking for neighbors to a vector X by looking neighbors to f(X) and not X. f can be any function which converts a vector into another one or a machine learned model. In that case, f will be set to a default behavior. See functionmlinsights.mlmodel.ml_featurizer.model_featurizer()
.- Parameters:
fct – function f applied before looking for neighbors, it can also be a machine learned model
fct_params – parameters sent to function
mlinsights.mlmodel.ml_featurizer.model_featurizer()
knn – list of parameters, see
sklearn.neighbors.NearestNeighbors
SearchEnginePredictionImages#
- class mlinsights.search_rank.search_engine_predictions_images.SearchEnginePredictionImages(fct, fct_params=None, **knn)[source]#
Extends class
SearchEnginePredictions
. Vectors are coming from images. The metadata must contains information about path names. We assume all images can hold in memory. An example can found in notebook Search images with deep learning (torch).- fit(iter_images, n=None)[source]#
Processes images through the model and fits a k-nn.
- Parameters:
iter_images – Iterator
n – takes n images (or
len(iter_images)
)
- kneighbors(iter_images, n_neighbors=None)[source]#
Searches for neighbors close to the first image returned by iter_images. It returns the neighbors only for the first image.
- Parameters:
iter_images –
n_neighbors – number of neigbhors
- Returns:
score, ind, meta
score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix, meta is the metadata.