mlinsights.search_rank

SearchEngineVectors

class mlinsights.search_rank.search_engine_vectors.SearchEngineVectors(**pknn)[source]

Implements a kind of local search engine which looks for similar results assuming they are vectors. The class is using sklearn.neighborsNearestNeighbors to find the nearest neighbors of a vector and follows the same API. The class populates members:

Parameters:

pknn – list of parameters, see sklearn.neighbors.NearestNeighbors

fit(data=None, features=None, metadata=None)[source]

Every vector comes with a list of metadata.

Parameters:
  • data – a dataframe or None if the, the features and the metadata are specified with an array and a dictionary

  • features – features columns or an array

  • metadata – data

kneighbors(X, n_neighbors=None)[source]

Searches for neighbors close to X.

@param X features @return score, ind, meta

score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix, meta is the metadata

static read_zip(zipfilename, **kwargs)[source]

Restores the features, the metadata to a SearchEngineVectors.

Parameters:
Returns:

SearchEngineVectors

to_zip(zipfilename, **kwargs)[source]

Saves the features and the metadata into a zipfile. The function does not save the k-nn.

@param zipfilename a zipfile.ZipFile or a filename @param kwargs parameters for pandas.to_csv (for the metadata) @return zipfilename

The function relies on function to_zip. It only works for Python 3.6+.

SearchEnginePredictions

class mlinsights.search_rank.search_engine_predictions.SearchEnginePredictions(fct, fct_params=None, **knn)[source]

Extends class SearchEngineVectors by looking for neighbors to a vector X by looking neighbors to f(X) and not X. f can be any function which converts a vector into another one or a machine learned model. In that case, f will be set to a default behavior. See function mlinsights.mlmodel.ml_featurizer.model_featurizer().

Parameters:
fit(data=None, features=None, metadata=None)[source]

Every vector comes with a list of metadata.

Parameters:
  • data – a dataframe or None if the the features and the metadata are specified with an array and a dictionary

  • features – features columns or an array

  • metadata – data

Returns:

self

kneighbors(X, n_neighbors=None)[source]

Searches for neighbors close to X.

@param X features @return score, ind, meta

score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix, meta is the metadata.

SearchEnginePredictionImages

class mlinsights.search_rank.search_engine_predictions_images.SearchEnginePredictionImages(fct, fct_params=None, **knn)[source]

Extends class SearchEnginePredictions. Vectors are coming from images. The metadata must contains information about path names. We assume all images can hold in memory. An example can found in notebook Search images with deep learning (torch).

fit(iter_images, n=None)[source]

Processes images through the model and fits a k-nn.

Parameters:
  • iter_imagesIterator

  • n – takes n images (or len(iter_images))

kneighbors(iter_images, n_neighbors=None)[source]

Searches for neighbors close to the first image returned by iter_images. It returns the neighbors only for the first image.

Parameters:
  • iter_images

    Iterator

  • n_neighbors – number of neigbhors

Returns:

score, ind, meta

score is an array representing the lengths to points, ind contains the indices of the nearest points in the population matrix, meta is the metadata.