teachcompute.fctmr

simple map reduce functions

combiner

teachcompute.fctmr.simplefctmr.combiner(fctkey1: Callable, gen1: Iterable, fctkey2: Callable, gen2: Iterable, how: str = 'inner') Iterable[source][source]

Joins (or combines) two generators. The function is written based on two reducers. The function is more efficient if the groups of the second ensemble gen2 are shorter as each of them will be held in memory.

Paramètres:
  • fctkey1 – function which returns the key for gen1

  • gen1 – generator for the first element

  • fctkey2 – function which returns the key for gen2

  • gen2 – generator for the second element

  • howinner, outer, left, right*

Renvoie:

generator

combiner or join

<<<

from teachcompute.fctmr import combiner


def c0(el):
    return el[0]


ens1 = [("a", 1), ("b", 2), ("a", 3)]
ens2 = [("a", 10), ("b", 20), ("a", 30)]
res = combiner(c0, ens1, c0, ens2)
print(list(res))

>>>

    [(('a', 1), ('a', 10)), (('a', 1), ('a', 30)), (('a', 3), ('a', 10)), (('a', 3), ('a', 30)), (('b', 2), ('b', 20))]

ffilter

teachcompute.fctmr.simplefctmr.ffilter(fct: Callable, gen: Iterable) Iterable[source][source]

Filters out elements from a generator.

Paramètres:
  • fct – function

  • gen – generator

Renvoie:

generator

filter

<<<

from teachcompute.fctmr import ffilter

res = ffilter(lambda x: x % 2 == 0, [4, 5])
print(list(res))

>>>

    [4]

mapper

teachcompute.fctmr.simplefctmr.mapper(fct: Callable, gen: Iterable) Iterable[source][source]

Applies function fct to a generator.

Paramètres:
  • fct – function

  • gen – generator

Renvoie:

generator

mapper

<<<

from teachcompute.fctmr import mapper

res = mapper(lambda x: x + 1, [4, 5])
print(list(res))

>>>

    [5, 6]

Différence entre un itérateur et un générateur ?

Un itérateur et un générateur produisent tous deux des éléments issus d’un ensemble. La différence vient du fait que qu’un itérateur parcourt les éléments d’un ensemble qui existe en mémoire. Un générateur produit ou calcule des éléments d’un ensemble qui n’existe pas en mémoire. Par conséquent, parcourir deux fois un ensemble avec un itérateur a un coût en O(n) alors que pour un générateur, il faut ajouter le calcul de l’élément une seconde fois. Le coût est imprévisible et parfois il est préférable de cacher les éléments pour le parcourir plusieurs fois : cela revient à transformer un générateur en itérateur. Un générateur est souvent défini comme suit en Python :

<<<

def generate(some_iterator):
    for el in some_iterator:
        yield el


g = generate([4, 5])
print(list(g))
print(g.__class__.__name__)

>>>

    [4, 5]
    generator

reducer

teachcompute.fctmr.simplefctmr.reducer(fctkey: Callable, gen: Iterable, asiter: bool = True, sort: bool = True) Iterable[source][source]

Implements a reducer.

Paramètres:
  • fctkey – function which returns the key

  • gen – generator

  • asiter – returns an iterator on each element of the group of the group itself

  • sort – sort elements by key before grouping

Renvoie:

generator

reducer

<<<

from teachcompute.fctmr import reducer

res = reducer(lambda x: x[0], [("a", 1), ("b", 2), ("a", 3)], asiter=False)
print(list(res))

>>>

    [('a', [('a', 1), ('a', 3)]), ('b', [('b', 2)])]

take

teachcompute.fctmr.simplefctmr.take(gen: Iterable, count: int = 5, skip: int = 0) Iterable[source][source]

Skips and takes elements from a generator.

Paramètres:
  • gen – generator

  • count – number of elements to consider

  • skip – skip the first elements

Renvoie:

generator

take

<<<

from teachcompute.fctmr import take

res = take([4, 5, 6, 7, 8, 9], 2, 2)
print(list(res))

>>>

    [6, 7]

Parallel

fast_parallel_mapper

teachcompute.fctmr.fast_parallel_fctmr.fast_parallel_mapper(fct: Callable, gen: Iterable, chunk_size: int = 100000, parallel: bool = True, nogil: bool = False, nopython: bool = True, sigin: str | None = None, sigout: str | None = None) Iterable[source][source]

Parallelizes a mapper based on numba and more specifically Automatic parallelization with @jit. This page indicates what numba optimizes when it parallizes a map.

Paramètres:
  • fct – function

  • gen – generator

  • chunk_size – chunk size

  • parallel

    see parallel

  • nopython – see nopython

  • nogil – see nogil

  • sigin – signature of input type

  • sigout – signature of output type

Renvoie:

generator

The parallelization can only happen if the array is known. So the function splits the array in chunck of size chunk_size. This tentative is not very efficient due to the genericity of the mapper. python is not a good language to do that. See unit test test_parallel_fctmr.py.

create_array_numba

teachcompute.fctmr.fast_parallel_fctmr.create_array_numba(nb: int, sig: str) ndarray[source][source]

Creates an array of size nb knowing its signature.

Paramètres:
  • nb – integer

  • signature – signature, ex: 'f8'

Renvoie:

container

pyparallel_mapper

teachcompute.fctmr.pyparallel_fctmr.pyparallel_mapper(fct: Callable, gen: Iterable, threads: int | None = None) Iterable[source][source]

Applies function fct to a generator. Relies on ThreadPool.

Paramètres:
  • fct – function

  • gen – generator

  • threads – number of threads

Renvoie:

generator

If the number of threads is None, it is replaced by os.cpu_count() or 1 (see multiprocessing.pool).

mapper

<<<

from teachcompute.fctmr.pyparallel_fctmr import pyparallel_mapper

res = pyparallel_mapper(lambda x: x + 1, [4, 5])
print(list(res))

>>>

    /usr/lib/python3.10/multiprocessing/pool.py:268: ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.ThreadPool state=RUN pool_size=8>
      _warn(f"unclosed running multiprocessing pool {self!r}",
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    [5, 6]

Unfortunately, the parallelization is not following the map/reduce concept in a sense that the function generates an intermediate list and creates an iterator on it.