teachcompute.fctmr¶

simple map reduce functions¶

combiner¶

teachcompute.fctmr.simplefctmr.combiner(fctkey1: Callable, gen1: Iterable, fctkey2: Callable, gen2: Iterable, how: str = 'inner') → Iterable[source][source]¶

Joins (or combines) two generators. The function is written based on two reducers. The function is more efficient if the groups of the second ensemble gen2 are shorter as each of them will be held in memory.

Paramètres:

fctkey1 – function which returns the key for gen1
gen1 – generator for the first element
fctkey2 – function which returns the key for gen2
gen2 – generator for the second element
how – inner, outer, left, right*

Renvoie:

generator

combiner or join

<<<

from teachcompute.fctmr import combiner


def c0(el):
    return el[0]


ens1 = [("a", 1), ("b", 2), ("a", 3)]
ens2 = [("a", 10), ("b", 20), ("a", 30)]
res = combiner(c0, ens1, c0, ens2)
print(list(res))

>>>

    [(('a', 1), ('a', 10)), (('a', 1), ('a', 30)), (('a', 3), ('a', 10)), (('a', 3), ('a', 30)), (('b', 2), ('b', 20))]

ffilter¶

teachcompute.fctmr.simplefctmr.ffilter(fct: Callable, gen: Iterable) → Iterable[source][source]¶

Filters out elements from a generator.

Paramètres:

fct – function
gen – generator

Renvoie:

generator

filter

<<<

from teachcompute.fctmr import ffilter

res = ffilter(lambda x: x % 2 == 0, [4, 5])
print(list(res))

>>>

[4]

mapper¶

teachcompute.fctmr.simplefctmr.mapper(fct: Callable, gen: Iterable) → Iterable[source][source]¶

Applies function fct to a generator.

Paramètres:

fct – function
gen – generator

Renvoie:

generator

mapper

<<<

from teachcompute.fctmr import mapper

res = mapper(lambda x: x + 1, [4, 5])
print(list(res))

>>>

    [5, 6]

Différence entre un itérateur et un générateur ?

Un itérateur et un générateur produisent tous deux des éléments issus d’un ensemble. La différence vient du fait que qu’un itérateur parcourt les éléments d’un ensemble qui existe en mémoire. Un générateur produit ou calcule des éléments d’un ensemble qui n’existe pas en mémoire. Par conséquent, parcourir deux fois un ensemble avec un itérateur a un coût en $O(n)$ alors que pour un générateur, il faut ajouter le calcul de l’élément une seconde fois. Le coût est imprévisible et parfois il est préférable de cacher les éléments pour le parcourir plusieurs fois : cela revient à transformer un générateur en itérateur. Un générateur est souvent défini comme suit en Python :

<<<

def generate(some_iterator):
    for el in some_iterator:
        yield el


g = generate([4, 5])
print(list(g))
print(g.__class__.__name__)

>>>

    [4, 5]
    generator

reducer¶

teachcompute.fctmr.simplefctmr.reducer(fctkey: Callable, gen: Iterable, asiter: bool = True, sort: bool = True) → Iterable[source][source]¶

Implements a reducer.

Paramètres:

fctkey – function which returns the key
gen – generator
asiter – returns an iterator on each element of the group of the group itself
sort – sort elements by key before grouping

Renvoie:

generator

reducer

<<<

from teachcompute.fctmr import reducer

res = reducer(lambda x: x[0], [("a", 1), ("b", 2), ("a", 3)], asiter=False)
print(list(res))

>>>

    [('a', [('a', 1), ('a', 3)]), ('b', [('b', 2)])]

take¶

teachcompute.fctmr.simplefctmr.take(gen: Iterable, count: int = 5, skip: int = 0) → Iterable[source][source]¶

Skips and takes elements from a generator.

Paramètres:

gen – generator
count – number of elements to consider
skip – skip the first elements

Renvoie:

generator

take

<<<

from teachcompute.fctmr import take

res = take([4, 5, 6, 7, 8, 9], 2, 2)
print(list(res))

>>>

    [6, 7]

Parallel¶

fast_parallel_mapper¶

teachcompute.fctmr.fast_parallel_fctmr.fast_parallel_mapper(fct: Callable, gen: Iterable, chunk_size: int = 100000, parallel: bool = True, nogil: bool = False, nopython: bool = True, sigin: str | None = None, sigout: str | None = None) → Iterable[source][source]¶

Parallelizes a mapper based on numba and more specifically Automatic parallelization with @jit. This page indicates what numba optimizes when it parallizes a map.

Paramètres:

fct – function
gen – generator
chunk_size – chunk size
parallel –
see parallel
nopython – see nopython
nogil – see nogil
sigin – signature of input type
sigout – signature of output type

Renvoie:

generator

The parallelization can only happen if the array is known. So the function splits the array in chunck of size chunk_size. This tentative is not very efficient due to the genericity of the mapper. python is not a good language to do that. See unit test test_parallel_fctmr.py.

create_array_numba¶

teachcompute.fctmr.fast_parallel_fctmr.create_array_numba(nb: int, sig: str) → ndarray[source][source]¶

Creates an array of size nb knowing its signature.

Paramètres:

nb – integer
signature – signature, ex: 'f8'

Renvoie:

container

pyparallel_mapper¶

teachcompute.fctmr.pyparallel_fctmr.pyparallel_mapper(fct: Callable, gen: Iterable, threads: int | None = None) → Iterable[source][source]¶

Applies function fct to a generator. Relies on ThreadPool.

Paramètres:

fct – function
gen – generator
threads – number of threads

Renvoie:

generator

If the number of threads is None, it is replaced by os.cpu_count() or 1 (see multiprocessing.pool).

mapper

<<<

from teachcompute.fctmr.pyparallel_fctmr import pyparallel_mapper

res = pyparallel_mapper(lambda x: x + 1, [4, 5])
print(list(res))

>>>

    /usr/lib/python3.12/multiprocessing/pool.py:268: ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.ThreadPool state=RUN pool_size=20>
      _warn(f"unclosed running multiprocessing pool {self!r}",
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    [5, 6]

Unfortunately, the parallelization is not following the map/reduce concept in a sense that the function generates an intermediate list and creates an iterator on it.