teachcompute.fctmr¶
simple map reduce functions¶
combiner¶
- teachcompute.fctmr.simplefctmr.combiner(fctkey1: Callable, gen1: Iterable, fctkey2: Callable, gen2: Iterable, how: str = 'inner') Iterable [source][source]¶
Joins (or combines) two generators. The function is written based on two reducers. The function is more efficient if the groups of the second ensemble gen2 are shorter as each of them will be held in memory.
- Paramètres:
fctkey1 – function which returns the key for gen1
gen1 – generator for the first element
fctkey2 – function which returns the key for gen2
gen2 – generator for the second element
how – inner, outer, left, right*
- Renvoie:
generator
combiner or join
<<<
from teachcompute.fctmr import combiner def c0(el): return el[0] ens1 = [("a", 1), ("b", 2), ("a", 3)] ens2 = [("a", 10), ("b", 20), ("a", 30)] res = combiner(c0, ens1, c0, ens2) print(list(res))
>>>
[(('a', 1), ('a', 10)), (('a', 1), ('a', 30)), (('a', 3), ('a', 10)), (('a', 3), ('a', 30)), (('b', 2), ('b', 20))]
ffilter¶
- teachcompute.fctmr.simplefctmr.ffilter(fct: Callable, gen: Iterable) Iterable [source][source]¶
Filters out elements from a generator.
- Paramètres:
fct – function
gen – generator
- Renvoie:
generator
filter
<<<
from teachcompute.fctmr import ffilter res = ffilter(lambda x: x % 2 == 0, [4, 5]) print(list(res))
>>>
[4]
mapper¶
- teachcompute.fctmr.simplefctmr.mapper(fct: Callable, gen: Iterable) Iterable [source][source]¶
Applies function fct to a generator.
- Paramètres:
fct – function
gen – generator
- Renvoie:
generator
mapper
<<<
from teachcompute.fctmr import mapper res = mapper(lambda x: x + 1, [4, 5]) print(list(res))
>>>
[5, 6]
Différence entre un itérateur et un générateur ?
Un itérateur et un générateur produisent tous deux des éléments issus d’un ensemble. La différence vient du fait que qu’un itérateur parcourt les éléments d’un ensemble qui existe en mémoire. Un générateur produit ou calcule des éléments d’un ensemble qui n’existe pas en mémoire. Par conséquent, parcourir deux fois un ensemble avec un itérateur a un coût en alors que pour un générateur, il faut ajouter le calcul de l’élément une seconde fois. Le coût est imprévisible et parfois il est préférable de cacher les éléments pour le parcourir plusieurs fois : cela revient à transformer un générateur en itérateur. Un générateur est souvent défini comme suit en Python :
<<<
def generate(some_iterator): for el in some_iterator: yield el g = generate([4, 5]) print(list(g)) print(g.__class__.__name__)
>>>
[4, 5] generator
reducer¶
- teachcompute.fctmr.simplefctmr.reducer(fctkey: Callable, gen: Iterable, asiter: bool = True, sort: bool = True) Iterable [source][source]¶
Implements a reducer.
- Paramètres:
fctkey – function which returns the key
gen – generator
asiter – returns an iterator on each element of the group of the group itself
sort – sort elements by key before grouping
- Renvoie:
generator
reducer
<<<
from teachcompute.fctmr import reducer res = reducer(lambda x: x[0], [("a", 1), ("b", 2), ("a", 3)], asiter=False) print(list(res))
>>>
[('a', [('a', 1), ('a', 3)]), ('b', [('b', 2)])]
take¶
- teachcompute.fctmr.simplefctmr.take(gen: Iterable, count: int = 5, skip: int = 0) Iterable [source][source]¶
Skips and takes elements from a generator.
- Paramètres:
gen – generator
count – number of elements to consider
skip – skip the first elements
- Renvoie:
generator
take
<<<
from teachcompute.fctmr import take res = take([4, 5, 6, 7, 8, 9], 2, 2) print(list(res))
>>>
[6, 7]
Parallel¶
fast_parallel_mapper¶
create_array_numba¶
pyparallel_mapper¶
- teachcompute.fctmr.pyparallel_fctmr.pyparallel_mapper(fct: Callable, gen: Iterable, threads: int | None = None) Iterable [source][source]¶
Applies function fct to a generator. Relies on ThreadPool.
- Paramètres:
fct – function
gen – generator
threads – number of threads
- Renvoie:
generator
If the number of threads is None, it is replaced by
os.cpu_count() or 1
(see multiprocessing.pool).mapper
<<<
from teachcompute.fctmr.pyparallel_fctmr import pyparallel_mapper res = pyparallel_mapper(lambda x: x + 1, [4, 5]) print(list(res))
>>>
/usr/lib/python3.10/multiprocessing/pool.py:268: ResourceWarning: unclosed running multiprocessing pool <multiprocessing.pool.ThreadPool state=RUN pool_size=20> _warn(f"unclosed running multiprocessing pool {self!r}", ResourceWarning: Enable tracemalloc to get the object allocation traceback [5, 6]
Unfortunately, the parallelization is not following the map/reduce concept in a sense that the function generates an intermediate list and creates an iterator on it.