Source de données¶

Wikipédia¶

mlstatpy.data.wikipedia.download_dump(country, name, folder='.', unzip=True, timeout=-1, overwrite=False)[source][source]¶

Paramètres:

mlstatpy.data.wikipedia.download_pageviews(dt, folder='.', unzip=True, timeout=-1, overwrite=False)[source][source]¶

Downloads wikipedia pagacount for a precise date (up to the hours), the url follows the pattern:

https://dumps.wikimedia.org/other/pageviews/%Y/%Y-%m/pagecounts-%Y%m%d-%H0000.gz

Paramètres:

Renvoie:

filename

More information on page pageviews.

mlstatpy.data.wikipedia.download_titles(country, folder='.', unzip=True, timeout=-1, overwrite=False)[source][source]¶

:param country country :param folder where to download :param unzip unzip the file :param timeout timeout :param overwrite overwrite

mlstatpy.data.wikipedia.enumerate_titles(filename, norm=True, encoding='utf8')[source][source]¶

Enumerates titles from a file.

:param filename filename :param norm normalize in the function :param encoding encoding