Source de données#
Wikipédia#
- mlstatpy.data.wikipedia.download_dump(country, name, folder='.', unzip=True, timeout=-1, overwrite=False)[source][source]#
Downloads wikipedia dumps.
- Paramètres:
country – country
name – name of the stream to download
folder – where to download
unzip – unzip the file
timeout – timeout
overwrite – overwrite
- mlstatpy.data.wikipedia.download_pageviews(dt, folder='.', unzip=True, timeout=-1, overwrite=False)[source][source]#
Downloads wikipedia pagacount for a precise date (up to the hours), the url follows the pattern:
https://dumps.wikimedia.org/other/pageviews/%Y/%Y-%m/pagecounts-%Y%m%d-%H0000.gz
- Paramètres:
dt – datetime
folder – where to download
unzip – unzip the file
timeout – timeout
overwrite – overwrite
- Renvoie:
filename
More information on page pageviews.
- mlstatpy.data.wikipedia.download_titles(country, folder='.', unzip=True, timeout=-1, overwrite=False)[source][source]#
Downloads wikipedia titles from dumps.wikimedia.org/frwiki/latest/latest-all-titles-in-ns0.gz.
:param country country :param folder where to download :param unzip unzip the file :param timeout timeout :param overwrite overwrite