Set of utility functions supporting the frenchtext library.

A lot of the code in this module is adapted from the excellent fastai2 library.

It was originally written by Jeremy Howard and Sylvain Gugger. Thanks Jeremy and Sylvain !

External dependencies :

pip install requests

pip install fastprogress

Config

class Config[source]

Config()

config.datasets
PosixPath('/home/laurent/.frenchtext/datasets')
config.libdata
PosixPath('/mnt/c/Users/laure/OneDrive/Dev/Python/frenchtext/frenchtext/data')

Downloading

download_url[source]

download_url(url, dest, file_size=0, overwrite=False, pbar=None, show_progress=True, chunk_size=1048576, timeout=10, retries=3)

Download url to dest unless it exists and not overwrite

source = "https://onedrive.live.com/download?cid=196F0B5AFCED95CA&resid=196F0B5AFCED95CA%21468236&authkey=AJ_vuj54LGPSenQ"
dest = config.datasets / "assurance.dataset.zip"
file_size = 18223939

download_url(source,dest,file_size)
Extracting assurance.dataset.zip (this may last several seconds) ...
OK
!ls -l {config.datasets}
total 12487508
-rw-rw-rw- 1 laurent laurent   91136056 Mar  1 18:27 assurance.dataset.feather
-rw-rw-rw- 1 laurent laurent  144616048 Feb 15 21:45 banque.dataset.feather
-rw-rw-rw- 1 laurent laurent  186553136 Feb 15 21:47 bourse.dataset.feather
-rw-rw-rw- 1 laurent laurent  145062520 Feb 15 21:48 comparateur.dataset.feather
-rw-rw-rw- 1 laurent laurent   11902488 Feb 15 21:48 crédit.dataset.feather
-rw-rw-rw- 1 laurent laurent  962874856 Feb 15 21:50 forum.dataset.feather
-rw-rw-rw- 1 laurent laurent   31609912 Feb 15 21:50 institution.dataset.feather
-rw-rw-rw- 1 laurent laurent  921930504 Feb 15 21:51 presse-1.dataset.feather
-rw-rw-rw- 1 laurent laurent  855158864 Feb 15 21:54 presse-2.dataset.feather
-rw-rw-rw- 1 laurent laurent  809591952 Feb 15 21:55 presse-3.dataset.feather
-rw-rw-rw- 1 laurent laurent  958970872 Feb 15 21:56 presse-4.dataset.feather
-rw-rw-rw- 1 laurent laurent 1153696120 Feb 15 22:01 presse-5.dataset.feather
-rw-rw-rw- 1 laurent laurent 1452331608 Feb 15 22:09 presse-6.dataset.feather
-rw-rw-rw- 1 laurent laurent  543178032 Feb 15 22:10 siteinfo.dataset.feather
-rw-rw-rw- 1 laurent laurent  549966224 Feb 15 22:10 wikipedia-1.dataset.feather
-rw-rw-rw- 1 laurent laurent  729072464 Feb 15 22:12 wikipedia-2.dataset.feather
-rw-rw-rw- 1 laurent laurent 1069759688 Feb 15 22:13 wikipedia-3.dataset.feather
-rw-rw-rw- 1 laurent laurent 1086691712 Feb 15 22:16 wikipedia-4.dataset.feather
-rw-rw-rw- 1 laurent laurent 1083060912 Feb 15 22:17 wikipedia-5.dataset.feather