Installation and run
Installation
Depdendencies can be installed using pip.
pip install -r requirements.txt
Run
In order to run the analysis we simply execute the python program. This will create a folder named output containing the parquet database and the final html with the interactive visualization.
python -m COVID19_project
Run tests
Each transformation has its corresponding test. It is possible to run them with:
python -m unittest tests/test_*.py
Run in docker
It is possible to build a docker container from the dockerfile suministrated in the repository. This docker image is build uppon the jupyter-spark image and it comes with a jupyter lab interface. In order to build the image you can run:
docker build caviri/covid19:latest .
The in order to run the docker image you need to tunnel the ports. Jupyter uses 8888 port, and the pySpark UI uses 4040 ports.
docker run -p 10001:8888 -p 4041:4040 caviri/covid19
As an alternative to build your own image it is possible to pull a image from docker hub:
docker pull caviri/covid19:latest