Installation and run

Installation

Depdendencies can be installed using pip.

pip install -r requirements.txt

Run

In order to run the analysis we simply execute the python program. This will create a folder named output containing the parquet database and the final html with the interactive visualization.

python -m COVID19_project

Run tests

Each transformation has its corresponding test. It is possible to run them with:

python -m unittest tests/test_*.py

Run in docker

It is possible to build a docker container from the dockerfile suministrated in the repository. This docker image is build uppon the jupyter-spark image and it comes with a jupyter lab interface. In order to build the image you can run:

docker build caviri/covid19:latest .

The in order to run the docker image you need to tunnel the ports. Jupyter uses 8888 port, and the pySpark UI uses 4040 ports.

docker run -p 10001:8888 -p 4041:4040 caviri/covid19

As an alternative to build your own image it is possible to pull a image from docker hub:

docker pull caviri/covid19:latest