transform module

transform.py

This Python module contains all transformations applied to the data.

transform.calc_daily_difference(df: DataFrame, input_name: str, output_name: str) DataFrame

Calcultation of daily difference

Parameters
  • df – Input Spark DataFrame.

  • input_name – Name of the column to process.

  • output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.

Returns

Transformed DataFrame.

transform.calc_rolling_mean(df: DataFrame, temporal_window: int, input_name: str, output_name: str) DataFrame

Calcultation of rolling mean

Parameters
  • df – Input Spark DataFrame.

  • temporal_window – The size of the window when calculating the rolling mean.

  • input_name – Name of the column to process.

  • output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.

Returns

Transformed DataFrame.

transform.transform_col_date_to_datetime(df: DataFrame, input_name: str, output_name: str) DataFrame

Transform column date into to column in datetime type.

Parameters
  • df – Input Spark DataFrame.

  • input_name – Name of the column to update.

  • output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.

Returns

Transformed Spark DataFrame.

transform.transform_col_string_to_date(df: DataFrame, input_name: str, output_name: str) DataFrame

Transform column date in string to column date in date type.

Parameters
  • df – Input Spark DataFrame.

  • input_name – Name of the column to update.

  • output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.

Returns

Transformed Spark DataFrame.

transform.transform_data(df: DataFrame) DataFrame

Transform original dataset.

Parameters

df – Input Spark DataFrame.

Returns

Transformed Spark DataFrame.

transform.transform_item_date_to_datetime(date: date) datetime

transform Date format to Datetime.

It calculates the minimum datetime possible and combine it with the date.

Parameters

date – Input date.

Returns

Combined date and minimum time output.