transform module
transform.py
This Python module contains all transformations applied to the data.
- transform.calc_daily_difference(df: DataFrame, input_name: str, output_name: str) DataFrame
Calcultation of daily difference
- Parameters
df – Input Spark DataFrame.
input_name – Name of the column to process.
output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.
- Returns
Transformed DataFrame.
- transform.calc_rolling_mean(df: DataFrame, temporal_window: int, input_name: str, output_name: str) DataFrame
Calcultation of rolling mean
- Parameters
df – Input Spark DataFrame.
temporal_window – The size of the window when calculating the rolling mean.
input_name – Name of the column to process.
output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.
- Returns
Transformed DataFrame.
- transform.transform_col_date_to_datetime(df: DataFrame, input_name: str, output_name: str) DataFrame
Transform column date into to column in datetime type.
- Parameters
df – Input Spark DataFrame.
input_name – Name of the column to update.
output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.
- Returns
Transformed Spark DataFrame.
- transform.transform_col_string_to_date(df: DataFrame, input_name: str, output_name: str) DataFrame
Transform column date in string to column date in date type.
- Parameters
df – Input Spark DataFrame.
input_name – Name of the column to update.
output_name – Name of the column where to store the result. If input_name is equal to output name, the column will be overwritten.
- Returns
Transformed Spark DataFrame.
- transform.transform_data(df: DataFrame) DataFrame
Transform original dataset.
- Parameters
df – Input Spark DataFrame.
- Returns
Transformed Spark DataFrame.
- transform.transform_item_date_to_datetime(date: date) datetime
transform Date format to Datetime.
It calculates the minimum datetime possible and combine it with the date.
- Parameters
date – Input date.
- Returns
Combined date and minimum time output.