DataCleaner¶
-
class
qf_lib.common.utils.data_cleaner.
DataCleaner
(dataframe: qf_lib.containers.dataframe.simple_returns_dataframe.SimpleReturnsDataFrame, threshold: float = 0.05)[source]¶ Bases:
object
Cleans data which is partially incomplete, e.g. has gaps
- Parameters
dataframe (SimpleReturnsDataFrame) – DataFrame of simple returns. If one column has more missing values than the threshold, it is removed from the result.
threshold (float) – top limit of missing data. If the amount of missing data in a series exceeds this limit, the series will be removed. It is a relative value (e.g. 0.02, which corresponds to 2% of the data from the series).
Methods
proxy_using_regression
(benchmark_tms, …)Removes columns from the DataFrame which have too many missing values.
proxy_using_value
(proxy_value)Removes columns from the DataFrame which have too many missing values.
-
proxy_using_regression
(benchmark_tms: qf_lib.containers.series.qf_series.QFSeries, columns_type: type) → qf_lib.containers.dataframe.simple_returns_dataframe.SimpleReturnsDataFrame[source]¶ Removes columns from the DataFrame which have too many missing values. Then, the missing data in the remaining columns is completed using regression with the benchmark.
- Parameters
benchmark_tms (QFSeries) – benchmark used indirectly to proxy the missing data in the Dataframe.
columns_type (type) – type of each column (e.g. PricesSeries, LogReturnsSeries)
- Returns
completed dataframe. However it can still contain missing data, because sometimes it is not possible to complete all data using regression (e.g. for data that is missing in the original series there is no corresponding benchmark value).
- Return type
-
proxy_using_value
(proxy_value: float) → qf_lib.containers.dataframe.simple_returns_dataframe.SimpleReturnsDataFrame[source]¶ Removes columns from the DataFrame which have too many missing values. Then, the missing data in the remaining columns is completed using a given proxy_value.
- Parameters
proxy_value (float) – value with which all the missing data should be filled
- Returns
completed dataframe without missing data
- Return type