1. What are the nine types of missing value imputation methods based on methodology?
The nine types of missing value imputation methods based on methodology are: (1) deletion methods, (2) neighbour-based methods, (3) constraint-based methods, (4) regression-based methods, (5) statistical-based methods, (6) matrix factorization/based methods, (7) expectation-maximization-based methods, (8) multi-layer perceptron-based methods, and (9) methods based on deep learning (DL).
read more
2. What scenarios were evaluated in the MLM task for imputing missing values in the time series data?
In the MLM task for imputing missing values in the time series data, two scenarios were evaluated. Scenario 1 involved imputing a single missing value at a specific position in the sequence. Scenario 2 involved imputing a missing value where all values were missing after a position in the sequence. These scenarios were used to assess the model's ability to predict and complete missing values in the time series data.
read more
3. How does BERT tokenize sentences?
BERT tokenizes sentences by dividing them into small units called tokens. These tokens can be words or sequences of characters. The model compares each token against all other tokens to gather contextual information. This information is stored as embeddings, which are numerical representations of the data. In Figure 1, the BERT input for a sequence of irradiance values is shown, demonstrating how the model processes and tokenizes the data. Additionally, BERT uses special tokens like '[CLS]', '[UNK]', '[PAD]', and '[SEP]' to identify the beginning of the sequence, unknown words, fill in missing spaces, and separate sentences, respectively. These tokens play a crucial role in the tokenization process and help the model understand the context and dynamics of the text.
read more
4. What is the time resolution of the first solar irradiance dataset?
The time resolution of the first solar irradiance dataset is 10 minutes. This dataset contains records from 112 meteorological stations in Galicia, stored in a tabular .csv file format, with information spanning two years (from February 2017 to February 2019). The variables observed at the stations include temperature, atmospheric pressure, precipitation, wind speed, wind direction, and solar irradiance. The dataset is crucial for the study, as it provides a detailed analysis of solar irradiance patterns over a specific geographical area and time period.
read more