Dynamic inferential NOx emission prediction model with delay estimation for SCR de-NOx process in coal-fired power plants

The selective catalytic reduction (SCR) decomposition of nitrogen oxide (de-NOx) process in coal-fired power plants not only displays nonlinearity, large inertia and time variation but also a lag in NOx analysis; hence, it is difficult to obtain an accurate model that can be used to control NH3 injection during changes in the operating state. In this work, a novel dynamic inferential model with delay estimation was proposed for NOx emission prediction. First, k-nearest neighbour mutual information was used to estimate the time delay of the descriptor variables, followed by reconstruction of the phase space of the model data. Second, multi-scale wavelet kernel partial least square was used to improve the prediction ability, and this was followed by verification using benchmark dataset experiments. Finally, the delay time difference method and feedback correction strategy were proposed to deal with the time variation of the SCR de-NOx process. Through the analysis of the experimental field data in the steady state, the variable state and the NOx analyser blowback process, the results proved that this dynamic model has high prediction accuracy during state changes and can realize advance prediction of the NOx emission.

harmful to human health and the environment. Meeting pollutant discharge regulations using traditional combustion control is difficult, so selective catalytic reduction (SCR) systems have been widely installed in the flue for the decomposition of nitrogen oxide (de-NO x ) [1]. The efficiency of the SCR de-NO x process can be easily affected by factors such as NH 3 injection, dilution air, reaction temperature and the catalyst activity. It is difficult to ensure the optimal ratio of NH 3 to NO x when the coal feed rate changes and the command of the automatic generation control fluctuates rapidly. The reasons for this are as follows: firstly, the SCR de-NO x process is nonlinear, has a large inertia and varies with time; secondly, the response of the NO x analyser has a large time delay of approximately 1 min; thirdly, every 50 min, the NO x analyser performs a blowback process lasting approximately 5 min. When the measured NO x emission value, which is maintained by the control processor during blowback, is too high or too low, the action of the proportional-integral-derivative (PID) control generally leads to an imbalance between the NH 3 injection and the required NO x reduction. This results in the NO x emission suddenly increasing or decreasing after blowback. This work aims to provide a method to predict the NO x emission in a timely manner through the operating variables in coal-fired power plants.
Many data-driven modelling techniques have recently emerged that established black-box models based on measured data from the SCADA system. Zambrano et al. [2] adopted the Hammerstein-Wiener model to optimize NH 3 injection. Krijnsen et al. [3] used neural networks (NN), nonlinear autoregressive exogenous (NARX) models and polynomial fitting to predict the NO x emission of a diesel engine. For coal-fired boilers, Peng et al. [4] established a hybrid ARX model with Gaussian radial basis function network-style coefficients under the steady state. Safdarnejad et al. [5] developed a data-driven model based on the recurrent NN model and the dynamic particle swarm optimizer to simultaneously estimate NO x and CO emissions. Tuttle et al. [6] presented a unique NN model using swappable synapse weights and the hybrid optimization approach in a combustion optimization system. For the SCR de-NO x process, Si et al. [7] used an improved online support vector regression (SVR) technique for modelling. Wu et al. [8] used an NO x emission prediction model that was only related to NH 3 injection. However, it would not be able to correctly reflect the other factors that affect the NO x emissions.
For complex chemical process, the high dimensionality and collinearity of the measured data make modelling difficult. The radial basis function kernel partial least square (RBF-KPLS) model can deal with the high dimensionality and collinearity of data [9]. If the sample features contain heterogeneous information, the use of a single kernel for mapping all the samples is not reasonable. Bao et al. [10] used a multi-scale kernel to improve the prediction accuracy of the support vector machine (SVM) model. For industrial process modelling, it is difficult to realize accurate results using the RBF kernel model. Zhang et al. [11] proposed the Morlet wavelet kernel SVR, and they verified that it has a smaller prediction error than the RBF kernel SVR via the mathematical function.
Because of the lag associated with NO x analysers, the determined NO x emission does not reflect the NH 3 flow in real time. The phase space of the model sample can be reconstructed by estimating the descriptor variable's delay time to improve the prediction accuracy [12]. In general, the delay time is estimated by field experiments, so its accuracy is usually low. The mutual information (MI) parameter can be used to analyse linear and nonlinear correlations [13]. For the SCR de-NO x process, the coal feed rate, inlet flue gas flow and inlet flue gas temperature affect the NO x emission, and there are interactions between the inlet flue gas flow and the inlet flue gas temperature.
To improve the accuracy of the NO x emission prediction model and solve the time-varying problem for the SCR de-NO x process, a novel dynamic inferential model is proposed in this paper. First, the k-nearest neighbour MI (knnMI) is used to estimate the time delay and reconstruct the model sample. Then, the model brings the Morlet wavelet kernel, which is able to effectively characterize data variation into a multi-scale KPLS. Finally, the delay time difference (DTD) method is used to update the model and the feedback correction strategy to correct the model. This paper is organized as follows: the theory of the knnMI estimator and the KPLS model are introduced in §2; §3 describes data preprocessing, delay estimation and model reconstruction, model update and correction approach and the framework of the dynamic inferential model; in §4, to evaluate the accuracy of the multi-scale wavelet kernel partial least square (mwKPLS) predictions, it is compared with the RBF-KPLS, multi-scale RBF-KPLS (mRBF-KPLS), wavelet KPLS (wKPLS), particle swarm optimization back propagation (PSO-BP) and SVR based on cross-validation optimization (CV-SVR) models using benchmark datasets; §5 details the experimental results of the dynamic inferential model for the SCR de-NO x process; finally, concluding remarks are provided in §6.

k-nearest neighbour mutual information estimator
Estimation of MI derives from the concept of entropy in information theory. As a measure of information, it reflects the measure of the statistical dependence between two variables. The basic histogram and kernel estimator that belong to the MI estimator are based on probability density estimation. However, they have weaknesses, such as computational complexity, low precision and large amounts of calculation in higher dimensions. The knnMI estimator avoids the shortcoming of exact probability density estimation, and it is simple and only requires a small amount of calculation, which can be summarized as follows [14].
Suppose a space Z = (x,y); here, the vectors x and y are each formed by 1 column and n samples. The estimate for the MI of vectors x and y is then where n x (i) is the number of sample points x j , whose distance from x i is strictly less than ε i /2, ε i /2 is denoted as the distance from x i to its kth neighbour; similarly, n y (i) is obtained instead of y, i∈[1, … , n]. Ψ(x) is the digamma function, Ψ(x) = Г(x) −1 dГ(x)/dx. It satisfies the recursion Ψ(x + 1) = Ψ(x) + 1/x and Ψ(1) ≈ −0.5772156. The symbol 〈 · · · 〉 indicates the mean of the variables in it.

Kernel partial least square model
Assuming the descriptor variable matrix X ∈ R n ×m , response variable vector Y ∈ R n ×1 , i = 1,2, … , n. For the kernel matrix K 0 , its centralized form is K 1 . X and Y are z-score normalized as X 1 and Y 1 .
The estimation of the KPLS model from the training set is described as follows [9]: 1. Normalizing the training set X 0 tr and Y 0 tr , to get X 1 tr and Y 1 tr . 2. Calculating the training kernel matrix K 0 tr ¼ kðx tr ,x tr Þ: ð2:2Þ 3. Centring the training kernel matrix where I is a unit matrix; 1 n is a matrix where all the elements are 1 with dimensions of n. 4. Let L be the number of principal components, and i iterates from 1 to L and randomly initializes the score vector u i of X 1 tr . 5. Calculate the score vector t i ð2:4Þ 6. Calculate the weight vector c i 8. Then steps (4)-(7) are repeated until t i converges. 9. The matrices K 1 tr and Y 1 tr are reduced until t and u are extracted.
where T and U are matrices that are composed of score vectors t and u.
The prediction for the test set by the KPLS model is similar to the training set, except for computation of the test kernel matrix K 0 te and the centralization of K 0 te K 0 te ¼ kðx te ,x tr Þ ð 2:10Þ and where nt is the number of the test set.
3. Dynamic inferential model with delay estimation

Data preprocessing
Data preprocessing includes outlier eliminating and data filtering, which are useful for building a stable model structure.
In this paper, the Pauta criterion was used to eliminate outliers. The formula for this is where x t is the suspected outlier at time t, x t is the sample mean at time t and σ t is the standard deviation of the sample at time t. If the above equation is satisfied, the outlier can be eliminated and replaced with the value of the linear interpolation.
To realize dynamic elimination of outliers, x t and σ t in equation (3.1) used the following equations [15]: where n is the sample size. In addition, the Butterworth filter was used to filter data.

Delay estimation and model samples reconstruction
Because the time delay between each set variable vector x .i and the response variable vector y is different, the phase space for each x .i is reconstructed by inserting a different time delay τ i ∈ [min(τ i ), max(τ i )] (min(τ i ) and max(τ i ) are determined by field measurements). The MI value is related to the dimension w of x 0 Ái . A suitable w should cover the most relevant data of x 0 Ái with y. Hence, the delay time τ i and dimension w at time t are calculated as max t i ¼t 0 i ,w i ¼w 0 i MI(½x Ái ðt À t i À w i þ1Þ, . . . ,x Ái ðt À t i À 1) ,x Ái ðt À t i Þ T ,½yðt À w i þ1Þ, . . . ,yðt À 1) ,yðtÞ T ) s:t: where T max is the maximum reaction time of the SCR process. The above equation is a constrained multivariable nonlinear optimization problem. For m set variables, there are 2 m variables that need to be optimized. Thus, within the scope of the above constraints, a global searcher based on a PSO algorithm maximizes the objective function, thereby obtaining an optimal t 0 ¼ ½t 0 royalsocietypublishing.org/journal/rsos R. Soc. open sci. 7: 191647 By estimating the time delay t 0 i of each set variable vector x .i , the reconstructed descriptor variable matrix X rc is assumed as follows:

Multi-scale wavelet kernel partial least square
The Morlet wavelet kernel adopted in this paper has a strong capability for characterizing data variation that can be used to construct the allowable multi-dimensional tensor product wavelet kernel. The mother function is To prove that the Morlet mother wavelet kernel is an admissible support vector kernel, the following definitions are first introduced.
Definition (3.1). (Mercer's condition [16]) In a double infinite dimensional square integrable space L 2 (Ω), the necessary condition for the kernel k(x, z) that can realize the dot product in a feature space for: The wavelet kernel is represented by the dot product as The tensor product wavelet kernel that satisfies the translation invariance theorem according to definition (3.2) is expressed as Proof. According to definition (3.1) and equation (3.7), let w(x) ∈ R and w(x) ≠ 0, hence cos 1:75 Because w(x) ≠ 0, F > 0 can be obtained, therefore, the Mercer's condition is satisfied. ▪ Theorem (3.2). On the basis of the Morlet wavelet kernel, the translation invariant wavelet kernel royalsocietypublishing.org/journal/rsos R. Soc. open sci. 7: 191647 Proof. Fourier transform for k(x) Because a > 0 and N > 1, then, F > 0. According to definition (3.3), the Morlet wavelet kernel is a permissible support vector kernel.
The multi-scale kernel takes into account the distribution characteristics of the samples in the original input space. Therefore, it improves the sparsity of the solution in the high-dimensional feature space. Based on the Morlet wavelet kernel, the multi-scale wavelet kernel is represented by where c is the scale parameter, a i is the wavelet kernel width and i = 1, …, c.
To prove that the multi-scale wavelet kernel preserves the finitely positive semi-definite 'kernel' property, the following theorems are introduced. ▪ Theorem (3.3). Kernel matrix is a positive semi-definite matrix. Proof. Let kernel matrix K = k(x i , x j ) = 〈ψ(x i ), ψ(x j )〉 and i, j = 1, …, n. Thus, any vector α ∈ R n satisfies: Hence, the multi-scale kernel matrix K is positive semi-definite. A kernel function with a certain kernel width is suitable for mapping a learning sample with a certain feature into a high-dimensional feature space; hence, the feature distribution number can be used as the optimal scale parameter. In this paper, fuzzy c-means (FCM) clustering was used to partition the sample feature distribution, so that the optimal classification is selected as the scaling parameter.
If the descriptor variable matrix X ∈ R n×m has c cluster centres, the fuzzy classification matrix U c×n denotes that n samples are partitioned into c classifications. Therefore, in the corresponding cluster centre matrix Z c×s , the sth index value is the average of the index value in accordance with the cth royalsocietypublishing.org/journal/rsos R. Soc. open sci. 7: 191647 classification sample ð3:12Þ Then the objective function is constructed The optimal fuzzy classification matrix U and the corresponding cluster centre matrix Z are solved, so that the objective function J reaches a minimum. Here, kX j À Z i k represents the Euclidean distance between the jth sample and the ith cluster centre.
The fuzzy classification uncertainty is If equation (3.14) is close to 1, the classification ambiguity is low and the FCM clustering effect is better. When equation (3.14) is at its maximum, that is ð3:15Þ When the above equation is satisfied, the optimal classification is realized for c Ã .

Dynamic model update method
In this paper, dynamic modelling was realized by increasing the inputs to the model; for this, the historical input x(t − 1), …, x(t − w + 1) and output y(t − 1), …, y(t − w + 1) were added as the new input. Furthermore, the time difference (TD) method can solve the variable drift problem and realize improved prediction accuracy compared with other update methods; furthermore, the data-driven model based on the TD method does not require frequent reconstruction and parameter updates [18,19]. The TD method first calculated the first-order TD between adjacent sampling data for the input and output. Here, Δx(t) and Δy(t) can be calculated as In this paper, because of the time-varying SCR de-NO x process and large time delay for NO x analysis, the DTD update method and feedback correction strategy are proposed. In §3.2, the model matrix is reconstructed by delay estimation. Therefore, the training regression model becomes Similarly, the DTD of the output can be predicted by: where ρ is 0.3,ỹðtÞ is the corrected dynamic model output,ŷðtÞ is the dynamic model output and y(t) is the real value.

Framework for the dynamic inferential model
The framework for the dynamic inferential model mainly includes data preprocessing, delay estimation, model sample reconstruction, the multi-scale wavelet kernel partial least square (mwKPLS) model, DTD update and feedback correction (figure 1). The steps of the algorithm are as follows: 1. Acquire the measured data for the selected variables before time t and confirm the raw samples.  5. Carry out FCM clustering on the reconstructed descriptor variable matrix X rc to determine the optimal scale parameter c Ã . 6. Normalize the training set and carry out estimation using the mwKPLS model. 7. Predict the NO x emission using the mwKPLS model based on a 1 , · · · , a c Ã . 8. Acquire the measured data for selected variables at time t + 1 and correct the predicted NO x emission value based on the feedback. 9. Repeat steps 2-8.

Benchmark dataset experiments
In this paper, two benchmark datasets-the concrete slump dataset [20] and the polymer dataset [21]were used to verify the prediction ability of the mwKPLS model. The parameters of the datasets are shown in table 1.
The following models were used for comparison with the mwKPLS model: RBF-KPLS, multi-scale RBF-KPLS (mRBF-KPLS), wKPLS, back propagation NN (BP-NN) based on PSO optimization (PSO-BP) and CV-SVR. The 10-fold CV method was adopted for parameter optimization in all the models except PSO-BP. To avoid parameters in a local optimum, a grid search was used to optimize the kernel width under the same search range, and the root mean square error (RMSE) was used as the evaluation index for model accuracy. The results of the experiment and the parameters of the algorithm that were optimized are shown in tables 2 and 3 (b indicates the number of hidden layer nodes, p indicates penalty parameter and σ indicates RBF kernel width).
(1) The wKPLS algorithm had a smaller RMSE value than the KPLS one for the training and test sets at the same c. The Morlet mother wavelet kernel is nearly orthogonal with the RBF kernel; hence, the fitting and generalizability of the wKPLS algorithm were improved.  (2) For the concrete slump dataset, W c (U) reached a maximum of 0.7334 when c = 2. However, when c = 3, the RMSE value of the training set decreased, and the RMSE value of the test set increased. This indicated that the optimal c was related to the sample features. If c was too large, the training accuracy of the model could be improved, but it may not improve the generalizability of the model. Therefore, FCM clustering was used to determine the optimal c effectively. For the polymer dataset, the c determined by the FCM clustering was at most 2. The mwKPLS algorithm had a smaller RMSE value than the wKPLS one for the training and test sets. The prediction accuracy for the mwKPLS algorithm was the highest. This indicated that the Morlet wavelet kernel is suitable for samples with multiple feature distribution.
(3) Compared with the PSO-BP and CV-SVR algorithms, mwKPLS had the highest prediction accuracy.
This indicated that the CV-SVR brought unnecessary redundancy or noise into the training model, resulting in the low prediction accuracy of the model. Because many parameters (except b) need to be optimized, the output of the PSO-BP model was not necessarily optimal.

SCR de-NO x process
In coal-fired power plants, the SCR de-NO x reaction is carried out in a reactor that is vertically installed between the boiler economizer and the air preheater. When NH 3 and air are mixed, the mixed air passing through the ammonia injection grille in the upper part of reactor reacts with the flue gas from the outlet of the economizer under the catalyst and then passes into the air preheater. Finally, the de-NO x exhaust gas is discharged into the atmosphere through the chimney. The flow chart for the SCR de-NO x process is shown in figure 2.

The selection of model variables and samples
The NO x emission is related to many factors, such as NH 3 injection, the dilution air volume, the reaction temperature and the catalyst activity. In addition, the boiler load change, coal quality and combustion conditions (such as the O 2 content) cause large fluctuations in the inlet NO x concentration. The selection of the descriptor variables is generally based on the mechanism of the process. Therefore, this paper mainly considers the steps for NO x formation and the mechanism of the SCR de-NO x reaction. NO x in the flue gas is mainly in the form of NO, with a smaller portion of NO 2 . The main reactions in SCR de-NO x process are as follows:  From the above reactions, the inlet NO x concentration and the NH 3 injection flow directly reflect the NH 3 /NO x molar ratio that affects the de-NO x efficiency and the NH 3 slip. Furthermore, the SCR reaction is affected by the inlet O 2 content. The NH 3 injection flow is mainly controlled to adapt for different boiler loads via the NH 3 valve. The inlet O 2 content directly affects the NO x emission concentration and de-NO x efficiency. Further, the boiler load change often affects the inlet flue gas flow, resulting in a change of the flue gas temperature by heat exchange. The change of the inlet flue gas temperature affects the speed of the SCR de-NO x reaction and the activity of the catalyst.
The experimental field data were continuously recorded in the DCS database of the coal-fired power plant. Assuming that the coal quality was constant, the state of unit covers the steady state and the variable state, in which the load varied between 700 and 900 MW, and the selected data should be continuous. Onedimensional linear interpolation was performed on the measured NO x emission during the blowback process, and any abnormal operation condition should be avoided. The sampling period was 10 s and a total of 2100 samples were collected. Table 4 shows the range of selected model variables.

Analysis of the data preprocessing results
An assumption of the Pauta criterion is that the data are normally distributed. While the operational data of the practical industrial process rarely conform to a normal distribution, it does not affect the effectiveness of the outlier elimination. The probability that the numerical distribution of industrial process data is within (μ − 3σ, μ + 3σ) is 0.9973. Taking the NO x emission concentration as an example, outlier elimination was performed using the Pauta criterion. Figure 3a shows that the Pauta criterion was able to detect some obvious outliers, such as data at 100, 250 and 318 min. These outliers were consistently mismatched with the baseline population, which adversely affected the statistical properties of the entire data. Figure 3b shows that the data before filtration have a large amount of high-frequency noise, which does not help stabilize the model. In this work, the order of the Butterworth filter used was 8 and the cut-off frequency was 0.9. After filtering, the high-frequency noise was eliminated to a large extent, and the filtered data could capture the change of the trend. Therefore, using adaptive filtering to process raw data was beneficial for the predictive model.

Analysis of the delay estimation result
According to the field test results, the maximum delay for the SCR denitration reaction is approximately 120-400 s, and the maximum delay for the boiler load that affects the inlet NO x concentration is 600 s. In this work, the sampling period was 10 s and T max in equation (3.4) was 120, and the range for the time delay is shown in table 5. As an example, the delay estimation results for each descriptor variable at time t = 250 min are shown in table 5.

Data correlation analysis
To analyse whether the descriptor variable x i and the response variable y is nonlinear and that there is multi-collinearity between the descriptor variables, correlation analysis was performed on the normalized data. The correlation structures and the Pearson correlation coefficient |r| are shown in figure 4. It can be seen from figure 4 that x i and y are nonlinear. Furthermore, |r| < 0.39, so there is a low correlation between x and y. The unit load, inlet O 2 content, inlet flue gas temperature and NO x emission concentration display very weak correlations, while the inlet NO x concentration and NO x emission concentration show a slightly stronger correlation. Therefore, the NO x emission concentration has a nonlinear relationship with the boiler load and inlet flue gas temperature; hence, the NO x emission may also increase as the boiler load decreases. In addition, a Pearson correlation coefficient |r| greater than 0.7 was observed between the descriptor variable vectors, including the total coal feed rate, inlet flue gas flow and unit load, inlet NO x concentration and NH 3 injection flow, inlet flue gas flow and inlet flue gas temperature showing strong correlation; indicating that there is high multiple correlation. For example, the flue gas flow can cause a change of the flue gas temperature, with a greater inlet flue gas flow resulting in a higher inlet flue gas temperature.   First, multi-scale characteristic analysis of the training set was performed. FCM clustering was used to determine the scale c, and W c (U ) was compared to obtain the optimal scale c Ã .
It can be seen from table 6 that when c = 2, W c (U) reached the maximum and the clustering effect of the training set after FCM clustering was the best. Therefore, in this paper, c = 2 was used as the multiscale parameter. Secondly, L is determined by the leave-one-out cross-validation. The relationship between L and R 2 k ðYÞ is shown in figure 5. It can be seen from figure 5 that when k = 4, the explained variance was R 2 k ðYÞ 0:0975 and the total explained variance R 2 (Y ) reached 93.17%. Noise would be included in the model if too many L were extracted, which would affect the prediction accuracy; therefore, L = 4 was selected for this work.
To analyse the effects of different variables and phase space reconstruction on model performance, a training sample of n = 500 and a test sample of nt = 200 were used. The comparison results are shown in table 7.
(1) Dynamic modelling strategies often use an incrementally set variable to bring the system's dynamic characteristics into the model. For the mwKPLS model, if the descriptor variable only adds x(t − 1), the fitting accuracy and the prediction accuracy would both be reduced. When y(t − 1) is added, the fitting accuracy and the prediction accuracy both improved, similar results were obtained for the knnMI-mwKPLS model.   (2) The performance of the mwKPLS and knnMI-mwKPLS models were improved by adding the y(t − 1) variable, and the influence of phase space reconstruction was then further analysed. From the results in table 7, it can be verified that the fitting accuracy of the training set and the prediction accuracy on the test set could both be improved.

Dynamic inferential model analysis
In this paper, the dynamic model and the corrected dynamic model were analysed, and the model update performance was verified with the field data for different operating states, including the steady state, variable state and the blowback process of NO x analyser.    The moving window method was used to select the steady-state samples and variable state from the preprocessed operating data. The steady-state determination criteria were evaluated using the stability factor (SF), which is shown by the below equation where N is the length of the moving window, x max and x min are the maximal and minimal values, respectively, of samples (x i , i = 1,…, N ) in the moving window and δ 0 is the SF given previously. In this work, the boiler load was chosen as the feature variable for the state judgement. The δ 0 was set to 0.083, N was 200 and the sampling period was 10 s. Finally, the steady state and variable state samples were obtained according to the above criteria, as shown in figure 6. It can be seen from figure 6a and table 8 that when the unit is in the steady state, the load is relatively stable. Therefore, the dynamic model showed good predictive accuracy with a low RMSE of 1.4540 mg m −3 and a high coefficient of determination (Q 2 ) of 0.9038. The configuration parameters for the model include L, c Ã , a 1 and a 2 , with selected value of 4, 2, 4 and 20, respectively. After feedback correction, the corrected dynamic model showed slightly improved predictive ability. For the variable state, the boiler load gradually increased and a large amount of NO x was produced. The predictive ability of the dynamic model is lower at the peak of the NO x emission curve in figure 6b. At this time, the model configuration parameters were the same as those for the steady state. After feedback correction, the corrected dynamic model demonstrated a clear improvement in its predictive ability.
The NO x analyser periodically performs a blowback operation to ensure the cleanliness of the sampling system. At this time, the final measured value of the NO x emission concentration is maintained until the end of the blowback process; therefore, this is an important scenario for the dynamic inferential model. The corrected dynamic model uses y(t) for modelling and y(t + 1) for correction; however, these values are not available at this point because y(t) and y(t + 1) are in the self-hold state during blowback. Therefore, it is necessary to substitute the predicted valueŷðtÞ for the real value y(t) to calculate the predicted valueŷðt þ 1Þ at time t + 1.  Assuming that the NO x analyser is in the blowback process from t = 300 min to t = 350 min and the model configuration parameters are L = 4, c Ã = 2, a 1 = 2 and a 2 = 18, the results show that the dynamic model can still maintain high accuracy, as seen from figure 7 and table 9; the deviation between the predicted value and the real value is small, which effectively tracks the change in the curve, even for the highest or lowest points. When the NO x analyser reverts from the blowback process to normal operation, there is only a small disturbance to the model output.
In addition, the dynamic model also does not need frequent reconstruction and parameter updates, which is similar to the TD method. After analysis of numerous experimental results, it can be stated that even if the dynamic model adopts a different parameter, it has minimal effect on the accuracy of the model's predictions. Therefore, the dynamic model used a fixed parameter. The average time for model training was only 3.47 s for each update of the model, which meets engineering requirements.

Conclusion
In this paper, the multi-scale kernel and the Morlet wavelet kernel were combined to propose a new kernel function. The prediction accuracy of the mwKPLS model based on the new kernel function was further improved, as confirmed via verification using benchmark datasets.
Due to the response lag of the NO x analyser and the large inertia of the SCR reaction, the knnMI estimator could realize delay estimation and the model samples could be reconstructed. Therefore, the dynamic inferential model was able to accurately predict the NO x emissions one sampling period in advance.
In practice, abnormal operational condition of the boiler and the SCR system should be avoided; in particular, the continuous emission monitoring system should be in a normal work mode to ensure accuracy of the measured data. Under normal operating condition, the dynamic inferential model could better track the NO x emission trend under conditions with large fluctuation. If the deviation between the predicted value and the set value was large or the NO x analyser was in the blowback process, the NH 3 injection could be adjusted in time to adapt for load change, which is beneficial for improving the de-NO x efficiency and reducing NH 3 slip, which lays the foundation for design of the controller.
Data accessibility. This article does not contain any additional data.  Figure 7. Predicted values using the dynamic model during the NO x analyser blowback process.