Deep learning based on small sample dataset: prediction of dielectric properties of SrTiO3-type perovskite with doping modification

The perovskite crystal structure represents a semiconductor material poised for widespread application, underpinned by attributes encompassing heightened efficiency, cost-effectiveness and remarkable flexibility. Notably, strontium titanate (SrTiO3)-type perovskite, a prototypical ferroelectric dielectric material, has emerged as a pre-eminent matrix material for enhancing the energy storage capacity of perovskite. Typically, the strategy involves augmenting its dielectric constant through doping to enhance energy storage density. However, SrTiO3 doping data are plagued by significant dispersion, and the small sample size poses a formidable research hurdle, hindering the investigation of dielectric property and energy storage density enhancements. This study endeavours to address this challenge, our foundation lies in the compilation of 200 experimental records related to SrTiO3-type perovskite doping, constituting a small dataset. Subsequently, an interactive framework harnesses deep neural network models and a one-dimensional convolutional neural network model to predict and scrutinize the dataset. Distinctively, the mole percentage of doping elements exclusively serves as input features, yielding significantly enhanced accuracy in dielectric performance prediction. Lastly, rigorous comparisons with traditional machine learning models, specifically gradient boosting regression, validate the superiority and reliability of deep learning models. This research advances a novel, effective methodology and offers a valuable reference for designing and optimizing perovskite energy storage materials.


Introduction
Perovskite materials have attracted widespread attention owing to their high photoluminescence quantum yield, high colour purity, tunable bandgap, wide colour gamut, high carrier mobility and long carrier diffusion length [1].In the realm of complex oxide heterostructures with ABO 3 -type perovskite structures, notable physical phenomena have been observed.These phenomena encompass significant magnetoresistance, metal-insulator transitions, high-Tc superconductivity and two-dimensional electron gases.Strontium titanate (SrTiO 3 ) stands as a quintessential inorganic compound with an ABO 3 -type perovskite structure.It serves as a widely used electronic functional ceramic material, boasting high dielectric constant, low dielectric loss and excellent thermal stability.It finds extensive applications in the realms of electronics, mechanics and ceramics industries [2].SrTiO 3 exhibits a broad bandgap (3.4 eV), excellent photocatalytic activity and distinct electromagnetic properties, along with redox catalytic activity.Consequently, it has found extensive use in photocatalytic domains, including photocatalytic water splitting for hydrogen production and the degradation of organic pollutants and photochemical cells [3][4][5][6][7].Doping is a commonly used method to modify the electronic structure and magnetism of SrTiO 3 , allowing for its functionalization as a semiconductor, ferroelectric and magnetic material [8][9][10].
However, the impact of doping on the dielectric properties and energy storage density of SrTiO 3 is not yet clear and requires extensive experimental data and theoretical calculations to support it.With the rapid development of computers, various computational methods such as density functional theory [11,12], first-principles calculations [13], molecular dynamics [14] and finite element models [15] have been widely applied and used for high-throughput screening.Although there have been advances in algorithms and computing conditions, performing large-scale first-principles calculations is still very expensive.In the meantime, the abundance of accessible databases in materials science has made the use of data-driven machine learning (ML) methods possible, which are increasingly being used to bypass these calculations [16,17].Examples of perovskite property prediction include band characteristics [18], optimal composition [19], dielectric performance [20], bandgap energy [21] and dielectric breakdown strength [22].
According to the choice of fitting algorithms, ML research is generally divided into two categories.The first category includes traditional ML methods [23], such as support vector regression, Gaussian process regression (GPR) and other non-neural network models.Stanev et al. [24] modelled the critical temperatures (Tc) of over 12 000 known superconductors available via the SuperCon database using several ML methods.Lin et al. [20] and Li et al. [25], respectively, predicted the polycrystalline dielectric constant and bandgap value of perovskite using traditional ML methods.All these models demonstrated extremely low prediction error with perfect predictive performance; however, a large amount of data in the datasets were based on publicly available databases calculated using mathematical models, which allowed the traditional ML models to have good fitting performance.In contrast, these models exhibited problems of low prediction accuracy, slow convergence speed and poor generalization ability when dealing with experimental data.This is because, in the field of perovskites and materials, research funding for experiments is expensive, leading to dispersed and sparse experimental data, small sample sizes and difficulties collecting and unifying datasets, which severely obstructs the study of the relationships between perovskite composition, structure and properties.
At this time, the proposal of neural networks with strong data fitting and feature extraction capabilities led researchers to explore deep neural networks (DNNs) with many hidden layers that could be trained like the human nervous system, which is another type of ML method-deep learning (DL).The DNN model is a ML model based on an artificial neural network, as shown in figure 1, where circles represent neurons, arrows represent connections between neurons, W j represents the weight matrix of the jth layer and the number h i in parentheses represents the number of neurons in the ith layer.To deal with the difficulty of collecting experimental data, building small sample datasets and relying on reliable experimental data, neural networks can describe material properties in high-dimensional space as functions of composition and process parameters [26], effectively solving the shortcomings of traditional ML models, improving prediction accuracy and convergence speed and enhancing generalization ability.DNN has been widely applied in the field of materials science: in terms of predicting material properties, DNN can learn the relationship between material structure and properties to quickly and accurately predict physical, chemical, mechanical, electronic and other properties of materials [27,28].For example, DNN can predict the elastic modulus of various types of materials such as metal alloys, ceramics and polymers [27], as well as the optoelectronic properties such as bandgap and carrier mobility of perovskite materials [28]; in terms of constructing phase diagrams, DNN can automatically construct phase diagrams of multi-component systems by learning the relationship between material stable phases and formation energies at different compositions and temperatures [29].For example, DNN can construct phase diagrams of complex ternary systems such as aluminium-nickel-cobalt and Al-Ni-Zr [29] and verify them against experimental or first-principles computational results; in terms of material structure characterization, DNN can accelerate the design and characterization of material physical properties by learning the relationship between material structure features at different scales and target functions.For example, a DNN model can accelerate the design and characterization of the mechanical properties of non-uniform cellular materials by combining them with finite element analysis [30].
Although these methods have demonstrated that DNN can learn complex nonlinear mappings from large amounts of data and achieve or exceed human-level performance on multiple tasks, there are still very few studies on predicting small sample datasets in the field of materials science.This is because, unlike traditional ML models, DNN models cannot directly make predictions or evaluations by calling the model.The difficulty of training DNN lies in the fact that as the number of hidden layers (i.e.network depth) increases, gradients gradually disappear, leading to traps of poor local minima [31].Feng et al. [26] and Yu et al. [32] both avoided local minima using stacked autoencoders (SAE) and fine-tuning, respectively, and achieved higher accuracy and smaller errors than traditional ML models in predicting the sensitivity of solidification cracking in material defects and the mechanical properties of aluminium alloys based on small sample datasets.However, the problem of parameter explosion in DNN remains unsolved.Since DNN adopts a fully connected form, the connections in the structure bring about an order of magnitude of weight parameters, which not only easily leads to overfitting but also easily causes traps in local optima.Moreover, pre-training and fine-tuning algorithms require high computer knowledge and typically require interdisciplinary researchers in both computer and materials science fields to conduct research, resulting in a waste of significant economic and time costs.Another neural network architecture belonging to the deep learning method is the convolutional neural network (CNN), which mainly improves the problem of parameter explosion in DNN.The network structure is shown in figure 2 [33], where not all upper and lower layer neurons are directly connected, but connected through 'convolution kernels' as intermediaries (partial connections).Owing to the characteristic of limiting the number of parameters and mining local structures, most research on CNN has focused on image recognition [34][35][36][37].However, CNN's role in data regression problems is also extremely powerful.Cao et al. [38] combined CNN and Long Short Term Memory (LSTM) models to achieve high-accuracy prediction of water plant operation data; Malek et al. [39] used one-dimensional CNN to extract features from spectral data, combined with Support Vector Machine (SVM) and GPR for accurate regression prediction; Kołodziej et al. [40] estimated heart rate variability using one-dimensional CNN and achieved higher performance accuracy than Multilayer Perceptron (MLP) and SVM.These applications have all demonstrated CNN's strong data fitting and feature extraction capabilities.However, there are very few studies on data regression in the field of materials science.Therefore, constructing a CNN model to predict material properties can fill this gap and achieve improved prediction accuracy and convergence speed.
In this work, experiments were conducted to investigate the doping flexibility around SrTiO3 (STO), with Sr 2+ ions being doped at A-sites by Ba 2+ , Ca 2+ , Pb 2+ , Bi 3+ , etc., and Ti 4+ being doped at B-sites by Zr 4+ , Al 3+ , etc. [41][42][43].Furthermore, some rare earth elements such as Nd 3+ , Sm 3+ , Pr 3+ and Dy 3+ have been shown to improve the performance of the STO system [44][45][46][47][48].Reliable experimental data from literature and data collected by the research group on STO doping modification were used to construct a small dataset based on the STO perovskite system, which includes process parameters such as element molar ratio, sintering temperature, preparation method, as well as physical descriptors such as cell parameters, cell volume and microstructure [49].CNN and DNN models were constructed to predict the dielectric properties of the STO doping system, achieving better prediction accuracy and generalization performance than traditional ML models (gradient boosting regression, GBR).Both CNN and DNN models demonstrated high predictive performance for the small dataset, providing a new modelling approach for studying the correlation between composition, structure and properties in STO data.

Methodology
The construction of traditional ML models, such as the GBR model discussed in this article, can be divided into the following five steps: (i) collecting a dataset containing the target attribute; (ii) generating a feature set based on prior knowledge and intuition to describe the characteristics of specific materials; (iii) identifying important features highly correlated with the target attribute through feature selection; (iv) evaluating candidate ML algorithms and selecting the best algorithm; and (v) testing the effectiveness of the model on new data outside the dataset by applying the model to it [23].Unlike traditional ML models that mainly focus on feature selection for model prediction, the emphasis and difficulty in deep learning methods lies in constructing predictive models suitable for the dataset.Currently, popular deep learning models include CNNs, DNNs and recurrent neural networks (RNNs).Visual Geometry Group (VGG) [50], proposed by a research team at the University of Oxford, is among those that have achieved tremendous success in practical applications.It is characterized by its use of small convolution kernels, which increases the depth of the model and improves its accuracy.GoogleNet [51], proposed by a research team at Google, uses an inception module that allows the model to increase its depth and accuracy without adding parameters.
In the present study, all construction steps can be divided into data collection, data preprocessing, building GBR, DNN and CNN models, prediction and regression analysis (as shown in figure 3).The GBR model used in this work was implemented in the Python programming language, which is commonly used in ML owing to being free and having numerous open-source packages that aid in ML, such as pandas [52], numpy [23], scipy [20] and scikit-learn [25].The architectures of the DNN and CNN models were built using Matlab R2021a [53], which has a powerful deep learning toolbox [54] to assist researchers in constructing deep learning models suitable for their data, including mainstream models like VGG and GoogleNet.

Data collection
Vendik et al. [55,56] proposed analytical equations for calculating the complex dielectric constant of ferroelectric and paraelectric ferroelectric materials under different temperatures and electric fields.The equations are based on the traditional Landau theory, taking into account four energy dissipation mechanisms.Liu et al. [57] proposed a formula for calculating the complex dielectric constant of ceramic Ba x Sr 1-x TiO 3 with the recipe formula (2.1) In this formula, G E, T, x, ξ s is the real part of the ferroelectric body's dielectric response Green's function, x represents the proportion of barium, T represents temperature and f represents the operating frequency of the bias field.ε 00 x is the simulation of the Curie-Weiss constant C, which can be expressed as ε 00 = C/Tc.ξ s is the statistical dispersion of the bias field (also known as the defect factor), which reflects the 'quality' of the material and corresponds to defects in the material (including oxygen vacancies and non-uniformities).Γ 1, 2, 3, 4 represents the four energy dissipation (loss) mechanisms considered in the original model.Although this formula only applies to the BaTiO3 (BST) system, it also provides a reference for collecting data on the STO system.Starting from this formula, we established an experimental database of STO ceramic materials by selecting literature and research group experimental data.For non-data-oriented literature, if the data are displayed in graphical format rather than numerical format, we use WebPlotDigitizer [58] to extract the data from the graph.The dielectric performance of ceramics refers to the polarization degree and loss in an electric field, which is influenced by factors such as material composition, structure and lattice defects of ceramics.The dielectric constant properties of SrTiO 3 refer to its polarization degree in an electric field, which is influenced by factors such as temperature, frequency and doping, as evidenced by experimental data reported in the literature.This is why we chose the dielectric constant data of the STO system to establish our database.In the experimental testing of dielectric constants, different sample thicknesses and preparation process parameters are used, making it challenging to compare and draw consistent conclusions from data obtained from different studies.However, this is not an issue for DNNs.On the contrary, when we add these features to the dataset, the probability of CNNs, DNNs and other deep learning models discovering complex hidden relationships increases with the increase in data space variations.
In this study, a total of 200 sets of dielectric constant data for SrTiO 3 doped and modified were collected from the literature and our research group.Unlike traditional ML approaches, the dataset used in this study is solely derived from experimental data, bypassing the complex and often resourceintensive feature extraction process typically required in conventional ML.This approach eliminates the need for extensive computational resources and time-consuming feature selection procedures.By using only the material compositions, specifically the stoichiometric ratios between the STO system and the dopants (Sr, Ti, O, Sm, Ba, Er, Hf, Zn, B, Si, Nd, Mn, Zr, Bi, Mg, Ca, Al and Sn), transformed into unified molar ratios represented in mol%, this study achieves remarkable predictive performance within the framework of deep learning models.

Data screening and processing
By using the Pearson correlation coefficient to screen the features in the dataset, assuming two variables X and Y, we have: in formula (2.2), X and Y are the means of X and Y, respectively.The Pearson correlation coefficients calculated range from −1 to 1, where a value close to 1 indicates a positive correlation between the two variables, a value close to −1 indicates a negative correlation and a value closer to 0 indicates less correlation between the two variables.Features with high correlation are removed, while those with low Pearson correlation coefficients are retained as inputs for ML.As shown in figure 4, when considering elements in the STO system, it was found that the Pearson correlation coefficients of Sr, Ti and O elements with the dopant element molar ratio had low correlation, and were, therefore, all retained as feature inputs for ML. (2.2) Unlike previous literature that mainly focused on 'good' data, which refer to data obtained by improving the dielectric properties of the STO system through doping modification, this study collected a portion of data that failed to improve or even decrease the dielectric constant of the STO system.These so-called 'bad' data, which are not suitable for traditional ML models for prediction, were also collected into the database and used to train the model.In summary, this measurement database comprises 16 distinct doping elements, totalling 200 data points (see electronic supplementary material), serving as the doping dataset for the STO system.As emphasized in this work, most of the data in this database represent STO composites; however, there is also a part that represents other components such as BST.

Data analysis
The histogram in figure 5

Data preprocessing 2.2.1. Data normalization
To conform to ML input requirements, this study converted the collected mass ratios of STO and doping materials into molar ratios and normalized the data.Normalization is an essential preprocessing step in ML that can help the model converge faster.In this work, all features were normalized before the training process.The Matlab function mapminmax can be used to normalize the data for preprocessing by adjusting the parameters, and it can also be used to perform inverse normalization for regression prediction.Formula (2.3) shows the principle formula of this function for normalization processing, where X represents the input data to be processed, X max and X min are the maximum and minimum values, respectively, and Y represents the normalized output matrix.The values of Y max and Y min are generally set to 1 and −1, respectively.

Division of dataset
To achieve an unbiased division of training and test data, we employed a stratified sampling approach.Initially, all data were grouped, and within each group, random selections were made for inclusion in the test set.This resulted in a random 9:1 split for selecting training and test data, using 9/10 of the dataset (180 data points) for training.The remaining 1/10 of the data (20 data points) was excluded to assess the generalization performance of the trained neural network, specifically its prediction accuracy for unseen data.
It is important to note that during the training process, as depicted in figure 3a of the humanmachine interaction framework for constructing the DNN, the training set data were further divided into training set 1 and validation set in a 7:2 ratio.This division was also applied during the training of the CNN, ensuring the precision of the constructed DNN and CNN models.

Deep neural networks 2.3.1. Framework for constructing deep neural networks
Figure 1 shows the model architecture of the DNN, which generally consists of an input layer, hidden layers and an output layer.While both the input and output layers consist of only one layer, the number of hidden layers is generally not fixed.The relationship between the number of hidden layers and the performance of the DNN model is shown in table 1 [31].
In theory, deeper hidden layers should enhance the fitting ability of the model and yield better results.However, in practice, too many hidden layers may lead to overfitting, increase the training difficulty, and make the model difficult to converge.Therefore, when using BP neural networks, it is best to refer to existing models with excellent performance.If there are no such models, start with one or two hidden layers, as shown in the table, and try to avoid using too many layers.One should not disregard practical considerations and blindly stack multiple neural networks together.It is also helpful to use transfer learning and fine-tuning [23] of pre-trained models, which can result in significant performance improvements.In material science regression problems, some published models have shown excellent performance with a DNN structure consisting of four hidden layers (21-(6-5-4-3)-1) for predicting solidification cracking susceptibility (SCS) [26], while others have used a three-hidden-layer DNN structure (14-(24-16-8)-1) for predicting the mechanical properties of aluminium alloys [32].Based on this study, we established the best DNN model with 1-5 hidden layers and compared the prediction accuracy of different numbers of hidden layers.
Although the DNN model has shown strong generalization ability for defects and properties of these materials, it also indicates that the number of hidden layers and the number of nodes in each layer of the DNN model are not fixed when dealing with different small datasets.Currently, there is no scientific or universal method for determining the number of hidden layer nodes.The basic principle of selecting the number of hidden layer nodes is to use a compact structure with as few hidden nodes as possible while meeting the accuracy requirements.It is found that the number of hidden nodes is not only related to the number of nodes in the input/output layer but also to the complexity of the problem to be solved, the type of activation function used in the transformation and the characteristics of sample data.The number of nodes in the hidden layer is generally less than N-1, where N is the number of training samples.
However, since the DNN model requires the determination of a large number of parameters, it also incurs significant time and economic costs.Therefore, the real challenge is to develop effective DNN models that can fit small-sample experimental data well.In this work, we proposed a humanmachine interactive learning framework, as shown in figure 3a.Based on the theoretical model (2.1) established by humans on the knowledge of the STO doping system, we collected and screened experimental datasets.We initialized the model parameters such as the number of hidden layers, the activation function of each layer and the number of nodes per layer using existing models or literature experience.We pre-trained the DNN model according to the dataset and initialized parameters set by human knowledge, and adjusted the model parameters based on the training results, including the number of hidden layer nodes, to achieve model 'reconstruction'.As shown in the framework of figure 3a, we learned and established the DNN model, obtained the optimal parameters through pre-training, and reconstructed the DNN to obtain the best DNN model structure for predicting and regression analysis of this small dataset.
The DNN construction method provided in this study, based on training data, is precisely the method used to determine the number of hidden layers and corresponding nodes.The number of nodes per hidden layer is determined using the empirical formula (2.4). (2.4) where m is the number of input layer nodes, n is the number of output layer nodes and a is generally an integer between 1 and 10.The range of hidden layer numbers is from 1 to 5, and the activation functions are, respectively, Y = tansig(x), Y = purelin(x), and Y = logsig(x), ReLU (y = max (0, x)).Our proposed framework is accepted by the inherent learning abilities of DL algorithms, which excel at quickly learning from large amounts of data, while humans use their analytical knowledge to abstract different domains to predict new situations, capturing even the slightest changes.

Construct deep neural network model
Based on the human-computer interaction framework shown in figure 3a, a DNN model is constructed.The DNN model is initialized with 1-5 hidden layers, and a three-layer hidden layer architecture is used as an example, as shown in figure 3b.The pre-training phase involves training the model for 1000 epochs with a learning rate of 0.01 and a target minimum error of 0.000001.The mean squared error (MSE) on the validation set is calculated as the prediction accuracy during the pre-training phase, and it is used to determine the optimal number of nodes in the hidden layers.Based on the optimal number of hidden layers and nodes, DNN models with 1-5 hidden layers are established and trained again with the same network parameters as the pre-training phase.The trained models are then used for simulation, and the predicted results are denormalized.The denormalized results are compared with the true values of the testing set, and the calculated error values are used as the prediction error of the model.

Convolutional neural network 2.4.1. Framework for constructing convolutional neural network
Figure 2 shows the structure and operating principle of CNN, which generally includes an input layer, convolutional layer, pooling layer, fully connected layer and output layer.The output layer is the regression layer for regression prediction.The steps for constructing a CNN model are shown in figure 3d.After preprocessing, data are input into the input layer and then processed by the convolutional layer and activation function.In this study, we used one-dimensional valid convolution as an example with a tensor I of length 5 and a kernel K of length 3. I is the input data matrix, and K is the convolution kernel.K moves along I sequentially, and at each fixed position, the corresponding values are multiplied and summed.Valid convolution only considers the case where K can completely cover I, that is K moves internally within I, as shown in figure 3c.
After the data are processed by the convolution operation, they are processed by the activation function before entering the pooling layer.The so-called activation function is a nonlinear mapping of the output results of the convolutional layer.If no activation function is used (which is equivalent to f(x) = x), in this case, the output of each layer is a linear function of the input of the previous layer.It can be concluded that no matter how many neural network layers there are, the output is a linear combination of the input, which is the same as the effect of no hidden layer, which is the most primitive perceptron.
Pooling, also known as under-or down-sampling, is mainly used for feature dimensionality reduction, compressing the number of data and parameters, reducing overfitting and improving the model's fault tolerance.There are mainly two types: max pooling and average pooling, and we used the former in this study.
When the data reaches the fully connected layer, all neurons between the two layers are connected with weights, usually at the end of the CNN.That is, the connection method of the neuron is the same as that of the traditional neural network, as shown in figure 1, but the activation function needs to be added in the middle of the fully connected layer for processing.

Construct convolutional neural network model
The CNN model employed in this research comprises eight layers in its basic architecture, as illustrated in table 2. It includes an input layer, convolutional layers, pooling layers and an output layer, as well as two activation function layers and a fully connected layer.It uses the adaptive moment estimation (ADAM) algorithm as its optimization function.ADAM is characterized by its adaptability, fast convergence, low memory requirements and robustness, making it a widely adopted choice in the field of deep learning.The maximum number of training epochs is set at 1000, with a batch size of 24 for each iteration.The gradient threshold is set to 1, controlling the magnitude of gradients and truncating them when they exceed the threshold.The initial learning rate is set at 0.005.

Principles of the gradient boosting regression model
The GBR model improves the predictive accuracy of the final regression results by minimizing the algorithm generated during the training process.The final model is an optimized individual decision tree and a staged additive ensemble that infers f(X) → E. The target property of interest, denoted as 'E', is predicted using input data and features of the perovskite material, represented as 'X'.The relationship between the input and output is modelled using a function, denoted as 'f(X)'.ML aims to determine this function by using a well-defined set of labelled perovskite data (known as the training set) and using the trained model to predict the target property 'E' of new perovskite materials not included in the training set.Typically, the algorithm weights several weak ensembles for predicting E, all of which are obtained from separate training exercises.The ensemble can be represented in the form of equation (2.5).
Equation (2.6) expresses the optimal weight coefficients after minimization using the loss function, where n is the training time, X is the input data, E is the target energy property, w n is the distribution weight vector, f X i , w n is the regression function and F n − 1 X i is the current model.The loss function L uses either squared error or absolute error. (2.5)

Parameters for the gradient boosting regression model
The GBR model employed in this study underwent 'hyperparameter optimization' to tailor it for data regression prediction in small datasets.Among these parameters, the maximum number of iterations was set to 2000, and the learning rate, typically tuned in conjunction with the former, was set to 0.1 in this study.The decision tree's maximum depth was set at 2, the minimum number of samples required for internal node splitting was 2 and the loss function used was MSE.

Prediction accuracy
For each ML model, three standard errors are used to assess the performance accuracy of the prediction exercise.They include the mean absolute error (MAE), the root mean square error (RMSE) and the R-square or coefficient of determination (R 2 ).The MAE equation (2.7) measures the average magnitude of errors in the set of predictions.The RMSE equation (2.8) has the benefit of penalizing large errors and is useful when large errors are particularly undesirable.The R 2 metric equation (2.9) is a statistical measure of how close the data fit to a regression line and is measured in percentage.Both the MAE and RMSE express average model predictions in the units of the variable and are negatively oriented scores, meaning the lower, the better [60].

11
royalsocietypublishing.org/journal/rsos R. Soc.Open Sci.11: 231464 (2.9) where RSE is the relative square error.Here, E i ¯ denotes the predicted value of the variable E; and E i ¯ is the mean value with summation over n number of samples, that is, i = 1, 2, ..., n.

Training GBR/DNN/CNN
In addition, this research also conducted training and testing of the GBR model to validate the accuracy advantage of DNN and CNN.The GBR model was trained and predicted using preprocessed input data, using Python language and the GBR model from the scikit-learn library.The structure of the DNN is illustrated in figure 3b.It initially takes input from all elements, and the DNN thus constructed consists of 18 input neurons (each corresponding to one input variable), hidden neurons (with varying numbers) and one output neuron.The transfer function for the output layer is a linear function, purelin.The training algorithm used for the model is trainlm.During the pre-training phase, using the human-machine interaction framework proposed in this study (as depicted in figure 3a), the DNN with the following structure was trained: 18-(N1)-1, where N1, the number of nodes in the hidden layer, is determined based on empirical formula (2.2), ranging from 5 to 14.In the pre-training phase, other parameter values were set as follows: maximum training epochs (epoch) as 1000, learning rate as 0.01 and target minimum error as 0.000001.Figure 6 illustrates the variation of pre-training error with the increase in the number of nodes in the hidden layer, and it is observed that the optimal number of hidden nodes is 5, with a corresponding MSE of 0.84306.Thus, the final DNN structure with one hidden layer is determined as 18-(5)-1.
Similarly, DNNs with different configurations, including two hidden layers with 13 and 12 nodes, respectively, structured as 18-(13-12)-1, and DNNs with 3, 4 and 5 hidden layers structured as 18-(8-7-6)-1, 18-(11-10-10-9)-1 and 18-(11-10-10-9-9)-1, respectively, were trained using random initialization and pre-training techniques such as SAE and fine-tuning.The numbers inside the parentheses represent the neuron indices of the corresponding hidden layers.The optimal DNN models for the entire input dataset were confirmed through a human-computer interaction framework and pre-training, and they were found to be 18-( 5  CNNs were trained using error analysis on a validation set, and the lowest RMSE of 0.0695 was achieved after 1000 rounds of training.The trained models were then used for data regression prediction. Multiple iterations of training with different random seeds were performed for each GBR and DNN configuration, exceeding 100 times, and the best-performing DNN, GBR and CNN models were selected for comparison.The GBR model used fivefold cross-validation to prevent overfitting, and data normalization was applied to DNN and CNN to prevent overfitting as well.The GBR model in this study was implemented using the scikit-learn library in Python, with data preprocessing and regression prediction.All computations of deep learning models were conducted in Matlab R2021a using the statistics and ML toolbox, neural network toolbox and regression learners.Owing to the small dataset and pre-training, the training time for DNN and CNN configurations on a personal computer ranged from a few seconds to several tens of seconds, significantly shorter than training deep learning models for image recognition, which requires multiple graphics processing units (GPUs) and many hours or even days.Finally, MAE, RMSE and R 2 values between the computed target values and the predicted values from GBR, DNN and CNN were used as evaluation metrics for model prediction accuracy.

Compare model prediction accuracy 3.2.1. Original dataset
Using the established DNN and CNN models, as well as the GBR model, regression prediction was performed on the unseen dataset from the original dataset.The results are presented in table 3.
From the table, we can discern that the DNN model constructed using the human-machine interaction framework performs exceptionally well, particularly the DNN model with four hidden layers, which exhibits the best fit.It achieves an impressively high R 2 value of 0.87515, surpassing the GBR, a traditional ML model, which achieves an R 2 value of 0.8648.However, the one-layer DNN model demonstrates very poor predictive accuracy on the original dataset, rendering its R 2 value meaningless.Furthermore, the prediction results exhibit significant errors, with both MAE and RMSE values even surpassing those of the GBR model.This discrepancy can be attributed to the inherent complexity of the STO dataset's data space, underscoring that a single hidden layer DNN model is unsuitable for data regression prediction in this dataset.
As the number of hidden layers increases, the model accuracy shows a trend of initially increasing and then decreasing, which is not a linear increase, similar to the trend observed with the number of hidden layer nodes.This finding is consistent with the results of other researchers.Firstly, when we increase the number of neurons in the hidden layer, the training accuracy of the neural network easily reaches high scores, but the testing accuracy has limitations.Secondly, blindly increasing the number of hidden layers does not linearly increase the prediction accuracy of the DNN model and may  even increase the errors and decrease the accuracy.This is one of the main reasons why researchers explore higher testing accuracy of DNN, representing true learning ability (training accuracy can be understood as memory ability).
It is worth noting that the CNN model's performance on the original dataset is not satisfactory, with lower predictive accuracy compared with the GBR model.Furthermore, in the predictions on the original dataset, although the R 2 values for all three models (excluding the one-layer DNN) are relatively high, their error values MAE and RMSE are both around 500 and 1000, indicating significant errors.This suggests that all models exhibit poor performance within the original dataset's range.Additionally, the deep learning models, DNN and CNN, do not demonstrate better regression fitting capabilities for the original dataset compared with traditional ML methods commonly used in materials-related applications.This discrepancy can be attributed to the fact that the input features of this study's dataset consist solely of the molar ratios of doping elements in STO, representing complete experimental data rather than the extensive computational material descriptors used in other literature.Moreover, the dataset contains a large amount of colossal dielectric constant data, which also contributes to the moderate performance of ML on this dataset.
The predictions of the GBR model, the optimal DNN model and the CNN model on the original dataset are illustrated in figure 7. The horizontal axis represents the true values ('Target') of the original dataset, while the vertical axis represents the predictions of the three models ('Prediction').In the figure, the solid red line represents the linear regression equation.The closer the cyan data points are to the yellow dashed line, the more accurate the predictions.The table in the figure displays the equations, intercept, slope, sum of squared residuals, Pearson's r value and sample size (N) corresponding to each type of image.However, in contrast to traditional ML models that rely solely on hyperparameter optimization, deep learning models, as demonstrated through the human-machine interaction framework proposed in this study, exhibit respectable predictive accuracy on the original dataset.Additionally, owing to their more intuitive model architecture, they show significant potential for performing data regression prediction on small datasets.

Small dataset
To further validate the robust fitting and generalization ability of DNN and CNN on small-sample datasets, we conducted a reselection process on the small-sample dataset by removing the doping components related to the giant dielectric constant, particularly the Mn element, resulting in a new small-sample dataset with reduced input features to 17 and a dataset size of 125 samples.Meanwhile, the GBR and CNN model architectures used in this section remained unchanged, with the number of input features or nodes decreasing with the reduction of input features.However, the DNN model was reconstructed using the human-computer interaction framework shown in figure 3a, which further demonstrates the versatility and user-friendliness of the framework proposed in this study.The DNN model was first pre-trained on the small-sample dataset, and figure 8 shows the variation of pre-training error with the increase of hidden layer node number.From the figure, it can be observed that the optimal number of hidden layer nodes is 5, with a corresponding MSE of 0.36397.Therefore, the optimal model structure for DNN with 1 hidden layer is 17-( 5)-1.Finally, GBR, DNN and CNN models were retrained on the small-sample dataset, and the performance on the final unseen dataset was tested, as shown in figure 9. Compared with all the models used on the original dataset, the predictive model for the small dataset has eliminated the ineffective one-layer DNN model.
In figure 9, the green portion on the left represents the predictive accuracy of the GBR model.It is observed that the GBR model has a lower MAE compared with the two-layer DNN model but higher than the three-, four-and five-layer DNN models, significantly higher than CNN.The RMSE values are the highest among the three models, and the R 2 value is the lowest.Additionally, in contrast to the GBR model applied to the original dataset, the GBR model on the small dataset exhibits a performance decline.This suggests that the GBR model has relatively poorer data fitting capabilities on small datasets compared with deep learning models.However, it still demonstrates stable predictive performance when compared with the unoptimized or unprocessed DNN (one-layer DNN).This indirectly confirms the scientific validity of using traditional ML models for material property prediction in the field of materials.While achieving stable predictive performance is important, especially considering the extensive time and economic costs associated with material computations, achieving higher data fitting capabilities on small datasets requires the integration of deep learning models, which are indispensable in the field of computer science for significant applications.The middle yellow section in figure 9 displays the changes in R 2 value, MAE and RMSE of the DNN model's predictions on the small dataset as the number of hidden layers increases.The results indicate that as the number of hidden layers increases from two to four, the DNN model's R 2 value rises from 0.85203 to its peak at 0.8986.However, as the number of hidden layers further increases to five, the R 2 value decreases to 0.88513.In the left graph, the patterns for RMSE and MAE are opposite.With an increase in the number of hidden layers from two to five, the error trends show a decrease followed by an increase.It is noteworthy that the RMSE reaches its minimum value with the four-layer hidden DNN model, while the MAE achieves its minimum with the three-layer DNN structure.However, the overall model accuracy is still not as high as that of the four-layer DNN model.In summary, the 17-(13-13-12-11)-1 structure in the DNN model exhibits the highest accuracy, which is consistent with the predictive results of the DNN model applied to the original dataset.
The right-hand side purple part of figure 9 represents the prediction accuracy and error of the CNN model.It can be observed that the CNN model has the smallest prediction errors in terms of MAE and RMSE among all the models and the highest R 2 value among all the models.This indicates that the CNN model has the best prediction performance for the small sample dataset of the STO system in this study.This not only proves that deep learning has a much better data fitting ability than traditional ML  models but also demonstrates that the CNN model has a better data regression ability for small sample datasets in the field of materials science compared with the DNN model.Table 4 shows the comparison of prediction accuracy and errors of the traditional ML model GBR, the best DNN model and the CNN model.Among them, the CNN model has the highest R 2 value, which is 0.9715, and the smallest prediction errors.The DNN model with the structure of 17-(13-13-12-11)-1, which contains four hidden layers, has the best accuracy with an R 2 value of 0.8986.Both deep learning models have higher testing accuracy than the GBR model, with improvements of 0.1423 and 0.0694, respectively.The prediction errors are significantly reduced compared with the GBR model, with the highest MAE value reduced by 46.2065 and the highest RMSE value reduced by 89.1024.Compared with the original dataset error results in table 3, the MAE and RMSE values of the four-layer DNN model are reduced from 605.093 to 91.2099 and from 1099.3045 to 115.9266, respectively.At the same time, the MAE and RMSE values of the CNN model's prediction results are reduced from 745.9119 to 48.617 and from 1239.3661 to 61.3909, respectively.Its error reduction is quite remarkable, with the R 2 value increasing from 0.84131 to 0.9715.This also demonstrates the powerful data fitting and predictive performance of deep learning models on small datasets.
Figure 10 shows the linear regression plots of the target values versus the predicted values of GBR, the best DNN model and the CNN model on the unseen test dataset.In the figure, the horizontal axis labelled 'Target' represents the actual values of dielectric constants in the test dataset, while the vertical axis labelled 'Prediction' corresponds to the predicted values from the GBR, DNN and CNN models.In the figure, the solid red line represents the linear regression equation.The closer data points are to the yellow dashed line, the more accurate the predictions.As shown in the figure, both DNN and CNN models exhibit strong data regression capabilities on this small sample dataset, with a significant reduction in errors compared with the original dataset.Moreover, both deep learning models have higher accuracy than the GBR model.

Conclusions
In this study, we have established a small sample dataset of SrTiO 3 -based perovskite materials with modified dielectric constants through doping.This dataset includes 'good' and 'bad' data, and we have successfully predicted the energy storage performance using CNN and DNN models, improving performance prediction accuracy.
A vast array of scattered experimental data is succinctly quantified through deep learning regression to represent the influence of doping element molar ratios on target properties, with these ratios serving as inputs for deep learning.Compared with the common traditional ML methods applied in the field of materials science, deep learning regression bypasses the need for intricate feature computation, extraction and selection.It achieves higher predictive accuracy and generalization performance for small materials domain datasets by relying on straightforward material element ratios.
Although DNN and CNN are mostly used on large datasets, they are still highly accurate models when dealing with small datasets in the field of materials.Small datasets are common in materials research, therefore, deep learning models are more suitable than traditional ML models for solving materials-related problems, such as the study of perovskite doping modification on room temperature dielectric constant in this paper.Constructing deep learning models for small datasets using more suitable and efficient methods is effective and necessary.This study demonstrates that deep learning models constructed with small datasets and human-computer interaction frameworks have huge application prospects in materials research, particularly in dispersed, multi-variate nonlinear problems.In comparison with the DNN model used for the small dataset of the STO system, this study also demonstrates that the CNN model exhibits lower errors, superior data fitting and regression capabilities.This suggests a novel modelling approach for future research in the field of materials science, particularly for small-sample studies.
Ethics.This work did not require ethical approval from a human subject or animal welfare committee.Data accessibility.The data used in this article and its references are provided in the electronic supplementary material [61].
Declaration of AI use.We have not used AI-assisted technologies in creating this article.Authors' contributions.Q.L.: data curation, investigation, methodology, writing-original draft; H.H.: writing-review and editing; H.L.: supervision.

Figure 1 .Figure 2 .
Figure 1.DNN model architecture diagram.The DNN model consists of an input layer, hidden layers and an output layer, with circles representing nodes in each layer.DNN, deep neural network.

Figure 3 .
Figure 3. Overall framework of this study.This schematic introduces the details of data collection and partitioning, as well as the process of model construction and prediction, including GBR, DNN, CNN and regression analysis of results.(a) Human-machine interaction learning framework for building DNN models.Human-based theoretical models built on knowledge.Pre-trained models are used to adjust model parameters and the number of nodes in hidden layers to achieve 'reconstructed' models.The DNN model obtains the optimal structure for this small sample dataset through pre-training and reconstruction.(b) Example of a three-layer DNN model architecture.The number of nodes in the input layer, hidden layers and output layer are 18, 10, 10, 9 and 1, respectively.(c) One-dimensional valid convolution calculation process in the convolutional layer of the CNN model.(d) CNN model construction process.The model architecture mainly includes the input layer, convolutional layer (+ activation function), pooling layer, fully connected layer (+ activation function) and output layer (regression layer).CNN, convolutional neural network; DNN, deep neural network; GBR, gradient boosting regression.

Figure 4 .
Figure 4. Pearson correlation coefficient diagram of the 18 variables in the dataset. 0

Figure 5 .
Figure 5. Data analysis of the dataset.Histograms of the 18 variables in the final dataset (sample size: 200), with statistical information such as mean, minimum, maximum and standard deviation displayed on the histograms.

Figure 6 .
Figure 6.Model error during the pre-training phase of the original dataset as the number of nodes in hidden layers varies.The optimal nodes are 5, 13, 8, 11 and 11, with corresponding errors of 0.84306, 0.006623, 0.004504, 0.00668 and 0.003511.

Figure 7 .
Figure 7. Prediction results of GBR, DNN and CNN models on unseen data in the original dataset.CNN, convolutional neural network; DNN, deep neural network; GBR, gradient boosting regression.

Figure 8 .
Figure 8. Model error during the pre-training phase of the small dataset as the number of nodes in hidden layers varies.The optimal nodes are 5, 9, 12, 13 and 12, with corresponding errors of 0.36397, 0.09005, 0.072151, 0.062461 and 0.079323.

Figure 9 .Figure 10 .
Figure 9.Comparison of prediction accuracy and error of GBR/DNN/CNN models on the small dataset.CNN, convolutional neural network; DNN, deep neural network; GBR, gradient boosting regression; MAE, mean absolute error; RMSE, root mean square error.
displays the minimum, maximum and average values of 18 variables in the final dataset.Most of the variables, except for Sr, Ti, O, Ba and Mg elements, do not have a good distribution and are not suitable for modelling.However, with the powerful data fitting and feature extraction capabilities of deep learning models, the connections among these variables can be discovered, and good regression performance can be achieved.Feature selection was conducted based on the Pearson correlation coefficient, as shown in figure4.Highly correlated features were discarded, and data with excessive missing parameters were deleted, resulting in a final dataset size of 200.

Table 1 .
Relationship between the number of hidden layers in the DNN model and results.Finally, the output layer performs regression prediction and calculates the MSE value of the model to obtain the final prediction result.The entire model training process is complete.

Table 2 .
CNN model architecture and parameter settings.

Table 3 .
Prediction accuracy and error comparison results of models on the original dataset.

Table 4 .
Prediction accuracy and error comparison results of the GBR model, optimal DNN model, and CNN model on the small dataset.