Design of cinnamaldehyde amino acid Schiff base compounds based on the quantitative structure–activity relationship

Cinnamaldehyde amino acid Schiff base (CAAS) is a new class of safe, bioactive compounds which could be developed as potential antifungal agents for fungal infections. To design new cinnamaldehyde amino acid Schiff base compounds with high bioactivity, the quantitative structure–activity relationships (QSARs) for CAAS compounds against Aspergillus niger (A. niger) and Penicillium citrinum (P. citrinum) were analysed. The QSAR models (R2 = 0.9346 for A. niger, R2 = 0.9590 for P. citrinum,) were constructed and validated. The models indicated that the molecular polarity and the Max atomic orbital electronic population had a significant effect on antifungal activity. Based on the best QSAR models, two new compounds were designed and synthesized. Antifungal activity tests proved that both of them have great bioactivity against the selected fungi.


Introduction
Primary and opportunistic antifungal infections are a severe threat to human life and health [1]. As fungal resistance increases, many antifungal compounds have become ineffective [2]. It is therefore necessary to explore new, novel antifungal formulations to control fungal infections [3]. Natural products and modified natural-derived compounds have continued to play  a highly significant role in the discovery of antifungal agents [4]. Researchers have modified natural, antifungal compounds to meet key requirements for practical applications. Cinnamon oil is a natural, antifungal substance and its main component is cinnamaldehyde [5]. Numerous studies have reported that cinnamaldehyde could inhibit the growth of the pathogenic microorganisms Aspergillus niger, Trametes versicolor and Staphylococcu saureus [6]. Cinnamaldehyde also exhibited potential anti-tumour [7] and anti-diabetes [8] properties. Also, cinnamaldehyde is generally recognized as safe and is allowed as a food additive or antimicrobial agent by the US FDA (Food and Drug Administration) [9]. However, cinnamaldehyde as either an antimicrobial agent or food additive has many practical limitations largely due to its high volatility and strong odours [10].
Hence, many researchers have shifted their attention to cinnamaldehyde derivatives. Sharma et al. [11] synthesized cinnamaldehyde derivatives and cinnamaldehyde Schiff base. The results indicated that the presence of a methoxyl group on cinnamaldehyde benzene ring and cinnamaldehyde Schiff base led to a noticeable improvement in antifungal activity. Cinnamaldehyde Schiff base is an important class of cinnamaldehyde derivatives with excellent bioactivity and can be synthesized using a simple method [12]; the synthesis route is shown in figure 1. The bioactivity of cinnamaldehyde Schiff base compounds has been reported by many researchers. Zahan et al. [13] studied the dithiocarbazata cinnamaldehyde Schiff base compound and the metal complex compound. The bioactivity test showed that cinnamaldehyde Schiff base and metal complex exhibited comparative activity to cinnamaldehyde. Wei et al. [14] published a research on cinnamaldehyde amino acid Schiff base. Results indicated that Schiff base compounds were more active than the reference benzoic acid against Bacillus subtilis, Escherichia coli and Saccharomyces cerevisiae. Hence, it is meaningful to explore and design new cinnamaldehyde Schiff base compounds with favourable bioactivity. In a previous study, the antimicrobial activity of several cinnamaldehyde amino acid Schiff base compounds were studied [15]; the antimicrobial activity results implied that cinnamaldehyde amino acid Schiff base compounds possessed excellent antifungal activity, good water solubility and an odour. Cinnamaldehyde amino acid Schiff base has potential to be an antifungal agent. After an initial analysis regarding the structure and activity, its antifungal activity was found to be significantly influenced by its chemical structure. A comprehensive study on the relationship between activity and compounds should be conducted for designing the new cinnamaldehyde compounds. One approach is to design compounds using computer applications such as quantitative structure-activity relationship (QSAR) [16]. QSAR provides a mathematically quantified relationship between a molecule's structural descriptors and a compound's bioactivity at the molecular level, and can predict the activity of compounds including those not yet synthesized [17]. Using this approach, there is no need to synthesize each compound to discover those that possess the desired activity. Promising compounds can be further screened for synthesis in the laboratory.
This paper focuses on the use of QSAR for cinnamaldehyde amino acid Schiff base compounds to present a comprehensive analysis on the relationship between the bioactivity and structures of cinnamaldehyde amino acid Schiff base (CAAS) compounds. Under the guidance of QSAR models, two designed cinnamaldehyde compounds were synthesized and their antifungal activities were determined.

Determination of antifungal activity
The antifungal activity of all CAAS compounds was determined according to the Paper Disc Method against A. niger and P. citrinum. In brief, the procedure is as follows. Potato dextrose agar (PDA) medium with 2% agar was prepared and sterilized for use. The sterilized Petri dishes and PDA medium were sterilized under UV-irradiation for 20 min. The strain suspension was molten medium which was thoroughly mixed and then poured into the Petri dishes and allowed to solidify. The autoclaved discs (approx. 8 mm) were dipped into the test solution (concentration: 0.125 mol l −1 ) for 10 min. After that, the discs were put onto the surface of the solid media strain suspension mixture. The test samples were cultured at 28°C for 2 days. All tests were carried out in triplicate and the diameter of the inhibition zone was the average of those of the three test zones.   In this experiment, a well-known commercial antifungal compound fluconazole served as control. The antifungal activity rate (AR) was calculated using the followed equation: where dT and dC were the diameter of the inhibition zone for the test compounds and fluconazole, respectively. The antifungal activity rates (ARs) and the log 10 AR (lgAR) of all CAAS compounds are listed in tables 1 and 2. The lgAR was used to compute the relationship between antifungal activity and structure of cinnamaldehyde compounds. The values of dC for the control compound fluconazole were 18.7 mm and 13 mm against A. niger and P. citrinum, respectively.

The method used for quantitative structure-activity relationship calculations
The three-dimensional structures of the compounds were drawn using the Chembio 3D 12.0 software, and the chemical structures were imported for geometrical optimization using the AMPAC Agui 9.2.1 software [18]. Secondly, the output file of the 21 compounds' structural information and the lgAR were imported in the CODESSA 2.7.16 software to compute molecular descriptors. Then, a 'best multilinear regression' function was conducted in the CODESSA 2.7.16 software to calculate the regression relationship between chemical structures and antifungal activity. Then, the number of descriptors and the optimal QSAR models were determined by the 'breaking point' rule [19] (The squared correlation (R 2 ) of the model dramatically increased over the number of descriptors, but after a certain point, the increase is less significant [20]. That point is the 'breaking point'.) To analyse the descriptor, charge distribution and density of cinnamaldehyde compounds were calculated by the

The validation of optimal models
Model validation was conducted using a process of internal validation and 'leave one out' crossvalidation [21]. In short, 21 compounds were classified into three small groups termed a, b and c with seven compounds in each group. Each of two small groups were combined as the training set A (a + b), B(a + c), C(b + c), and the correspondingly remaining groups c, b and a were considered the test set. Using the same descriptors of the best model, 'multilinear regression' was conducted to obtain a regression model for the training set and this model was used to predict the lgAR of the test set. The statistical results, correlation coefficient (R 2 ), Fisher value (F) and standard deviation (s 2 ) are listed in table 4. The 'leave one out' cross-validation is similar to the internal validation, which was conducted as follows. Every fourth compound was set as the external test set d (4, 8, 12, 16 and 20) and the other compounds were considered to be the training set D. Similarly, a training set model was obtained by computing the multilinear regression with the same descriptors of the best model. The obtained training set model was used to predict the corresponding external test set.

Results and discussion
3.1. Establishing the optimal quantitative structure-activity relationship models A series of QSAR models were obtained after performing the 'best multilinear regression' procedure. The optimal model was determined by a rule called 'breaking point' shown in figure 3, which was implemented by analysing the plot of the descriptors of the obtained models versus squared correlation corresponding to those models. In figure 3, the squared coefficient increased rapidly until the point corresponding to four descriptors. After this point, the increase in value of the squared coefficient was not as great. Hence, this point is the 'breaking point', and the QSAR model corresponding to the breaking point is regarded as the optimal QSAR model. Additionally, the number of descriptors should meet the requirement of multilinear regression: where N is the sample number (21) and the D is descriptor number of the final QSAR models [22]. Therefore, the optimal QSAR models were selected using four descriptors. The value of each descriptor of the optimal models is listed in tables 1 and 2. These four descriptor parameters and statistical data corresponding to the optimal QSAR models were listed in table 3, and the definition and analysis for each descriptor parameter are presented in Results and discussion section. According to the statistical data of the optimal models, the optimal QSAR models were described as fit multilinear regression equations (3.2) and (3.3). In the equations, descriptor parameters (D) are the independent variables, and the lgAR is the calculated value of compounds. For CAAS compounds, the predicted value (lgAR calc ) was calculated according to the above equation and the relationships between experimental value (lgAR exp ) and predicted value (lgAR calc ) are presented for A. niger and P. citrinum, respectively (figure 4). In figure 4, the lgAR exp and lgAR calc fit in a line y = x, with R 2 of 0.9572 and 0.9301 against A. niger and P. citrinum, respectively, which implied that the best QSAR models possessed good predictability. In table 4, all the validation results are satisfactory. The average of the statistical results was very close to the best model.         against A. niger, there were four structural descriptors that apparently affected the antifungal activity of the CAAS compounds. The most statistically significant descriptor was the polarity parameter/square distance, D 1 . This is an electrostatic descriptor defined by the following equation [23,24]: where Q min and Q max are the most negative and the most positive atomic partial charges in the molecule, respectively, and R mm is the distance between the most positive and the most negative atomic partial charges in the molecule. The polarity parameter reflects the polarity and characteristics of the charge distribution of the molecule. A compound with proper polarity can smoothly penetrate a fungal cell wall or cell membrane and interact with an active target. A change was observed on the value of P when the cinnamaldehyde compounds' structure changed, for example compound 7 had a P value of 0.1249. The charge distribution changed when -OCH 3 was introduced on the benzene ring ( figure 5). This charge distribution led to an increase in the value of Q min and a decrease in P (2.5943 × 10 −3 ). The second descriptor is relative negative charge (RNCG) (D 2 ), and it is a quantum-chemical descriptor. RNCG is defined as the most negative charge divided by the total negative charge [25]: where Q − max is the most negative charge and Q − is the total negative charge. As shown in  cinnamaldehyde compounds; for instance in compounds 2 and 3, the only difference in structure is the benzene ring 4-substituent. Compound 2 has a methoxyl group and compound 3 has a chlorine atom in the p-position on the benzene ring. These differences in substituent groups lead to a lower RNCG value for compound 3 (0.1624) than for compound 2 (0.2546). The charge distribution of optimal structures of compounds 2 and 3 by Gaussian 09 could explain it (figure 6). In figure 6, the oxygen atom in the methoxyl group had the most negative charge (−0.554) in compound 2; the total negative charge and the most negative charge changed when the substituent group changed to a Cl atom. The most negative charge of compound 3 was −0.495.
The third most important descriptor is the ESP-HA-dependent HDCA-1 (H-acceptor dependent H-donor charged surface area, D 3 ), which is a quantum-chemical descriptor [26] that represents the hydrogen bonding donor ability of the CAAS compounds [27]. The formation of hydrogen bonding of CAAS compounds is easier as HDCA-1 increases [28]. In equation (3.2), a negative coefficient for HDCA-1 demonstrates the ability of the CAAS compounds to form hydrogen bonds which might be detrimental to antifungal activity.
The last parameter is maximum total interaction for a C−O bond (D 4 ), which is a semi-empirical descriptor that could be used to measure the bond strength between the two atoms involved [29]. A positive coefficient implied that the strength of the C−O bond had a positive contribution to antifungal activity of cinnamaldehyde compounds against A. niger.
According to the optimal QSAR models against P. citrinum, the most statistically significant descriptor was the maximum atomic orbital electronic population (D 5 ). It is an electrostatic descriptor and an index of nucleophilicity for cinnamaldehyde compounds [30]. The positive coefficient in the model indicated that the increase in D 5 denoted antifungal activity of cinnamaldehyde compounds against P. citrinum.
The second descriptor is maximum electrophilic reactivity index for a C atom (D 6 ), which is a quantum-chemical descriptor [31] that reflects the electrophilic reactivity of the C atom on cinnamaldehyde compounds. For a given atomic species A, the maximum electrophilic reactivity index for an A atom was defined as [32] where ε LUMO is the energy of the lowest unoccupied molecular orbital (LUMO). Here, C LUMO,i is the ith orbital coefficient of atom A on LUMO. Such summation is conducted over all valence atomic orbitals i in atom A(i = l . . . n A ). In the best QSAR model against P. citrinum, a positive coefficient indicates that the antifungal activity increased as the magnitude of D 6 increased. The third descriptor parameter is PNSA-2 total charge weighted PNSA (D 7 ). This descriptor is defined as the total negative charge multiplied by partial negative solvent-accessible surface area, which indicates the influence of negative charge distribution on the antifungal activity of cinnamaldehyde compounds [33].
The last descriptor is the maximum 1-electron reactivity index for an O atom (D 8 ), which is a quantumchemical descriptor [34]. It is an important descriptor parameter selected by 'best multilinear regression' from about 400 descriptors. In equation (3.3), D 8 had a positive coefficient showing that increase in the magnitude of D 8 will increase the antifungal activity of the cinnamaldehyde compound against P. citrinum.

Design of new compounds
According to analysis results of the two best QSAR models, the most important factors for antifungal activity were the polarity parameter (D 1 ) and the maximum atomic orbital electronic population (D 5 ) against A. niger and P. citrinum, respectively. For the most important structural parameter of the QSAR  model against A. niger, some special structural factors like the number of COO − and substituent groups on the benzene ring significantly decreased the value of the polarity parameter, and this decrease was very beneficial to increase antifungal activity. Hence, these special structural factors were chosen as the structural characteristics on the new design compounds. In addition, the number of COO − groups of cinnamaldehyde Schiff base compounds will change the polarity of the cinnamaldehyde compounds; that is the increase in COO − groups will result in an increase in the polarity of the compounds, which is beneficial to increase the water solubility of cinnamaldehyde compounds and enlarge their applied field. With regard to the most important structural parameter of the QSAR model against P. citrinum, the value of D 5 of the cinnamaldehyde compound was positive, contributing to the antifungal activity against P. citrinum. The structural characteristics like the number of COO − groups and halogen atoms increased the value of D 5 obviously. However, it is generally believed that halogenated hydrocarbon possesses high toxicity [35]. This structural characteristic was not considered in design compounds. Above all, the structural characteristics of the number of −COO − and −OCH 3 groups were selected as the key character factors in the design of compounds. Hence two new compounds were designed and synthesized. Structural characterization results are shown as follows. The structures of the new designed compounds are shown in figure 7; the AR of new compounds was determined by the method described in the Material and methods section, and the results are listed in table 5.
The predicted lgAR of new compounds was obtained by the following steps. First, the structures of new compounds were drawn and inputted into the AMPAC 9.21 software to geometry-optimize them and the optimized structure files were saved. Then, the optimized structure files were inputted into the CODESSA 2.7.16 software to calculate the molecular descriptors. Finally, a predict function was conducted in the condition of the best model, and the calculated value (Cal.lgAR) was obtained and listed in table 5. In table 5, the Cal.lgAR value of two compounds for both fungi were greater than that of the control compound fluconazole. The Cal.lgAR value of compound Da against A. niger was the greatest among those of all the compounds used for establishing the model. The experimental results of antifungal activity have shown that new compounds exhibited better bioactivity than compounds listed in table 1. From table 5, the Exp.lgAR value was very close to that of Cal.lgAR for both new compounds against the two fungi.
The average of absolute error and the relative error were 0.0545 and 2.55% against A. niger, and 0.2374 and 11.55% against P. citrinum. These small errors implied that two best QSAR models had good predictability and were satisfactory. From another perspective, two designed compounds could be treated as the external test set to validate the best QSAR models. Small errors indicated that these two QSAR models were reliable.

Conclusion
Two QSAR models of CAAS compounds against A. niger and P. citrinum with good statistical results were obtained and validated. The definition and analysis of the important descriptor parameters implied the chemical structural characteristics which influenced antifungal activity. The results indicated that molecular polarity and negative charge distribution of cinnamaldehyde compounds were important influences on antimicrobial activity. By analysis of the descriptor parameters of these two models, some guidance was obtained on chemical structure for the design of new cinnamaldehyde compounds. Two designed compounds exhibited excellent antifungal activity against both fungi and the experimental values were very close to the predicted values. All the results indicated that two best QSAR models possessed good predictability.
Ethics. Our research does not require any ethical approval from a local ethics committee because we carried out our work based on other sources.
Data accessibility. The datasets supporting this article have been uploaded as part of the electronic supplementary material.