Prediction of Monomer Reactivity in Radical Copolymerizations from Transition State Quantum Chemical Descriptors

In comparison with the Q-e scheme, the Revised Patterns Scheme: the U, V Version (the U-V scheme) has greatly improved both its accessibility and its accuracy in interpreting and predicting the reactivity of a monomer in free-radical copolymerizations. Quantitative structure-activity relationship (QSAR) models were developed to predict the reactivity parameters u and v of the U-V scheme, by applying genetic algorithm (GA) and support vector machine (SVM) techniques. Quantum chemical descriptors used for QSAR models were calculated from transition state species with structures CH 3 –CHR• or •CH 2 –CH 2 R (formed from vinyl monomers CH=CHR + H•), using density functional theory (DFT), at the UB3LYP level of theory with 6-31G(d) basis set. The optimum support vector regression (SVR) model of the reactivity parameter u based on Gaussian radial basis function (RBF) kernel (C = 10, ε = 10 and γ = 1.0) produced root-mean-square (rms) errors for the training, validation and prediction sets being 0.220, 0.326 and 0.345, respectively. The optimal SVR model for v with the RBF kernel (C = 20, ε = 10 and γ = 1.2) produced rms errors for the training set of 0.123, the validation set of 0.206 and the prediction set of 0.238. The feasibility of applying the transition state quantum chemical descriptors to develop SVM models for reactivity parameters u and v in the U-V scheme has been demonstrated.


Introduction
The relationship between the composition of a binary mixture of the monomer feed and that of the resulting copolymer is one of the most important aspects in copolymerization studies [1] .For the copolymerization of monomers 1 and 2 (or a radical M 1 with a monomer M 2 ), the copolymer composition equation can be expressed as [1,2] p m 12 m 21 m where R m is the ratio of [M 1 ] to [M 2 ] in the monomer mixture and R p is the ratio of [M 1 ] to [M 2 ] in the polymer formed, r 12 and r 21 are the monomer reactivity ratios.Therefore Equation 1 is extremely useful in predicting and controlling the composition of any copolymer produced from any pair of monomers at any concentration ratios [1,2] .But Equation 1 may be limited because of the shortage of the values of r 12 and r 21 .The Q-e scheme can be used to estimate the monomer reactivity ratios with following equations [1][2][3] [ ] [ ] where Q 1 and Q 2 denote the conjugative effects of M 1 and M 2 respectively, e 1 and e 2 describe their respective polarity.Alfrey and Price [4] assumed that the parameter Q may reflect the general reactivity of a monomer (or a radical), that is, the energetic property or the thermodynamic property, as it governs reactivity in all chemical processes.In addition, the parameter e may reflect the supposed permanent electric charge resulting in mutual attraction or repulsion between the two monomers (or radicals).Published studies show that the parameter Q is dependent on the reaction free energy of the free-radical reaction and the electronegativity of the monomer (or the average electronegativity of the monomer and the radical); and the parameter e is related to the electronegativity of the monomer or both the monomer and the corresponding radical [1][2][3] .Although very widely used, the Q-e scheme has serious shortcomings.For example, the assumption that permanent electric charges exist on all the species involved, including hydrocarbons, is very unlikely.Moreover, the assumption that the polarity of a monomer being identical to that of the corresponding radical derived from that monomer is under debate [1,[3][4][5] .

-Prediction of monomer reactivity in radical copolymerizations from transition state quantum chemical descriptors
Recently, the Revised Patterns Scheme, the U-V scheme, has greatly improved both its accessibility and its accuracy, which can be expressed by Equation 4 [3][4][5][6] .
where r 1s is the monomer reactivity ratio of the monomer 1 (M 1 ) and styrene; u 2 , v 2 , and π 1 are the counterparts of e 2 , Q 2 , and e 1 in the Q-e scheme, respectively.Thus, u 2 represents the polarity of the double bond in the monomer, arising from the influence of substituents, and accounts for electronic effects, dipole effects, or specific interactions between monomers.v 2 describes the intrinsic reactivity of the monomer M 2 (i.e., the energetic property or the reaction free energy of the free-radical reaction).In addition, the U-V scheme can be used for the prediction of transfer constants (C 2 ) by using the relationship: The U-V scheme may be limited when u and v values of the monomer of interest are unknown.Therefore, the development of reliable quantitative structure-activity relationship (QSAR) models for the prediction of the basic parameters u and v is of real interest, particularly for new monomers for which experimental investigation would be expensive.QSAR approaches can conserve resources and accelerate the process of development of new molecules [7][8][9][10][11] .Yi et al. developed QSAR models for parameters u and v with quantum chemical descriptors calculated from radicals C 1 H 3 -C 2 HR 3 •.Correlation coefficients for the training sets were 0.941 for the parameter u and 0.947 for the parameter v; and correlation coefficients for the test sets were 0.947 for u and 0.934 for v [12] .
Reactivity parameters, such as u and v, are related to the reaction rate constants and activation energies.This means molecular descriptors from the transition state complexes C 1 H 3 -C 2 HR   3 ) and predict the u and v values in the U-V scheme.

Methods
Tables 1 and 2 shows experimental values of parameters u and v of 50 vinyl monomers with structures C 1 H 2 =C 2 HR 3 [6] .The entire set of reactivity parameters u ranged from -3.50 to 1.18 and v ranged from -2.06 to 1.44.Moreover, the entire sets were characterized by a high degree of structural variety.For example, the monomers included halides, ketones, sulfides, esters, ethers, aromatic rings, and so on.The experimental data of 50 reactivity parameters u and v in Tables 1 and 2 were randomly divided into three sets: a training set (30 monomers, Nos.1-30), a validation set (10 monomers, Nos.31-40), and a test set (10 monomers, Nos.41-50).
The training set was used to build models, the validation set was used to optimize the parameters of models, and the test set was used to evaluate the prediction ability.
The transition state complexes 3 ) derived from the addition of vinyl monomers (C 1 H 2 =C 2 HR 3 ) with the radical H• were fully optimized and calculated with density functional theory (DFT) in Gaussian 09 program (Revision A.02), at the UB3LYP level of theory with 6-31G(d) basis set.Frequency calculations show that each transition state complex had a single imaginary vibrational frequency [13] .
Totally, 23 quantum chemical descriptors [14][15][16] were calculated for each transition state complex.These descriptors include the average molecular polarizability (α), the total dipole moment (µ), the energies of the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO) of alpha spin states (E αHOMO and E αLUMO ), the energies of HOMO and LUMO of beta spin states (E βHOMO and E βLUMO ), the energy gap between HOMO and LUMO of alpha spin states (E αg ), the energy gap between HOMO and LUMO of beta spin states (E βg ), Mulliken atomic charges of C 1 , C 2 and X 3 (Q MC 1, Q MC 2 and Q MX 3), Mulliken charges of C 1 , C 2 and X 3 with hydrogens summed into heavy atoms (q MC 1, q MC 2 and q MC 3), Mulliken atomic spin densities (D MC 1, D MC 2 and D MX 3), atomic polar tensor (APT) charges (Q AC 1, Q AC 2 and Q AX 2), and APT charges with hydrogens summed into heavy atoms (q AC 1, q AC 2 and q AX 3).Here X 3 is the atom joining directly to C 2 .The descriptor α was defined as: Where α xx , α yy , and α zz are principal components of the polarizability tensor and can reflect electric perturbation in the x-, y-, and z-coordinates.APT charge on an atom is related to trace of the corresponding tensor of derivatives of dipole moment with respect to Cartesian coordinates of that atom [17] .
Support vector machine (SVM) is a set of learning algorithm mainly used to resolve the classification and regression problem [8,9,[18][19][20][21][22][23] .In SVM, systems use the input data into a high dimensional feature space and subsequently carry out the linear regression in the feature space.For a given data set (x 1 , y 1 ), (x 2 , y 2) , …, (x l , y l ), where x i ∈ R n , y i ∈ R (i = 1, 2,…, l), the linear critical function of support vector regression (SVR) is listed as below [10,18,19,22] : where n is the total number of input-output pairs, ϕ(x) is called as the feature mapping function, x is the input space, f(x) is the output, and w and b are the coefficients.SVR problem is equivalent to the solution of quadratic convex programming: subject to: * ( ) where C is a regularized constant determining the tradeoff between the training error and the model flatness, ε is Thus, Equation 7 can be rewritten as: where α i and α i * are the introduced Lagrange multipliers.Through selecting the appropriate kernel function, the entire problem can be solved in the input space itself: where s is the number of input data having nonzero values of (α i and α i * ).The kernel function K(.,.) must satisfy the condition of Mercer's theorem so that it corresponds to some type of inner product in the high-dimensional feature space.In general, the Gaussian radial basis function (RBF) is taken as the kernel function of SVM models: The adjustable parameter γ plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand.So three parameters C, ε and γ in ε-SVR models should be adjusted.All SVM models from the present paper were obtained with winSVM (http:// www.cs.ucl.ac.uk/staff/M.Sewell/winsvm/).Genetic algorithms (GA) are optimization algorithms that mimic natural biological evolution.At each generation, a new set of approximations is created by the process of selecting individuals according to their level of fitness and breeding them together using genetic operators inspired by natural genetics, i.e. random mutation, crossover and selection procedures.This process leads to better models or solutions from an originally random starting population or sample [24,25] .GA together with multiple linear regression (MLR) analysis has become an effective and powerful tool in selecting variables for QSARs.Thus GA-MLR technique in the BuildQSAR program [26] was used in this work.Next, the optimal descriptor sets were used as the input files of SVM models.
The accuracy of a model was evaluated with the rootmean-square (rms) error, which can be expressed as 2 (  ) where f i is the calculated value, y i is the experimental value for the ith monomer and N is the total number of samples used.The smaller the rms value, the more accuracy the model will be.

Results and Discussion
By analyzing the parameters u and v with respect to the 23 descriptors with the GA-MLR technique in the BuildQSAR program [26] , respective optimal subset of descriptors in the model of parameters u and v was obtained.The optimal subset of descriptors for the parameter u comprises the average molecular polarizability (α), the HOMO energy of alpha spin states (E αHOMO ), and APT charge of C 1 atom (Q AC 1).The optimal descriptor subset for the parameter v consists of Mulliken atomic charge of X 3 (Q MX 3), Mulliken atomic spin density of X 3 (D MX 3), and the energy gap between HOMO and LUMO of beta spin states (E βg ).The values of these descriptors are shown in Tables 1 and 2. The definitions and standardized coefficients of these descriptors in each MLR model are listed in Table 3.A larger absolute value of beta coefficients means the corresponding descriptor is more significant.Thus, E αHOMO and E βg are the most significant descriptors in the models of parameters u and v, respectively.
The parameter u denotes the polarity of a monomer.A large u value means less polarity of a monomer.The frontier molecular orbital descriptors, such as E HOMO , E LUMO , and E g ( = E LUMO -E HOMO ) play major roles in governing many chemical reactions [7] .These descriptors were used widely in describing molecular reactivity, stability [27,28] , or polarity [7] .According to the frontier molecular orbital theory of chemical reactivity [7] , E HOMO describes the susceptibility of the molecule toward attack by electrophiles and thus is correlated with the ionization potential; and E LUMO , characterizing the susceptibility of the molecule toward attack by nucleophiles, is directly related to the electron affinity.A higher E HOMO value means the stronger electron-donating ability and the smaller electronegativity [2] , which results in a smaller electronic effect and molecular polarity.Thus E αHOMO is positively correlated with the parameter u.
Local electron densities or charges are important in many chemical reactions and physicochemical The energy gap between HOMO and LUMO of beta spin states.
-0.419 properties of compounds.They are also used widely for the description of the molecular polarity of molecules [7] .Molecular polarity is dependent on bond polarity and the molecular geometry.Generally, for vinyl monomers C 1 H 2 =C 2 HR 3 , the APT charge of C 1 , Q AC 1 is less than the charge of C 2 , Q AC 2. A monomer with the larger descriptor Q AC 1 suggests that the polar bonds (i.e., the double bond in the monomer) is relatively evenly (or symmetrically) distributed, which results in a less molecular polarity.So it is easy to understand that the descriptor Q AC 1 is positively related to the parameter u.The last descriptor appearing in the model of u is α, i.e., the average polarizability.α increases with the size of the species either as a result of an increase with the number of electrons or by the expansion of the molecular radius.A large α indicates a large size of substituent group R 3 in a vinyl monomer, which may lower the molecular symmetry and lead to a large molecular polarity and a small parameter u.Therefore, α is negatively correlated with the parameter u.
The parameter v describes the intrinsic reactivity of a monomer.A high reactive monomer, that has a large conjugative effect and a large v value, may lower the activation energy gained on adding the radical to the double bond of the monomer.Table 3 shows that v increases with decreasing E βg .The reason is that a large E g means high stability for the molecule in the sense of its lower reactivity in chemical reactions [27] .A transition state species ( 3 ) possessing a small E βg value suggests that the corresponding monomer is prone to forming a transition state structure and has a large parameter v value.As stated above, atomic charge descriptors can reflect molecular chemical reactivity (or intermolecular interactions) [7] .A large Q MX 3 (Mulliken atomic charge of X 3 ) or small D MX 3 (Mulliken atomic spin density of X 3 ) implies that the monomer is relatively easy to form a transition state structure and has a large v value.Thus, both Q MX 3 and D MX 3 are related to the reactivity parameter v.
The program winSVM was used to develop SVM models for u and v.In order to get satisfactory models, the regularized constant C, the width of the non-penalized tube ε and the bandwidth parameter γ of the RBF kernel function should be selected properly [22] .We take the training of SVM models of u as an example.Firstly, the training set of u (in Table 1) was selected as the input file to obtain 100 models after 100 iterations.The initial optimization results show that a model with SVM parameters of C = 10, ε = 10 -4 and γ = 1.0 produced a low rms error.Thus, these SVM parameters were used for further optimization with the validation set.By training the SVM models of u with different γ values of 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, and 1.4 under the condition of C = 10 and ε = 10 -4 , the validation set produced the rms errors of 0.403, 0.353, 0.327, 0.345, 0.373, 0.390, and 0.405, respectively.Thus, the optimal γ corresponding to the minimal rms error (0.327) was set to 1.0.Subsequently, by applying γ = 1.0 and ε = 10 -4 , the second parameter C was optimized with C being equal to 7, 8, 9, 10, 11, 12, and 13.The validation set rms errors based on different C are 0.395, 0.362, 0337, 0.327, 0.328, 0.334, and 0.343, respectively, so the optimal C was equal to 10.Similarly, the third parameter ε under the condition of C of 10 and γ of 1.0, was optimized with ε = 10 -6 , ε = 10 -5 , ε = 10 -4 , ε = 10 -3 , ε = 10 -2 , and ε = 10 -1 .The validation set rms errors are 0.326, 0.326, 0.327, 0.327, 0.331, and 0.388, respectively.Thus, the optimal ε equals 10 -5 .
In the end, the optimum ε-SVR model of u with the RBF kernel (C= 10, ε = 10 -5 and γ = 1.0) was tested by the prediction set in Table 1.The u values calculated with the optimal SVR model are listed in Table 1 and   The optimal ε-SVR model for v produced rms errors for the training set of 0.123, the validation set of 0.206 and the prediction set of 0.238.The mean rms error and correlation coefficient for the parameter v of 50 monomers are 0.170 and 0.981, respectively, which are comparable to the values of existing models [12] .The calculated v values from the optimal SVR model are listed in Table 2 and depicted in Figure 2.
It should be noted that there are significant experimental errors for reactivity parameters such as Q, e, u, and v.For example, as long as the correlation coefficient R between the experimental and calculated e values is greater than 0.876 (rms = 0.326), then a good fit has been achieved [2,12] .This means these QSAR models of reactivity parameters (u and v) are acceptable if their correlation coefficients are close to or above 0.9.Our models in this paper have correlation coefficients of 0.972 for u and 0.981 for v, denoting that our results for the model e are satisfactory and acceptable.

Conclusions
QSAR models of the reactivity parameters u and v in the U-V scheme used for the prediction of reactivity ratios and transfer constants for vinyl monomers in radical copolymerization were developed, by applying GA and SVM techniques.Quantum chemical descriptors used to build SVR models, were calculated from transition state species with structures C 1 H 3 -C 2 HR 3 • or •C 1 H 2 -C 2 H 2 R 3 , formed from vinyl monomer C 1 H 2 =C 2 HR 3 + H•.The models were proved to be accurate with mean rms errors of 0.272 (R = 0.972) for u and 0.170 (R = 0.981) for v, which demonstrate that calculating descriptors from transition state structures to develop SVM models for reactivity parameters u and v is feasible.

depicted in Figure 1 .Figure 2 .
Figure 2. Plot of the experimental versus calculated v values.

Figure 1 .
Figure 1.Plot of the experimental versus calculated u values. 3

should be related to u and v parameters. The purpose of this work is to calculate quantum chemical descriptors from transition state structures (C 1 H 3
• or •C 1 H 2 -C 2 H 2 R 3 (formed from the vinyl monomer + H•)

Table 1 .
Transition state descriptors and u values for 50 monomers.

Table 2 .
Transition state descriptors and v values for 50 monomers.are positive slack variables for the data points.Usually, SVR uses the ε-insensitive loss function to measure the empirical risk (training error): *

Table 3 .
Descriptors selected for models, meaning and beta coefficients.