** Burkhard Kirste**

Institut für Organische Chemie der Freien Universität Berlin, Takustr. 3, D-W-1000 Berlin 33, Germany

* Keywords:* EPR; spectrum fitting; computer analysis

Several methods have been proposed which may aid in the analysis of high-resolution EPR spectra: cepstral analysis [1,2], spectrum contraction (successive elimination of HFCs) [3,4], extraction of HFCs [5], correlation methods [6-9], maximum entropy method [10], significance plots [11,12] and "roll-up" transformation [13]. These methods will be more or less useful in providing estimates for the hyperfine coupling constants. However, the obtained set of parameters should be verified by spectrum simulation and comparison with the experimental spectrum. Moreover, plain simulations are of rather limited use, since minor errors in parameters (linewidths, HFCs) may lead to a poor match with the experimental spectrum. Therefore spectrum fitting, providing an automated refinement of the input parameters, is considered an indispensable component in any scheme for automated spectrum analysis.

Spectrum fitting programs require a module capable of providing a spectrum simulation based on a model which is physically adequate to the particular problem under study, a criterion for judging the goodness of fit, and an algorithm for improving the initial set of parameters. The method is based on the assumption that the "best" parameters will produce a simulation with an absolute minimum of the error function; it should be noted, however, that an unambiguous solution is not always achievable, particularly when the experimental spectrum is poorly resolved. A variety of more or less sophisticated techniques has been developed for that purpose. In principle, the problem consists in performing a nonlinear least-squares fit in the multidimensional parameter space. Common procedures are the Marquardt or multidimensional Gauss-Newton method [14,15], the simplex algorithm of Nelder and Mead [16,17], Powell's method [18] and an evolutionary Monte Carlo method [19]. General problems of fitting algorithms are failure of convergence or convergence to an undesired local minimum.

In the present paper the current status of two programs designed for the iterative least-squares fitting of high-resolution EPR spectra with constant or varying linewidths, EPRFT and HFFIT [19], will be described. Moreover, schemes for an automated analysis of EPR spectra by using these programs will be discussed.

All spectra were transferred by PC-NFS to a local workstation (SUN SPARC station IPC (4/40), 25 MHz, FPU). Spectra were evaluated by means of program DATA, yielding hyperfine couplings and linewidths from ENDOR spectra and an autocorrelation from EPR spectra. EPR spectrum simulations were performed on the workstation or on an AT-486 by means of the programs EPRFT and HFFIT. For comparison, the simulation programs were run on an Atari Mega ST4, a VAX station 3100, an IBM RT-6150, an IBM RS/6000 (Model 530) and a CDC-4680 (MIPS 6000).

Whereas EPRFT and HFFIT were originally programmed in Fortran 77,
employing machine-dependent (HP-1000/A600) and therefore
non-portable file-handling subroutines, recent versions have been
ported to the C language
[22] and all current programming
is done in C. Specifically, the compilers Turbo C 2.0 for the
Atari ST, Borland C++ for IBM PC and different variants of UNIX C
have been employed. Although somewhat inferior to Fortran
with respect to purely numerical work, C offers much more flexibility
regarding user interfaces and can be adapted to different platforms
by means of conditional compilation (making the source code a bit
ugly, though).
- Programs are available by anonymous FTP from
`ftp.fu-berlin.de` or from the author
on request.

For convenience, calculations are usually performed using the cgs unit G. Conversion of units: 1 G = 0.1 mT = 2.80247 MHz *(g/g_e), where g and g_e denote the g values of the radical and of the free electron, respectively.

As an introductory example, EPR and ENDOR spectra of Coppinger's
radical are depicted in
fig. 1. Note that both are absorption
spectra, but in the first derivative representation.
The lineshape of individual signals is usually (first-derivative)
Lorentzian or Gaussian or some mixture of these.
The hyperfine pattern of the EPR spectrum, a doublet of quintets (with
two overlapping lines at the center), is due to hyperfine interaction
of the unpaired electron with the central methine proton and four
equivalent ring protons; the hyperfine splitting due to the 36
* tert*-butyl protons is not resolved. The two hyperfine
coupling constants (HFCs) can be extracted easily (in units of mT)
from the line spacings within the two quintets and the separation
of the two quintets, respectively. Note that the smallest HFC is
always accessible as the distance of the outermost signal to the
next-nearest neighbor, with the intensity ratio yielding the
number of equivalent nuclei (irrespective of the nuclear spin),
* provided* that they are not buried under the noise. The
"wiggles" in the wings are so-called ^{13} C satellite lines
due to ^{13} C in natural abundance
[23].
Obviously it would not be very difficult to build an
"expert system" capable of fully
analyzing this kind of spectra from scratch, although such an attempt has
not been performed in the present study.

The ENDOR spectrum consists of three pairs of lines centered about the free proton Larmor frequency (nu_H ca. 14.5 MHz). The hyperfine coupling constants (in units of MHz) are given as the line separations of each pair (a[MHz] = 28.025 a[mT]) [20,21]. Note, however, that the ENDOR signal intensities are quite misleading: one proton belongs to the largest splitting, four to the second largest, and 36 (!) to the smallest.

Program DATA (original Fortran version by T. Rakowsky [24]), ported to the C language and modified to handle ENDOR spectra as well, serves for an analysis of EPR and ENDOR spectra. First a rough estimate of the peak-to-peak noise is calculated from the first and last ten data points. Then signals are searched and accepted if their peak-to-peak height exceeds a given signal-to-noise limit; signal positions are calculated from the interpolated zero passage (or failing that, from the point of steepest descent). Thus, the experimental spectrum is reduced to a stick spectrum, and a listing of peak positions, peak-to-peak heights and peak-to-peak widths is generated. The spectrum center is refined by means of a symmetry test, and signals failing the symmetry test are canceled. A further treatment of ^1 H ENDOR spectra provides a listing of HFCs in units of MHz and G (1 G = 0.1 mT). In the case of EPR spectra, an autocorrelation is (optionally) performed, reporting line spacings and the number of their occurrences; spacings differing by less than half an average linewidth are grouped together.

Program MANIP allows various data manipulations: baseline correction (spline [25] or polynomial), smoothing (9-point smooth [26]), integration, differentiation [26], addition or subtraction of spectra, normalization, shift, fast Fourier transform (FFT) [25], resolution enhancement (Lorentzian-to-Gaussian transformation [27]), data reduction (via FFT) and file type conversion (e.g., ASCII or binary; 16-bit integers with either low or high byte first). Program VIEW (for IBM PC, using BGI) and a similar program for the Atari ST allow display, expansion and comparison of spectra; on UNIX machines running X windows (X11), program XGRAPH [28] is employed (after data conversion to ASCII x,y pairs). Program FPLOT providing HPGL output (or some scientific graphics package) is used for making hardcopies on a plotter or laser printer.

* Isotropic EPR spectra with constant linewidths. Program*
EPRFT. An early version of program EPRFT (written in
Fortran for an HP-1000/A600 computer) has been described
previously
[19]. Basic features of the current
version (ported to C) have remained unchanged and will be summarized
briefly. In module ` simul`, a stick spectrum is calculated
to first order (high-field approximation)
from the input parameters (hyperfine coupling constants,
nuclear spin, number of equivalent nuclei,
position of first data point relative to center and scaling of abscissa).
Within limits set by global constants (e.g., 12 sets of nuclei,
maximum of 50 components in each set) any spin system can be treated.
First the relative intensities ` p` and line positions ` xm`
within each multiplet (i.e., set of equivalent nuclei) are calculated.
The total (theoretical) number of lines ` ks` is equal to
the product of components within the multiplets,

(set i, N_i equivalent nuclei, nuclear spin I_i).
Then each of the k_s hyperfine components is added to the stick
spectrum (kept in array ` y` ), i.e., the product of relative
intensities is added to the proper array element; the channel
number is calculated from the sum of the shifts due to each
multiplet splitting and the position of the spectrum center.
Actually two adjacent elements with appropriate
weights are used to minimize rounding errors.
In module ` func` , first the stick spectrum is calculated
by calling ` simul` . Then convolution of the lineshape is
achieved by multiplication of the Fourier transforms
( ` realft`
[25])
of the stick spectrum and the lineshape function followed by
generation of the first derivative and inverse
Fourier transform
[29]. Pure Lorentzian (f^L),
Gaussian (f^G) or an arbitrary mixture
(x_{lgf} f^L + (1-x_{lgf}) f^G) may be used.
For simplicity, only the real parts of the Fourier transforms
of the lineshape functions are taken into account,

where b denotes the first derivative peak-to-peak
linewidth, c is the scan range (corresponding to n data points)
and r the number of the * complex* data point;
in the discrete FFT algorithm applied
[25], there are
n/2 complex data points, real and imaginary parts alternate
in the n-membered array. Neglect of the imaginary part of the
lineshape function presents no problem provided that the scan range
extends a few linewidths beyond the outermost lines or the spectrum
is centered. (Otherwise the truncated part is folded to the other
end of the spectrum.) The first derivative is obtained by
multiplication with 2 pi ir/n.

Finally the intensity of the calculated spectrum is adjusted to the experimental spectrum by multiplication with N,

Function ` func` returns the sum of the squared deviations
(y_r^{exp} - Ny_r^{calc})
which is equivalent to the well-known chi-square (chi^2),

provided that all data points have the same uncertainties As criterion for the goodness of fit, the standard deviation (root mean square error, sigma) is calculated;

The return value of ` func` , chi^2 is used by the minimizer,
either the evolutionary Monte Carlo method (option C),
the simplex method (option S) or the Marquardt method (option M).
The evolutionary Monte Carlo method
[30] has been
described previously
[19]; in the zeroth iteration,
a reference value of sigma (sigma_0) is calculated.
Then the parameters are varied by small but random amounts,

(where rnd(x) is the random function, 0...1)
and the new set is retained if sigma < sigma_0 otherwise
the previous set is restored. In the simplex method,
first the n_{var}+1 vertices of the initial simplex are
constructed from the original set of parameters and n_var
sets in which one of the parameters is modified by adding Delta a_j;
minimization is achieved by calling routine
` amoeba`
[25]. The Marquardt method (or
Levenberg-Marquardt method)
[14,15,
25] is a standard
least-squares routine and need not be described in detail here.
The partial derivatives with respect to n_{var} variable
parameters are calculated numerically by the difference method,
using increments delta a_k of 1/20 of the peak-to-peak
linewidth (for the linewidth itself, one tenth of this value is
used). Corrections to the parameters are applied with a
damping factor ( ` mfact` ) <= 1 to improve the convergence
behavior. The procedure is stopped if convergence of parameters is
reached, judging from corrections by less than a fraction (0.004) of the
linewidth, or when it diverges. Alternatively, any of the iterative
methods is stopped when the maximum number of iterations is
exceeded.

Program EPRFT may be run interactively or (on UNIX machines) in background. In the interactive mode (call with no parameters), the program requests all input parameters from the user. These parameters may be stored in an ASCII file which can be used in future runs. The progress of iterative fitting may be followed on the monitor, and the procedure can be stopped any time. On suitable platforms, visual comparison of experimental and simulated spectra is possible via calls to external programs (VIEW or XGRAPH, see the section on data analysis). The program reports the optimized parameters, maximum amplitude and calculated area under the absorption curve; this may be used in estimates of radical concentrations as an alternative to double integration of the experimental spectrum. For background operation (UNIX), program EPRFT must be called with the following parameters: name of parameter file, maximum number of iterations, method (C, S or M), [Marquardt damping factor] (and ampersand &); it may be terminated gracefully by an interrupt signal.

Generally speaking, demand regarding computing power, storage allocation and required quality of input parameters increases in the series Monte Carlo method, simplex method and Marquardt method. As a rule of thumb, it may be stated that the Marquardt method fails if the errors in HFCs exceed half a linewidth. If it works, it converges fastest and has the additional advantage of providing error estimates of the fitted parameters. (These are returned as Delta a_j.) The convergence behavior of the simplex method and particularly the Monte Carlo method can be tuned by a proper choice of the variations Delta a_j which should reflect the estimated errors. Regarding the latter method, the variations should be scaled down for achieving a final refinement of the parameters; this is currently not done automatically. The evolutionary Monte Carlo method is inherently slow, typically a few hundred iterations are required. Usually the simplex method is the best choice, regarding a compromise between speed of convergence and sensitivity towards input parameters.

The execution time of program EPRFT is mainly determined by the two Fourier transforms involved and proportional to the number of data points. To provide an impression, execution times have been measured on a number of different platforms, if possible by means of the UNIX time command; programs were compiled for speed (UNIX: -O option). It should be emphasized that the following results are not meant to be benchmark tests. The execution times refer to a single Monte Carlo iteration, using triphenylmethyl radical and 4 K data points as an example. IBM XT-286 (6 MHz, no coprocessor): 140 s, Atari ST (8 MHz, coprocessor 68881): 35 s, HP-1000/A600 (Fortran version): 28 s, IBM AT-286 (8 MHz, coprocessor 80287): 22 s, IBM RT-6150 (FPU-1): 9.9 s, VAX station 3100: 1.5 s, AT-486 (25 MHz): 0.74 s, SUN SPARC-IPC (25 MHz, FPU): 0.26 s, CDC-4680 (MIPS 6000): 0.059 s, IBM RS/6000 (Model 530): 0.038 s.

Isotropic EPR spectra with varying linewidths. Programs hffit and hffits . Program EPRFT discussed above assumes constant linewidths. Due to the FFT algorithm it is particularly efficient in simulations involving thousands of hyperfine components. However, in many cases EPR spectra exhibit linewidth variations, either an alternating linewidth effect due to internal dynamics (which shall not be dealt with in this paper; see [15, 19]) or an asymmetric (M_I-dependent) linewidth effect due to slow molecular tumbling. In that case programs HFFIT or HFFITS may be employed [31]. Here convolution of the lineshape is achieved conventionally by superposition of tabulated truncated lineshape functions, i.e. first-derivative Lorentzian or Gaussian signals; cutoff is at +- 2 (Gaussian) or +- 6 (Lorentzian) first-derivative linewidths from the center.

In program HFFIT, the following M_I-dependence of linewidths b is assumed,

where all parameters refer to peak-to-peak linewidths. This relation is particularly useful if one nucleus dominates the asymmetric linebroadening, e.g., the nitrogen in nitroxide radicals. The program asks the user for estimates of the B and C parameters for each nucleus, all A values are summarized in the global linewidth parameter. However, it must be emphasized that this program does not take cross-terms of the type E_{ij}M_{Ii}M_{Ij} into account.

Program HFFITS allows all linewidths to vary individually without any restriction by an underlying theory (adjusting intensities to keep areas constant). Results are stored on files for further investigation. The maximum number of lines is controlled by a global constant (e.g., 1000).

Both programs use the evolutionary Monte Carlo method for minimization; for a limited number of variables (e.g., 100) the simplex method may be used alternatively in program HFFITS. Note that the Monte Carlo method requires merely one-dimensional arrays, thus the storage demand increases linearly with the number of variables, whereas both the simplex and the Marquardt method require two-dimensional arrays with a quadratic dependence on the number of parameters. Although the Monte Carlo method can cope with a large number of variables, the time required for convergence will increase correspondingly.

Execution time is proportional to the number of data points, the number of hyperfine components and the linewidth; furthermore, it depends on the lineshape (Lorentzian taking longer because of the chosen cutoff conditions). In simple cases, HFFIT and HFFITS are much faster than EPRFT; e.g., in the case of Coppinger's radical (see fig. 1) with 2 K data points, EPRFT requires 0.114 s and HFFITS 0.0244 s for one iteration on a SPARC station.

Fig. 2 shows three examples for problems often encountered in EPR spectroscopy. On the top, the spectrum of 2-methyl-1,3-bis(biphenylylene)propenyl radical (methyl derivative of Koelsch's radical) is depicted. From theory, 4*5^4 = 2500 hyperfine components are expected. Although the resolution is apparently quite good, program DATA recognizes only about 114 lines. The outermost signals are buried under the noise; they are actually cut off in the spectrum shown. Without additional information, it is hardly possible to determine a single hyperfine coupling accurately by direct means. An autocorrelation analysis (cutoff at 3 G) yields the following result, quoting the ten most abundant spacings (in G; 1 G = 0.1 mT; counts are given in parentheses): 0.118 (102), 1.824 (94), 0.234 (93), 1.941 (90), 1.706 (90), 0.351 (86), 1.589 (85), 2.057 (81), 1.472 (81), 0.469 (79). Obviously this list is not very helpful to begin with.

The second spectrum, due to slowly tumbling Coppinger's radical
(fig. 2, center), provides an example for an
asymmetric linewidth effect. An analysis would have to take into
account the fact that the * amplitudes* are no longer
representative of intensities.
The third spectrum, due to zinc
5-cyclohexyl-10,15,20-triphenylporphyrin cation radical
(fig. 2, bottom), is an example of a poorly resolved spectrum.
Only the largest splitting (due to the nitrogens) might
be extracted from this spectrum, determination of proton
splittings is hopeless without an ENDOR experiment.

* Brute-force attempts* . In principle, the evolutionary
Monte Carlo method might be able to track down to the global
minimum, given the spin system, an * arbitrary* set of
starting parameters, large variations Delta a_j of the
parameters and a huge number of iterations. With modern
workstations, it is quite feasible to perform a million
iterations. This idea was first tested with a relatively easy
problem, namely Coppinger's radical. Input parameters were
approximately correct values for the linewidth and the
spectrum center, arbitrary values for the two HFCs in the
range of 0 to 10 G, variations Delta a_j of 5 G and
a limit of 10000 iterations (taking about 20 min on a SPARC
station). Out of a series of 20 attempts, the program found
acceptable results in 19 cases (errors being a fraction
of the linewidth) and had only one complete failure.
(The signs were reversed in a number of cases, but they are
not relevant in these simulations.) However, if the program
happens to get trapped in a deep local minimum, it will hardly find
out in practice. An example for such a case is shown in
fig. 3,
bottom. The dotted trace gives the correct simulation, the solid
line illustrates the deadlock situation (here the spectrum
center had been chosen poorly); actually the program
jumped out towards the global minimum after almost 10000 iterations
when nothing happened.

To provide a more demanding test, the method was applied to the 2-methyl-1,3-bis(biphenylylene)propenyl radical (fig. 2, top). Input parameters were approximately correct values for the linewidth and the spectrum center, 2 HFCs of 2.0 G (variation 0.25 G, 4 H each), 1 HFC of 1.0 G (variation 0.25 G, 3 H), 2 HFCs of 0.5 G (variation of 0.2 G, 4 H each) and a limit of 100000 iterations (taking about 9 hours on a SPARC station). The procedure ended up in a local minimum (HFCs of 1.942, 1.823, 1.237, 0.355 and 0.124 G). In summary, it can be stated that this method may be used successfully to find one or two unknown splittings, but does not provide a general solution to the problem.

* Using ENDOR data.* In our laboratory, accurate values of
hyperfine coupling constants are usually available from
ENDOR experiments. In many cases, ENDOR spectra can be evaluated
rigorously by computer methods (e.g., using program DATA). ENDOR spectra
with strongly overlapping lines can be analyzed by a program
named COMPASS
[32]. However, it is not always clear
how many equivalent nuclei belong to each HFC. Therefore program
EPRGEN has been developed which generates parameter sets to be used
with program EPRFT. One option consists in systematically varying
the number of nuclei within given bounds. A single simulation and
comparison with the experimental spectrum (calculation of sigma)
might yield the proper decision. However, the HFCs determined
by ENDOR, though "accurate", are usually obtained under somewhat
different experimental conditions and may give a rather
unsatisfactory simulation. Moreover, the determination of the
spectrum center and particularly the linewidths by direct methods
is usually not precise enough. Therefore it is advisable to
combine the method with a fitting procedure.

As an example, the triphenylmethyl radical shall be taken, see
fig. 4. Analysis of the ENDOR spectrum
(fig. 4, top) by program
DATA yielded three HFCs: 7.991 MHz (2.849 G), 7.300 MHz (2.603 G)
and 3.201 MHz (1.141 G). To perform a rigorous test, numbers
of contributing nuclei were varied between 0 and 9, yielding a
total of 1000 trial sets. Approximately correct linewidth,
lineshape (Lorentz-Gauss with a Lorentz fraction of 0.3) and
spectrum center were provided, and a maximum of 100 iterations
according to the simplex method were allowed (taking about
4 h on a SPARC station in total). The following final
HFCs were obtained in the best simulation: 2.8423 G, 2.6022 G
and 1.1427 G, assignments (number of nuclei) and values
(in parentheses) of the best five simulations were as follows:
3,6,6 (120.6), 3,8,6 (166.9), 3,6,8 (177.7), 3,6,4 (205.5),
and 3,4,6 (207.1). So the result is unambiguous, but it should
be mentioned that a simulation using the original ENDOR data and
the optimum values for all other parameters yields a sigma
value of 245.9! Therefore a procedure * without*
refinement of parameters might well lead to wrong conclusions.

At this point the relevance of the absolute value of the parameter shall be addressed. In theory, it should be equal to the rms noise figure if a perfect match of simulated and experimental spectra has been achieved. This is indeed the outcome when using synthetic spectra with added noise. However, in the present case at best a sigma value of 117.5 was achieved, whereas the rms noise is only about 20 (from the last 200 data points). This discrepancy is illustrated by the difference spectrum (fig. 4, bottom); the reasons are neglect of ^{13} C satellite lines, neglect of second-order effects and imperfections in the lineshape (due to modulation broadening and other experimental problems). Therefore the absolute value of sigma does not provide a suitable criterion for the correctness of the fit.

* Combining Monte Carlo search and fitting.*
As has been noted above, application of any of the available
fitting procedures from scratch, i.e., without any reasonable
initial parameters, does not generally solve the problem.
If approximate HFCs are not available by some other means
(ENDOR, direct analysis), they must be guessed. That means,
the pertinent range of HFCs for each set of nuclei must be searched.
This task could be done systematically by constructing
a multidimensional grid of starting values or randomly
by means of the Monte Carlo method. Here the second way was chosen,
and an option for randomly generating HFCs within given bounds
was integrated in the parameter generator EPRGEN.

In the following, the results for the test cases will be reported. On input, approximately correct values for the linewidths and the spectrum center were provided as well as ranges of HFCs. Usually the simplex method with a limit of 100 iterations was chosen as minimizer in the fitting process (variations for constructing the initial simplex were roughly equal to the linewidth). The results were sorted in order of ascending sigma values, and the number of correct results found in succession was counted. A simulation was accepted if errors in HFCs were less than a fraction of the linewidth. Because of the limit imposed on the number of iterations, optimum values of all parameter were usually not reached. Approximate execution times refer to a SPARC station IPC.

1. Coppinger's radical, input ranges: 5 +- 5 G (1 H) and 5 +- 5 G (4 H), result: 5.813 and 1.382 G, 8 hits out of 500 (simplex, 70 min). 2. Triphenylmethyl radical, input ranges: 3 +- 0.5 G (3 H), 3 +- 0.5 G (6 H) and 1 +- 0.5 G (6 H), result: 2.812, 2.604 and 1.143 G, 1 hit out of 100 (simplex, 20 min). 3. 2-Methyl-1,3-bis(biphenylylene)propenyl radical, input ranges: 2 +- 0.25 G (4 H), 2 +- 0.25 G (4 H), 1 +- 0.25 G (3 H), 0.5 +- 0.2 G (4 H) and 0.5 +- 0.2 G (4 H), result: 1.946, 1.822, 1.234, 0.471 and 0.345 G, 5 hits out of 500 (simplex, 4 h); 2 hits out of 500 (Marquardt, damping factor 0.5, 80 min). The latter simulation, after final refinement of the parameters, is depicted in fig. 5 (top).

In the present paper, only the problem of high-resolution EPR spectra of single species has been addressed. Clearly these methods should be extended to cover mixtures, powder samples or dynamic processes. Moreover, they should also be useful in the field of high-resolution ^1 H NMR.

Financial support by the Fonds der Chemischen Industrie is gratefully acknowledged.

Burkhard Kirste