Numerical Methods | |
|
Methodological and roundoff errorsIn the following, we will see typical examples of different types of errors which occur in computer programs. We will also discuss possible ways to recognize and minimize the effects of methodological and roundoff errors. Connection between roundoff errors and algorithmThe first example shows that the roundoff error on the evaluation of a mathematical expression can be reduced through an appropriate reformulation of the algorithm. Let us imagine we wish to evaluate numerically the following expression: \[ \begin{equation} F(x) = \frac{\sqrt{1+x} - \sqrt{1-x}}{x}, \end{equation} \]for $x<<1$. Since $F(x)$ has a well-defined analytical expression, in this case there is no methodological error. Furthermore, we can assume that in this case there are no input or output errors. If we code this formula without further reshaping, we obtain the following result for decreasing values of $x$: Two features immediately catch our attention in the values of $F(x)$ shown in the above table:
This behaviour can be easily explained:
So far, we have only identified the error. The cure in this case is very simple: If we rewrite $F(x)$ as: \[ \begin{equation} F1(x) \equiv F(x) = \frac{(\sqrt{1+x}-\sqrt{1-x})}{x} \cdot \frac{(\sqrt{1+x}+\sqrt{1-x})}{(\sqrt{1+x}+\sqrt{1-x})} = \frac{2}{\sqrt{1+x}+\sqrt{1-x}} \end{equation} \]we recover the correct results. Another example from [Press,1992] shows that we can end up in trouble also in case of seemingly very simple problems, if we program the formulas without thinking. Let us consider the numerical solution of a second-order equation: \[ \begin{equation} ax^{2}+bx+c=0 \end{equation} \] with real coefficients $a$, $b$, and $c$. The well-known solutions are, \[ \begin{equation} x_{1}=\frac{-b+\sqrt{b^{2}-4ac}}{2a} \qquad x_{2}=\frac{-b-\sqrt{b^{2}-4ac}}{2a}. \end{equation} \]We immediately see that for very small values of the coefficients $a$ and/or $c$ there is the danger of a 'subtractive cancellation' on the values of $x_{1}$ and $x_{2}$ (for $b>0$ and $b<0$ respectively). However, it is also easy to recast the above expression in an equivalent form that is stable against roundoff errors, \[ \begin{equation} x_{1}=\frac{c}{q} \qquad x_{2}=\frac{q}{a} \end{equation} \]where \[ \begin{equation} q \equiv -\frac{1}{2} \left[ b + \text{sgn}(b) \sqrt{b^{2}-4ac} \right]. \end{equation} \]Try the two forms of the solutions for small values of $a$ and/or $c$ using the code below. Roundoff and methodological errors in numerical differentiationThe basic principle of a numerical differentiation (derivation) is that of replacing the required differential with a suitable finite difference expression, i.e. \begin{equation} \label{e4} \frac{d}{dx} f(x)\mid _{x=x_{o}} \approx \frac{f(x_{o}+h) - f(x_{o})}{h}. \end{equation}The differential ratio (\ref{e4}) converges to the first derivative of the function $f(x)$ in $x_{o}$ when the increment $h \to 0$. A finite increment (or stepsize) $h$ introduces a methodological error $\epsilon_{V}$, which gets smaller and smaller with decreasing $h$. Theoretical considerations show that $\epsilon_{V}$ must have the form, \begin{equation}\label{e5} \epsilon_{V}=C_{V}(h) \cdot h, \end{equation}where the quantity $C_{V}(h)$ in many cases depends only weakly on $h$, and can be replaced in practice by a constant: \[ \begin{equation} C_{V}(h) \approx C_{V}. \end{equation} \]If we plot $\epsilon_{V}$ vs $h$ in a log-log scale, the error curve satisfies the linear relation, \begin{equation}\label{e6} \log \epsilon_{V} = \log C_{V} + \log h \end{equation}with slope +1. The error is calculated for a concrete example, \[ \begin{equation} f(x)=\ln (x) \sin(5x), \end{equation} \] \[ \begin{equation} \frac{d}{dx} f(x) \mid _{x_{o}=3} = \frac{\sin(15)}{3}+5 \ln 3 \cos(15). \end{equation} \]The error is, \[ \begin{equation} \epsilon_V = \frac{\sin(15)}{3}+5 \ln 3 \cos(15) -\frac{\ln (3+h) \sin(5(3+h)) - \ln (3) \sin(15)}{h}. \end{equation} \]The error is plotted below. Clearly the relation (\ref{e5}) is well fulfilled in the interval $10^{-8} < h < 10^{-1}$ . For increments that are larger or smaller than these values, the 'real' behaviour of the error deviates greatly from this behaviour. The causes for this behaviour are the following:
The dependence of the roundoff error on $h$ does not follow a smooth curve, but tends to the expression: \begin{equation}\label{e7} \epsilon_{R}=\frac{C_{R}}{h}. \end{equation}From what we said so far, we can conclude:
There are two possible strategies to further reduce this minimal error:
In summary, we can say: The error diagnostics in this example (and in many areas of numerical
mathematics) is largely simplified by the fact that under
certain assumptions (in this case, in a particular range of $h$)
the methodological error is much larger than the roundoff error.
In many applications one has to evaluate numerically the function where erf$(x)$ represents the Gaussian error function. The integral representation, i.e. the representation in
form of a Taylor series is: The numerical evaluation of erfc$(x)$ can be done using this
Taylor series, whose convergence in principle is assured for all
real arguments $x$: From the point of view of computational efficiency it is
extremely inconvenient to evaluate each term $a_{n}$
of the series independently. It is much better to calculate
each $a_{n}$ from the previous $a_{n-1}$.
In fact, to calculate
the ratio $a_{n}/a_{n-1}$, we can exploit
the recursive relation:
Truncating the series at $n=n_{max}$ introduces a methodological error.
However, since the series is composed of monotonically descending
terms
with alternating sign, we can safely assume that the norm of $\epsilon_{V}$
is not larger than the first neglected term $a_{nmax+1}$.
Using this simple method to estimate
the methodological error
it is possible to compute the series up to machine precision,
i.e. until the final result is not changed including further terms in the sum.
In conclusion: within machine precision we can eliminate the
methodological error! We have thus shown that all deviations of the numerical results from the exact one can be attributed to the accumulation of roundoff errors. We can immediately verify this, repeating the same calculation with single and double precision. The results of these tests for different values of $x$ are given in the Table below.
Comparison of the evaluation of the function erfc$(x)$ through a Taylor series with single precision and double precision. All results are given in half-exponential form with 7 digits for the mantissa. The exponent E indicates a calculation in simple precision, the exponent D a calculation in double precision. (Note: JavaScript only calculates with double precision numbers.) While a Taylor series expansion approximates the value of erfc$(x)$ quite well for small arguments $x$, roundoff errors make it less accurate for large $x$. A different algorithm should be employed to calculate erfc$(x)$ for large $x$. An alternative method for calculating erfc$(x)$ is to use a continued fraction, \[ \begin{equation} \text{erfc}(x) = \frac{e^{-x^{2}}}{\sqrt\pi} \frac{1}{x+ \frac{a_1}{x+\frac{a_2}{x+\frac{a_3}{x+\cdots} }}}\qquad a_m=\frac{m}{2} \end{equation} \]The numerical calculation of a continued fraction has the disadvantage that we have to decide how many terms of the fraction we want to retain before the beginning of the calculation. This means that if we want to increase the number of terms of the series to reduce the methodological error, we have to restart the calculation from scractch, and cannot simply add further terms to an 'old' calculation as in the case of Taylor series. The code below uses a continued fraction with $m$ terms to calculate the complementary error function. In the table below, calculations of the complementary error function are shown in both in single precision and double precision for calculations using the taylor expansion algorithm and the continued fraction algorithm with $m=31$ and $m=61$.
All results are given in half-exponential form with 7 digits for the mantissa. The exponent E indicates a calculation in simple precision, the exponent D a calculation in double precision. By examining this table we can draw the following conclusions:
We can summarize the above results as follows:
Example of stable and unstable algorithmsMany numerical methods are based on recursion formulas of the type: \[ \begin{equation} y_{n} = a \cdot y_{n-1} + b \cdot y_{n-2} \qquad n=2,3,\ldots \quad . \end{equation} \]When employing such formulas, it is important to consider carefully the stability of the algorithm. An algorithm is defined as stable (or unstable), when the error with respect to the exact result at the $n$-th step of the calculation decreases (or increases) in the following steps. A very instructive example in this context is the numerical calculation of spherical Bessel functions through a "forward recursion": \[ \begin{equation} j_{o}(x)=\frac{\sin x}{x} \qquad j_{1}(x)=\frac{\sin x}{x^{2}} - \frac{\cos x}{x} \end{equation} \] \[ \begin{equation} j_{l}(x)=\frac{2l-1}{x} \cdot j_{l-1}(x) - j_{l-2}(x) \end{equation} \]The numerical evaluation of this formula obviously involves no methodological error; the only possible source of error are roundoff errors! To evaluate the magnitude of the roundoff error, we can compare the results calculated in single and double precision (see table 1.3). Tab.1.3: Numerical calculation of the spherical Bessel functions of order 0-9 through forward recursion. single prec. double prec. x = 0.3000 L= 0 0.9850674E+00 0 0.9850674D+00 1 0.9910274E-01 1 0.9910289D-01 2 0.5960023E-02 2 0.5961525D-02 3 0.2309625E-03 3 0.2558598D-03 4 -.5708978E-03 4 0.8536426D-05 5 -.1735790E-01 5 0.2329701D-06 6 -.6358853E+00 6 0.5811086D-08 7 -.2753767E+02 7 0.1884359D-07 8 -.1376248E+04 8 0.9363682D-06 9 -.7795982E+05 9 0.5304202D-04 x = 1.0000 L= 0 0.8414710E+00 0 0.8414710D+00 1 0.3011687E+00 1 0.3011687D+00 2 0.6203508E-01 2 0.6203505D-01 3 0.9006739E-02 3 0.9006581D-02 4 0.1012087E-02 4 0.1011016D-02 5 0.1020432E-03 5 0.9256116D-04 6 0.1103878E-03 6 0.7156936D-05 7 0.1332998E-02 7 0.4790142D-06 8 0.1988459E-01 8 0.2827691D-07 9 0.3367050E+00 9 0.1693277D-08 x = 10.0000 L= 0 -.5440211E-01 0 -.5440211D-01 1 0.7846694E-01 1 0.7846694D-01 2 0.7794219E-01 2 0.7794219D-01 3 -.3949584E-01 3 -.3949584D-01 4 -.1055893E+00 4 -.1055893D+00 5 -.5553451E-01 5 -.5553451D-01 6 0.4450132E-01 6 0.4450132D-01 7 0.1133862E+00 7 0.1133862D+00 8 0.1255780E+00 8 0.1255780D+00 9 0.1000964E+00 9 0.1000964D+00 x = 20.0000 L= 0 0.4564726E-01 0 0.4564726D-01 1 -.1812174E-01 1 -.1812174D-01 2 -.4836553E-01 2 -.4836552D-01 3 0.6030358E-02 3 0.6030359D-02 4 0.5047615E-01 4 0.5047615D-01 5 0.1668391E-01 5 0.1668391D-01 6 -.4130000E-01 6 -.4130000D-01 7 -.4352891E-01 7 -.4352891D-01 8 0.8653319E-02 8 0.8653319D-02 9 0.5088423E-01 9 0.5088423D-01 The results in Table 1.3 clearly show that this method is strongly unstable, in particular for small arguments $x$. This instability leads to a complete breakdown of the method for increasing order of the Bessel function. On the other hand, the stability of the method sensibly improves for larger values of $x$. In order to obtain meaningful results also for small values of the argument, we have to switch from the "forward recursion" algorithm we used so far to a "backward" recursion: \[ \begin{equation} j_{L+1}(x)=0 \qquad j_{L}(x)=\delta \end{equation} \] \[ \begin{equation} j_{l}(x)=\frac{2l+3}{x} \cdot j_{l+1}(x) - j_{l+2}(x) \end{equation} \]In these expression, $j_{l_{max}}(x)=\delta$ is an arbitrary (small) number, which is the starting value for the backward recursion. The natural number $l_{max}$ must be larger than the order $l$ to be calculated. Even though the starting value of the recursion is arbitrary, the values converge rapidly to a number series $\alpha j_{l}$ for decreasing $l$. The initially unknown constant $\alpha$ can be estimated normalizing the value for $l=0$ to the quantity $\sin x/x$. Due to the arbitrariness of $\delta$, the results obtained in this way are affected by a methodological error, which is smaller, the bigger the starting index $l_{max}$. To estimate $\epsilon_{V}$, we have repeated the calculation twice, i.e. for $l_{max}=18$ and $l_{max}=27$; in both cases we have performed calculations both in single and double precision. The results of these tests are summarised in the table below. single precision double precision backwards(18) backwards(27) backwards(18) backwards(27) x = 0.3000 L= 0 0.9850674E+00 0.9850674E+00 0.9850674D+00 0.9850674D+00 1 0.9910290E-01 0.9910290E-01 0.9910289D-01 0.9910289D-01 2 0.5961526E-02 0.5961525E-02 0.5961525D-02 0.5961525D-02 3 0.2558598E-03 0.2558598E-03 0.2558598D-03 0.2558598D-03 4 0.8536426E-05 0.8536425E-05 0.8536426D-05 0.8536426D-05 5 0.2329583E-06 0.2329583E-06 0.2329583D-06 0.2329583D-06 6 0.5378445E-08 0.5378444E-08 0.5378444D-08 0.5378444D-08 7 0.1076069E-09 0.1076069E-09 0.1076069D-09 0.1076069D-09 8 0.1899475E-11 0.1899474E-11 0.1899474D-11 0.1899474D-11 9 0.2999847E-13 0.2999847E-13 0.2999847D-13 0.2999847D-13 x = 1.0000 L= 0 0.8414710E+00 0.8414710E+00 0.8414710D+00 0.8414710D+00 1 0.3011687E+00 0.3011687E+00 0.3011687D+00 0.3011687D+00 2 0.6203505E-01 0.6203505E-01 0.6203505D-01 0.6203505D-01 3 0.9006580E-02 0.9006579E-02 0.9006581D-02 0.9006581D-02 4 0.1011016E-02 0.1011016E-02 0.1011016D-02 0.1011016D-02 5 0.9256115E-04 0.9256114E-04 0.9256116D-04 0.9256116D-04 6 0.7156936E-05 0.7156935E-05 0.7156936D-05 0.7156936D-05 7 0.4790134E-06 0.4790133E-06 0.4790134D-06 0.4790134D-06 8 0.2826499E-07 0.2826498E-07 0.2826499D-07 0.2826499D-07 9 0.1491376E-08 0.1491376E-08 0.1491377D-08 0.1491377D-08 x = 10.0000 L= 0 -.5440211E-01 -.5440211E-01 -.5440211D-01 -.5440211D-01 1 0.7846695E-01 0.7846695E-01 0.7846695D-01 0.7846694D-01 2 0.7794219E-01 0.7794220E-01 0.7794220D-01 0.7794219D-01 3 -.3949586E-01 -.3949586E-01 -.3949585D-01 -.3949584D-01 4 -.1055893E+00 -.1055893E+00 -.1055893D+00 -.1055893D+00 5 -.5553451E-01 -.5553451E-01 -.5553451D-01 -.5553451D-01 6 0.4450133E-01 0.4450133E-01 0.4450133D-01 0.4450132D-01 7 0.1133862E+00 0.1133862E+00 0.1133862D+00 0.1133862D+00 8 0.1255780E+00 0.1255780E+00 0.1255780D+00 0.1255780D+00 9 0.1000964E+00 0.1000964E+00 0.1000964D+00 0.1000964D+00 x = 20.0000 L= 0 0.4564727E-01 0.4564726E-01 0.4564726D-01 0.4564726D-01 1 -.8801447E-01 -.1812270E-01 -.8801444D-01 -.1812269D-01 2 -.5884944E-01 -.4836567E-01 -.5884943D-01 -.4836567D-01 3 0.7330211E-01 0.6031280E-02 0.7330208D-01 0.6031273D-02 4 0.8450518E-01 0.5047661E-01 0.8450516D-01 0.5047661D-01 5 -.3527479E-01 0.1668320E-01 -.3527476D-01 0.1668320D-01 6 -.1039063E+00 -.4130086E-01 -.1039063D+00 -.4130085D-01 7 -.3226432E-01 -.4352875E-01 -.3226432D-01 -.4352875D-01 8 0.7970807E-01 0.8654292E-02 0.7970804D-01 0.8654284D-02 9 0.1000162E+00 0.5088490E-01 0.1000161D+00 0.5088490D-01 Numerical calculation of spherical Bessel function of order 0-9 with backwards recursion. It appears that the backward recursion represents a very stable algorithm for the whole range of $x$ considered: the results with single and double precision differ by at most two units in the seventh digit of the mantissa. Furthermore, we see that the methodological error increases with increasing values of the argument $x$.
|