title: Division without division and other computational tricks
slug: inverse
date: 2026-05-22 12:22:29
modified: 2026-05-25 00:17:36 UTC+02:00
type: text
tags: nerd_fun

## What is this about? [🔗](#intro){.small} {: #intro}

You want to divide two numbers with your computer.

No problem: Just let your computer do it!

But in rare cases, your computer may not know how to divide.

This could be the case if said computer is a mere µcontroller.  Even
more so if you do your own arithmetic, e.g., BCD or fixed point.

So this post assumes your computer knows how to
do addition <math><mo>+</mo></math>, subtraction <math><mo>-</mo></math>,
multiplication <math><mo>*</mo></math>,
and comparison with <math><mo>&lt;</mo></math>,
but it does not yet know division <math><mo>/</mo></math>.  It offers
a software solution that teaches your computer division.

I understand one normal expert advice for teaching computers division
is to use the [CORDIC](https://en.wikipedia.org/wiki/CORDIC) algorithm
family (which can even be used to teach your computer multiplication,
in case it has not learned that yet, either).  But that CORDIC stuff
is quite a beast to wrap your head around (which I, honestly, have not
really tried yet). Apparently, it also needs a bit of ROM space for
lookup tables.  Finally, from my present partial understanding, CORDIC
seems to be intended for binary numeric representation; the approach
offered here can be used for BCD numbers unchanged.

So this toot dares to suggest a quite simple home-grown approach to
the division problem.

That approach has a reasonably decent runtime performance, though it
may not be optimal.  It results in short program code which is easy to
get right.  And it doesn't need lookup tables.

And, of course: If you just want to read this for the fun of it,
without any particular application in mind, you are warmly invited to
do so.

_Full disclosure: I wrote this for the fun of it, without any
particular application in mind._

## In a nutshell [🔗](#summary){.small} {: #summary}

You are given a number <math><mi>a</mi><mo>></mo><mn>0</mn></math> and
desire to calculate <math><mfrac><mn>1</mn><mi>a</mi></mfrac></math>.

(That is actually all that is missing to do general division:
Instead of dividing something by <math><mi>a</mi></math>, multiply
it with <math><mfrac><mn>1</mn><mi>a</mi></mfrac></math>.)

To do so, start with any value <math><msub><mi>x</mi><mn>0</mn></msub></math>
that is a reasonable approximation of <math><mfrac><mn>1</mn><mi>a</mi></math>.

Here, the requirements for "reasonable" are lax.
Any <math><msub><mi>x</mi><mn>0</mn></msub></math>
with <math><mn>0</mn><mo>&lt;</mo><msub><mi>x</mi><mn>0</mn></msub><mo>&lt;</mo><mfrac><mn>2</mn><mi>a</mi></math>
will work.  (But the better your initial approximation is, the faster the
calculation completes.)

From your start value <math><msub><mi>x</mi><mn>0</mn></msub><mo></mo></math>, define a
sequence <math>
    <msub><mi>x</mi><mn>0</mn></msub><mo>,</mo>
    <msub><mi>x</mi><mn>1</mn></msub><mo>,</mo>
    <msub><mi>x</mi><mn>2</mn></msub><mo>,</mo><mo>...</mo>
</math>
of better and better approximations for <math><mfrac><mn>1</mn><mi>a</mi></math>
via the simple formula

<math display="block">
    <msub>
        <mi>x</mi>
        <mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow>
    </msub>
    <mo>=</mo>
    <mrow>
        <msub>
            <mi>x</mi>
            <mi>n</mi>
        </msub>
        <mo>(</mo>
        <mn>2</mn>
        <mo>-</mo>
        <mrow>
        <mi>a</mi>
        <msub>
            <mi>x</mi>
            <mi>n</mi>
        </msub>
        </mrow>
        <mo>)</mo>
    </mrow>
</math>

Here is a numerical example with <math><mi>a</mi><mop>=</mop><mn>3</mn></math>,
using <math><msub><mi>x</mi><mn>0</mn></msub><mop>=</mop><mn>0.5</mn></math>
as an initial approximation for <math><mfrac><mn>1</mn><mn>3</mn></math>.

&#x200b; | value
--|------
&#x200b;​<math><msub><mi>x</mi><mn>0</mn></msub></math> | 0.500000000000000000
&#x200b;<math><msub><mi>x</mi><mn>1</mn></msub></math> | 0.250000000000000000
&#x200b;<math><msub><mi>x</mi><mn>2</mn></msub></math> | 0.312500000000000000
&#x200b;​<math><msub><mi>x</mi><mn>3</mn></msub></math> | 0.332031250000000000
&#x200b;​<math><msub><mi>x</mi><mn>4</mn></msub></math> | 0.333328247070312500
&#x200b;​<math><msub><mi>x</mi><mn>5</mn></msub></math> | 0.333333333255723119
&#x200b;​<math><msub><mi>x</mi><mn>6</mn></msub></math> | 0.333333333333333315
&#x200b;​<math><msub><mi>x</mi><mn>7</mn></msub></math> | 0.333333333333333315
&#x200b;​<math><msub><mi>x</mi><mn>8</mn></msub></math> | 0.333333333333333315

The sequence stabilizes with all subsequent values being identically
to <math><msub><mi>x</mi><mn>6</mn></msub></math>.  That value 
incidentally is the best that can be achieved, as the next higher
value that can be represented by the floating point number
scheme used is 0.333333333333333370, which is less precise.

In all numerical calculations presented here, I used whatever floating
point number scheme my PC's libraries offer.  I believed this to be
the ubiquitous [IEEE 754
binary64](https://en.wikipedia.org/wiki/Double-precision_floating-point_format),
but I did nothing to verify this.

Once <math><msub><mi>x</mi><mn>2</mn></msub></math> has found the
first correct digit, each of the few next subsequent values has twice
as many correct digits as its predecessor.  This is typical of the
_quadratic convergence_ afforded by the method presented here.

### When to stop? [🔗](#stop_condition){.small} {: #stop_condition}

One might be tempted to stop
when <math><msub><mi>x</mi><mi>n</mi></msub><mop>=</mop><msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub></math>.
This will work most of the time, but not always.  In rare cases, the sequence
will oscillate.

One example found by brute-force search
is <math><mi>a</mi><mop>=</mop><mn>338</mn></math> with start
value <math><msub><mi>x</mi><mn>0</mn></msub><mop>=</mop><mn>0.002</mn></math>:

&#x200b; | value
-|-
&#x200b;<math><msub><mi>x</mi><mn>0</mn></msub></math> | 0.0020000000000000000
&#x200b;<math><msub><mi>x</mi><mn>1</mn></msub></math> | 0.0026479999999999997
&#x200b;<math><msub><mi>x</mi><mn>2</mn></msub></math> | 0.0029259764480000002
&#x200b;<math><msub><mi>x</mi><mn>3</mn></msub></math> | 0.0029582205931032645
&#x200b;<math><msub><mi>x</mi><mn>4</mn></msub></math> | 0.0029585798380249708
&#x200b;<math><msub><mi>x</mi><mn>5</mn></msub></math> | 0.0029585798816568042
&#x200b;<math><msub><mi>x</mi><mn>6</mn></msub></math> | 0.0029585798816568051
&#x200b;<math><msub><mi>x</mi><mn>7</mn></msub></math> | 0.0029585798816568042
&#x200b;<math><msub><mi>x</mi><mn>8</mn></msub></math> | 0.0029585798816568051

The value returned by the build-in division in this case prints as
0.0029585798816568047.

It is conceivable that in some cases, the sequence might end up in a
merry-go-round of three or even more numbers.  With a brief
brute-force search, I did not find any occurrences of that,
but it should not be ruled out.

In the realm of pure mathematics and precise real numbers,
the algorithm employed guarantees an _increasing_ sequence <math>
    <msub><mi>x</mi><mn>1</mn></msub><mo>&lt;</mo>
    <msub><mi>x</mi><mn>2</mn></msub><mo>&lt;</mo>
    <msub><mi>x</mi><mn>3</mn></msub><mo>&lt;</mo><mo>...</mo>
</math>.  (This statement purposefully omits the possibly too-high
start-value <math><msub><mi>x</mi><mn>0</mn></msub></math>.)

In the practical realm of computers with finite-precision arithmetic,
I suggest as a straightforward termination criterion <math>
    <msub><mi>x</mi><mi>n</mi></msub><mo>​​​≥</mo>
    <msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub>
</math>.  When that triggers, a good finite-precision approximation of the desired
result <math><mfrac><mn>1</mn><mi>a</mi></mfrac></math>
is given by <math><msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub></math>.

For a posh twist, you can use that only
if <math><msub><mi>x</mi><mi>n</mi></msub><mo>​​​=</mo><msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub></math>, and <math>
    <mn>0.5</mn>
    <mo>(</mo>
    <msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub>
    <mo>+</mo>
    <msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>2</mn></mrow></msub>
    <mo>)</mo>
</math> if not.  This results in a marginal increase in typical precision,
at the expense of a marginal increase in runtime.

### Choice of start value [🔗](#x0){.small} {: #x0}

One can try to find a good start value <math><msub><mi>x</mi><mn>0</mn></msub></math>
by analyzing the given number <math><mi>a</mi></math> to be inverted.

Where floating point numbers are used, analysis
of the representation of <math><mi>a</mi></math>
yields information about the magnitude
of <math><mi>a</mi></math>, from which a rough approximation
of its inverse can be obtained that is good enough to serve
as <math><msub><mi>x</mi><mn>0</mn></msub></math>.

A more pedestrian approach starts with the
value <math><mn>x</mn><mo>=</mo><mn>1</mn></math> and
calculates the product <math><mn>x</mn><mn>a</mn></math>.
If that product is close enough to <math><mn>1</mn></math>,
then <math><mn>x</mn></math> can be used as is.
If not, <math><mn>x</mn></math> can be either increased or
decreased as pertinent, and <math><mn>x</mn><mn>a</mn></math> calculated anew.

This pedestrian approach is used in the following sample implementation:

## Sample implementation [🔗](#sample_impl){.small} {: #sample_impl}

Here is a sample implementation for this algorithm in Python:

``` python
def sample_implementation(a):
    assert 0 < a

    # Find decent start value x
    # between 0.17/a and 1.8/a:
    x = 1
    xa = x * a
    while 1.8 < xa:
        x *= 0.1
        xa = x * a
    while xa < 0.17:
        x *= 10
        xa = x * a
    # The multiplication factors
    # of 0.1 and 10 in the above
    # have been pulled
    # out of thin air.

    # They'll work, but better
    # choices may be possible.

    # If representation is binary,
    # using the values 0.125 and 8
    # will most likely
    # speed up calculations.

    # Omit the possibly too high
    # start value x, proceed to the
    # next value in the sequence
    # right away:
    x = x * (2 - xa)

    while True:
        x_new = x * (2 - x * a)
        if x_new <= x:
            return x_new
        x = x_new
```

To verify, the results of this implementation where compared with that
of the system's floating point division for 1,000,000 values
in the range 1e-300 through 1e+300.

The results from this implementation where either the same as that
computed by the system's `1/a`, or it was the next lower floating
point value that can be represented, or the next higher.  In no case
was the result from this implementation further off the system's, it
was always one of those three.

In percentages:

what | frequency
-----|------
same value | 67.8 %
next lower value  | 23.2 %
next larger value | 9.0 %

For a further experiment, the posh implementation of calculating the
return value was tried:

``` python
if x_new <= x:
    if x_new == x:
        return x_new
    else:
        return 0.5 * (
          x_new +
          x_new * (2 - x_new * a)
        )
```

This was only slightly better:

what | frequency
-----|------
same value | 68.1 %
next lower value  | 22.9 %
next larger value | 9.0 %

### Suggested exercises [🔗](#codeit){.small} {: #codeit}

- Implement this algorithm in a programming language of your choice
  and play around with it.
- Extend the algorithm so it can deal with negative <math><mi>a</mi></math>
  and also does whatever the appropriate error handling for your platform is
  when <math><mi>a</mi><mo>=</mo><mn>0</mn></math>.
  
## The math behind this: a theorem. [🔗](#theorem){.small} {: #theorem}

### Theorem's assumptions [🔗](#assumptions){.small} {: #assumptions}

Assume a real-valued 
function <math><mi>f</mi></math> is defined over the open non-empty interval of
real numbers between two numbers <math><mi>u</mi></math> and <math><mi>v</mi></math> 
and is twice continuously differentiable over that interval.

Assume a point <math><mi>z</mi></math> exists in that interval,
that is, <math><mi>u</mi><mo>&lt;</mo><mi>z</mi><mo>&lt;</mo><mi>v</mi></math>,
which is a fixpoint of <math><mi>f</mi></math>,
that is, <math><mi>f</mi><mo>(</mo><mi>z</mi><mo>)</mo><mo>=</mo><mi>z</mi></math>.

Assume further <math><msup><mi>f</mi><mo>′</mo></msup><mo>(</mo><mi>z</mi><mo>)</mo><mo>=</mo><mn>0</mn></math>,
and also <math><mn>0</mn><mo>≤</mo><msup><mi>f</mi><mo>′</mo></msup><mo>(</mo><mi>x</mi><mo>)</mo></math>
for all <math><mi>x</mi></math>
such that <math><mi>u</mi><mo>&lt;<mo><mi>x</mi><mo>≤<mo><mi>z</mi></math>,
and finally, assume that no fixpoint of <math><mi>f</mi></math> exists
larger than <math><mi>u</mi></math> and smaller than <math><mi>z</mi></math>.

### Theorem's claims  [🔗](#claims){.small} {: #claims}

- Then, any <math><mi>x</mi></math>
such that <math><mi>u</mi><mo>&lt;</mo><mi>x</mi><mo>&lt;</mo><mi>z</mi></math> fulfills <math><mi>x</mi><mo>&lt;</mo><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>≤</mo><mi>z</mi></math>.

Assuming that first claim to be true, given
any <math><msub><mi>x</mi><mn>0</mn></msub></math> such
that <math><mi>u</mi><mo>&lt;</mo><msub><mi>x</mi><mn>0</mn></msub><mo>≤</mo><mi>z</mi></math>,
we can define a sequence recursively via <math>
    <msub>
        <mi>x</mi>
        <mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow>
    </msub>
    <mo>=</mo>
    <mi>f</mi>
    <mo>(</mo>
        <msub>
            <mi>x</mi>
            <mi>n</mi>
        </msub>
    <mo>)</mo>
</math> for all indices (natural numbers) <math><mi>n</mi></math>.

This sequence has the following properties:

- <math><mi>u</mi><mo>&lt;</mo><msub><mi>x</mi><mi>n</mi></msub><mo>≤</mo><mi>z</mi></math>
  will hold for any index <math><mi>n</mi></math>.
  (This helps the sequence definition to make sense.)
- This is an increasing sequence, that
  is, <math><msub><mi>x</mi><mi>n</mi></msub><mo>≤</mo><msub>
        <mi>x</mi>
        <mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow>
    </msub></math> for any index <math><mi>n</mi></math>.
- This sequence converges to <math><mi>z</mi></math>.
- Finally, it does so in quadratic fashion,
that is, there exist some number <math>m</math> and some
constant <math><mi>k</mi><mo>></mo><mn>0</mn></math> such
that <math><mo>|</mo>
    <mrow>
    <msub>
        <mi>x</mi>
        <mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow>
    </msub><mo>-</mo><mi>z</mi>
    </mrow>
    <mo>|</mo><mo>≤</mo><mi>k</mi><msup><mrow><mo>|</mo>
<msub><mi>x</mi><mi>n</mi></msub><mo>-</mo><mi>z</mi>
<mo>|</mo></mrow><mn>2</mn></msup></math>
for any index <math><mi>n</mi></math>
such that <math><mi>n</mi><mo>></mo><mi>m</mi></math>.
{: #quadratic}

## Application [🔗](#inv_from_theorem){.small} {: #inv_from_theorem}

Assuming for a moment the theorem given were true. It can be applied to our
inversion problem as follows:

* <math><mi>u</mi><mo>=</mo><mn>0</mn></math>, <math><mi>v</mi><mo>=</mo><mfrac><mn>2</mn><mi>a</mi></math>.
* <math><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mi>x</mi><mo>(</mo><mn>2</mn><mo>-</mo><mrow><mi>a</mi><mi>x</mi></mrow><mo>)</mo></math> is a quadratic parable with zeros at <math><mn>0</mn></math> and <math><mfrac><mn>2</mn><mi>a</mi></mfrac></math>, opening downwards.
* <math><mi>z</mi><mo>=</mo><mfrac><mn>1</mn><mi>a</mi></mfrac></math> is clearly a fixpoint
  of <math><mi>f</mi></math>.
* A straightforward calculation shows that the only other fixpoint
  of <math><mi>f</mi></math> is <math><mn>0</mn></math>.
* The maximal value of <math><mi>f</mi></math> is in the middle between its zeros,
  so at <math><mi>z</mi></math>.
* The first derivative of <math><mi>f</mi></math> at <math><mi>z</mi></math>
  is <math><mn>0</mn></math>, as <math><mi>f</mi></math> has its maximum there.
* The first derivative of <math><mi>f</mi></math> has its _only_ zero
  at <math><mi>z</mi></math>.
* As <math><mi>f</mi></math> opens downwards,
  the derivative has to be positive before <math><mi>z</mi></math> and negative thereafter.
* The theorem gives us that our sequence is increasing and we have quadratic convergence, for any start
  value <math><msub><mi>x</mi><mn>0</mn></msub></math>
  such that <math><mn>0</mn><mo>&lt;</mo><msub><mi>x</mi><mn>0</mn></msub><mo>≤</mo><mfrac><mn>1</mn><mi>a</mi></mfrac></math>.
* Now <math><mi>f</mi></math> is symmetrical around <math><mfrac><mn>1</mn><mi>a</mi></mfrac></math>,
so we also have quadratic convergence for any
start value <math><msub><mi>x</mi><mn>0</mn></msub></math>
such that <math><mfrac><mn>1</mn><mi>a</mi></mfrac><mo>≤</mo><msub><mi>x</mi><mn>0</mn></msub><mo>&lt;</mo><mfrac><mn>2</mn><mi>a</mi></mfrac></math>.

## Suggested exercise [🔗](#check_inequality){.small} {: #check_inequality}

Convince yourself (in a "pedestrian" way) that for any <math><mi>x</mi></math>
with <math><mn>0</mn><mo>&lt;</mo><mi>x</mi><mo>&lt;</mo><mfrac><mn>1</mn><mi>a</mi></mfrac></math>
the assertion <math><mi>x</mi><mo>&lt;</mo><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>≤</mo><mi>z</mi></math> holds in our particular <math><mfrac><mn>1</mn><mi>a</mi></mfrac></math> situation.

## Suggested exercise [🔗](#pi){.small} {: #pi}

Convince yourself that the theorem allows the use of the
function <math><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mi>x</mi><mo>+</mo><mi>sin</mi><mo>(</mo><mi>x</mi><mo>)</mo></math>
to calculate <math><mi>π</mi></math>.

(There are much more convenient ways to calculate <math><mi>π</mi></math>.
But this way is at least legitimate, as the <math><mi>sin</mi></math> function
can be calculated via its [power series](https://en.wikipedia.org/wiki/Sine_and_cosine#Series_and_polynomials)
without prior knowledge of <math><mi>π</mi></math>.)

## Suggested exercise [🔗](#temperament){.small} {: #temperament}

In music, the "[equal
temperament](https://en.wikipedia.org/wiki/Equal_temperament)" demands
that the quotient of the frequencies of two adjacent (half-)tones
is <math><mi>z</mi><mo>=</mo><mroot><mn>2</mn><mn>12</mn></mroot></math>.

* Convince yourself that, for any real <math><mi>λ</mi></math>, this
number <math><mi>z</mi></math> is a fixpoint of <math>
<msub><mi>f</mi><mi>λ</mi></msub><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo>
<mo>(</mo><mn>1</mn><mo>-</mo><mi>λ</mi><mo>)</mo><mi>x</mi><mo>+</mo>
<mi>λ</mi><mo>(</mo><msup><mi>x</mi><mn>13</mn></msup><mo>-</mo><mi>x</mi><mo>)</mo></math>.
* Find a value <math><mi>λ</mi></math> that might be suitable to make the theorem
applicable.
* Verify that the theorem indeed is applicable.
* Calculate <math><mi>z</mi></math> via <math><msub><mi>f</mi><mi>λ</mi></msub></math>.

## Proof of the theorem [🔗](#proof){.small} {: #proof}

We first need to prove: Given <math><mi>x</mi></math>
such that <math><mi>u</mi><mo>&lt;</mo><mi>x</mi><mo>&lt;</mo><mi>z</mi></math>,
it is true that <math><mi>x</mi><mo>&lt;</mo><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>≤</mo><mi>z</mi></math>.

For <math><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>≤</mo><mi>z</mi></math>:
According to the [mean value theorem](https://en.wikipedia.org/wiki/Mean_value_theorem), there
must exist some <math><mi>ξ</mi></math> with <math><mi>x</mi><mo>&lt;</mo><mi>ξ</mi><mo>&lt;</mo><mi>z</mi></math>
such that <math><msup><mi>f</mi><mi>′</mi></msup><mo>(</mo><mi>ξ</mi><mo>)</mo><mo>=</mo><mfrac>
<mrow><mi>f</mi><mo>(</mo><mi>z</mi><mo>)</mo><mo>-</mo><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow>
<mrow><mi>z</mi><mo>-</mo><mi>x</mi></mrow>
</mfrac></math>.  According to our assumptions, <math>
<msup><mi>f</mi><mi>′</mi></msup><mo>(</mo><mi>ξ</mi><mo>)</mo><mo>≥</mo><mn>0</mn></math>.
As <math><mi>x</mi><mo>&lt;</mo><mi>z</mi></math>,
we see <math><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>≤</mo><mi>f</mi><mo>(</mo><mi>z</mi><mo>)</mo>
<mo>=</mo><mi>z</mi></math>.


For <math><mi>x</mi><mo>&lt;</mo><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></math>, consider the
function <math><mi>p</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>-</mo><mi>x</mi></math>.
This function is zero precisely where <math><mi>f</mi></math> has a fixpoint.
In particular, <math><mi>p</mi><mo>(</mo><mi>z</mi><mo>)</mo><mo>=</mo><mn>0</mn></math>.
Now <math><mi>f</mi></math> has no fixpoint between <math><mi>u</mi></math>
and <math><mi>z</mi><mo></math> (exclusively),
so <math><mi>p</mi></math> becomes zero nowhere in that interval.  Being a continuous function, <math><mi>p</mi></math> cannot
change sign over an interval without assuming a zero value in that interval.
So <math><mi>p</mi></math> cannot change sign between <math><mi>u</mi></math>
and <math><mi>z</mi><mo></math> (exclusively), in other words,
must either be always negative or always positive there.
Now <math><mi>p</mi></math> is continuously differentiable.
Its derivative evaluates to <math><mn>-1</mn></math> at <math><mi>z</mi></math>.
As the derivative is continuous, it must be negative for some interval of points
slightly smaller than <math><mi>z</mi></math>.
By another application of the mean value theorem, <math><mi>p</mi></math> must be positive
on that same interval.  We established <math><mi>p</mi></math> does not change sign
in the interval strictly between <math><mi>u</mi></math> and <math><mi>z</mi></math>,
and we have found points in that interval
where <math><mi>p</mi></math> is positive, so we now know
it must be positive everywhere between <math><mi>u</mi></math> and <math><mi>z</mi></math>
(exclusively), which in particular
proves <math><mi>p</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>></mo><mn>0</mn></math>.
Now <math><mi>p</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>></mo><mn>0</mn></math> is equivalent
to <math><mi>x</mi><mo>&lt;</mo><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></math>.

These two together prove that the sequence <math>
    <msub><mi>x</mi><mn>0</mn></msub><mo>,</mo>
    <msub><mi>x</mi><mn>1</mn></msub><mo>,</mo>
    <msub><mi>x</mi><mn>2</mn></msub><mo>,<mo>...</mo>
</math> increases and is bounded from above by <math><mi>z</mi></math>.

Any increasing bounded sequence of real numbers converges. (How to
prove this fact, which is both profound and basic, depends on your
definition of real numbers.)

Let us call the limit <math>
<mi>ζ</mi><mo>=</mo>
<munder><mi>lim</mi><mrow><mi>n</mi><mo>→</mo><mi>∞</mi></mrow></munder>
<msub><mi>x</mi><mi>n</mi></msub></math>.
As <math><mi>f</mi></math> is continuous:

<math display="block"><mi>f</mi><mo>(</mo><mi>ζ</mi><mo>)</mo><mo>=</mo>
<mi>f</mi><mo>(</mo><munder><mi>lim</mi><mrow><mi>n</mi><mo>→</mo><mi>∞</mi></mrow></munder>
<msub><mi>x</mi><mi>n</mi></msub><mo>)</mo><mo>=</mo>
<munder><mi>lim</mi><mrow><mi>n</mi><mo>→</mo><mi>∞</mi></mrow></munder>
<mi>f</mi><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub><mo>)</mo><mo>=</mo>
<munder><mi>lim</mi><mrow><mi>n</mi><mo>→</mo><mi>∞</mi></mrow></munder>
<msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo>
<munder><mi>lim</mi><mrow><mi>n</mi><mo>→</mo><mi>∞</mi></mrow></munder>
<msub><mi>x</mi><mi>n</mi></msub><mo>=</mo>
<mi>ζ</mi>
</math>

So <math><mi>ζ</mi></math> is a fixpoint of <math><mi>f</mi></math>, which
forces <math><mi>ζ</mi><mop>=</mop><mi>z</mi></math>.  So the sequence <math>
    <msub><mi>x</mi><mn>0</mn></msub><mo>,</mo>
    <msub><mi>x</mi><mn>1</mn></msub><mo>,</mo>
    <msub><mi>x</mi><mn>2</mn></msub><mo>,</mo><mo>...</mo>
</math> converges with limit <math><mi>z</mi></math>.

The only claim not yet proven is that this convergence is quadratic.

This is a direct consequence of
[Taylor's theorem](https://en.wikipedia.org/wiki/Taylor%27s_theorem).
It allows us to rewrite <math><mi>f</mi><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub><mo>)</mo><mo>=</mo>
<mi>f</mi><mo>(</mo><mi>z</mi><mo>)</mo>
<mo>+</mo>
<msup><mi>f</mi><mi>′</mi></msup><mo>(</mo><mi>z</mi><mo>)</mo><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub><mo>-</mo><mi>z</mi><mo>)</mo>
<mo>+</mo>
<msub><mi>R</mi><mn>1</mn></msub><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub>)</mo>
</math>

In our case, <math><msup><mi>f</mi><mi>′</mi></msup><mo>(</mo><mi>z</mi><mo>)</mo><mo>=</mo><mn>0</mn></math>,
so this reduces to <math><mi>f</mi><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub><mo>)</mo><mo>=</mo>
<mi>f</mi><mo>(</mo><mi>z</mi><mo>)</mo><mop>+</mop><msub><mi>R</mi><mn>1</mn></msub><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub>)</mo>
</math>
or <math><mi>f</mi><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub><mo>)</mo><mo>-</mo><mi>f</mi><mo>(</mo><mi>z</mi><mo>)</mo><mo>=</mo>
<msub><mi>R</mi><mn>1</mn></msub><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub>)</mo></math>
or <math><msub><mi>x</mi><mrow><mi>n</mi><mop>+</mop><mn>1</mn></mrow></msub><mo>-</mo><mi>z</mi><mo>=</mo>
<msub><mi>R</mi><mn>1</mn></msub><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub>)</mo></math>.
The Lagrange form of the remainder boils down to <math>
<msub><mi>R</mi><mn>1</mn></msub><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub>)</mo><mo>=</mo>
<mfrac><mrow><msup><mi>f</mi><mi>″</mi></msup><mo>(</mo><mi>ξ</mi><mo>)</mo></mrow><mn>2</mn></mfrac>
<msup><mrow><mo>(</mo><msub><mi>x</mi><mi>n</mi></msub><mo>-</mo><mi>z</mi><mo>)</mo></mrow><mn>2</mn></msup>
</math> for some suitable value <math><mi>ξ</mi></math> with <math>
<msub><mi>x</mi><mi>n</mi></msub><mo>≤</mo><mi>ξ</mi><mo>≤</mo><mi>z</mi></math>.
Now <math><msup><mi>f</mi><mi>″</mi></math> is continuous, hence bounded over any closed interval
containing <math><mi>z</mi></math>.  If we have an upper bound <math><mi>k</mi></math>
of <math><mo>|</mo><mi>f</mi><mo>|</mo></math> over some such interval and
let <math><mi>n</mi></math> become large enough
so all subsequent <math><msub><mi>x</mi><mi>n</mi></msub></math> are contained in that interval,
then indeed <math>
<mop>|</mop><msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>-</mo><mi>z</mi><mop>|</mop>
<mop>≤</mop>
<mfrac><mo>k</mo><mn>2</mn></mfrac>
<msup>
   <mrow><mop>|</mop><msub><mi>x</mi><mi>n</mi></msub><mop>-</mop><mi>z</mi><mop>|</mop></mrow>
   <mn>2</mn>
</msup></math>,
which completes the proof.

## Suggested exercise [🔗](#mirrortheorem){.small} {: #mirrortheorem}

Formulate the mirror-image theorem, where convergence ends up not being from below, but from above, and prove it.

Use your theorem to prove that [Heron's
method](https://en.wikipedia.org/wiki/Square_root_algorithms#Heron's_method)
works, which uses the recursion formula <math>
<msub><mi>x</mi><mrow><mi>n</mi><mop>+</mop><mn>1</mn></mrow></msub><mop>=</mop>
<mfrac><mn>1</mn><mn>2</mn></mfrac><mop>(</mop>
<msub><mi>x</mi><mi>n</mi></msub><mop>+</mop>
<mfrac><mi>a</mi><mrow><msub><mi>x</mi><mi>n</mi></msub></mrow></mfrac>
<mop>)</mop></math> to calculate <math><msqrt><mi>a</mi></msqrt></math>.
{: #herons}

Use your theorem to prove that recursion via the function <math><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mi>x</mi><mo>+</mo><mi>sin</mi><mo>(</mo><mi>x</mi><mo>)</mo></math> converges to <math><mi>π</mi></math> for any start 
value <math><msub><mi>x</mi><mn>0</mn></msub></math> with <math><mi>π</mi><mo>≤</mo><msub><mi>x</mi><mn>0</mn></msub><mo>&lt;</mo><mn>2</mn><mi>π</mi></math>.

## Suggested exercise [🔗](#cubic){.small} {: #cubic}

Above, there is a [definition of "quadratic convergence"](#quadratic)
that loosely means that after a certain number of digits of the final result
have been calculated correctly, the next iteration step tends to produce twice
as many correct digits.  There are approximation sequences that exhibit
cubic convergence:  The next iteration step tends to produce three times
as many correct digits as the previous one has established.

* Give a formal definition of cubic convergence.
* Show that, given any 
start value <math><msub><mi>x</mi><mn>0</mn></msub></math>
with <math><mi>0</mi><mo>&lt;</mo><msub><mi>x</mi><mn>0</mn></msub><mo>&lt;</mo><mn>2</mn><mi>π</mi></math>,
the sequence defined by <math><msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo>
<msub><mi>x</mi><mn>n</mn></msub><mo>+</mo><mi>sin</mi><msub><mi>x</mi><mn>n</mn></msub></math>
exhibits cubic convergence behaviour, converging towards <math><mi>π</mi></math>.
* Show that, given any <math><mn>a</mn><mo>></mo><mn>0</mn></math>
and any start value <math><msub><mi>x</mi><mn>0</mn></msub></math>
with <math><mi>0</mi><mo>&lt;</mo><msub><mi>x</mi><mn>0</mn></msub><mo>&lt;</mo><msqrt><mfrac><mn>7</mn><mn>3</mn></mfrac><mi>a</mi></msqrt></math>, 
the sequence defined by <math><msub><mi>x</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo>
<mo>(</mo><mo>(</mo><mo>(</mo><mfrac><mn>3</mn><mrow><mn>8</mn><msup><mi>a</mi><mn>2</mn></mfrac><msubsup><mi>x</mi><mi>n</mi><mn>2</mn></msubsup><mo>)</mo><mo>-</mo><mfrac><mn>5</mn><mrow><mn>4</mn><mi>a</mi></mrow></mfrac><mo>)</mo><msubsup><mi>x</mi><mi>n</mi><mn>2</mn></msubsup><mo>+</mo><mfrac><mn>15</mn><mn>8</mn></mfrac><mo>)</mo><msub><mi>x</mi><mi>n</mi></msub>
</math> exhibits cubic convergence behaviour, converging to <math><msqrt><mi>a</mi></msqrt></math>.
-- When all subterms that do not involve <math><msub><mi>x</mi><mi>n</mi></msub></math>
are computed only once beforehand, four multiplications and two additions/subtractions
need to be done in each iteration of the loop.  It depends on circumstances whether this
is faster or slower, compared with the ancient Heron's method
as mentioned [above](#herons). [🔗](#fast_sqrt){.small}
{: #fast_sqrt}

## Author's remark [🔗](#mathmlpain){.small} {: #mathmlpain}

This was also an experiment regarding typing math in via raw mathml.
Result: Can be done, but is not entirely convenient.

I did not bother installing web fonts, but rely on your browser's support
of mathml.  Depending what browser you use, your mileage may vary.

## Discussion opportunity  [🔗](#comments){.small} {: #comments}

If you want to comment or discuss this piece and have a Fediverse
account, feel invited to answer [my pertinent
toot](https://mastodon.radio/@dj3ei/116618396589431125).
