Outline of issues
Timings related to division and fractions are more interesting than the other arithmetic operations, and they will get a little more attention.
Some boundaries between performance jumps have been mentioned; others may be obvious:
- packed versus unpacked arrays:
- machine integers versus bigints ($|n| > 2^{63}$ or $n = -2^{63}$).
- vectorized versus unvectorized operations (on packed arrays).
- one operation in
Divide[a,b] versus two operations in a/b.
Of course the timings on bigints grow with their size. The bigints used in the timings below are small, around 2^64, to give an idea of the initial jump between them and machine integers.
Power[]
Power[] is the main reason a/b = Times[a, Power[b, -1]] is so much slower on machine-integer arrays than Divide[a, b]. The table below gives an overview of the performance of Power[] on large arrays.
There is a step down in speed when the output is not packable. Such is the case for Power[b, -1] and other negative exponents, and it is quite slow. There is another step down in speed of Power[] when the integers in the output are bigints. The cases when power is 0 or 1 tend to be special cases that are not particularly important. A surprise is that raising a bigint array to a power is faster than raising a packed array to a power when the result contains quite a few bigints (5,000 or more in an array of length 100,000).
Overview of Power[]:
| input |
array |
power |
output |
array |
timing (s) |
| mach. int |
packed |
> 1 |
machine |
packed |
$5.5\times 10^{-5}$ |
|
unpacked |
≥ 1 |
machine |
packed* |
$3.3\times 10^{-4}$ |
|
any |
≥ 1 |
bigint |
unpacked |
$6.3\times 10^{-1}$ |
|
any |
= -1 |
machine |
unpacked |
$3.3\times 10^{-1}$ |
|
any |
< -1 |
machine |
unpacked |
$3.4\times 10^{-1}$ |
|
any |
< -1 |
bigint |
unpacked |
$6.5\times 10^{-1}$ |
| bigint |
unpacked |
= -1 |
bigint |
unpacked |
$7.9\times 10^{-3}$ |
|
unpacked |
other |
bigint |
unpacked |
$1.{-}\,4.\times 10^{-2}$ |
| mach. int |
singleton |
any |
bigint |
singleton |
$4.1\times 10^{-6}$ |
| bigint |
singleton |
any |
bigint |
singleton |
$2.1\times 10^{-6}$ |
Table 1: Power[] on integer arrays. Other than the last two rows, which time a single integer, timings were on arrays of length 10^5 on a MacBook Pro M4 Max.
When the length is ≥ 251 and the power is nonnegative, a machine-int array will be packed, and a vectorized Power[] will be used.
By "machine" output for negative powers, I mean the integers in Rational[n, d] are machine integers.
When the output is unpacked, the effect of packed vs. unpacked input is negligible compared to the variability in timings.
The timing on machine input to bigint output is roughly proportional to how many bigints are produced.
Comparison with other arithmetic operations:
| f |
packed |
unpacked |
big,small |
big,big |
| Plus |
$1.2434\times 10^{-5}$ |
$2.5219\times 10^{-4}$ |
$\bf 1.0446\times 10^{-2}$ |
$\bf 1.3225\times 10^{-2}$ |
| Times |
$1.6838\times 10^{-5}$ |
$2.5162\times 10^{-4}$ |
$\bf 9.5097\times 10^{-3}$ |
$\bf 1.1018\times 10^{-2}$ |
| Divide |
$\bf 1.0785\times 10^{-2}$ |
$\bf 1.0838\times 10^{-2}$ |
$\bf 1.3317\times 10^{-2}$ |
$\bf 2.5694\times 10^{-2}$ |
| Quotient |
$4.8694\times 10^{-5}$ |
$3.2230\times 10^{-4}$ |
$\bf 1.3558\times 10^{-2}$ |
$\bf 1.8172\times 10^{-2}$ |
| Mod |
$3.4405\times 10^{-5}$ |
$3.0939\times 10^{-4}$ |
$\bf 9.5853\times 10^{-3}$ |
$\bf 1.9692\times 10^{-2}$ |
| Power |
$1.0667\times 10^{-4}$ |
$3.6628\times 10^{-4}$ |
$\bf 3.0156\times 10^{-2}$ |
- |
Table 2. Comparison of arithmetic operations f[a, b].
Timings were on arrays of length 10^5 on a MacBook Pro M4 Max.
The "small" array is an array of small machine integers, since otherwise Power[a, b] would lead to Overflow[];
for the same there is no timing for when both are bigints.
Bold timings indicate the output was not packed.
Vectorized operations
We can see from the overview of Power[a, k] that it is vectorized on a large arrays a of machine integers with nonnegative powers k, provided no integer overflow that falls back on bigint computations.
Here is a summary of vectorized division-related operations, provided input and output are machine integers:
- Vectorized:
Quotient[], Mod[].
- Not vectorized:
Divide[], QuotientRemainder[], Power[] with negative powers, Internal`Reciprocal[].
- Like
Power[] with nonnegative power, Quotient[] and Mod[] pack array inputs of length $\ge 251$ as do other vectorizable basic arithmetic operations like Plus[], Times[], and Subtract[]. The length is a system parameter controlled by the option "ListableAutoPackLength" in SystemOptions["PackedArrayOptions"].
- For large arrays,
Transpose[{Quotient[a, b], Mod[a, b]}] is much faster than the equivalent QuotientRemainder[a, b].
Division
Famously, one cannot divide most pairs of integers and get an integer result.
Hence Quotient[] and Mod[] are the performant machine-integer operations for division-related computation.
Fractions involve other considerations.
A significant computational expense in computing a/b by whatever code is the reduction of the fraction to least terms.
Such computations are not vectorized.
Another consideration is that an array of fractions is not represented as an efficient packed array.
| f |
packed |
unpacked |
bigint |
| Divide |
$1.4557\times 10^{-2}$ |
$1.3384\times 10^{-2}$ |
$3.6033\times 10^{-2}$ |
| slash |
$2.5502\times 10^{-1}$ |
$2.5825\times 10^{-1}$ |
$4.5660\times 10^{-2}$ |
| reciprocal |
$1.5873\times 10^{-2}$ |
$1.6426\times 10^{-2}$ |
$4.3957\times 10^{-2}$ |
| mapRational |
$1.3904\times 10^{-2}$ |
$1.2133\times 10^{-2}$ |
$3.3932\times 10^{-2}$ |
| threadRational |
$7.4402\times 10^{-3}$ |
$8.3068\times 10^{-3}$ |
$6.1329\times 10^{-3}$ |
Table 3. Timings of integer-to-rational division.
Timings were on arrays of length 10^5 on a MacBook Pro M4 Max.
Here are the codes for the composite operations:
slash[a_, b_] := a/b;
reciprocal[a_, b_] := a*Internal`Reciprocal[b];
mapRational[a_, b_] := MapThread[Rational, {a, b}];
threadRational[a_, b_] := Block[{Rational}, Thread[Rational[a, b]]];
The arithmetic of fractions
There are fewer cases to consider with inputs that are fractions.
The main difference in speed is whether the terms of the fractions are machine integers or bigints.
On an array of length $10^5$ of machine-integer fractions, the code a/b and a * Internal`Reciprocal[b] take about the same amount of time (0.033±0.02s); Divide[a, b] is somewhat faster (0.022±0.03s). Plus[a, b] and Times[a, b] take the same time as Divide[a, b]. The timings on fractions with bigint terms are about three to four times as long and grow as the bigints get bigger; the time for Plus seems to grow more slowly.
Sample codes
Here are examples of the arrays used:
(* arrays *)
packed = RandomInteger[{1, 1000}, 100000];
unpacked = Developer`FromPackedArray@packed;
small = RandomInteger[{2,6}, 100000];
bigint = RandomInteger[{2^63, 2^64}, 100000];
bigger = RandomInteger[{2^123, 2^124}, 100000];
fraction = bigger/bigint;
Timings were done with either RepeatedTiming[f[a,b];] or on longer running trials with just AbsoluteTiming[f[a,b];], with different arrays for a and b.