Greyscale, you might be doing it wrong

Posted by Michał ‘mina86’ Nazarewicz on 7th of February 2021

While working on ansi_colours crate I’ve learned about colour spaces more than I’ve ever thought I would. One of the things were intricacies of greyscale. Or rather not greyscale itself but conversion from sRGB. ‘How hard can it be?’ one might ask following it up with a helpful suggestion to, ‘just sum all components and divide by three!’

Taking an arithmetic mean of red, green and blue coordinates is an often mentioned method. Inaccuracy of the method is usually acknowledged and justified by its simplicity and speed. That’s a fair trade-off except that equally simple and fast algorithms which are noticeably more accurate exist. One such method it is built on an observation that green contributes the most to the perceived brightens of a colour. The formula is (r + 2g + b) / 4 and it increases accuracy of the calculation by taking green channel twice and speed by changing division operation into a bit shift. But that’s not all. Even if speed is an important factor, there are better algorithms.

TL;DR

fn grey_from_rgb_avg_32bit(r: u8, g: u8, b: u8) -> u8 {
    let y = 3567664 * r as u32 + 11998547 * g as u32 + 1211005 * b as u32;
    ((y + (1 << 23)) >> 24) as u8
}

The above implements the best algorithm for converting sRGB into greyscale if speed and simplicity is main concern. It does not involve gamma thus forgoes most complicated and time-consuming arithmetic. It’s much more precise and as fast as arithmetic mean.

That’s hardly the end of the story though as slightly more complicated methods can increase accuracy much more.

Accuracy measurement

Phrases such as ‘increases accuracy’ prompt a question about measurements used to make those claims. When it comes to colourimetry there is an obvious answer: the ΔE*₀₀ formula (simply referred to as ΔE herein). It builds on the L*a*b* colour space in which L* denotes luminosity and shades of grey have a* and b* coordinates equal to zero. Therefore, converting any colour into L*a*b* and then zeroing its a* and b* components turns it into shade of grey. Combining that with the ΔE distance yields the following measurement of accuracy:

// (r, g, b): input sRGB colour; y: calculated shade of grey in sRGB space
// result: CIEDE2000 distance between actual shade of grey and the provided one
fn distance(r: u8, g: u8, b: u8, y: u8) -> f32 {
    let colour_in_lab = lab::Lab::from_rgb(&[r, g, b]);
    let perfect_grey = lab::Lab { l: colour_in_lab.l, a: 0.0, b: 0.0 };
    let approximated_grey = lab::Lab::from_rgb(&[y, y, y]);
    delta_e::DE2000::new(perfect_grey, approximated_grey)
}

Using this measurement shows that (r + 2g + b) / 4 formula is nearly 40% more accurate than the naïve arithmetic mean. (Benchmarks and graphs are included at the end of the article). But we can do so much better.

Perfect conversion

The perfect conversion algorithm would be to move to L*a*b* colour space, set a* and b* to zero and go back to sRGB. This can be optimised significatly by recognising that L* component depends on the Y coordinate of the XYZ colour space only. Going all the way isn’t necessary and it’s enough to calculate the Y component (without bothering with X and Z) and apply gamma correction to it to get the desired shade of grey. This method is implemented by the following code:

fn grey_from_rgb_exact(r: u8, g: u8, b: u8) -> u8 {
    fn to_linear(v: u8) -> f32 {
        if v <= 10 {
            v as f32 * (1.0 / (12.92 * 255.0))
        } else {
            ((v as f32 + 0.055 * 255.0) * (1.0 / (1.055 * 255.0))).powf(2.4)
        }
    }

    fn from_linear(v: f32) -> u8 {
        (if v <= 0.0031308 {
            (12.92 * 255.0) * v
        } else {
            v.powf(1.0 / 2.4) * (1.055 * 255.0) - (0.055 * 255.0)
        } + 0.5) as u8
    }

    let y = 0.21264934272065283 * to_linear(r) +
            0.7151691357059038  * to_linear(g) +
            0.07218152157344333 * to_linear(b);
    from_linear(y)
}

Coefficients at the end come from the sRGB→XYZ conversion matrix (specifically it’s the second row) whereas from_linear and to_linear functions implement sRGB gamma correction.

While completely accurate, this function is roughly 30 times slower than the other two considered so far. If absolute precision is necessary, this algorithm is the one to use. On the other hand, sacrificing accuracy leads to significant spead-ups and may be worth considering.

Simplified sRGB

One possible time save is to simplify sRGB gamma formula. Common approximation is to assume gamma equal to 2.2. This streamlines to_linear and from_linear functions by removing a conditional but in doing so eliminates the cheaper of the two code paths. As a result, it makes the conversion slower rather than faster. Not all is lost though since this approach can be pushed some more by using gamma equal to 2:

fn grey_from_rgb_gamma2(r: u8, g: u8, b: u8) -> u8 {
    ((r as f32 * r as f32 * 0.21264934272065283 +
      g as f32 * g as f32 * 0.7151691357059038  +
      b as f32 * b as f32 * 0.07218152157344333).sqrt() + 0.5) as u8
}

Hardly anyone would call it sRGB, but it speeds up the algorithm by over 80%. Furthermore, despite being rather crude, handles 90% of the colours pretty well requiring ‘close inspection’ to see a difference between correct and calculated grey. (Again, all benchmarks at the end).

Integer square root

Using the γ = 2 approximation has additional benefit for platforms without a fast floating point unit. With a minor degradation in accuracy, the algorithm can be rewritten to consist entirely of integer arithmetic. While the trade-off may not be worth it on modern desktop machines which perform square root operation as fast as division but less powerful systems may still benefit from this approach:

fn grey_from_rgb_gamma2_isqrt(r: u8, g: u8, b: u8) -> u8 {
    fn isqrt(n: u32) -> u8 {
        let mut n = n;
        let mut x = 0;
        let mut b = 1 << 16;

        while b > n {
            b = b / 4;
        }

        while b != 0 {
            if n >= x + b {
                n = n - x - b;
                x = x / 2 + b;
            } else {
                x = x / 2;
            }
            b = b / 4;
        }
        x as u8
    }

    isqrt((r as u32 * r as u32 * 13936 +
           g as u32 * g as u32 * 46869 +
           b as u32 * b as u32 *  4731) >> 16)
}

More accurate weights

Pushing the idea of simplifying gamma correction further eventually leads to the point where conversion to and from linear values is skipped entirely. The result is a weighted average just as discussed at the beginning. This time, however, knowing what coefficients to use allows for a much better choice of weights:

fn grey_from_rgb_avg_float(r: u8, g: u8, b: u8) -> u8 {
    (r as f32 * 0.21264934272065283 +
     g as f32 * 0.7151691357059038  +
     b as f32 * 0.07218152157344333 + 0.5) as u8
}

This can be speed up by once again turning to integer arithmetic. There are many possible sets of integer weights but really the only two worth mentioning are (54, 183, 19) and (3567454, 11998779, 1210983). The former makes the calculations fit 16-bit integers while the later 32-bit ones.

fn grey_from_rgb_avg_16bit(r: u8, g: u8, b: u8) -> u8 {
    let y = 54 * r as u16 + 183 * g as u16 + 19 * b as u16
    ((y + (1 << 7)) >> 8) as u8
}

fn grey_from_rgb_avg_32bit(r: u8, g: u8, b: u8) -> u8 {
    let y = 3567454 * r as u32 + 11998779 * g as u32 + 1210983 * b as u32;
    ((y + (1 << 23)) >> 24) as u8

}

Benchmarks

Having described the algorithms it’s time to quantify all the claims about their accuracy and speed. Source code of the functions and benchmarks is available within ansi_colours module. Let’s first quickly recap all the described methods that are about to be evaluated:

Average
The simple arithmetic average of red, green and blue components.
2×Green
Weighted average where red and blue components have weight one while green two.
16-bit weights (grey_from_rgb_avg_16bit)
32-bit weights (grey_from_rgb_avg_32bit)
Weighted average with integer weight limited to 16-bit or 32-bit quantities respectively.
Float weights (grey_from_rgb_avg_float)
Weighted average with exact floating point weights.
γ=2 (no fpu) (grey_from_rgb_gamma2_isqrt)
Approximation of sRGB gamma as γ = 2 without any floating point arithmetic operations.
γ=2.0 (grey_from_rgb_gamma2)
γ=2.2 (Simplified sRGB)
Approximation of sRGB gamma as γ = 2.0 or γ = 2.2 respectively.
Exact (grey_from_rgb_exact)
The exact method for conversion to grey doing proper gamma correction.
" ΔE = 1 ΔE = 2 ΔE = 3 ΔE = 5 ΔE = 10 ΔE = 25

Time and delta

When evaluating the methods two metrics were taken into account. First one was time to convert all 16 million colours into greyscale. This metric is quite noisy and shouldn’t be take literally. Still it gives a sense of scale to compare classes of algorithms. The measurement has been performed on a relatively modern and capable CPU, an i7-7600U processor at 2.8 GHz. The result will vary widely on different architectures so as usual it’s important to make one’s own benchmarks to match one’s use case.

The second metric is ΔE. As previously described, for each colour it’s the distance between it’s true grey version (as calculated by going through L*a*b* colour space) and grey produced by the algorithm being evaluated. The common guide in interpreting the metric is that ΔE of less than one is not perceptible by human eye, distance between one and two requires close inspection, anything higher is noticeable at a glance and anything below 50 indicates colours are more similar than dissimilar. When evaluating the algorithms all 24-bit colours were tested and average, maximum and histogram of delta values were collected.

Table below show the time as well as average and maximum ΔE values produced by each algorithm. The exact method is provided for reference to compare time.

Time [ms]Average ΔEMax ΔE
Average298.744.5
2×Green295.526.6
16-bit weights314.128.1
32-bit weights314.128.2
Float weights994.128.2
γ=2 (no fpu)1600.84.3
γ=2.01810.74.0
γ=2.210330.22.1
Exact993

The chart represents the same information showing maximum and average ΔE as bars with scale on the left side and time as points on a line with scale on the right side.

Average2×Green16-bit weights32-bit weightsFloat weightsγ=2 (no fpu)γ=2.0γ=2.2Exact 0 5 10 15 20 25 30 35 40 45 50 0 200 400 600 800 1000 1200 Max ΔE Average ΔE Time [ms]

Histogram

Lastly, table and chart below shows a histogram of ΔE values. It’s important to note that ranges of delta values used in buckets are not the all the same size. Furthermore, the table represents a cumulative histogram. Raw data with equally-sized buckets is available as a CSV file. The benchmark produces that data as well and can be further modified to perform more detailed statistics.

% ΔE < 1% ΔE < 2% ΔE < 5% ΔE < 10% ΔE < 30
Average7.3%14.5%36.1%67.1%92.2%
2×Green13.3%26.4%57.3%82.7%100.0%
16-bit weights23.1%41.0%71.2%89.9%100.0%
32-bit weights23.3%41.3%71.0%89.9%100.0%
Float weights23.3%41.3%71.0%89.9%100.0%
γ=2 (no fpu)74.8%90.6%100.0%
γ=2.079.4%92.3%100.0%
γ=2.299.7%100.0%
γ=2.2 γ=2.0 γ=2 (no fpu) Float weights 32-bit weights 16-bit weights 2×Green Average 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ΔE < 1 1 ≤ ΔE < 2 2 ≤ ΔE < 5 5 ≤ ΔE < 10 10 ≤ ΔE < 30 30 < ΔE

Summary

As seen from the analysis the naïve method of calculating arithmetic mean of red, green and blue components to get a shade of grey doesn’t have much going for it. Even if a quick, dirty and easy to remember method is needed, it’s much better to use (r + 2g + b) / 4 formula but even that is best avoided. Of course the choice of method depends on application and trade-offs one is willing to make.

If accuracy of the conversion is paramount and cannot be sacrificed there’s only one possible algorithm. The exact method which performs proper sRGB gamma correction and calculates the Y coordinate of the colour.

If speed is an important factor to consider the γ = 2.0 approximation can be used. It is over 80% faster than the exact method and even the variant using integer arithmetic produces ‘indistinguishable’ result for nearly 75% of the colours.

If accuracy isn’t that important, gamma correction step can be skipped resulting in a weighted average. The best choice here appears to be either one of 32-bit or 16-bit weights. Small, 16-bit platforms might benefit from the former but other than that there’s really no reason to shy away from the 32-bit variant.

No other methods are worth bothering with. Float weights give virtually the same results as 32-bit weights but are considerably slower. Similarly, the γ = 2.2 approximation is slower than doing gamma correction correctly so it’s better to do the precise calculation. And lastly, plain average and average which takes green twice makes sense only on the most constraint platforms (think 8-bit processor with kilobytes of memory).

Afterword

One final remark is that doing the ‘technically correct’ luminosity calculation isn’t the only way to go about converting images to greyscale. Another way to look at the problem is by designing a ‘contrast preserving’ algorithm. For a person with normal vision, colours of the same luminosity are easy to distinguish so long as their chromacity is sufficiently different. For example, each colour in the sequence on the right above this paragraph is easy to pick out but the formulæ described here will map all of them to the same shade of grey. Especially in computer vision this may not be the desired result, but it does represent what a black and white film would capture.

Update (March 2021): The article has been updated with more precise value for the D65 standard illuminant. This affected the coefficients used in the code. However, the adjustment was only slight so there were no changes to the benchmark results.