Surprised that there have been zero comments.
This article shows work that follows on from research by Toole et.al. The main point being that when given tone/EQ controls, (Toole's tests already compensated for speaker volume differences) a majority of speaker test listeners tended to make similar tone changes, and not ones that accentuated extreme tonal balance (adding/subtracting a ton of bass or highs). Read Toole's book/articles for details.
I have spent some time using iTunes' built-in EQ to adjust (crudely) for differences (volume and EQ) between commercial recordings. I used to reluctantly use the iTunes Soundcheck "automatic" volume leveling/compression option, but the volume still varied a lot between local iTunes music files created from different download/CD/streaming sources.
I primarily listen in Shuffle mode, so the editing/mastering volume/tone inconsistencies between recordings become really apparent. In the digital age, the "Circle of Confusion" is a now a significant issue for serious music listeners, unlike the olden days when we would put on a vinyl album or CD and adjust volume/tone before plonking down in the sweet spot.
It would be REALLY nice if the music recording/reproduction industry would settle on and implement some standards that curb the worst excesses in recording/editing/mastering that contribute to the Circle of Confusion. Toole notes that the movie industry was forced by surround sound technology being implemented in public theatres and TV/home settings to attempt to address this issue, but apparently the music industry has not done much for stereo listeners.
I get that musicians will always push sonic boundaries, but there is no reason for patently extreme examples to get past the recording/editing/mastering engineers. The bass in You Should See Me In A Crown is WAY over the top, and I like a lot of bass. Especially if it shuffles in before/after something like Breakfast In America.
I'm not advocating automated real-time compensation as the article suggests listeners may want, that's a bit of a cop-out from the industry not willing to set up a certification/monitoring framework for music recording/reproduction. I'm not a purist by any stretch, but I don't want some (another?) algorithm deciding what the musicians/engineers intended. "Hot" recording/mastering has been known reproduction problem since forever, but was nearly impossible to quantify, let alone control in full-analog. But now we have hifi digital, and that excuse no longer holds any water.
Imagine how much easier selecting/setting-up new gear would be if digital source material volume/tonal balance was reasonably standardized, instead of the "anything goes" situation we face now.