Is there any point to upscaling standard def! MORE.....

Gordon @ Convergent AV

Distinguished Member
AVForums Sponsor
Joined
Jul 13, 2000
Messages
14,611
Reaction score
4,166
Points
3,051
Location
Living in Surrey, covering UK!
This is a continuation of a thread several folk were participating on with regard to whether there is any extra detail to be realised from scaling to higher res panels than the native res of the source.

I will try to dig out the JoeKane article I mentioned but in the meantime I give you this email I was sent today. I have removed the name of the chap who sent it as he never said whether I could name him. He is unknown to me anyway other than this email.

It refers to a couple of images. I am going to upload them to a yahoo album for viewing if possible as they are above limits of this site.


""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Hello Gordon,

As an enthusiast and businessman involved in image quality issues, scaling,
enhancement etc. I thought I'd take the liberty to write this message with
attached images I feel should interest you.

I have been reading a few threads posted on the the AV and AVS forums where
people are under the misunderstanding that it is impossible to generated
higher res real detail in an image than exists in the cptured source frame.

This in not entirely true, in that under curtain circumstances a
considerable amount of extra real detail can be extrapolated from a sequence
of frames and there are algorithms being developed that can genuinely
reconstruct the detail and edges in still frames.

These techniques are beginning to become more commonly used in post
production systems to some extent and real time consumer level systems
aren't too far away either I believe.

I have emailed to you a source (under-sampled/aliased) image and processed
image example to illustrate what is being achieved. These are genuine
images from prototype research systems. I don't know how to post these
images to the forum so if you wish to do so then please do, as I am sure
there will be some other members interested to know there is more to image
scaling and enhancement than they may have realised.

Many thanks.
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

I am just the messanger here guys so lets have an interesting discussion.


Gordon
 
Processed
 

Attachments

  • s processed.jpg
    s processed.jpg
    31.3 KB · Views: 314
Initialy this doesn't look too dissimilar to the images in the original AVS post (which was simmulated), so I wonder what are the differences we are seeing here?

As I mentioned in the original thread, the Radeon apparently has a similar scaling ability built in specifically for DVD, so I wonder just how good that is in comparison to the latest algorithms being developed.

Gary.
 
A pixel is a pixel its the smallest unit of sampling in your image if its smaller than a pixel its not represented in your image. You can use differing scaling algorithms to produce better results on up (and downsizing) basically the better the algorithms the less artifacts you get as a result of the scale. ie less aliasing less distortion on edge transitions that sort of thing: but the point is you can't produce extra resolution that isn't in the image to start with all you can do is try to hide the most obvious effects of the lower than optimum res at image sizes that disclose it.

Anything else is a bodge possibly consisting of techniques that are more akin to adding high frequency noise on abrupt pixel value transitions: ie sharpening techniques and or in combination with edge detection and edge processing algorythms.

Then you could run some motion based techniques that will allow differing algorithms to be applied to differing behaviour in the image in an attempt to optomise whatever techniques you are using for changes in the material over time . (Dscaler uses similar detection routines I suspect )

Then you get into pattern recognition and motion vector analysis which is still not going to help you all that much when it comes to reconstructing images beyond telling you where the fg differentiation is relative to the bg and thats easily confused by camera and or object moves. Then fractal reitteration which is well loopy but only works for ceratin things.

In a word No you cannot create resolution that isn't there all you can do is mess about with the image. If you could do it with any level of accuracy you would be able to say take a photograph of a room with a piece of newsprint in it which is so out of focus you can't even see the lines of newsprint let alone any letters and run it through some process and bingo be able to read the newsprint and more importantly actually have what was written on the newsprint and not just some bodged together artifact that you reckon looks like text. You cannot do this and even this is easier than dealing with variable moving images ( you could use a bit of pattern recognition to hunt for letter shapes to harden things up a bit for example).

There is no magic there are only pixels and how they are remapped.
 
Very interesting Mr D, thanks.

Wouldn't it be easier to just do adjacent pixel approximation?

If you're inserting a line between two others, then wouldn't it be a relatively simple operation to etrapolate what would be inbetween those two lines rather than to try and work out what motion etc was actualy going on in the film?

You can tell I'm a total novice can't you? ;)

Gary.
 
Keith,

I hear what you are saying. It's not real detail it's unreal detail we are seeing in the above images. However, there is no denying that it is more pleasing to look at the unreal one. Of course, if you stand miles away they'll look the same eventually, which I suppose birngs us back to resolution perception and distance.


Gordon
 
Gordon looking at your example images I would say they are bogus.

If you look at the texture detail in the wool on some of the square details in the scarf there is no way anything on the planet on its own with just that image as input would be capable of interpolating that sort of detail. I suspect it would even defy the laws of normal physics.

This is another "simulated" comparisson.
 
Originally posted by Gary Lightfoot
Very interesting Mr D, thanks.

Wouldn't it be easier to just do adjacent pixel approximation?

If you're inserting a line between two others, then wouldn't it be a relatively simple operation to etrapolate what would be inbetween those two lines rather than to try and work out what motion etc was actualy going on in the film?

You can tell I'm a total novice can't you? ;)

Gary.

yes I can't remember exactly what one it is but you've just outlined a bilinear filtering process. ( I think it might even be a nearest )However it would likely show aliasing. Bicubic is what you get with the radeons I think... and its a pretty standard filter these days ( i get lost when people start talking about 8-tap ) Even then its not about what filter has the most clout but what one is best for the job ( upsizing , down sizing , whole pixel , subpixel mapping)

The motion vector analysis would enable you/it/whatever to segment the image with greater accuracy allowing you vary the type of scaling technique on different parts of the image according to how its moving but its overkill to be honest. I was attempting to demystify scaling a little by showing there is only so much you can do even when you decide to get all hi tec.

The over-riding thing about scaling any image is its resolution thats what ultimately sets the upper size limit of acceptable image quality. You could say scaling is almost more about hiding resolution limits rather than increasing it.
 
I've missed this conversation so far.....

It is interesting..... i thought i was in total agreement with keith, but im thinking of counter agruements.

On one hand, you cannot extract more information than exists...... fundamental law really. But, surely if your processing system manages to impart the same information as the encoding system lost, then you can infact get an improved image. But this wouldnt be adding extra detail, this would effectively be adding controlled noise.

ad
 
Originally posted by Mr.D
Gordon looking at your example images I would say they are bogus.

Edit....

This is another "simulated" comparisson.

First of all, those images are not bogus. However, in my haste in sending them to Gordon I didn't provide any good explanation as to my point in presenting them or more importantly explaining how relevant they are to this forum's memberships' interest, which are of course all things video and such. I apologise for my omission.

Anyway, to (hopefully) set things straight; the systems used to generate the higher res image from the unprocessed sample are not designed for use on typical video sequences and would in fact produce such good results for only a small portion of any video program. After hearing and reading many times people having difficulty appreciating the nature of subpixel resolution and subpixel motion in video I wanted to show that there is the possibility under certain circumstances to generate real detail in an upscaled image; of which this is but one example.

The processing system used for the images Gordon posted actually used pixel information integrated at subpixel accuracy from 10 video frames to create that one upscaled frame. The unprocessed frame is of course scaled up using the simplest of algorithms purely to match the two in size.

Put simply (I think!), the system attempts to 'undo' the aliasing in the source where there is sub-pixel motion over a sequence of frames and the source doesn't truly obey the Nyquist criterion for sampling.
The amount of noise in the image and nature of the motion affects how many frames need to be integrated for a particular upscaling. The nature of the source capturing components also affects the limits of the scaling capabilities in this system also.

The system works along similar lines to our own human visual system, which also creates supersampled images, integrated over time due to the constant sub-sampling of our field of view through involuntary eye movement.

I don't know if I have made my point here at all or been at all helpful. However, in my research I have spoken with many people in many companies and institutions and read (far too) many papers and it is fantastic just how much success there has been in the field of image processing. Particularly in creating systems that can estimate and track true motion in images, recognise objects, extrapolate missing information and track objects in 3D. There are many applications for such tools, perhaps the lesser being to process consumer level video, but nonetheless it seems to me that some companies are actively researching the possibility of applying these types of systems to consumer level devices.
 
Originally posted by Mr.D


If you look at the texture detail in the wool on some of the square details in the scarf there is no way anything on the planet on its own with just that image as input would be capable of interpolating that sort of detail. I suspect it would even defy the laws of normal physics.


So effectively over 10 frames ( not one single image) on frames when the extra detail is visible it pulls detail off the frames (segments the image) in question scales them warps them and tracks them into the required area ?

This wouldn't be a zoom or a push in by any chance ? (which would be a piece of cake situation for this sort of software)

You know I've evaluated systems like this before for automating certain compositing situations (wire removals , clean plate generation , motion based keying). Sometimes it works ok but normally its on situations where an average compositor could do the job more reliably (ie nodal camera moves that disclose all the necessary detail to provide the fill in) on more complex images with lots of parrallax and rotations and lens distortion its not very effective. I don't think this technology would be of any use in a realtime scaling situation you'd get as many artifacts as benefits. And like I said it would do zip on a single image.

I'd also be interested in seeing what the start frames and end frames of the 10 frame sequence were like at native res : 720x576 compared with the processed image at the same res. ( not some custom res dictated by the chunks comped together from the rest of the frame range).

You say as much yourself that this isn't really technology for the scaling up of video res sequences but rather the collation of one master image thats a higher res than the sum of its parts (stitcher) so I apologise if you feel you have been dragged into a conversation about something that isn't really what your software is for. No intention to rubbish your software just trying to demystify some of the issues to do with scaling in the context that most of us come across it.
 
Keith,

Excellent post. You are making it clearer to me what we are seeing in these images. Also made it clearer to me how wire removal etc is done in complex panning sequences. This is very interesting.

I guess that this is an example of image enhancement software. Something for examle that would be developed to try to read numberplates on cars from those crappy camera's we see on street corners. I guess that with software like this that, even if you couldn't make out the numbers in each frame, by comparing specific pixels in each frame you could make possibly recreate a more accurate approximation of what detail there was, thereby maybe making the number plate readeable. Is this right? If it is are you saying that there are simple techniques for achieving this? If there are is it the sort of thing that could be done in realtime? (I suspect this is the sort of thing Xantus are trying to achieve with Terranex)

Gordon
 
Forget real-time for now, how about software on your PC that would churn away on the content of a DVD for, say, 24 hours to produce something that a Terranex would generate in real time but without the £30,000 price tag.

Allan
 
Just though I chime in with some comments on what is and isn't possible to do with scaling. Apologies if you all know all this already but it seems that sometimes it isn't explained very well so I'll have a go.

The numbers associated with each pixel on the sampling grid are not best represented by squares of the same size as the sampling grid. So for DVD the best image you can get out of an NTSC DVD is not an image that is 720x480 squares where each square can only be one colour. the best image is composed of a 2d wave like surface where the highest frequency is a wavelength of 360 horizontally or 240 vertically.

The problem with just using 720x480 pixels to display the image is that you introduce high frequency information that isn't supposed to be there. The transitions between the center point of each point on the sampling grid is supposed to be smooth not a step change.

No display device can reproduce this "perfect" image correctly. Film comes closest. A line based device like CRT can come close in a horizontal direction by using filters to limit the maximum horizontal frequency but to get a better represenation in the vertical plane more lines are required.

For pixel devices you need more pixels to avoid the problem of having too much unwanted high frequency information in the result but no device goes high enough in pixel terms to really do that and most will introduce more artefacts than you fix.

The key point about the samples on a DVD is that they are point samples and not area samples. In exactly the same way as the samples on a CD are also point samples and not best represented by square noise pixels.

John
 
Humour me a moment, are we talking about reproducing an image that is superior to the source?

In which case, surely we dont need new formats encoded at higher resolution, surely we could just use an encreased capacity system utilising a higher transfer rate, thus allowing every image to be duplicated and then the decoder at the far end simply interpolating out the addional info? come to this of it that might be a slightly long winded way of doing it.....

ad
 
My comments on the images :

I'm still very doubtful of the genuity of the two images.

The first image has no eyelashes, yet in the second they're well defined.

Only way to prove this is for some trustworthy person to aquire an image with plenty of details, downsize it to a specified resolution, make that downsized image available to anyone wanting to play and then compare the processed results with the original high res image.

Even better, generate a second high res image offset by a few pixels to simulate movement, downsize that one as well, and split both the downsized images into even and odd field line frames, producing 4 downsized interlaced frames from the original two high res pictures.

Anyone want to play? :)


Richard
 
If say the 10 frames consist of a zoom out from the girl you'll effectively be able to remap the visible detail in the closer in frames and translate it into a bigger image of greater resolution but you'll still be limited to the max resolution available form the closer in image.

John makes some interesting comments: its a bit of a compromise whatever way you go: stay native with a 1:1 pixel match to a video res panel and be limited to a generally smaller image size, scale to a larger than video res panel and introduce scaling artifacts but have less pixel structure for a bigger sized image.

I guess what we need is fuzzy pixels: maybe panasonic have the right idea with the ae300.

Really what we need is HD or better.
 
Its all very well being able to interpolate details from progressive information from 10 frames or so, but it'd still give you the problem that when a scene change occurs the image would initially appear fuzzy and gain clarity over 0.5 seconds. The same as items move into scene and are uncovered by moving objects.

It'd be like watching a very compressed MPEG-1 movie, just on a finer scale.


Richard
 
I think its already been mentioned that reconstructing resolution in this way from a series of iamges is of limited practical use from the point of view of exhibiting video material in the way that most of us will encounter it.
 
Originally posted by fraggle
My comments on the images :

I'm still very doubtful of the genuity of the two images.

The first image has no eyelashes, yet in the second they're well defined.

Richard

Richard, the images are genuine but much of the detail you see in the processed image is not available in the original, it has been derived from a series of images (10 progressive frames in this case).

I made a mistake in allowing those images to be posted without providing a thorough explanation to go with them; I tried to make amends further up this thread.

The system used to process the image is very limited in its application in its present form, (which is only a prototype research project), but my idea was to show that under certain circumstances it is possible to pull more real (or very close to real) detail from images. In this case where a particular set of criterion are satisfied the results are as per the image posted by Gordon.

No single algorithm I know of can upscale and enhance video or film whereby detail can be recovered of simulated very accurately on its own. This challenging process requires a set of algorithms designed to cope with so very many different features of images, moving and otherwise.

It is important for the resulting processed images to be consistent in quality and processes where they fail or reach their limit must fallback gracefully to other processing systems or at least try and mask their inadequacies.

There are algorithms available and many in development that can upscale and enhance video based on what they can reliably know about a pixel or group of pixels, such as its motion, transparency, background or foreground, type of occlusion, whether its and edge or a texture etc. By knowing this sort of information with a high degree of reliability and consistency will lead to upconverted images with high degrees of additional detail, or at least the useful simulation of added detail with sharp consistent quality.

I think many on this forum have heard of Teranex and their high performance upconverters. Teranex use a selection of sophisticated algorithms for pixel analysis over several frames/fields and an amazing amount of processing power to achieve near high definition results.

Apologies again for misleading people by asking Gordon to post those two images slightly out of context.
 
This topic seems to me linked to another discusion going on in the Plasma forum, which is of concern to me.

I think the simple question people would like answering is, I have Denon 3800 progressive scan DVD, most of my discs are R2, so am I better off choosing a standard definition plasma and have my images down scaled or a high definition and have everything upscaled?

I feel that the later should be the better but, most seem to think the other way, any comments from you learned gents?

cheers
 
I would say at normal plasma sizes ie 42" you'd be better off staying with a native res panel if all you are feeding it is SD material. At those sort of sizes all scaling will do is soften the image and introduce artifacts.

comments ?
 

The latest video from AVForums

TV Buying Guide - Which TV Is Best For You?
Subscribe to our YouTube channel
Back
Top Bottom