Someone must be doing this for animation, right? I want to watch all the animated classics at full frame rate. Purists may gasp but I think that it won't be long before these models can do a lot better than mere interpolation, preserving the original feel of sequences despite the increased frame rate and avoiding the "soap opera" effect.
As someone who already interpolates media with SVP4, I remain unconvinced by those arguments. My media looks amazing running at 120 hz, I don't get headaches by jarring panning shots that look closer to a slideshow than a movie.
I recently watched Avatar 2 in IMAX 3D with 48 FPS, and it looked incredible. If only all media in the future could let go of the arbitrary 24 frame limit, which was made a hundred years ago because filmmakers wanted to conserve their film usage.
Avatar 2 was shot for 48 fps. It’s very different to interpolating animation to a higher frame rate.
Most animation is shot for 24 fps on twos, so 12ish frames (we usually go between 8fps and 24fps)
The difference is that we design poses to read at a given frame rate and have performance characteristics at those rates. If you were to interpolate it, you’d change a fundamental characteristic of the acting.
How long a ball makes contact with a surface is important and interpolation struggles to understand something that should stick to a surface for an extra frame versus something that should be blended.
Yes you’ll potentially get lower strobing but you’re also now getting a different performance. Maybe you don’t care, but it’s fundamentally changing things away from the intention.
This is a different discussion than just simply what frame rate video is shot at.
Yeah you're right, I don't actually care about the difference in performance, I can't tell it apart honestly speaking, I just notice the extra fluidity which I'll continue to use.
That’s fair, and that’s always a viewers choice to make , especially with home viewings.
> I remain unconvinced by those arguments
Though hopefully that does go some of the way to convince you that the arguments are legitimate. But personal subjective preference can override intention of the author.
After giving it a day, I think I understood the problem. That extra fluidity isn't expected to be necessary by most animators, as well as audiences. This might be the realms of assistive technologies, like subtitles fundamentally is.
Also consider that real videos and CGs are "shot" by 24 fps, but 2D animations aren't shot but drawn by human. Animator decides what picture should be an inter frame.
I studied film and worked as a DOP. There are some aesthetic aspects to frame rates many people don't consider.
With lower frame rates (24/25) you can expose individual frames longer, which leads to more motion blur with movement relative to the camera. Motion blur can be an deliberate choice, or a necessity: with higher frame rate the exposure time per frame drops, which means you need more light per frame, which means you either need more budget to light things or you need to push the sensor/film more which typically degrades color and noise performance. Nowadays this is not a big issue, but it can still be relevant if you e.g. do night scenes with available light or similar. Of course having double the images also affects the budget in different ways when it comes to storage, post production, offloading times, battery consumption, how well you have to perform your practical effects and so on.
If budget is not an issue, high frame rates change how a film or a scene feels as well. Things feel more real. Shorter exposure times/smaller shutter angles are traditionally used to make things feel more stressful. So habing a higher frame rate can be a good thing, but it can also be a bad thing, and it all depends on the story you are telling. For example a war zone action shocker with shaky immersive hand camera might profit from the high frame rates as it feels more real (and thus more shocking), while a fantasy story like the hobbit might feel a bit too real and thus uncanny using the same thing.
Art is very often about creating an aesthetical distance to the reality. That is why most masterful paintings don't look like photographs, why black and white photographs can aestheticise their motif, why directors never show people going to the shitter in certain genres etc.
IMO high frame rates in film are heavily influenced by the narrative aesthetics pioneered by video games. Many modern films have a video-gamey feel to them as well, in the way they show motion, physics and the camera angles/motions that are chosen.
So frame rate in film is an aesthetical choice, and like everything choices can be made wrong, especially since films are made 4 times, once on paper, once on set, once in the editing room and once in the head of the viewer. Maybe what seemed like a good choice on day one on the set turns out to be a bad one on the last day. Maybe on set everything seemed like the right choice, but in the editing room the focus of the story shifts and you regret not shooting it differently.
I wonder when we will see variable frame rate films where film makers can use the frame rate as an aesthetical choice on a per scene basis..
How it's done in Avatar 2 was pretty bad, in my experience. The transitions between framerates were just jarring, since they seem to have used it mid scene even. I'd have much rather just preferred 48 FPS throughout the movie.
Try doing that with a lot of parallax movement and DoF and it will look like a fest of weird blocky artifacts, because different depths of the image move at different speeds which means you would need to blur them into the dorection of the movement at different depths. Moving objects can become semi transparent at the edges which means generating that will involve knowing (or guessing) what is behind objects. This is non-trivial and very hard to solve in a general way.
The current best way to do this that I know of involves a ton of manual masking of elements at different z depths and the results will look still worse than if you had done it in camera from the start.
Nobody wants to have to do that to 40 minutes of a 90 minute movie unless there is a really good reason to do it.
I've used it for anime and other types of animation (such as CGI based media like the show Arcane), both worked fine in my view, couldn't tell how the performance was impacted.
... at the end of the day, HN comment might not be the place to explain the theory and history of animation from late 18th to early 21st century, it just don't seem to make sense to do so to me. Maybe some of [0] or [1] explains why frame interpolation for animations is a bad idea(even for some of CGI animations), if they don't, I just don't have words for that right now.
Even at native 24Hz, 24Hz motion looks so stuttery. Hell, even Avatar 2 at 48fps looked stuttery during action sequences.
If motion is too fast for a given framerate, my mind immediately stops interpeting it and I stop being able to focus. I absolutely hate 24fps content with pans, because the moment there's a pan, I just see a blurry mess and it takes several seconds to readjust after the pan is over. It's annoying, it's distracting, and I hate it.
> Hell, even Avatar 2 at 48fps looked stuttery during action sequences.
Likely due to variable frame rate, I noticed this too. If they kept it at 48 FPS throughout, it probably wouldn't have been so stuttery, but they decided to make it randomly 24 FPS within the same scene which was almost unwatchable.
No, I'm talking about when a panning shot comes up, the video feels as if it's a slideshow due to how few frames there are with how much motion is occurring. Compare it to a video game where running at 120 FPS and moving the camera around looks much clearer than doing the same thing at 30 FPS or fewer. Note that I'm not even talking about input lag (aka the game feeling better at higher FPS), I'm talking only about how it visually looks on screen. I much prefer the higher FPS.
> Frame timing is an integral, creative aspect of the medium
> not something to be fixed.
The first does not imply the second in every case. There are many, many instances where a creator with unlimited resources would have chosen to drawn more frames. And now we can. We don't have to indiscriminately interpolate everything to 120 Hz, but I think it's clearly possible to make huge improvements to a whole lot of hand drawn animation.
Sure, maybe a creator can utilize AI interpolation as a tool. I understood your comment to be about applying AI interpolation to works already completed.
It is also true that a lot of time, the "low" framerate in Animation is also due to budget. But it is then seen has a artistic limitation and therefore is part of the final art.
I think having good interpolation technology *in the hands of the artists* is awesome: You can make buttery smooth animation for a much lower cost when you want. But you get to keep your full creative control over the timing and framing whenever you want it.
I agree that it is part of the creative process right now. But it is nothing but an artifact of it's time, where it would exorbitantly expensive to create 60 animation frames. Just like 30fps video and and black and white television and games back in the day.
60FPS animation as the default is superior to the choppyness we have nowadays. But the industry has to adjust their process, and upscale animation may never look right.
The "soap opera" effect is mainly a perceptual response to the high frame rate itself, even if there are no interpolation artifacts. That's why it has its name: soap operas are typically filmed at 60fps, with cameras that cameras one full frame every 1/60th of a second.
Why would there be a lack of blur? The 24fps (say) footage will start with more blur as the shutter speed will be slower and this blur will remain when interpolated to 60fps. It would surely have a physically impossibly high amount of blur after interpolation.
Interpolation doesn’t change shutter angle which is what causes motion blur to have a specific aesthetic look, so when interpolated it looks wrong for the frame rate.
You’re right in that it doesn’t change the amount of blur, I’m not sure why the person you’re replying to thinks that. But it does change the aesthetic of the blur.
Perhaps they’re thinking of the issue with the Hobbit where the higher frame rate at a given shutter angle (180 let’s say) that shortened the blur compared to a traditional 24 fps film.
They adjusted this for the second hobbit film to leave the shutter open longer (not a 360 shutter, but something like a 270) to bring back some of that feeling.
Not quite! They capture one full frame every 1/30th of a second in the US and Japan, and 1/25th everywhere else.
They capture one *field* every 1/50th of a second, and interlace it.
The reason for this is that it was hard to scan a full 625 lines in 1/50th of a second, and it would have required ludicrous bandwidth for the AM transmitters of the day and massive output valves to drive the scan coils in TV sets which would then produced huge flyback spikes.
So the decision was to scan 312.5 lines per field, offset slightly, with each pair of fields forming one 1/25th of a second frame. Telecine was easy in the UK because you could speed the film up a little and then each TV frame would be two fields per film frame.
If you shoot 50i you can deinterlace to 25p by combining the two sets of scan lines from each field with a bit of clever processing to hide the "combing" caused by things moving between fields, or you can double it up to 50p by just copying each line out twice with a bit of clever processing to smooth out the steps. I could post some video clips showing this if you were interested.
Of course modern digital TV cameras are capable of shooting 60p as you say, but anyone shooting "telenovela" stuff is going to be shooting 60i.
It gets worse because for NTSC colour they changed it to 59.94Hz so that the colour subcarrier could be an odd multiple of the scan rate and just fit into the 4MHz-wide channels.
This is why all your video editing software has got those ballachey 59.94/24.97 modes.
It gets worse because for telecine you need to need to show two frames of 24fps film for every three frames of video, so you need to slow your film down to 23.976fps so it fits.
> The "soap opera" effect is mainly a perceptual response to the high frame rate itself, even if there are no interpolation artifacts.
also, i think the 3:2 pulldown (24->29.97fps) and interlacing that typically causes jitter/blurring is absent at framerates that are even multiples (30->30 or 30->60)
A similar thing happened with The Hobbit, iirc. It was shot in 60fps and everyone said the CGI was too obvious, I think we’ve just learned to associate high frame rates with computer games.
My big problem with the high frame rate in the hobbit was the slack of camera stability -- I think whatever image stabilization solution they used performed very poorly for that film -- not sure what went wrong but always hypothesized it was related to the higher capture rate.
It felt like the camera was always bouncing and moving around randomly in a way I found very displeasing...
Sure, but most hand drawn animation is done at 12 FPS or lower. Even just upsampling from 12 to 24 would be awesome and wouldn't get into the territory where the soap opera effect is caused by the framerate itself.
This would change the acting choices the animator made however.
to take my response from elsewhere , consider a single drawn frame impact that is held for 2 or more frames. That timing is integral to the performance.
When you interpolate, it no longer holds the right amount so the performance changes.
People may scoff at that, but it’s a huge deal for artists. It was hell when we had dailies and motion smoothing on one of the TVs turned on (if we weren’t using screening rooms). Suddenly all the notes were different and acting didn’t feel right.
Adding frames isn’t simply an act of interpolation when it comes to animation. You’d need to understand acting choices as well.
The timing of motion and the importance of how long a frame is held for is important to how characters are portrayed and the impact of actions.
Take a punch or a bounce. The time of contact may be one drawn frame but it may be on screen longer to sell it. What should interpolation do here? If if it smooths everything but the impact, then the impact will feel too sticky and lose its force. Similarly if it smooths the impact as well then it goes by too quick and it feels too weak.
Animation is full of these choices and it’s a common problem in video games for example where animators animate at a given frame rate but games can run at variable sampling rates along their animation, causing unwanted artifacts of motion or ruin the performance.
Just for fun, I used ffmpeg's `minterpolate` (https://ffmpeg.org/ffmpeg-filters.html#minterpolate) feature to raise the framerate of my Mad Max 2 Road Warrior DVD from 24 to 100 fps. The results were interesting.
If you pause the video and step through one frame at a time, it's very clear which frames are from the original film and which frames were conjured by the algorithm. There's no AI involved, so the algorithm doesn't really understand background from foreground, and objects in the artificial frames are distorted and unnatural. But at 100 fps you don't notice this, or at least I didn't, and motion seemed very smooth and fluid.
The soap opera effect definitely held. It didn't feel like watching a classic movie from 1982, it felt much more like watching a contemporary reality TV program. There was definitely a sense of realism, but the realism was not that you're witnessing an actual post-apocalyptic society clinging to survival. It was realistic in the sense that you felt like you're on a film set, and if you could just peek outside the frame you'd see George Miller and the camera and microphone operators with the lighting and rigging and make-up and costume departments hanging around the catering truck.
A lot of people felt that HFR ruined Peter Jackson's The Hobbit series (or was one of the many contributing factors) because it broke the pretense of fantasy. In my experience raising the frame rate did the same for The Road Warrior. Those outlandish costumes looked even sillier at 100 fps, but man those stunts are incredible! You do get a visceral sense of just how dangerous a lot of those stunts are. It's a miracle that nobody was killed.
Yeah, even in the example from the README, something seems a bit off, although I can't put my finger on it. But I wonder how the animation would look if the subject turns its head...
One of the immediately noticable uncanny valley effects is the eye appear to pan up, which never occurs in real life -- the eye jerks to a new direction as a step movement, called a saccade.
From an ML perspective, I would have assumed they would train on high frame rate video, but it seems they didn't, they're estimating motion of pixel clumps. I don't see how it can ever do things like eyes/smiles/teeth. Useful for reconstructed pseudo slideshows in Google Photos with added vampire effect.
nvfruc is extremely limited in its capability, basically small motion interpolation between two frames within a very short interval. FILM (and other deep learning based interpolation methods) is capable of interpolating two frames with large motion differences at a significantly larger computational cost. I don't understand why you're trying to imply nvfruc could be better at both cost and quality? Even Nvidia won't make such bold claims.
While I'm really impressed by the work in this paper (and codebase) I cannot stress enough that having minor as examples is definitely not a good idea on so many levels!
i appreciate that "I cannot stress enough" is somewhat idiomatic, however it strikes me as odd to announce this but then not offer least some of the specific reasons you believe this.
there are aspect of copyright law, and ownership of that image (this is work done inside google) that a minor, which eventually be an adult have no control.
Additionally, minor images can be abused (I'm leaving at it, but you can assume that some AI breakthrough in generative imagines can be abused)
Yup. There's another framework called 'Feature-wise Linear Modulate' (FiLM), also related to video, but otherwise unrelated (note the lowercase 'i'); and thanks to Hasbro, Neural Radiance Fields (NeRF) is surprisingly difficult to search (but Hasbro was first, so NeRF not a great choice).
I've seen 2-3 other examples of insane name duplication in CV research in the last couple of months.
It'll never happen. Best to qualify when searching "FILM interpolation" or expand when introducing "Frame Interpolation for Large Motion" as was done here. That way you get the best of both worlds, friendly acronym for active use and extremely referenceable name for long term or first time use. The long form is probably what you'd need to do without common abbreviations anyways, weird acronyms just to be unique doesn't help when actively using it now or a decade later when you're trying to reference back to it and nobody knows what gLMFI1 was anyways - unless it becomes extremely popular in which case gLMFI1 is a pain to use all the time.
They give the full proper citation for referencing the work:
@inproceedings{reda2022film,
title = {FILM: Frame Interpolation for Large Motion},
author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}