Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
FILM: Frame Interpolation for Large Motion (github.com/google-research)
99 points by tosh on Dec 20, 2022 | hide | past | favorite | 67 comments


Someone must be doing this for animation, right? I want to watch all the animated classics at full frame rate. Purists may gasp but I think that it won't be long before these models can do a lot better than mere interpolation, preserving the original feel of sequences despite the increased frame rate and avoiding the "soap opera" effect.


Frame timing is an integral, creative aspect of the medium, not something to be fixed.

1. https://www.idtech.com/blog/what-does-animating-on-ones-twos...

2. https://www.awn.com/tooninstitute/lessonplan/timing.htm#:~:t....

3. Search for "A good example would be this cut, and especially the first four shots." in https://animetudes.com/2021/03/06/the-kanada-style-in-contex...


As someone who already interpolates media with SVP4, I remain unconvinced by those arguments. My media looks amazing running at 120 hz, I don't get headaches by jarring panning shots that look closer to a slideshow than a movie.

I recently watched Avatar 2 in IMAX 3D with 48 FPS, and it looked incredible. If only all media in the future could let go of the arbitrary 24 frame limit, which was made a hundred years ago because filmmakers wanted to conserve their film usage.


Avatar 2 was shot for 48 fps. It’s very different to interpolating animation to a higher frame rate.

Most animation is shot for 24 fps on twos, so 12ish frames (we usually go between 8fps and 24fps)

The difference is that we design poses to read at a given frame rate and have performance characteristics at those rates. If you were to interpolate it, you’d change a fundamental characteristic of the acting.

How long a ball makes contact with a surface is important and interpolation struggles to understand something that should stick to a surface for an extra frame versus something that should be blended.

Yes you’ll potentially get lower strobing but you’re also now getting a different performance. Maybe you don’t care, but it’s fundamentally changing things away from the intention.

This is a different discussion than just simply what frame rate video is shot at.


Yeah you're right, I don't actually care about the difference in performance, I can't tell it apart honestly speaking, I just notice the extra fluidity which I'll continue to use.


That’s fair, and that’s always a viewers choice to make , especially with home viewings.

> I remain unconvinced by those arguments

Though hopefully that does go some of the way to convince you that the arguments are legitimate. But personal subjective preference can override intention of the author.


After giving it a day, I think I understood the problem. That extra fluidity isn't expected to be necessary by most animators, as well as audiences. This might be the realms of assistive technologies, like subtitles fundamentally is.


Also consider that real videos and CGs are "shot" by 24 fps, but 2D animations aren't shot but drawn by human. Animator decides what picture should be an inter frame.


I studied film and worked as a DOP. There are some aesthetic aspects to frame rates many people don't consider.

With lower frame rates (24/25) you can expose individual frames longer, which leads to more motion blur with movement relative to the camera. Motion blur can be an deliberate choice, or a necessity: with higher frame rate the exposure time per frame drops, which means you need more light per frame, which means you either need more budget to light things or you need to push the sensor/film more which typically degrades color and noise performance. Nowadays this is not a big issue, but it can still be relevant if you e.g. do night scenes with available light or similar. Of course having double the images also affects the budget in different ways when it comes to storage, post production, offloading times, battery consumption, how well you have to perform your practical effects and so on.

If budget is not an issue, high frame rates change how a film or a scene feels as well. Things feel more real. Shorter exposure times/smaller shutter angles are traditionally used to make things feel more stressful. So habing a higher frame rate can be a good thing, but it can also be a bad thing, and it all depends on the story you are telling. For example a war zone action shocker with shaky immersive hand camera might profit from the high frame rates as it feels more real (and thus more shocking), while a fantasy story like the hobbit might feel a bit too real and thus uncanny using the same thing.

Art is very often about creating an aesthetical distance to the reality. That is why most masterful paintings don't look like photographs, why black and white photographs can aestheticise their motif, why directors never show people going to the shitter in certain genres etc.

IMO high frame rates in film are heavily influenced by the narrative aesthetics pioneered by video games. Many modern films have a video-gamey feel to them as well, in the way they show motion, physics and the camera angles/motions that are chosen.

So frame rate in film is an aesthetical choice, and like everything choices can be made wrong, especially since films are made 4 times, once on paper, once on set, once in the editing room and once in the head of the viewer. Maybe what seemed like a good choice on day one on the set turns out to be a bad one on the last day. Maybe on set everything seemed like the right choice, but in the editing room the focus of the story shifts and you regret not shooting it differently.

I wonder when we will see variable frame rate films where film makers can use the frame rate as an aesthetical choice on a per scene basis..


>I wonder when we will see variable frame rate films where film makers can use the frame rate as an aesthetical choice on a per scene basis..

There’s this. It looks like they used it in Avatar 2 actually. Not sure how widespread it is otherwise. https://www.pixelworks.com/en/truecut


How it's done in Avatar 2 was pretty bad, in my experience. The transitions between framerates were just jarring, since they seem to have used it mid scene even. I'd have much rather just preferred 48 FPS throughout the movie.


you can always add blur across frames if it fits better artistically


Try doing that with a lot of parallax movement and DoF and it will look like a fest of weird blocky artifacts, because different depths of the image move at different speeds which means you would need to blur them into the dorection of the movement at different depths. Moving objects can become semi transparent at the edges which means generating that will involve knowing (or guessing) what is behind objects. This is non-trivial and very hard to solve in a general way.

The current best way to do this that I know of involves a ton of manual masking of elements at different z depths and the results will look still worse than if you had done it in camera from the start.

Nobody wants to have to do that to 40 minutes of a 90 minute movie unless there is a really good reason to do it.


That's surprisingly hard to do well, and is a very very very slow process.


Yes, increasing costs (artist- and technical) even more in post production.


It should be okay for video. Parent comment is about animation. The distinctions is whether the individual intra-frames are shot or drawn.


I've used it for anime and other types of animation (such as CGI based media like the show Arcane), both worked fine in my view, couldn't tell how the performance was impacted.


... at the end of the day, HN comment might not be the place to explain the theory and history of animation from late 18th to early 21st century, it just don't seem to make sense to do so to me. Maybe some of [0] or [1] explains why frame interpolation for animations is a bad idea(even for some of CGI animations), if they don't, I just don't have words for that right now.

0: http://gurneyjourney.blogspot.com/2014/07/elongated-in-betwe...

1: https://twitter.com/animepiic/status/1562932959472238592


>slideshow

Are you talking about inconsistent frame timing aka judder? 24 divides evenly into 120, so you shouldn’t have that problem on a 120hz tv anyway.


Even at native 24Hz, 24Hz motion looks so stuttery. Hell, even Avatar 2 at 48fps looked stuttery during action sequences.

If motion is too fast for a given framerate, my mind immediately stops interpeting it and I stop being able to focus. I absolutely hate 24fps content with pans, because the moment there's a pan, I just see a blurry mess and it takes several seconds to readjust after the pan is over. It's annoying, it's distracting, and I hate it.


> Hell, even Avatar 2 at 48fps looked stuttery during action sequences.

Likely due to variable frame rate, I noticed this too. If they kept it at 48 FPS throughout, it probably wouldn't have been so stuttery, but they decided to make it randomly 24 FPS within the same scene which was almost unwatchable.


But that's the interesting part: It shows that we only tolerate 24Hz because we're used to it, not because it's actually good!


Exactly. That's why I want more HFR films and media.


No, I'm talking about when a panning shot comes up, the video feels as if it's a slideshow due to how few frames there are with how much motion is occurring. Compare it to a video game where running at 120 FPS and moving the camera around looks much clearer than doing the same thing at 30 FPS or fewer. Note that I'm not even talking about input lag (aka the game feeling better at higher FPS), I'm talking only about how it visually looks on screen. I much prefer the higher FPS.


> Frame timing is an integral, creative aspect of the medium

> not something to be fixed.

The first does not imply the second in every case. There are many, many instances where a creator with unlimited resources would have chosen to drawn more frames. And now we can. We don't have to indiscriminately interpolate everything to 120 Hz, but I think it's clearly possible to make huge improvements to a whole lot of hand drawn animation.


Sure, maybe a creator can utilize AI interpolation as a tool. I understood your comment to be about applying AI interpolation to works already completed.


Both.


The animator Noodle recently put this video on YouTube that explains why he despises the idea of interpolated video:

https://youtu.be/_KRb_qV9P4g


It is also true that a lot of time, the "low" framerate in Animation is also due to budget. But it is then seen has a artistic limitation and therefore is part of the final art.

I think having good interpolation technology *in the hands of the artists* is awesome: You can make buttery smooth animation for a much lower cost when you want. But you get to keep your full creative control over the timing and framing whenever you want it.


I agree that it is part of the creative process right now. But it is nothing but an artifact of it's time, where it would exorbitantly expensive to create 60 animation frames. Just like 30fps video and and black and white television and games back in the day.

60FPS animation as the default is superior to the choppyness we have nowadays. But the industry has to adjust their process, and upscale animation may never look right.


The "soap opera" effect is mainly a perceptual response to the high frame rate itself, even if there are no interpolation artifacts. That's why it has its name: soap operas are typically filmed at 60fps, with cameras that cameras one full frame every 1/60th of a second.


More specifically, the reaction to lack of blur.

You can get 60 fps upconverted footage from motion interpolation to not exhibit the effect by adding copious gaussian motion blur.

The same problem (plus perceived judder due to low fps) appears if you play a video on a screen equipped with correctly tuned strobe backlight.

It's one of the reasons as to why simple kernel interpolation (blurmotion) with most kernels does not exhibit this effect either.


Why would there be a lack of blur? The 24fps (say) footage will start with more blur as the shutter speed will be slower and this blur will remain when interpolated to 60fps. It would surely have a physically impossibly high amount of blur after interpolation.


Interpolation doesn’t change shutter angle which is what causes motion blur to have a specific aesthetic look, so when interpolated it looks wrong for the frame rate.

You’re right in that it doesn’t change the amount of blur, I’m not sure why the person you’re replying to thinks that. But it does change the aesthetic of the blur.

Perhaps they’re thinking of the issue with the Hobbit where the higher frame rate at a given shutter angle (180 let’s say) that shortened the blur compared to a traditional 24 fps film.

They adjusted this for the second hobbit film to leave the shutter open longer (not a 360 shutter, but something like a 270) to bring back some of that feeling.


Not quite! They capture one full frame every 1/30th of a second in the US and Japan, and 1/25th everywhere else.

They capture one *field* every 1/50th of a second, and interlace it.

The reason for this is that it was hard to scan a full 625 lines in 1/50th of a second, and it would have required ludicrous bandwidth for the AM transmitters of the day and massive output valves to drive the scan coils in TV sets which would then produced huge flyback spikes.

So the decision was to scan 312.5 lines per field, offset slightly, with each pair of fields forming one 1/25th of a second frame. Telecine was easy in the UK because you could speed the film up a little and then each TV frame would be two fields per film frame.

If you shoot 50i you can deinterlace to 25p by combining the two sets of scan lines from each field with a bit of clever processing to hide the "combing" caused by things moving between fields, or you can double it up to 50p by just copying each line out twice with a bit of clever processing to smooth out the steps. I could post some video clips showing this if you were interested.

Of course modern digital TV cameras are capable of shooting 60p as you say, but anyone shooting "telenovela" stuff is going to be shooting 60i.


Thank you!! Our contemporary 60hz world seems to have retconned broadcast video standards.


It gets worse because for NTSC colour they changed it to 59.94Hz so that the colour subcarrier could be an odd multiple of the scan rate and just fit into the 4MHz-wide channels.

This is why all your video editing software has got those ballachey 59.94/24.97 modes.

It gets worse because for telecine you need to need to show two frames of 24fps film for every three frames of video, so you need to slow your film down to 23.976fps so it fits.

Ballache.


> The "soap opera" effect is mainly a perceptual response to the high frame rate itself, even if there are no interpolation artifacts.

also, i think the 3:2 pulldown (24->29.97fps) and interlacing that typically causes jitter/blurring is absent at framerates that are even multiples (30->30 or 30->60)


A similar thing happened with The Hobbit, iirc. It was shot in 60fps and everyone said the CGI was too obvious, I think we’ve just learned to associate high frame rates with computer games.


My big problem with the high frame rate in the hobbit was the slack of camera stability -- I think whatever image stabilization solution they used performed very poorly for that film -- not sure what went wrong but always hypothesized it was related to the higher capture rate.

It felt like the camera was always bouncing and moving around randomly in a way I found very displeasing...


Hobbit was 48hz (to be a multiple of the more standard 24 letting it also work on 24-only projection systems I am assuming).


And this was because there were two cameras for the whole 3D craze. One for each eye.

Though I don’t think I saw the 48fps in cinemas in Australia


I have to admit, watching it in 48fps (pretty sure it was 48 and not 60) the CGI sequences did look very much like video game cutscenes.


Sure, but most hand drawn animation is done at 12 FPS or lower. Even just upsampling from 12 to 24 would be awesome and wouldn't get into the territory where the soap opera effect is caused by the framerate itself.


This would change the acting choices the animator made however.

to take my response from elsewhere , consider a single drawn frame impact that is held for 2 or more frames. That timing is integral to the performance.

When you interpolate, it no longer holds the right amount so the performance changes.

People may scoff at that, but it’s a huge deal for artists. It was hell when we had dailies and motion smoothing on one of the TVs turned on (if we weren’t using screening rooms). Suddenly all the notes were different and acting didn’t feel right.


Adding frames isn’t simply an act of interpolation when it comes to animation. You’d need to understand acting choices as well.

The timing of motion and the importance of how long a frame is held for is important to how characters are portrayed and the impact of actions.

Take a punch or a bounce. The time of contact may be one drawn frame but it may be on screen longer to sell it. What should interpolation do here? If if it smooths everything but the impact, then the impact will feel too sticky and lose its force. Similarly if it smooths the impact as well then it goes by too quick and it feels too weak.

Animation is full of these choices and it’s a common problem in video games for example where animators animate at a given frame rate but games can run at variable sampling rates along their animation, causing unwanted artifacts of motion or ruin the performance.


Just for fun, I used ffmpeg's `minterpolate` (https://ffmpeg.org/ffmpeg-filters.html#minterpolate) feature to raise the framerate of my Mad Max 2 Road Warrior DVD from 24 to 100 fps. The results were interesting.

If you pause the video and step through one frame at a time, it's very clear which frames are from the original film and which frames were conjured by the algorithm. There's no AI involved, so the algorithm doesn't really understand background from foreground, and objects in the artificial frames are distorted and unnatural. But at 100 fps you don't notice this, or at least I didn't, and motion seemed very smooth and fluid.

The soap opera effect definitely held. It didn't feel like watching a classic movie from 1982, it felt much more like watching a contemporary reality TV program. There was definitely a sense of realism, but the realism was not that you're witnessing an actual post-apocalyptic society clinging to survival. It was realistic in the sense that you felt like you're on a film set, and if you could just peek outside the frame you'd see George Miller and the camera and microphone operators with the lighting and rigging and make-up and costume departments hanging around the catering truck.

A lot of people felt that HFR ruined Peter Jackson's The Hobbit series (or was one of the many contributing factors) because it broke the pretense of fantasy. In my experience raising the frame rate did the same for The Road Warrior. Those outlandish costumes looked even sillier at 100 fps, but man those stunts are incredible! You do get a visceral sense of just how dangerous a lot of those stunts are. It's a miracle that nobody was killed.



Yikes the teeth just materialize like in some nightmare!

https://film-net.github.io/static/images/000204/interpolated...


Yeah, even in the example from the README, something seems a bit off, although I can't put my finger on it. But I wonder how the animation would look if the subject turns its head...


None of the demos are properly working for me. Am I doing something wrong, like the photos are too large?


One of the immediately noticable uncanny valley effects is the eye appear to pan up, which never occurs in real life -- the eye jerks to a new direction as a step movement, called a saccade.

From an ML perspective, I would have assumed they would train on high frame rate video, but it seems they didn't, they're estimating motion of pixel clumps. I don't see how it can ever do things like eyes/smiles/teeth. Useful for reconstructed pseudo slideshows in Google Photos with added vampire effect.


very cool how does this compare to NVIDIA nvfruc?

https://docs.nvidia.com/video-technologies/optical-flow-sdk/...


The target application is somewhat different. FILM is more focused on high quality animation generation while nvfruc is for real-time streaming.


Yea but this is irrelevent unless you implying that FILM is better quality while nvfruc is lower quality but faster so applicable for realtime?

if nvfruc is better quality + faster then the comparison is that nvfruc is better.


nvfruc is extremely limited in its capability, basically small motion interpolation between two frames within a very short interval. FILM (and other deep learning based interpolation methods) is capable of interpolating two frames with large motion differences at a significantly larger computational cost. I don't understand why you're trying to imply nvfruc could be better at both cost and quality? Even Nvidia won't make such bold claims.


Thanks.


While I'm really impressed by the work in this paper (and codebase) I cannot stress enough that having minor as examples is definitely not a good idea on so many levels!


i appreciate that "I cannot stress enough" is somewhat idiomatic, however it strikes me as odd to announce this but then not offer least some of the specific reasons you believe this.


there are aspect of copyright law, and ownership of that image (this is work done inside google) that a minor, which eventually be an adult have no control. Additionally, minor images can be abused (I'm leaving at it, but you can assume that some AI breakthrough in generative imagines can be abused)


Why is that?


Can we please stop using common names for new technologies? will make it harder to reference in a decade.


Yup. There's another framework called 'Feature-wise Linear Modulate' (FiLM), also related to video, but otherwise unrelated (note the lowercase 'i'); and thanks to Hasbro, Neural Radiance Fields (NeRF) is surprisingly difficult to search (but Hasbro was first, so NeRF not a great choice).

I've seen 2-3 other examples of insane name duplication in CV research in the last couple of months.


FLiM would have been much better, and kind of funny


It'll never happen. Best to qualify when searching "FILM interpolation" or expand when introducing "Frame Interpolation for Large Motion" as was done here. That way you get the best of both worlds, friendly acronym for active use and extremely referenceable name for long term or first time use. The long form is probably what you'd need to do without common abbreviations anyways, weird acronyms just to be unique doesn't help when actively using it now or a decade later when you're trying to reference back to it and nobody knows what gLMFI1 was anyways - unless it becomes extremely popular in which case gLMFI1 is a pain to use all the time.


It’s not a distaster but it is better to use made up names, like filmz, filmy, filmerz, filmstack .. the list goes on.


They give the full proper citation for referencing the work: @inproceedings{reda2022film, title = {FILM: Frame Interpolation for Large Motion}, author = {Fitsum Reda and Janne Kontkanen and Eric Tabellion and Deqing Sun and Caroline Pantofaru and Brian Curless}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2022} }




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: