How I Warped Your Noise

Supplementary Results

Back to Main Page

More Noise Warping Comparisons

Our method allows warping Gaussian noise with extreme deformations while still preserving its Gaussian properties. This is not achievable with standard warping methods, as we show in the comparison below. In the first row, we compare with standard interpolation methods applied on consecutive frames \((F_{n-1}, F_{n})\). These tend to create numerical dissipation, which destroys high-frequency details and produces blurring. In the second row, we compute a flow map between the initial frame and the current one, and apply the interpolation methods on the pair \((F_0, F_{n})\). Due to numerical error in the mapping, flickering and incoherence appear in the result. Our \(\smallint\)-noise method outperforms all existing warping methods by transporting the noise perfectly while keeping its Gaussian properties.


Bilinear

Bicubic

Nearest Neighbor

Root-bilinear*

\(\smallint\)-noise (Ours)


*The root-bilinear interpolation is a simple modification of bilinear interpolation where we replace the interpolation coefficients by their squareroot. This has the property that when applied to a set of independent Gaussian noise samples, it would preserve the unit variance.


Warped Noise in Latent Diffusion

Here are some results of noise warping in latent diffusion models. Please refer to the paper appendix for more details.

No Cross-Frame Attention

Fixed Noise

\(\smallint\)-noise (Ours)

With Cross-Frame Attention

Fixed Noise

\(\smallint\)-noise (Ours)

With Cross-Frame Attention + Feature Injection

Fixed Noise

\(\smallint\)-noise (Ours)

Comparison with DDIM inversion

DDIM inversion is a popular inversion method that has also been used to obtain more informative noise priors for video editing tasks. However, it suffers from two main problems. First, it only produces one noise map per image. This may not always be compatible with other diffusion-based methods like SDEdit or I²SB, which uses DDPM. Second, as it remains primarily an inversion method, the spatial and temporal information of the image are entangled inside the noise.


We compare our method with DDIM inversion in the appearance transfer example with SDEdit. We experiment with two settings:

  • DDIM inversion to intermediate step (first row): similar to how we applied SDEdit above, we use DDIM to invert the synthetic video frames back to an intermediate timestep (60% of total steps), and then denoise it using forward DDIM. As expected, since there is no prompt to be changed or other settings to be modified, it mostly reconstructs the original synthetic video without adding any realistic appearance details.
  • DDIM inversion as initial noise (second row): we run a full DDIM inversion for each frame to obtain a noise map, which we treat similarly to the other noise priors we showed above: we add it to input frames and denoise using forward DDIM. Because the input video is far from the data distribution of the model (trained on realistic images of bedrooms), the DDIM-inverted noise is far from Gaussian. This makes it a poor candidate as a noise prior.
As a comparison, our noise prior only contains temporal information, so the model can generate realistic details on top of the synthetic scene without being constrained to reconstruct the input sequence. Furthermore, by applying the same warping to different noise samples (which DDIM inversion cannot do), we can obtain different variations in the final result (third row). Note that for fairness, we also use deterministic DDIM for denoising in our method, i.e. our noise warping method is only used once for the initial noise.


DDIM inversion to intermediate step


DDIM inversion as initial noise


\(\smallint\)-noise (Ours)


Here is a visual comparison between our noise prior and the one obtained from DDIM inversion. While DDIM inversion produces temporally correlated noise, its distribution heavily depends on how far the input video is from the training distribution of the diffusion model. In contrast, our warping method retains the Gaussian properties of the noise.


DDIM inversion noise

\(\smallint\)-noise (Ours)