Recovering redacted information from pixelated videos

January 25, 2022

Information that has been redacted is often the most interesting. It's therefore no wonder that some people might have a motivation to try to reverse such a redaction for various reasons.

In this blog post, I'll discuss image/video blurring methods and their weaknesses and present a simple yet effective method to get a high-resolution image from a pixelated video in order to recover redacted information (with no guessing involved).

Overview of redaction (reversal) methods

One of the best known attacks on digital redaction methods works only on specific file formats, e.g. PDFs and Office documents. As text and objects/images in those files are stored as different objects in "separate layers", redacted text can sometimes still simply be copied, and colored boxes that were intended to hide information can often be removed to reveal the redacted content. This technique however does not work for simple image formats that only store one layer of pixel information (except for surprising edge cases when using transparency).

A popular alternative to redacting information with a colored box is blurring. Instead of removing all information for a particular region, here the information density is only reduced.

Blurring is usually achieved using one of the following two methods:

  • Mosaic/Pixelization/Box Linear Filter: Multiple pixels are merged to a bigger one with the color being the average of the original pixel values. This makes everything look pixelated.
  • Gaussian blur (or similar): Using a specific kernel, the new color of every pixel is influenced by its surrounding pixels (via a weighted average). This makes everything look blurry.
Top Left: Mosaic Filter - Bottom Right: Gaussian Blur Filter

One of my favorite unblurring stories is about a Bitcoin Cash private key that was shown on French Television. An entrepreneur wanted to give away some crypto during an interview, but since the station was not authorized to give out money prizes during that TV transmission, the private key information was blurred. By combining tiny pieces of information from different camera shots and knowledge about the encoding scheme of a QR code, two researchers could recover the private key and claim the prize (after 16 hours of manual reverse engineering).

While this attack was very specific to QR codes and combined different manual techniques, more generic methods have also been published.

Image Unblurring

All of the following attacks share the same core idea: Pixelate potential input data and compare the results.

Similar to a dictionary attack, only probable inputs are compared, greatly increasing the brute force efficiency. The techniques also exploit the missing Avalanche effect of the Mosaic "hash function": When a detail in the input image changes, this will only affect a small area of the output image, thus making it possible to divide the problem into easier to solve sub-problems and only search for matches of smaller regions.

  • Since a blurred image can also be interpreted as just a low-resolution image, any kind of AI-based image "superresolution" algorithm could be suitable (e.g. ISR or TecoGAN (GAN with temporal coherence))
TecoGAN (row 1: Reference, row 2: TecoGAN output, row 3: TecoGAN input)
  • A GAN-based superresolution and text recognition algorithm was presented in 2016 as TextSR (also no code available)

Those results are definitely impressive, but due to the involved creativity of the AI, they can not be fully trusted. For a hacker, that can uncover a redacted QR code or credit card number this way (and check its validity via its checksum), this does not matter, but I hope PULSE-upscaled faces won't find their way into courts soon.

Due to the information loss inherent to image downscaling, this "creativity" is however essential to achieving the impressive-looking results of AI-based approaches.

Video unblurring

In comparison to a single image, a video provides much more possibilities for mistakes, some of which eradicate the need for unblurring completely:

  • Bad object tracking (in the following shot of a Funk documentary on hacking, the URL that was supposed to be blurred, was not properly motion-tracked and shown for a few frames)
  • Missing blur for a full video section (e.g. in the following shot of another Funk documentary on cyber bullying, the HaveIBeenPwned results were blurred, but the reporter's private email address was not)
Blurred HIBP results
Unblurred reporter's email address which could also be used to retrieve the previously blurred result page (and cyber bully her?) (the black box was added by me)
  • Missing blur in the first/last frame after/before a cut
  • Multiple camera shots, each leaking different information that can be combined (see the BTC QR code story)

In some of the observed cases, the mishaps were fixed after being noticed (e.g. both examples shown above). While this is usually desirable to reduce the spread of this information, it also makes it possible for a malicious actor to detect them easily and in an automated fashion by immediately downloading new uploads from documentary/news channels and diffing them with a download from a later date.

In case all frames are correctly blurred, a different approach is required.

Since a pixelated video can also be interpreted as low-resolution, using video upscaling techniques could be a viable approach. The current state-of-the art algorithms use some form of machine learning such as a Convolutional Neural Networks (VESPCN) or Generative Adversarial Networks (TecoGAN).

Side note: The potentially most extensive research on the problem of programmatically unblurring mosaic'ed regions from videos was done by Japanese Adult Video enthusiasts. Javplayer automatically detects blurred regions and performs upscaling via TecoGAN, and another person spent months improving their custom GAN that was trained with leaked videos (search for "De-Mosaic JAV with AI, Deep Learning and Adversarial Networks").

Reversing pixelation in videos without guessing

After seeing many videos where information has been blurred using the Mosaic method, I wanted to implement an idea that came to my mind: Since the camera is often moving relative to the blurred object, the boundaries for the bigger squares are also often moving relative to the real world. Therefore, by observing the color difference of the bigger squares in between frames, and correlating it with the movement, it should be possible to extract additional information and create an output image with a higher resolution.

The following animations illustrate this idea. A black bar in different positions is "redacted" using a mosaic pattern. In the first animation, a slight right shift of the mosaic in relation to the redacted content moves the gray block to the left. This suggests that the dark pixels causing the overall gray were located at the left border of the original mosaic pattern.

Compare this to the next animation where the black bar is located slightly further to the right. Here, the movement did not change the mosaic coloring:

In contrast to an AI-based approach, this would not be inventing information, but extracting spatial data (higher resolution) from existing temporal data (multiple frames). The differences to classic video upscaling are, that the larger part of the video already has a high resolution (allowing for more precise object tracking) and that a single super-high-resolution frame is an acceptable output (compared to a higher-resolution video).

To test the hypothesis, I created a simply example "video" by slightly moving the following input image between each frame while keeping a mosaic blur at the same location.

The result was this image sequence:

By reversing the image movements and putting all images on top of each other (either using GIMP or programmatically), I was able to get the following result:

Overlapped images with the movement reversed
Left: One frame of the input video - Right: GIMP/Python output of reversing the motion and overlaying images

As an alternative, I also tried upscaling the mosaic'ed video via TecoGAN, which does not help (much) with improving readability:

TecoGAN-upscaled video

While the output image was not as detailed as the input, the approach seemed promising, so I created a test video with my phone camera (Moto G9 Plus) and a document I had laying around (coincidentally a request to pay the TV tax that paid for the above documentaries). I then retracted the IBAN using the mosaic filter in Shotcut:

Letter with the IBAN pixelated using Shotcut's Mosaic filter

As I did not create the image sequence computationally this time, I did initially not know anything about the movement between frames. Therefore I imported the video clip into Blender, added tracking markers in high-contrast areas and let Blender track them throughout the video.

Using one marker close to the blurred area for the position, and two other markers further away to calculate the scale and rotation, I created a motion-tracked and stabilized video:

Stabilized footage

Overlapping all frames via a small Python script resulted in the following image:

From there, the redacted text is quite readable and seems to be "DE3000100000001272" (correct) or maybe "DE3000700000001272".

As a next step, this improved output image could then of course also be fed into one of the previously mentioned image unblurring/text recovery techniques.

Comparison with other techniques

I was quite happy with those results and curious how they would compare, so I tested three other techniques that were easily accessible (meaning that the code/tool was released and is usable without a lengthy setup and debugging process):

  • Topaz Video Enhance AI, a popular (commercial) AI-based video upscaling tool that claims to also use temporal information from multiple frames
  • VideoEnhancer, another video superresolution tool (apparently not in development anymore) that also mentions "subpixel accurate motion compensation"
  • MFSR, a MATLAB-based open source implementation of various algorithms to generate one high-res image out of a lower-res video

Attempting to use Video Enhance AI, even on maximum upscaling and quality settings, did not yield any improvement on the blurred text's readability:

Downscaling the whole mosaiced video with a lossless video codec to match the resolution of the mosaic (for better results) and then iteratively applying VideoEnhance's SR algorithm with the best settings produced the following illegible result:

Video upscaled by VideoEnhance

I then tried MFSR and played with different Res-Factors and SR algorithms, but could also not achieve a readable result:

Real-world example

The presented technique relies on alignment changes between a real-world object and the mosaic pattern. While in the previous examples, the mosaic pattern was at a static location with only the camera moving, this is not a requirement, and the technique should also work with a moving mosaic grid (as long as it does not perfectly line up with the hidden real-world object throughout the whole sequence).

The following snippet is from yet another Funk documentary, where they go undercover to a gun market:

Pixelated number plate in Funk documentary

Applying the same motion tracking technique from above (with markers near the number plate to keep its location static) results in the following tracked video:

Motion-tracked number plate

When overlaying the frames, I ignored 5 frames with a high amount of motion blur due to camera shake. The result can be seen here:

As we're only interested in the number plate, I then used Gimp to change the contrast, emphasizing the fine differences in gray that we reconstructed earlier:

While again not completely conclusive, I think there is a high chance that the number plate part shown above says  "J 1354" or "J 1314". This time however, without the original footage, I can't verify whether that's correct.

Notes:

  • Law enforcement could obviously also use partial number plate information to e.g. track down a specific vehicle of a known brand and color
  • The outcome could probably further be improved by performing 3D motion tracking, or weighing frames based on the amount of new information they provide

Related work

This specific (or a closely related) computer vision problem is called "Multi-Frame Super Resolution" and already received quite some attention.

One of the earliest related work I found was from 1996, titled "Extraction of high-resolution frames from video sequences".

In 2006, a paper was published, presenting a way to tackle the related problems of "Color Demosaicing" (for digital sensors) and Superresolution together.

In 2018, Google implemented such a technique for their flagship smartphones with impressive results, and released a paper at SIGGRAPH 2019:

In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multi-frame super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets

Two extensive and well curated lists with more research in this area can be found here:

So while the idea and technique presented in this blog post are not new, I did not find any research that focuses on recovering information from intentionally redacted videos, which has several key distinctions from other use cases:

  • Some parts of the video are high-resolution, allowing for high-precision motion tracking
  • The frames are from a long time period, resulting in more camera and/or object movement compared to a burst of images
  • As the focus is only on retrieving the redacted information, it's not necessary to create an overall good looking picture. Motion tracking e.g. can be optimized to only keep the blurred area as still as possible
  • The desired increase in resolution is very high and computation time almost doesn't matter

Please let me know if you are aware of such research.

Conclusion

In this blog post, I discussed various image and video (un-)blurring methods and presented a simple yet effective technique to potentially uncover redacted information from pixelated videos.

Content creators and journalists should be aware of the additional risks when redacting information in videos and use a sufficiently high mosaic size/blur radius, or better yet, use an opaque, single-colored box.

Furthermore, the search for information leaks must happen before publishing (preferably by another person with a fresh view) and should not be crowd-sourced to the first viewers.

Follow us on Twitter (@positive_sec) to keep up to date with our posts.