Information that has been redacted is often the most interesting. It's therefore no wonder that some people might have a motivation to try to reverse such a redaction for various reasons.
In this blog post, I'll discuss image/video blurring methods and their weaknesses and present a simple yet effective method to get a high-resolution image from a pixelated video in order to recover redacted information (with no guessing involved).
One of the best known attacks on digital redaction methods works only on specific file formats, e.g. PDFs and Office documents. As text and objects/images in those files are stored as different objects in "separate layers", redacted text can sometimes still simply be copied, and colored boxes that were intended to hide information can often be removed to reveal the redacted content. This technique however does not work for simple image formats that only store one layer of pixel information (except for surprising edge cases when using transparency).
A popular alternative to redacting information with a colored box is blurring. Instead of removing all information for a particular region, here the information density is only reduced.
Blurring is usually achieved using one of the following two methods:
One of my favorite unblurring stories is about a Bitcoin Cash private key that was shown on French Television. An entrepreneur wanted to give away some crypto during an interview, but since the station was not authorized to give out money prizes during that TV transmission, the private key information was blurred. By combining tiny pieces of information from different camera shots and knowledge about the encoding scheme of a QR code, two researchers could recover the private key and claim the prize (after 16 hours of manual reverse engineering).
While this attack was very specific to QR codes and combined different manual techniques, more generic methods have also been published.
All of the following attacks share the same core idea: Pixelate potential input data and compare the results.
Similar to a dictionary attack, only probable inputs are compared, greatly increasing the brute force efficiency. The techniques also exploit the missing Avalanche effect of the Mosaic "hash function": When a detail in the input image changes, this will only affect a small area of the output image, thus making it possible to divide the problem into easier to solve sub-problems and only search for matches of smaller regions.
Those results are definitely impressive, but due to the involved creativity of the AI, they can not be fully trusted. For a hacker, that can uncover a redacted QR code or credit card number this way (and check its validity via its checksum), this does not matter, but I hope PULSE-upscaled faces won't find their way into courts soon.
Due to the information loss inherent to image downscaling, this "creativity" is however essential to achieving the impressive-looking results of AI-based approaches.
In comparison to a single image, a video provides much more possibilities for mistakes, some of which eradicate the need for unblurring completely:
In some of the observed cases, the mishaps were fixed after being noticed (e.g. both examples shown above). While this is usually desirable to reduce the spread of this information, it also makes it possible for a malicious actor to detect them easily and in an automated fashion by immediately downloading new uploads from documentary/news channels and diffing them with a download from a later date.
In case all frames are correctly blurred, a different approach is required.
Since a pixelated video can also be interpreted as low-resolution, using video upscaling techniques could be a viable approach. The current state-of-the art algorithms use some form of machine learning such as a Convolutional Neural Networks (VESPCN) or Generative Adversarial Networks (TecoGAN).
Side note: The potentially most extensive research on the problem of programmatically unblurring mosaic'ed regions from videos was done by Japanese Adult Video enthusiasts. Javplayer automatically detects blurred regions and performs upscaling via TecoGAN, and another person spent months improving their custom GAN that was trained with leaked videos (search for "De-Mosaic JAV with AI, Deep Learning and Adversarial Networks").
After seeing many videos where information has been blurred using the Mosaic method, I wanted to implement an idea that came to my mind: Since the camera is often moving relative to the blurred object, the boundaries for the bigger squares are also often moving relative to the real world. Therefore, by observing the color difference of the bigger squares in between frames, and correlating it with the movement, it should be possible to extract additional information and create an output image with a higher resolution.
The following animations illustrate this idea. A black bar in different positions is "redacted" using a mosaic pattern. In the first animation, a slight right shift of the mosaic in relation to the redacted content moves the gray block to the left. This suggests that the dark pixels causing the overall gray were located at the left border of the original mosaic pattern.
Compare this to the next animation where the black bar is located slightly further to the right. Here, the movement did not change the mosaic coloring:
In contrast to an AI-based approach, this would not be inventing information, but extracting spatial data (higher resolution) from existing temporal data (multiple frames). The differences to classic video upscaling are, that the larger part of the video already has a high resolution (allowing for more precise object tracking) and that a single super-high-resolution frame is an acceptable output (compared to a higher-resolution video).
To test the hypothesis, I created a simply example "video" by slightly moving the following input image between each frame while keeping a mosaic blur at the same location.
The result was this image sequence:
By reversing the image movements and putting all images on top of each other (either using GIMP or programmatically), I was able to get the following result:
As an alternative, I also tried upscaling the mosaic'ed video via TecoGAN, which does not help (much) with improving readability:
While the output image was not as detailed as the input, the approach seemed promising, so I created a test video with my phone camera (Moto G9 Plus) and a document I had laying around (coincidentally a request to pay the TV tax that paid for the above documentaries). I then retracted the IBAN using the mosaic filter in Shotcut:
As I did not create the image sequence computationally this time, I did initially not know anything about the movement between frames. Therefore I imported the video clip into Blender, added tracking markers in high-contrast areas and let Blender track them throughout the video.
Using one marker close to the blurred area for the position, and two other markers further away to calculate the scale and rotation, I created a motion-tracked and stabilized video:
Overlapping all frames via a small Python script resulted in the following image:
From there, the redacted text is quite readable and seems to be "DE3000100000001272" (correct) or maybe "DE3000700000001272".
As a next step, this improved output image could then of course also be fed into one of the previously mentioned image unblurring/text recovery techniques.
I was quite happy with those results and curious how they would compare, so I tested three other techniques that were easily accessible (meaning that the code/tool was released and is usable without a lengthy setup and debugging process):
Attempting to use Video Enhance AI, even on maximum upscaling and quality settings, did not yield any improvement on the blurred text's readability:
Downscaling the whole mosaiced video with a lossless video codec to match the resolution of the mosaic (for better results) and then iteratively applying VideoEnhance's SR algorithm with the best settings produced the following illegible result:
I then tried MFSR and played with different Res-Factors and SR algorithms, but could also not achieve a readable result:
The presented technique relies on alignment changes between a real-world object and the mosaic pattern. While in the previous examples, the mosaic pattern was at a static location with only the camera moving, this is not a requirement, and the technique should also work with a moving mosaic grid (as long as it does not perfectly line up with the hidden real-world object throughout the whole sequence).
The following snippet is from yet another Funk documentary, where they go undercover to a gun market:
Applying the same motion tracking technique from above (with markers near the number plate to keep its location static) results in the following tracked video:
When overlaying the frames, I ignored 5 frames with a high amount of motion blur due to camera shake. The result can be seen here:
As we're only interested in the number plate, I then used Gimp to change the contrast, emphasizing the fine differences in gray that we reconstructed earlier:
While again not completely conclusive, I think there is a high chance that the number plate part shown above says "J 1354" or "J 1314". This time however, without the original footage, I can't verify whether that's correct.
This specific (or a closely related) computer vision problem is called "Multi-Frame Super Resolution" and already received quite some attention.
One of the earliest related work I found was from 1996, titled "Extraction of high-resolution frames from video sequences".
In 2006, a paper was published, presenting a way to tackle the related problems of "Color Demosaicing" (for digital sensors) and Superresolution together.
In 2018, Google implemented such a technique for their flagship smartphones with impressive results, and released a paper at SIGGRAPH 2019:
In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multi-frame super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets
Two extensive and well curated lists with more research in this area can be found here:
So while the idea and technique presented in this blog post are not new, I did not find any research that focuses on recovering information from intentionally redacted videos, which has several key distinctions from other use cases:
Please let me know if you are aware of such research.
In this blog post, I discussed various image and video (un-)blurring methods and presented a simple yet effective technique to potentially uncover redacted information from pixelated videos.
Content creators and journalists should be aware of the additional risks when redacting information in videos and use a sufficiently high mosaic size/blur radius, or better yet, use an opaque, single-colored box.
Furthermore, the search for information leaks must happen before publishing (preferably by another person with a fresh view) and should not be crowd-sourced to the first viewers.