Automatic Content Recognition (ACR) – How Does it Work?

Automatic Content Recognition (ACR) refers to technology embedded into OTT applications or SmartTVs that recognizes the content that you are watching by sampling small portions of the video/audio and comparing it with a large database.

ACR is prevalent in SmartTVs and hand-held devices and plays a major role in the audience measurement and ad-tracking industry.

In this article, let’s take a look at how Automatic Content Recognition or ACR works and some use-cases for this technology.


Firstly, how is Data Gathered from OTT Applications?

Before we look at ACR, let’s first take a quick look into the field of analytics and data gathering in OTT.

Typically, an SDK or library is integrated into an OTT application (HTML5, Android, iOS, SmartTV app, etc.) and then released to the public. Once it’s installed on a phone, or TV, the application can track the user’s actions, the content being watched, etc. at a very granular level.

Each time the user presses play/pause/stop/etc., the SDK records the action and reports it back to a server. Similarly, data points from millions of users are gathered, cleaned, and then presented in a useable format in dashboards back to the OTT content provider.

In most cases, it’s usually the publisher (aka content provider) who is the consumer of this information and the publisher uses it to improve their QoE, content offering, advertising strategies, etc.

You may think that this level of data-gathering is intrusive, but, the fact of the matter is that you agreed to this by pressing “Yes” on the consent form when you installed the app which in all likelihood, you didn’t read!

With that introduction to data-gathering (which is rather common in today’s world), let’s switch over to another form of intelligence-gathering – Automatic Content Recognition (ACR).


What is Automatic Content Recognition?

Automatic Content Recognition refers to technology that samples the audio or video that a user is consuming, creates a fingerprint from that sample, and compares this against an extensive database of fingerprints to automatically recognize what was being watched or listened to. In some instances of ACR, the recorded sample might be directly transmitted to a server for processing and further information extraction.


How Does Automatic Content Recognition Work?

As we’ve already seen, ACR works by sampling the video and/or audio and using that information to determine the content being consumed. This leads us to Acoustic (or Audio) Fingerprinting and Video Fingerprinting.

Here’s a visual explanation of ACR works. Simply put,

  • fingerprints are generated for the media that needs to be recognized (using either audio or video fingerprinting techniques). These fingerprints are stored in a database.
  • ACR-enabled SmartTVs, phones, or other devices generate similar fingerprints and transmit them to a server that compares these device-generated fingerprints with the main database to find a match.
  • Based on database-match, metrics or data are generated that provide insights into media consumption.
ACR Automatic Content Recognition

That’s fundamentally how fingerprinting and ACR works. Now, let’s take a look at the different techniques used in ACR.

Acoustic Fingerprinting

Quoting from Wikipedia, an acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database.

Certain metrics such as frequency, amplitude, tempo, spectrum (i.e., characteristics in the frequency domain), etc. are used in building a fingerprint or signature of the audio signal.

Another reason why this is important is that audio is generally compressed before transmission. And compression algorithms generally remove characteristics of an audio signal that are not perceptible to humans. Hence, the acoustic fingerprinting algorithm that are you building should also take these sources of distortion and noise into account.

Video Fingerprinting

Similar to Audio Fingerprinting, in Video Fingerprinting, small video clips are made from the original video, and certain characteristics are extracted from it. These techniques take care to ensure that image manipulation technologies like compression, or resizing do not affect these fingerprints and the content can be recognized nonetheless.

Digital Watermarking

Watermarking is the process of embedding data into video/audio covertly such that the embedded information is not ordinarily or easily detected. The watermark can be detected only by specialized and authorized watermark detecting software. Watermarking allows publishers to track piracy and establish authenticity. In the case of Automatic Content Recognition, one can use Watermarking as a method of detecting if someone has engaged with or watched a content.


Uses of ACR

There are several uses of ACR technology. Some of the more prominent ones are –

  1. Detection of copyright infringement: Copyrighted material such as video and audio are often used indiscriminately without attributing or paying royalties to the original content creators. If a database of copyrighted content exists, then large UGC platforms such as YouTube, TikTok, Vimeo, etc. could check to see if user-uploaded content contains copyrighted material or not.
  2. Ad-tracking: ACR has found a lot of use in the advertising industry and for good reason. Here’s why –
    1. Unless you have the ability to determine if an ad was played and watched by the end-user (instead of being buried at the end of a long landing page), then your metrics don’t make a lot of sense and it could lead to inflated data with respect to ad impressions, plays, and completion rates. This requires SDKs and changes to the players that can consume a lot of effort and development cycles.
    2. However, ACR has the ability to recognize the content that is being played by sampling certain pixels of video, or by recognizing the audio. This enables ACR to provide a better picture to the advertisers and publishers on the ad delivery and engagement.
  3. Collating information from different sources: This is a very interesting use-case of ACR. In most homes, there is one big TV in the living room where people gather to watch movies. However, the content streaming to the TV could come from an STB, Chromecast, Roku, FireStick, or an Xbox. Instead of embedding code inside all these devices, SmartTVs with ACR can recognize the content being played (from the “glass”) and report on it. This allows for content attribution and normalization across a variety of sources.
  4. Understanding Audiences and their preferences: Similar to other methods of gathering usage analytics, ACR allows broadcasters and content providers to know how their audience is responding to their content, marketing, strategies, etc. By having fine-grained information about their audience and their usage patterns, broadcasters can better invest their dollars and get a much higher ROI.
  5. Ad Retargeting by OEMs: Samsung includes ACR technology in their SmartTVs and sells ad inventory and provides ad-retargeting services. According to their website, “Samsung Ads offers TV Ad Retargeting that empowers brands to identify audiences who saw or missed their TV spots and reconnect with them via mobile, tablet, desktop or OTT.” And, “Samsung Smart TVs have built-in Automated Content Recognition (ACR) technology that can understand viewing behavior and usage including programs, movies, ads, gaming content and OTT apps in real-time”. You can read more about Samsung’s Privacy Policy here where they are pretty open about recording your video and audio to understand “you” better!
Related:  Calculate PSNR, VMAF, SSIM using FFmpeg


Controversies Surrounding ACR

The bone of contention around ACR is due to the fact that audio and/or video are recorded, fingerprinted, and often stored for future use. Some devices might be able to generate the fingerprints on-device, but some might send the audio recordings to the cloud for further processing.

So what happens if your private conversations are in those recordings? Who is listening on the other end?

Samsung got into one of these sticky situations and had to clarify in a press release. Their initial privacy policy stated –

“Please be aware that if your spoken words include personal or other sensitive information, that information will be among the data captured and transmitted to a third party through your use of Voice Recognition.”

This spooked a lot of people and Samsung had to backtrack and release a clarifying note that said –

If you enable Voice Recognition, you can interact with your Smart TV using your voice. To provide you the Voice Recognition feature, some interactive voice commands may be transmitted (along with information about your device, including device identifiers) to a third-party service provider (currently, Nuance Communications, Inc.) that converts your interactive voice commands to text and to the extent necessary to provide the Voice Recognition features to you. In addition, Samsung may collect and your device may capture voice commands and associated texts so that we can provide you with Voice Recognition features and evaluate and improve the features. Samsung will collect your interactive voice commands only when you make a specific search request to the Smart TV by clicking the activation button either on the remote control or on your screen and speaking into the microphone on the remote control.

And, please don’t think that I am picking on Samsung. Another TV manufacturer, Vizio was fined by the FTC for not being forthright with its data-tracking policies. (link to the notice on the FTC website).

And, here’s an interesting article from consumerreports.org on how to turn off “snooping” features on Android TVs, Amazon Fire TV Edition, LG, Roku, Samsung, Sony, and Vizio.

All of this constitutes a weird situation, I must say.

I don’t blame Samsung or Vizio entirely for this because we live in a digital era where everything we do or touch from the time we wake up … to the time we sleep is known to some server in the cloud! Heck, my Fitbit also knows the percentage of REM vs. Deep Sleep that I get every night.

That’s how “plugged-in” we are into the digital ecosystem and the fact that your SmartTV “knows” what you’re watching shouldn’t be shocking!


Conclusion

ACR technology is pervasive today and provides tremendous value to OEM’s, content providers & broadcasters, and advertisers. By providing detailed, granular information on content-engagement, audience, and usage patterns, ACR is here to stay!

Finally, on a lighter note, I shall end with this quip I read online that I thought it was apt this article on ACR.

Confucius say – In 1984, you watch TV. In 2020, TV watch you!

krishna rao vijayanagar
Krishna Rao Vijayanagar

I’m Dr. Krishna Rao Vijayanagar, and I have worked on Video Compression (AVC, HEVC, MultiView Plus Depth), ABR streaming, and Video Analytics (QoE, Content & Audience, and Ad) for several years.

I hope to use my experience and love for video streaming to bring you information and insights into the OTT universe.

1 Comment

Leave a Reply

Your email address will not be published.


*