When I first met Gil Perry, CEO and Co-Founder of D-ID four years ago, he envisioned a future where our photos of our faces, or videos of our friends or families would be free from computer tracking. His use of technology to obfuscate image data using Generative Adversarial Network (GANS) (which creates synthetic images that closely resemble the original photo, and is perceived as identical to the human eye, but completely dissimilar to a machine learning model) has created an alternative revenue model that enables organizations to protect the privacy of their employees. D-ID soon moved to video, hired the top deep learning computer vision experts in Israel and built top expertise in synthetic human faces computer vision.
D-ID’s technology was already being leveraged by fortune 100 companies, two smart city applications, CCTV manufacturers and users, and automotive companies. When Covid-19 hit, another opportunity arose for D-ID in the media and entertainment industry.
Today, in the content industry, there is a considerable gap between the demands of an endless array of social media sites and content outlets, and what content makers can afford to make, given time and cost. The use of synthetic media gets over this gap by allowing the production of large amounts of audio, video or image content without the need for costly, physical processes.
Gil Perry, Sella Blondheim and Eliran Kuta, D-ID’s Co-Founders spotted this opportunity during the global pandemic. Perry explains:
“Full film productions were halted because actors were unable to leave their homes. We understood that we could help in these circumstances and help deliver advertising required without actors needing to leave their homes. So we researched for six months, spoke to many potential prospects to understand what they required. We tested our tech limits and built a proprietary technology - our AI Face Platform, which enables the creation of high-quality, realistic video footage using any driver video to guide movement and bring still photos to life.”
For D-ID, their recent partnership with MyHeritage provides a great opportunity to showcase their AI Face Platform. MyHeritage, the genealogy site which allows members to discover their roots, is bringing old family images to life in a totally new way. People who were born before the era of video can now be seen to move their heads and make facial expressions. And it’s all done using AI and D-ID’s Live Portrait.
>A Wave Of Billion-Dollar Computer Vision Startups Is Coming
>Lunchclub Backed By Clubhouse Investors And Robinhood CEO Is Perfect Matchmaker App For Founders
>Alarming Cybersecurity Stats: What You Need To Know For 2021
Photos are mapped and then animated by a driver video, allowing the subject to mimic the motions of the driver video. As Gilad Japhet, CEO of MyHeritage, put it:
“MyHeritage is excited to partner with D-ID and integrate its technology into MyHeritage’s suite of advanced photo features. Historical photos provide us with a tangible link to our past. Seeing our ancestors’ faces come to life through video reenactment deepens our connection to our family history and is simply breathtaking.”
D-ID has also partnered with Peach Content, a Creative Agency, to produce high-quality creative content - with powerful tools that streamline creativity.
“D-ID's unique technology allowed us to take a still photo and combine it with live-action movement,” said Peretz Markish, VP Creative and Production at Peach Content. “It removes the limits that confined filmmaking in the past and gives us a brand new canvas of options.”
For D-ID, the use of synthetic media to replace live actors offers a cost-effective alternative to the movie and media industry. As Perry notes, companies are already tackling voice cloning and green screens are used as alternatives to more expensive on-location shooting. The leading edge cases involve manipulating the human body, and in particular, the face. In video production, most of the content is generated through face and voice. As leaders already in the face space, Perry realizes the next disruption in media and entertainment will be the creation of media using AI.
The range of potential uses is enormous, from precise lip-syncing in dubbing, to automated news readers driven from text, to the ability to re-do takes without having to get the whole crew and cast back together.
Perry contemplates another example:
“Imagine a digital news platform in which the end user can take a written article, choose his favorite news anchor, and see this anchor broadcast the news, in the news studio, instead of just reading them. Our algorithm generates the reporter and shows him in the news studio without him being filmed.”
In the dubbing process, example from French to English, where not only is the audio changed, but the French actor’s lips are now in sync with the new English voice. D-ID is now in discussions with two media corporations that want to pilot these capabilities.
The Deepfake Risk
But this is a space rife with potential privacy concerns, including the notorious ‘deepfake’ technology where bad actors swap or manipulate people’s faces for malicious ends. Gartner’s Predicts 2021 Report anticipates that:
- In 2023, 20% of successful account takeover attacks will use deepfakes as part of social engineering attacks
- By 2024, 60% of AI providers will include harm/misuse mitigation as a part of their software
Journalist, Nina Schick believes that in the next five to seven years, most online video will be synthetic. Big Tech is also testing the limits of synthetic media: Facebook’s Deep Fake Challenge to develop algorithmic deep fake detection systems, and Google’s LipSync challenge to teach AI systems how to read lips and help individuals with speaking disabilities.
And D-ID too are aware of the problem. Privacy and safety is the intent of their Face Anonymization solution which replaces one face with another on video to protect identities. The technology is already being adopted by documentary film producers who need to protect the identity of whistle-blowers, victims of sexual assault, and children – without compromising the quality of the viewing experience.
״In the time we’ve started using D-ID’s Face Anonymization technology, we’ve seen a greater willingness of people to go on film and tell their story without fear or concern of repercussion. While we’ve been able to use blurring and voice-altering techniques in the past, using Face Anonymization creates a whole new viewing experience for the audience.”
This is evident in Pinchasov’s recent Israeli primetime documentary, Reasonable Doubt, which centered on the possible wrongful conviction of a man in jail for murder. The police officer involved didn’t want his face to be seen, so Pinchasov replaced his face with that of another actor using D-ID’s expertise.
In addition, D-ID’s face detection and blurring capability blurs faces even in the most extreme video-conditions like big crowds or people wearing face masks. This high-end blurring feature allows documentary film makers and even organizations running CCTV security cameras to use footage, while protecting identities and other personally identifiable information (PII).
Synthetic Media to Synthetic Data
The smartphone has enabled the acceleration of photos, which gave rise to image databases, and an exploding demand to improve accuracy rates of facial identification. What we’ve experienced in the last 6 years include law enforcement’s dependence on CCTV cameras, in most public places and in schools and similar technologies like Ring in our homes; applications to surveil employees and contractors like Amazon’s driver monitoring; the use of more people’s personal photos used without consent as in the case of Clearview; a Covid-19 response to mandate remote video classroom learning that has also unveiled practices to collect student biometric and other personal information, while also leveraging AI analysis to determine instances of student cheating.
What has become common practice in the last 35 years is the development of large scale databases including Facial Recognition Technology FERET, introduced by DARPA in the mid-1990s, the Labeled Faces in the Wild (LFW) released in 2007 that included images downloaded directly by researchers from Google, Flickr, Yahoo and finally Facebook’s own database of user photos which, in 2014, were used to train their deep learning model DeepFace. These sources collected information from millions of individuals, without consent, and surreptitiously operated below the radar of any impending legislation. The fallout has been the inception of decisioning systems intent on procuring more pervasive recognition, tracking, and predictions, which have already proven to be harmful to individuals and groups.
Gartner recently released their Predicts 2021 Report: Artificial Intelligence and Its Impact on People and Society which posed this glimpse into the future:
“Generative AI, for example, is now able to create> amazingly realistic photographs of people and objects that don't actually exist; Gartner predicts that by 2023, 20% of account takeovers will use deep fakes generated by this type of AI. AI capabilities that can create and generate hyper-realistic content will have a transformational effect on the extent to which people can trust their own eyes."
To counter this, the industry is starting to put protective policies in place. Currently, the petabytes of data that are generated everyday are largely in control by the big tech giants: Google, Amazon, Microsoft, Facebook, Apple. Smaller start-up companies do not have access to these huge data sets and therefore are already at a disadvantage when they create their training models. The use of synthetic data becomes the privacy-preserving alternative that allows smaller companies to build their dataset volumes effectively to create prototypes and models.
According to Gartner:
- By 2024, 60% of the data used for the development of AI and analytics solutions will be synthetically generated.
- By 2025, 10% of governments will avoid privacy and security concerns by using synthetic populations to train AI
Gil Perry also understands the importance of enforcing D-ID’s strict policy frameworks for these types of projects, to minimize things like identification of participants who choose to remain anonymous and the full consent and awareness of those involved. In the documentary, for example, there was also transparent disclosure that the individual on screen is not the real person.
Nevertheless, the possibility of misuse during this era of deep fakes is very real. Establishing governance around the use of Face Anonymization, Live Portrait, Talking Heads and Lip Sync to ensure clients are following best practices may not go far enough to curb opportunities that may be beyond D-ID’s control. He understands that with the capabilities they’ve created, comes enormous responsibility:
“So, first with every amazing technology and disruption, there are risks and we take them very seriously. This technology is already here and we are creating it to establish ethical use in the face of progress. We can help prevent a lot of damage because we come from the privacy world, and the vision and what drives us and what keeps us up at night is this motivation to create good. So, more than just seeing the potential of harm, we know we are in the right place. The creation of the technology and resulting policy frameworks and monitoring how videos are used, will collectively influence organizational use. As founders, we put rules and guidelines in place against bad actors proliferating deep fake videos. We will also use watermarks in some cases which lets viewers identify a synthetic video.”
Perry contends that his software is geared towards business at the moment and those who licence their technology understand its limitations and are committed to using the products consistent with their values. They are also part of an open group that includes regulators to build policy around this emerging technology. Engagement across policy, technology and business is integral to mitigate any undue harms.
Synthetic data and the governance around its use seems like the right evolution towards a future of privacy. Perry is confident that AI’s outcomes thus far will chart this industry’s course that includes significant changes in the way the data is collected, created, processed with a view into potential individual and societal impacts.
And while intention towards ethical practices needs to be a groundswell adoption in lieu of impending legislation, the future that shifts industry demand from personal data to synthetic data is the future we all want.