Facial recognition and public data

Dr Sandra Peter is the Director of Sydney Executive Plus and Associate Professor at the University of Sydney Business School. Her research and practice focuses on engaging with the future in productive ways, and the impact of emerging technologies on business and society.

Kai Riemer

Kai Riemer is Professor of Information Technology and Organisation, and Director of Sydney Executive Plus at the University of Sydney Business School. Kai's research interest is in Disruptive Technologies, Enterprise Social Media, Virtual Work, Collaborative Technologies and the Philosophy of Technology.

This week: all things facial recognition and the implications of using public data.

Sandra Peter (Sydney Business Insights) and Kai Riemer (Digital Futures Research Group) meet once a week to put their own spin on news that is impacting the future of business in The Future, This Week.

The stories this week

00:35 – Facial recognition systems and the data they’re collecting

Other stories we bring up

Mastercard’s new face recognition payment system

WIRED used a free trial FindClone to trace an alleged captured Russian soldier

The UK Information Commissioner’s Office fines Clearview AI £7.5m

Clearview AI

PimEyes

Researchers in 2019 built a facial recognition system in New York City

Software engineer Cher Scarlett’s experience with Pimeyes via CNN

More from Cher Scarlett

Dall-E 2, the visual AI

Some software misidentifies some Black and Asian people 100 times more often than white men

Our previous discussion of facial recognition for fish on The Future, This Week

Our previous discussion on The Future, This Week around paying for fried chicken by scanning your face

Follow the show on Apple Podcasts, Spotify, Overcast, Google Podcasts, Pocket Casts or wherever you get your podcasts. You can follow Sydney Business Insights on Flipboard, LinkedIn, Twitter and WeChat to keep updated with our latest insights.

Send us your news ideas to sbi@sydney.edu.au.

Music by Cinephonix.

Sandra Peter

Kai Riemer

Transcript

Disclaimer We'd like to advise that the following program may contain real news, occasional philosophy and ideas that may offend some listeners.

Sandra Facial recognition everywhere. New payment systems, new court rulings, in the war in Ukraine, and even on our phones.

Intro From The University of Sydney Business School, this is Sydney Business Insights, an initiative that explores the future of business. And you're listening to The Future, This Week, where Sandra Peter and Kai Riemer sit down every week to rethink trends in technology and business.

Kai So they've been all these stories about facial recognition this week, quite a few.

Sandra Yeah, Mastercard came out with a new face recognition payment system that they're promising to trial in Brazil, with future pilots in the Middle East and across Asia.

Kai How does that work, I rock up to a counter in a supermarket and I just show my face? And there was something with waving, I wave at the camera or something?

Sandra You can wave at the camera, but the camera's still just gonna look at your face, so the waving is a bit of

Kai Just a gratuitous gimmick, yeah?

Sandra Yeah, but it's just you know, smile and wave, smile and wave.

Kai The supermarket has to sign up with Mastercard to have this service enabled. How does the thing know my face? Where does the face come from? Do I have to give that to Mastercard?

Sandra Well, according to the report, customers will instal an app which will take their picture and will take payment information, in this case, likely Mastercard payment information. This will be stored on a third-party app. So a company like Fujitsu, for instance, is named in the article. And at the checkout, the customers face will just be matched with stored data. And once they know it's you, they'll take the money straight out of your account.

Kai So it's a complicated partnership between tech companies that then sign-up retailers and customers then opt-in. So that's good news. Because a lot of the stories that we saw this week, they're not opt-in, they're not me uploading my face allowing people to use stuff.

Sandra That's right. Most of the tools that we've seen in the news companies like Clearview AI, work on data that they scrape off the internet. That's publicly available websites like our university websites, where both of our pictures are up, or indeed social media profiles. So your Twitter account and any pictures you might have posted there, or your Facebook account, anything that's public, basically, on the internet or...

Kai Yeah, or people post pictures that, you know, include your face, and then tag you. So there's all kinds of pictures about people on the public internet, and on social media platforms, that some of these tools have scraped and built quite gigantic databases.

Sandra And we've seen a couple of them mentioned in relation to the war in Ukraine in the past week, Clearview AI for instance.

Kai Yeah, Clearview AI has offered free access to the Ukrainian government to use its quite controversial solution to essentially help Ukrainians identify Russian soldiers, Russian assailants, and presumably oust them publicly in order to put pressure on the Russian government politically.

Sandra But we'll come back to Clearview AI, which is the company that you mentioned, which is only available on a commercial basis to mostly law enforcement, but also banking and insurance and finance, a few other companies. But a story in Wired mentioned that again, looking at identifying Russian soldiers in Ukraine, that it only takes five minutes to put a name to soldier’s face using nothing more than really a screenshot or a shot you've taken with your phone. And in this case, we saw journalists from Wired using a free trial of what is actually a Russian service called FindClone to trace pictures of a man that the Ukrainian government claimed was a captured Russian soldier.

Kai And the US government has used solutions like Clearview AI to identify people who participated in the insurrection of the Capitol. Journalists of The New York Times have also previously used services like that for similar purposes, to put names to faces that were recorded on publicly available television.

Sandra And this is not just names, right, to those faces. So in the case of the Russian soldier, there was the teenager's birthday, photos of his family, his place of work, his friends, so fairly detailed information in many of these cases.

Kai And that's exactly why governments have reacted quite vigorously to protect their citizens private information. We've seen this with the Italian government, but also just recently this week with the UK Government. There's been a ruling where Clearview AI was fined 7.5 million pounds for basically harvesting data of UK citizens from social media platforms such as Facebook, and Twitter, which is against European law, and Clearview AI has been found to infringe upon that.

Sandra So this is fairly significant because it's the UK data protection regulators third largest fine ever.

Kai It's called the Information Commissioner's Office, ICO.

Sandra And they have gone basically after a US-based company, which Clearview AI is, asking them to delete, as you've said the data of all UK residents from its system. And Clearview has collected more than 20 billion images of people's faces and data from Facebook and other social media companies. This is just scraping the web for publicly available information.

Kai And while it's not clear how many of those 20 billion photos are from the UK, it's quite significant the amount of data that this company is amassing, I mean, this is already three photos for every person on the globe, on average. And the company has announced that it is on track to amass 100 billion photos. And that's up from just 3 billion in 2020. So this is a really data-hungry company that just without any constraints seems to swoop up anything that it can find in terms of face pictures on the internet.

Sandra We thought this is a really good opportunity to go over what facial recognition is, how it works, how data fits into this entire story. And then to go over some of these changes around data, data privacy and data regulation around the world. And this is important because we're increasingly seeing the use of facial recognition above and beyond law enforcement, right. It used to be the domain of the police or border security where you had facial recognition or security features in phones or cameras or buildings. And now we're seeing a lot more commercial use. So it's worth really understanding exactly what this is about. And then what are some of the really interesting, good positive uses of it. And what are some of the more difficult aspects of facial recognition. So what is it?

Kai So facial recognition is the use of machine learning, deep learning algorithms to locate analyse and match faces in picture material in videos to a large-scale database of lots and lots of photographs that were trained into those algorithms.

Sandra So simply, some software that looks at you, analyses your face, and then can sometimes confirm your identity. And it's some of the most powerful uses of AI that we've seen in recent years. But it really relies on three kinds of different types of things that machine learning algorithms can do.

Kai So the first one is to train the machine to find a face in a video or in a photograph. So detect the face, know which pixels actually match the contours, the characteristics of a face. And in everyday life, we have that on display when you know we point our camera or phone at something and there's a face, and it draws this yellow, or red square around the face. And we know the camera has found the face.

Sandra And it knows to focus around that face, right?

Kai Exactly.

Sandra Or it used much like we saw this morning on Zoom to, you know, draw a little avatar, and animate that avatar or portrait on your face.

Kai So with that technology, all the computer knows is there is a face there.

Sandra But then the computer can go a bit further, there are algorithms that map faces. And often that's done by just measuring various characteristics of your face. So for instance, if I look at you Kai, I might measure what the difference is between your eyes or you know, the distance to your ears, or how big your nose is. And I add those numbers up and I get what is called a faceprint, right? An ID of your face.

Kai Yeah, and the computer has to go through a little bit of acrobatics, you know, flatten it down, if you look to the side, it sort of, you know, does this hockey mask thing where it basically pulls your face straight, but it quite reliably then transforms what we see as a face into essentially a mathematical value, a big string of data, that is your unique number that represents your face, versus my face or someone else's face.

Sandra And that's why I can unlock my phone with my face, and you can unlock your phone with your face.

Kai Yeah, and that's a one-on-one match. The phone has that on file. When you train your phone, it stores this string of data on your phone and then only your face will match that string. So that's the, basically, face analysis that can match one face to a string.

Sandra Yeah, so you can detect a face, you can analyse a face, but then there's a third thing you can do in facial recognition, which is the actual recognition bit, and that is confirm the identity of someone based on their face.

Kai And you can use that to pick a face out of a crowd or do large-scale analysis on all the faces in the crowd and see if it matches the face of someone you are looking for, for example. And we need to point out that this is much less reliable than what happens when you unlock your phone, because that's just a one-on-one match between your face and a string of data. Here, we now go to the messy business of billions of faces whose data values are ingrained in this deep learning algorithm, which is a very, very complicated set of numerical values that has been trained by feeding this humongous database of faces.

Sandra And this is where it gets interesting, because the idea here is that that algorithm gets better and better, the more faces I show it. While just analysing a face you just need to feed it your face from 20 different angles until it learns your face. With these databases when you're trying to recognise or identify, confirm someone's identity you want as many pictures as you can get. And the more pictures you have, the more accurate the system gets. And we've seen over the years, so many stories about how some of these systems can be biased, because they rely on biased data sets. For instance, the very early databases just had samples of faces of white men. Hence, they really struggled to identify women or identify people of colour, where the system might not even recognise that as a face. And as we've added more and more faces, we start to correct for some of those biases.

Kai And there are studies such as the one reported in The Washington Post, I think it was three years ago, and we'll put it in the shownotes, which says that faces of Asian and especially black people are misidentified 100 times more often than those of white men. And it's partly because of the data that we feed into it. But for many of these algorithms, there's also some technical reasons because facial recognition relies on contrast, and that's often stronger in most lighting on white faces than black faces. So there is always some inherent problem with identifying faces of different colour.

Sandra And an article in The New York Times also points out that there are also issues with things like gender identification, because labels are typically binary, male or female. So there are many, many issues with these databases. But usually the more data you have, the better they become.

Kai And better data, right? The good thing about social media data, and that's why it's so preferable for these companies is that people upload data about themselves. And they try to basically identify their own data in the ways they want data to be identified. So this is actually quite good data, much better than what you can scrape from the public, anonymous internet.

Sandra But this also gets us back to Clearview AI, which has one of the biggest data sets, as we said 20 billion images scraped off the internet. And it is used in, as we said, mostly law enforcement, but now also a application that's available to banks or insurance companies.

Kai And while you can make this abstract argument and say social media data faces are better than faces from the public Internet, of course it does not mean that people have consented to or necessarily want either any of the faces, or maybe especially the stuff from social media, be included in those databases. And this is where the UK Government has come in and said, 'this is plainly illegal in this country, and you have to pay a fine and you have to delete those faces from your database, please'.

Sandra But we'll come back to this in a minute. Because there are a few other things to mention about, kind of, public software use for facial recognition. Because we've been playing with one for the past couple of hours here.

Kai And that's an article that came out just this week.

Sandra Yep, also in the New York Times. A face search engine that really anyone can use, and that the New York Times deems "alarmingly accurate". And to our knowledge, they don't scrape that off social media, they at least don't list it in their results. But PimEyes is a paid service, you can trial it for free, but it's a paid service that anyone can use that finds photos from across the Internet to match whatever you've uploaded as a test photo. So if we were to walk on the street, technically I could take a picture of someone on the bus, or you know someone hanging around on the street corner and try to see if there's anything on the Internet about this person. And it's surprisingly accurate, you can have a look in the New York Times. It even recognised people with a face mask on. We've been playing with it and...

Kai We tried that ourselves, it didn't work quite so reliably.

Sandra But then both of us have glasses and very good face masks.

Kai What it does, it throws up a whole bunch of faces that look eerily alike, where the eyes on the facemask could be the person, and that is part of the reason why this is so problematic because it throws up a lot of false positives. It did, however, identify me. It did find one picture of me, you know, without a face mask, not so much for you. But it can also be problematic when it finds those false positives, where the picture looks eerily alike, but it is a person in a completely different context.

Sandra But with normal pictures it was eerily good, phone pictures taken at a distance. Again, we've only used people who have given us consent to have their pictures tested on this. So please use pictures of yourself if you're going to find this in the shownotes.

Kai And in fact, the tool actually only wants this to be used on your own face, you will have to tick a little tick box where it says, 'I'm using this with consent, I'm using this only on my own face'. And the article also mentions this that the owner of the system says, 'only use it on your own face'.

Sandra And the app by the way, is a Georgian app.

Kai Georgia. The country, not just state, yes.

Sandra Georgia the country, not the state. Yes. The one that declared independence from the Soviet Union back in 1991. That Georgia. So what can you use facial recognition for because I think this is a good opportunity to remind ourselves that there are not only nefarious uses of facial recognition, right? In Australia, we have the SmartGate technology at the airport that lets you out of the country and back into the country that actually works extremely well.

Kai It uses the same technology as you have on your iPhone, a one-on-one match between your face and what the immigration Border Force has on file about you.

Sandra We mentioned just now Mastercard and the payment technology, but China has been using biometric checkout technology, already five years ago. We talked about this a number of times on the podcast, paying for your train ticket in Beijing using your face. And there are legitimate uses in law enforcement, monitoring for known criminals or identifying child victims of abuse, identifying victims of human trafficking

Kai Or matching CCTV footage of a crime scene to databases of known offenders for example.

Sandra Fraud protection at ATM machines, logging in and out of laptops and phones and all sorts of devices.

Kai Facial recognition doesn't just stop with people, right?

Sandra Yes, the fish, right, facial fish, fish facial rec...

Kai Fish facial, facial fish, fish recognition.

Sandra Fish facial recognition, one of our favourite episodes from 2018.

Kai Facial recognition for fish.

Sandra Yes, where the Northern Territory Department of Primary Industry and resources here in Australia partnered with Microsoft to use AI for fish facial recognition.

Kai And that was to alleviate the strain on its researchers who were doing species surveillance and would have to sit in front of tedious long video stream and basically press a button: fish, no fish, fish, no fish.

Sandra Hence, you know, facial recognition for fish. And as we mentioned then, this is like some great technology because these fish do not pose for a passport picture and then move to the side, but rather just swim around. And as you've said, it's not just for fish, there's facial recognition for pigs. We talked even earlier than that about GoGo Chicken in China that was doing facial recognition for chickens. There's koalas, there's sharks, you name it.

Kai A former colleague from the Business School, Steve Elliot, was involved in a conservation project trying to save the orangutans in Borneo, they developed a facial recognition for primates basically, as well.

Sandra So lots of beneficial uses of facial recognition. But there's also some difficulties, and we've alluded to some of them. And I think it's important if we're looking at facial recognition to recognise that. And probably the most obvious difficulty is around the accuracy of the systems. Again, the more data you have, the better they get. But often software that law enforcement agencies use, and Clearview AI is one of those examples, isn't readily available for like the public to audit and the algorithms that they use, they are black box algorithms. So everything that we normally discuss in the context of AI still applies to these facial recognition systems.

Kai And there are those concerns that it leads to many more false positives for people of colour, it leads to over-policing of certain groups of people. And that's why some jurisdictions like the city of San Francisco have outlawed the use of facial recognition systems in public places.

Sandra The other issue is also that of who has access to the technology. We've just discussed this both in the context of the technology that we use, but also the Wired article that was using a free trial of a Russian service called FindClone can allow anyone to find out fairly personal details about other people. And in that case, the technology was used to identify a captured Russian soldier. But equally, it could be used to identify a Ukrainian soldier, their family, and their friends, and intimidate them or worse.

Kai And the article about PimEyes has the owner of the service on record, saying that he wants to provide the service to democratise access to this technology. Basically saying that law enforcement governments, often also autocratic governments, Georgia has quite a history of oppression, already use these technologies. And this would empower ordinary people to basically look up what compromising material might be found about them online that could be you know, in the innocent version, old party photos, but in other countries, their face that was identified in, you know, marching in a demonstration or something, and then try to take steps to have this compromising material, you know, removed from the internet. And the company also provides a paid service to help you with that.

Sandra And it's indeed this democratic creation of the technology that presents some of the difficulties around it. And I recall this other article that we covered back in 2019, that came out of the New York Times, that was showing just how easy it is to actually build one of these software's yourself. And what they had done back then was to install three cameras on the roof of a restaurant and just film people over lunch, like crowds, tourists, commuters going past the restaurant around lunchtime, but then they used that footage, and a service available to anyone on the internet that costs only $100, to transform it into a facial recognition tracking system. And basically identified people who worked in the area, they had tracked a college professor on his way to work. All of this was for $100, and perfectly legal, so they had not broken the law.

Kai And while you can use these technologies to you know, find out the identity of a troll or a bully that harasses you online, it can equally be used to, you know, find out everything about that guy on the bus or that girl on the bus, you always wanted to know more about in other words, it can present the perfect stalking tool for people who intend to do so.

Sandra And there are some difficulties around opting out of these platforms. First, it's quite difficult to know what databases have your data, and then it's quite difficult to have your data removed from these databases. As we've seen, in the case of PimEyes, you would have to upload a driver's licence or a passport that identifies you providing even more information for them to make sure that really to verify that it's you asking for this to be removed. And in the case of Clearview, we've seen the UK government really struggling to get the data of UK citizens of the platform.

Kai And it's important to note that in the case of PimEyes, it will only remove this from the search results not from the web as such. So other tools might still find that information. And in the case of Clearview AI, the company doesn't seem to give a...

Sandra Yeah, a spokesperson for the company said that they will basically not comply.

Kai Because they're a US service and the US has different kinds of law and what do they care because they do not currently provide services into the UK.

Sandra Yep, they're not subject to the ICOs jurisdiction. And they don't do business in the UK at this time.

Kai Which raises the question of how a government can protect their citizens online when they are overseas. Because clearly, the UK struggles here.

Sandra And there's many such data sets that offer similar dilemmas right? Take the Chinese dataset WebFace260M, which is built for training facial recognition AI, and it uses again, faces and names scraped from IMDb and Google Images.

Kai Among other things.

Sandra Among other things. The dataset might or might not be governed by Chinese law. The faces are not entirely of Chinese citizens, there are obviously international faces. So there's really not a lot of clarity yet around what laws and what jurisdictions govern the data that goes into these apps.

Kai At the same time, there was an article, and we'll put it in the shownotes, that many more countries these days try to counter this by cracking down on the use and trying to come up with local regulation, again, bumping up against the global nature of the internet.

Sandra And speaking of that global nature of the internet, we can draw some conclusions here around the bigger picture on the AI, because facial recognition algorithms are one of the really visible applications of machine learning. But the issues that we've discussed here, especially around the data that goes into them, it goes far beyond just facial recognition. And the article mentions as well that these public datasets scraped by the internet are in large part, what fuels the really powerful AI applications that we're seeing out there.

Kai And it goes to both the use of personal material, not just people's faces, but also people's texts, copyrighted material, stuff that are used for algorithms like GPT-3, text generation software that is based on a large body of text from the internet.

Sandra GPT-3 uses posts, blogs, links, harvested from Reddit and places like that.

Kai And then uses those text blocks to just generate new text. And this can then also include, again, personal information, because sometimes that text is fairly random. It is not as personal as you know, identification of someone's face. But who is to say that these systems not also invade people's privacy?

Sandra Well, we've seen also voice generation, we've spoken on the show about a Roomba that maps people's homes and generates data that way.

Kai We had Strava, the exercise app that creates these social data sets, which have been previously used to basically locate, you know, American air bases and pose again, privacy issues.

Sandra But pretty much anything that you've seen, the news that uses public data sets has similar dilemmas at its heart. DALI-2, the visual AI that's been all over the news in the past couple of weeks, that's built on images scraped from the internet, and that can generate new images. And that software can also be tempted into generating faces of people, not just scenery or just interesting images.

Kai So this is once again, a reminder of the importance of data in creating these algorithms, machine learning, deep learning. The fact that on the one hand, the accuracy of the data is really important, you know, sometimes there are problems, misidentification, but also that these systems only exist because we have these huge amounts of data on the internet. And using that data is not unproblematic, and knowing where your data is coming from, and whether you have the right to use it, much of that has not been fully explored yet.

Sandra But keeping an eye on what the Information Commissioner's Office in the UK will do is one of those interesting cases to watch.

Kai As is the legality of systems like PimEyes, and whether this will draw the attention of governments around the world.

Sandra But I think that is all we have time for today.

Kai Thank you. Thanks for listening.

Sandra Thanks for listening.

Outro You've been listening to The Future, This Week from The University of Sydney Business School. Sandra Peter is the Director of Sydney Business Insights, and Kai Riemer is Professor of Information Technology and Organisation. Connect with us on LinkedIn, Twitter, and WeChat. And follow, like, or leave us a rating wherever you get your podcasts. If you have any weird or wonderful topics for us to discuss, send them to sbi@sydney.edu.au.

Close transcript