"Face recognition, bad people and bad data" post shows the danger with bad training data and implications #AI #machineLearning #faceRecognition | KILUCRU | Scoop.it

We worry about face recognition just as we worried about databases - we worry what happens if they contain bad data and we worry what bad people might do with them.

My favourite example of what can go wrong here comes from a project for recognising cancer in photos of skin. The obvious problem is that you might not have an appropriate distribution of samples of skin in different tones. But another problem that can arise is that dermatologists tend to put rulers in the photo of cancer, for scale - so if all the examples of ‘cancer’ have a ruler and all the examples of ‘not-cancer’ do not, that might be a lot more statistically prominent than those small blemishes. You inadvertently built a ruler-recogniser instead of a cancer-recogniser.

The structural thing to understand here is that the system has no understanding of what it’s looking at - it has no concept of skin or cancer or colour or gender or people or even images. It doesn’t know what these things are any more than a washing machine knows what clothes are. It’s just doing a statistical comparison of data sets. So, again - what is your data set? How is it selected? What might be in it that you don’t notice - even if you’re looking? How might different human groups be represented in misleading ways? And what might be in your data that has nothing to do with people and no predictive value, yet affects the result? Are all your ‘healthy’ photos taken under incandescent light and all your ‘unhealthy’ pictures taken under LED light? You might not be able to tell, but the computer will be using that as a signal.


Via Farid Mheir