Machine Learning, self-explainability and ethics

Jason Yosinski gives a really good video demonstration of the Deep Visualization Toolbox http://yosinski.com/deepvis (near the bottom of the page). This toolbox allows users to understand how their trained image classification deep neural network is thinking under the hood - not just that it recognizes bookshelves, but all the things that go into recognizing a bookshelf, for example, the edges of the bookshelf and the text on the binding of the books.
Ideas on self-explainability
I think it becomes clearer from this video that the individual neurons in different layers learned things implicitly - like edge detection, text detection, face detection, and then the last layer of the network is basically running classification on the output of the intermediate steps to detect things that are labeled from the labeled images from the image dataset like animals, objects, etc. The idea I have regarding explainability is this: instead of simply having implicit feature detection (where you could then try to manually understand what each neuron is doing by experimentation like in this video), have explicit feature detection by having the final output (or more likely the second-to-last layer) be labels of many things that are tangible, like "fur", "cat ears", "dog tail", "human nose", etc, and then run classification on the output of that layer. It's possible that the network would lose some accuracy, because practitioners' hope with the current system is that whatever features it's learned implicitly (although less understandable to humans) are more expressive for final classification than our explicit intermediate features like "cat ears", but it's also possible that it would keep a similar level of accuracy while also being able to explain itself to humans. But that would be difficult because it would take a lot more effort to label the data - for every bookshelf image you'd have to label it "no cat ears", "no dog ears", etc., and for every cat image you'd have to label "no books", "no book shelves", etc., but images can contain both cats and bookshelves, and cats can be fur-less or tail-less and bookshelves can be book-less.
Ethics, and thoughts on how to improve

I think that using images of current employees of a company as a classifier to determine whether a job candidate would be a good employee is extremely unethical, for multiple reasons. The first and biggest glaring assumption is that current employees are good employees. Another assumption is that there is a correlation between someone's appearance and how good of an employee they are - I think that's inherently flawed, and that there are many outliers who would be heavily-discriminated against based on this model. Another problem (which is one of the main things I hear from the ethics regarding machine learning) is that this re-encodes existing human bias directly into the model and now more-strongly quantifies it in a way that is even more difficult to break. Maybe the current and past hiring managers are biased against a particular ethnic group - then this model would be biased against that ethnic group, and it's likely that no one from that ethnic group would ever be able to break that barrier which is now encoded into a model. If people want to use machine learning to augment the hiring process, they will need to use a vastly different approach than this one. The output should not be "probably a good employee" or "probably not a good employee"; as I said before, it could be something more tangible, like "critical thinker", "team player", etc. - the same things that human interviewers look for in potential job candidates. (Although I have a vague idea about what the output of that model could look like based on the examples I just mentioned, I am not sure what the input would look like - a written essay, a video/audio interview/clip, etc. - but I'm sure all of those have their own problems.)

Additional issues that are inherent to the approach that was mentioned
Another problem with their model is that, assuming that images of their current employees are good indicators of good employees, to train a model you also need training data that contrasts with the positive label, i.e. the negative label, which would be images of people who are not good employees - where does that data come from, who decides that, and how do they decide that? Even if this model could possibly have a chance of working in any conceivable way, they need a lot more data than just images of all the current and past employees to make the model viable, and to get that amount of data they need many people to gather and label that data, and the people who do that work should both be able to judge good and bad employees and be trustworthy to judge based on the metrics they are asked to use to label the data. If someone in a particular group, an ethnic group or any other kind of group, decides that they want to give a leg up to the other people in their group rather than labeling based on good/bad employee status, then the model will behave differently than expected. But if the label-workers do label the data in the way that they are asked, then it starts feeling like the Milgram experiment (Wikipedia).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.