Deep Learning and Biased AI

Peter Mawhorter

October 5th, 2020

Bias in AI

Readings

Group Discussion:

What are the sources of bias in AI systems?
What kinds of bias appear in these systems?

Group Discussion:

Who is responsible for harm done by an AI system?

Group Discussion:

If you were a movie villain, how would you use ostensibly harmless/helpful AI to do evil?

Bias

  • AI both amplifies and disguises bias.
    • Machines are wrongly seen as unbiased.
    • In fact, they thrive on biased data.
  • Theory-free AI is especially prone to bias.

Data Science

  • Normally, a model is based on a theory.
  • Data confirms or complicates the model.
  • Theory is revised to produce a new model.

“Data science” sometimes skips a step and creates a model directly from data.

Theory-free Models

  • A theory-free model doesn’t teach us anything.
  • We can’t critique the theory behind it.
  • It might still “work” very well.

Models and Authority

  • Scientific models can be components of oppressive systems.
    • Provide “justifications” for racism, colonialism, etc.
    • Put more power in the hands of the powerful.
    • “Science” sounds authoritative.
  • It’s easy to focus on whether a model or theory works, without ever asking about what effect it will have.

Regression

Regression

  • Normal math: solve an equation.
  • Regression: match an equation to data.

A regression line showing the error of several data points 

Regression

If we have nn data points (xi,yi)(x_i,y_i), our error is:

E=i=1i=n(yif(xi))2E = \sum_{i=1}^{i=n} \sqrt{(y_i - f(x_i))^2}

We can find an analytical minimum if our
function ff is simple, like f(x)=mx+bf(x) = mx + b.

Gradient Descent

  • What if f(x)f(x) is too complicated to solve the error equation?
  • We can try to find a minimum using Newton’s method as long as it’s differentiable.
  • Basic idea:
    1. Take the derivative of our error function with respect to the constants in that function, δEδC\frac{\delta E}{\delta C}.
    2. Nudge those constants in the direction that reduces the error.
    3. Repeat until we get to zero error (or a minimum).

Newton’s method illustrated, with a curve and a point on that curve. From that point, a tangent line is drawn which creates another point where it crosses zero, and from that point, we go back up to the curve, and then draw another tangent line which intersects zero closer to where the curve does. 

Several regression lines in gray leading towards a blue regression line that’s a better fit with some data points. 

Overfitting

  • Regression is only meaningful if the function is suitable.
  • A complex enough function can fit any data.
  • Only a theory can give you confidence to interpolate or extrapolate.

A very curvy line that goes through a set of data points exactly, compared to a straight line which isn’t exact but keeps going in the same direction as the data does at the ends. 

Deep Learning

Videos

Core Concept

  • Create a really general function.
  • Fit it to whatever data you have.
  • Use the fitted function.

Training

  • Gradient is hard to compute for 1000s of data points.
  • Just compute gradient for a few data points at a time, and repeat this process over and over again.
  • Use a small learning rate to move slowly.
  • Problems include overfitting as well as local minima of the error function.

A Neural Network

yj=f(iwj,ixi) y_j = f(\sum_{i}{w_{j,i} \cdot x_i})

[y1y2ym]=f([w1,1w2,1wn,1w1,2w2,2wn,2w1,mw2,mwn,m][x1x2xn]) \left[ \begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{matrix} \right] = f \left( \left[ \begin{matrix} w_{1,1} & w_{2,1} & \cdots & w_{n,1}\\ w_{1,2} & w_{2,2} & \cdots & w_{n,2}\\ \vdots & \vdots & \ddots & \vdots \\ w_{1,m} & w_{2,m} & \cdots & w_{n,m}\\ \end{matrix} \right] \cdot \left[ \begin{matrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{matrix} \right] \right)

Network Structure

  • Matrix of weights multiplied by input vector.
  • Apply an activation function to each output.
  • Simulates neurons in the brain: several signals are combined with different connection strengths to trigger an output.
  • Weights control connection strengths; activation function determines result.

Sigmoid

f(x)=11+ekx f(x) = \frac{1}{1 + e^{-kx}}

A sigmoid function

Deep Networks

  • A deep network has multiple layers.
  • Input from one layer is fed into another layer.
  • Size of output is usually smaller each time.
  • Millions to billions of parameters (weights).
    • Let a computer calculate the derivative…

Example Target Function

Input (xx) Output (yy)     Input (xx) Output (yy)
A cat [1, 0] A dog [0, 1]

What does it learn?

Video excerpt

  • Under the right conditions, it learns what humans do: how to find edges, and then patterns of edges, etc.
  • The right conditions are tricky to produce.

What does it need?

  • A huge amount of data
    • Minimum hundreds images; millions for high quality
  • Labeled data, so that we can compute error values
    • All sources of labeled training data are biased 🙁

Autoencoding

  • Set f(x)=xf(x) = x and ask network to learn the identity function
    • Network has to learn compression, because its internal state is smaller than the original image.
  • Learn the basics from unlabeled data
    • Unlabeled data is much easier to come by
    • All sources of unlabeled training data are also biased 🙁

Applications

Applications

Discussion

How might biases affect these applications of deep learning?

  • Image captions
  • Deep fakes
  • Generating art
  • Face recognition
  • Image super-resolution
  • Crime prediction

How should we feel about deep learning and AI as a scientific project?

My Work

Computational Creativity

  • Can computers be creative?
  • What does it mean to be “creative?”
  • Could computers support human creativity?

Critical (Computational) Creativity

  • Critique-by-reply.
  • Building generators to deeply understand systems.
  • A mix of computer science and media studies.
    • Focus on interactive media, especially video games.

Past Projects

Measuring “novelty” using an auto-encoder network.

Three Mii images with blurred versions beneath each. On the left, a brown-haired man close to the default settings, in the middle, a woman wearing glasses, and on the right, some kind of alien with eyes and mouth transposed. The blurred versions of the left two faces look similar to their base images, but the right face’s blurred image is very different. 

A lineup of different Mii images by novelty, showing four examples from each bin as well as an average for eight bins from left to right. The examples are mostly more similar to the default Mii on the left and less similar on the right. 

A network structure diagram for a symmetric convolutional autoencoder network with two convolutional layers and three fully connected layers. 

Bonus

Convolutional Networks

  • Like the eye, apply the same neural structure in parallel to many pixels.
  • Output is a smaller field of pattern vectors.
  • Combine pattern vectors using pooling to reduce processing requirements.

Convolutional Network Explanation

Recurrent Networks

  • Remember information across multiple activations using a “memory” input in addition to the usual input.
  • Output a new “memory” vector in addition to usual output.
  • Useful for processing or generating sequences with variable length, like text.

LSTM Explanation