Bias in AI

Readings

Image “upscaling” algorithm with racial bias.
Twitter cropping algorithm exhibits bias as well.
(optional) Excavating AI on bias in ImageNet and other AI systems.

Group Discussion:

[What are the sources of bias in AI systems?
What kinds of bias appear in these systems?]{.standout}

Group Discussion:

[Who is responsible for harm done by an AI system?]{.standout}

Group Discussion:

[If you were a movie villain, how would you use ostensibly harmless/helpful AI to do evil?]{.standout}

Bias

AI both amplifies and disguises bias.
- Machines are wrongly seen as unbiased.
- In fact, they thrive on biased data.
Theory-free AI is especially prone to bias.

Data Science

Normally, a model is based on a theory.
Data confirms or complicates the model.
Theory is revised to produce a new model.

“Data science” sometimes skips a step and creates a model directly from data.

Theory-free Models

A theory-free model doesn’t teach us anything.
We can’t critique the theory behind it.
It might still “work” very well.

Models and Authority

Scientific models can be components of oppressive systems.
- Provide “justifications” for racism, colonialism, etc.
- Put more power in the hands of the powerful.
- “Science” sounds authoritative.
It’s easy to focus on whether a model or theory works, without ever asking about what effect it will have.

Regression

Normal math: solve an equation.
Regression: match an equation to data.

A regression line showing the error of several data points

Regression

If we have n data points (x_i, y_i), our error is:

$E = \sum_{i=1}^{i=n} \sqrt{(y_i - f(x_i))^2}$

We can find an analytical minimum if our
function f is simple, like f(x) = mx + b.

Gradient Descent

What if f(x) is too complicated to solve the error equation?
We can try to find a minimum using Newton’s method as long as it’s differentiable.
Basic idea:

Take the derivative of our error function with respect to the constants in that function, $\frac{\delta E}{\delta C}$.
Nudge those constants in the direction that reduces the error.
Repeat until we get to zero error (or a minimum).

Newton’s method illustrated, with a curve and a point on that curve. From that point, a tangent line is drawn which creates another point where it crosses zero, and from that point, we go back up to the curve, and then draw another tangent line which intersects zero closer to where the curve does.

Several regression lines in gray leading towards a blue regression line that’s a better fit with some data points.

Overfitting

Regression is only meaningful if the function is suitable.
A complex enough function can fit any data.
Only a theory can give you confidence to interpolate or extrapolate.

A very curvy line that goes through a set of data points exactly, compared to a straight line which isn’t exact but keeps going in the same direction as the data does at the ends.

Deep Learning

Videos

Video: Deep Learning Basics including Autoencoders (14:55 to 33:40)
Extra Video: What is a Neural Network?
Extra Video: How Neural Networks Learn?

Core Concept

Create a really general function.
Fit it to whatever data you have.
Use the fitted function.

Training

Gradient is hard to compute for 1000s of data points.
Just compute gradient for a few data points at a time, and repeat this process over and over again.
Use a small learning rate to move slowly.
Problems include overfitting as well as local minima of the error function.

A Neural Network

y_j = f(∑_iw_j, i ⋅ x_i)

$$ \left[ \begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_m \end{matrix} \right] = f \left( \left[ \begin{matrix} w_{1,1} & w_{2,1} & \cdots & w_{n,1}\\ w_{1,2} & w_{2,2} & \cdots & w_{n,2}\\ \vdots & \vdots & \ddots & \vdots \\ w_{1,m} & w_{2,m} & \cdots & w_{n,m}\\ \end{matrix} \right] \cdot \left[ \begin{matrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{matrix} \right] \right)$$

Network Structure

Matrix of weights multiplied by input vector.
Apply an activation function to each output.
Simulates neurons in the brain: several signals are combined with different connection strengths to trigger an output.
Weights control connection strengths; activation function determines result.

Sigmoid

$$ f(x) = \frac{1}{1 + e^{-kx}} $$

{height=9em}

Deep Networks

A deep network has multiple layers.
Input from one layer is fed into another layer.
Size of output is usually smaller each time.
Millions to billions of parameters (weights).
- Let a computer calculate the derivative…

Example Target Function

Input (x)	Output (y)		Input (x)	Output (y)
{style=“height:120px”}	[1, 0]		{style=“height:120px”}	[0, 1]

What does it learn?

Video excerpt{target=“blank”}

Under the right conditions, it learns what humans do: how to find edges, and then patterns of edges, etc.
The right conditions are tricky to produce.

What does it need?

A huge amount of data
Minimum hundreds images; millions for high quality
Labeled data, so that we can compute error values
All sources of labeled training data are biased 🙁

Autoencoding

Set f(x) = x and ask network to learn the identity function
Network has to learn compression, because its internal state is smaller than the original image.
Learn the basics from unlabeled data
Unlabeled data is much easier to come by
All sources of unlabeled training data are also biased 🙁

Applications

Discussion

[How might biases affect these applications of deep learning?]{.standout}

Image captions
Deep fakes
Generating art

Face recognition
Image super-resolution
Crime prediction

[How should we feel about deep learning and AI as a scientific project?]{.standout .fragment}

My Work

Computational Creativity

Can computers be creative?
What does it mean to be “creative?”
Could computers support human creativity?

Critical (Computational) Creativity

Critique-by-reply.
Building generators to deeply understand systems.
A mix of computer science and media studies.
Focus on interactive media, especially video games.

Past Projects

Measuring “novelty” using an auto-encoder network.

Three Mii images with blurred versions beneath each. On the left, a brown-haired man close to the default settings, in the middle, a woman wearing glasses, and on the right, some kind of alien with eyes and mouth transposed. The blurred versions of the left two faces look similar to their base images, but the right face’s blurred image is very different.

A lineup of different Mii images by novelty, showing four examples from each bin as well as an average for eight bins from left to right. The examples are mostly more similar to the default Mii on the left and less similar on the right.

A network structure diagram for a symmetric convolutional autoencoder network with two convolutional layers and three fully connected layers.

Bonus

Convolutional Networks

Like the eye, apply the same neural structure in parallel to many pixels.
Output is a smaller field of pattern vectors.
Combine pattern vectors using pooling to reduce processing requirements.

Convolutional Network Explanation

Recurrent Networks

Remember information across multiple activations using a “memory” input in addition to the usual input.
Output a new “memory” vector in addition to usual output.
Useful for processing or generating sequences with variable length, like text.

LSTM Explanation

Deep Learning and Biased AI

Peter Mawhorter

October 5th, 2020

Bias in AI

Readings

Group Discussion:

Group Discussion:

Group Discussion:

Bias

Data Science

Theory-free Models

Models and Authority

Regression

Regression

Regression

Gradient Descent

Overfitting

Deep Learning

Videos

Core Concept

Training

A Neural Network

Network Structure

Sigmoid

Deep Networks

Example Target Function

What does it learn?

What does it need?

Autoencoding

Applications

Applications

Discussion

My Work

Computational Creativity

Critical (Computational) Creativity

Past Projects

Bonus

Convolutional Networks

Recurrent Networks