{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Week 3: Linear Algebra with Python\n", "\n", "09/19/2017\n", "\n", "In this notebook we'll look at some linear algebra concepts, namely, vectors and some operations with them, as well as matrices. For educational purposes, we are writing Python code to implement these operations. That is not efficient. However, soon we'll talk about the NumPy library that does things more efficiently.\n", "\n", "**Credit:** Some part of this material is based on the chapter \"Linear Algebra\" from the book \"Data Science from Scratch\" by Joel Gus." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vectors\n", "One way to think of vectors is as objects that can be added together (to form new vectors) or that can be multiplied by scalars (i.e., numbers), also to form new vectors. Another way to look at vectors is as points in some finite-dimensional space. Every value across some dimension is an element of the vector. Visually, the vector will be the line that connect the origin of coordinates for the space with the point. \n", "\n", "Although you might not think of data as vectors, they are a good way to represent numeric data.\n", "\n", "For example, if we have the heights, weights, and ages of a large number of people, we can treat this data as three-dimensional vectors `(height, weight, age)`. If we're taking a class with four exams, we can view the student grades as four-dimensional vectors `(exam1, exam2, exam3, exam4)`.\n", "\n", "In Python, the simplest approach is to represent vectors as lists of numbers. A list of three numbers corresponds to a vector in three-dimensional space: \n", "```\n", "height_weight_age = [70, # inches, \n", " 170, # pounds,\n", " 40 ] # years\n", "```\n", "and a list of four nunbers corresponds to a vector in four-dimensional space:\n", "```\n", "grades = [95, # exam1 \n", " 80, # exam2\n", " 75, # exam3 \n", " 62 ] # exam4\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding and subtracting vectors\n", "\n", "One common operation is to add two vectors. \n", "**Question:** If the vectors are represented as Python lists, can we simply add the two lists?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "v = [1, 2]\n", "w = [2, 1]\n", "\n", "v + w" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While this works syntactically, it's semantically meaningless, because + with lists performs concatenation. What we want is **componentwise** addition. That is, elements with the same index are added together. Let's write below a function that does that." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def vector_add(v, w):\n", " \"\"\"adds corresponding elements\"\"\" \n", " return [v_i + w_i for v_i, w_i in zip(v, w)]\n", "\n", "vector_add(v, w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similary, we can perform **subtraction** of two vectors:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def vector_subtract(v, w):\n", " \"\"\"subtracts corresponding elements\"\"\" \n", " return [v_i - w_i for v_i, w_i in zip(v, w)]\n", "\n", "vector_subtract(v, w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we have more than two vectors, given as a list of lists and want to add all of them. This means creating a new vector whose elements are sums of the corresponding items in all the other vectors:\n", "\n", "```\n", "[1, 1, 1]\n", "[2, 1, 0]\n", "[1, 0, 2]\n", "[0, 2, 1]\n", "```\n", "when we get the sum of all these vectors, we get the vector `[4, 4, 4]`. We can do this in Python as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def vector_sum(vectors):\n", " \"\"\"sums all corresponding elements\"\"\" \n", " result = vectors[0] # start with the first vector\n", " for vector in vectors[1:]: # then loop over the others\n", " result = vector_add(result, vector) # and add them to the result\n", " return result\n", "\n", "vecs = [\n", "[1, 1, 1],\n", "[2, 1, 0],\n", "[1, 0, 2],\n", "[0, 2, 1]]\n", "\n", "vector_sum(vecs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Python superpower: the `reduce` function \n", "\n", "Python's built-in function `reduce` (we have discussed `map`, `filter` and `reduce` in CS 111), can be used to simplify the function we just wrote. \n", "\n", "`reduce` is known as a high-order functin, because it can take another function as a parameter. In the example shown below, it's first argument, `vector_add`, is a function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def vector_sum(vectors):\n", " return reduce(vector_add, vectors)\n", "\n", "vector_sum(vecs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing addition of vectors\n", "\n", "For our vectors [1, 2] and [2, 1], we can draw the vector from the origin (0, 0) to the point (1, 2) or (2, 1). Then, their sum, (3, 3) will be the vector that starts at (0, 0) and ends at (3, 3). The code below displays the three vectors:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "# represent each vector line by the coordinates of their start and end points, thus, 4 values in total\n", "soa = np.array([[0, 0, 1, 2], [0, 0, 2, 1], [0, 0, 3, 3]])\n", "\n", "# given that there are 4 coordinate values: x and y for origin and u and v for the vector head (or arrow)\n", "# the line below creates lists that zips all x-vals together, all y-vals together, etc. See below for * behavior.\n", "X, Y, U, V = zip(*soa)\n", "plt.figure(figsize=(5,5))\n", "plt.quiver(X, Y, U, V, angles='xy', scale_units='xy', scale=1)\n", "plt.xlim([-1, 5])\n", "plt.ylim([-1, 5])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aside: the `*` operator in functions\n", "\n", "In Python, it is possible to pass an indefinite number of arguments into a function. That can be done using the unpacking operator $*$. That is exatly what is happening with zip above. \n", "\n", "While we can use zip with as many lists we want, separated by commas, we can also pass a list of lists as a single argument and indicate with $*$ that the function needs to unpack the value. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = [1, 2, 3]\n", "b = [4, 5, 6]\n", "c = [7, 8, 9]\n", "d = [a, b, c]\n", "d" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "zip(a,b,c)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "zip(d) # doesn't do what we want" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "zip(*d) # does the right thing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scalar Multiplication\n", "\n", "One of the common operations with vectors is to multiply them with a scalar (a number). To do this, we multiply every element of the vector with the number, as the code below shows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def scalar_multiply(c, v):\n", " \"\"\"c is a number, v is a vector\"\"\" \n", " return [c * v_i for v_i in v]\n", "\n", "v = [1, 2, 2]\n", "c = 5\n", "scalar_multiply(c, v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why is this useful? Assume we have a list of vectors, each representing grades of a student. For example:\n", "\n", "```\n", "[[87, 90, 92], \n", " [84, 88, 91],\n", " [95, 93, 97],\n", " [81, 81, 81]]\n", "```\n", "One thing we might be interested is: \"what are the grades of the \"average\" student in this class?\"\n", "\n", "So, we are interested in the \"vector mean\" the vector that stores the mean value of every component across all vectors. If every component is a midterm score, then we're looking for the vector that is made of all midterm averages. \n", "\n", "How would we calculate that? In two steps:\n", "\n", "1. find the vector sum (as we did above);\n", "2. multiply by 1/n, where n is the number of vectors. \n", "\n", "Let's write a function to do this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding the vector mean" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def vector_mean(vectors):\n", " \"\"\"compute the vector whose ith element is the mean of the ith elements of the input vectors\"\"\"\n", " n = len(vectors)\n", " return scalar_multiply(1.0/n, vector_sum(vectors))\n", "\n", "grades = [\n", " [87, 90, 92], \n", " [84, 88, 91],\n", " [95, 93, 97],\n", " [81, 81, 81]]\n", "\n", "vector_mean(grades)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The dot product\n", "\n", "The dot product of two vectors is the **sum** of their componentwise products. This product is a number. An alternative formula measures the product of the magnitude of one vector with the projection of the other vector on it. [See Vector Projection](https://en.wikipedia.org/wiki/Vector_projection)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\\begin{align}\n", "\\vec{\\mathbf{v}} \\cdot \\vec{\\mathbf{w}} & = \\sum v_i w_i \\\\\n", "\\vec{\\mathbf{v}} \\cdot \\vec{\\mathbf{w}} & = |v| \\cdot |w| \\cdot \\cos\\phi\n", "\\end{align}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def dot(v, w):\n", " \"\"\"v_1 * w_1 + ... + v_n * w_n\"\"\" \n", " return sum(v_i * w_i for v_i, w_i in zip(v, w))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's imagine we have two vectors: v = [3, 0] (this extends only across the x-axis) and w = [0, 4] (extends only across the y axis). It's easy to see that this two have nothing in common, and their dot product is 0. \n", "\n", "Now, imagine that we take [0, 4] and start rotating it clockwise, so that its x-value is first 1, then 2, then 3, then 4. While we're doing this, the y-value is decreasing, because we're keeping the magnitute of the vector constant at 4. The created vectors can be seen in a plot two cells down. \n", "\n", "Now, let's calculate the dot vector of v and the moving w, to see what happens to its value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from math import sqrt\n", "v = [3, 0]\n", "manyW = [[0, 4], \n", " [1, sqrt(4*4-1*1)], # using sqrt to calculate the y value, as we keep the magnitude constant at 4\n", " [2, sqrt(4*4-2*2)], \n", " [3, sqrt(4*4-3*3)], \n", " [4, 0]]\n", "\n", "for w in manyW:\n", " print \"dot product of {} and {} is {}\".format(v, w, dot(v, w))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# a vector is represented by two points: the origin (0, 0) and its end point, e.g. (3,0)\n", "# we have to create a list as: [0,0,3,0] to represent this vector for plotting\n", "soa = np.array([[0,0]+ v] + [[0,0]+el for el in manyW])\n", "X, Y, U, V = zip(*soa)\n", "plt.figure(figsize=(5,5))\n", "plt.quiver(X, Y, U, V, angles='xy', scale_units='xy', scale=1)\n", "plt.xlim([-1, 5])\n", "plt.ylim([-1, 5])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why do we care about the dot product?\n", "Let's look again at the two formulas for calculating the dot product:\n", "\n", "\\begin{align}\n", "\\vec{\\mathbf{v}} \\cdot \\vec{\\mathbf{w}} & = \\sum v_i w_i \\\\\n", "\\vec{\\mathbf{v}} \\cdot \\vec{\\mathbf{w}} & = |v| \\cdot |w| \\cdot \\cos\\phi\n", "\\end{align}\n", "\n", "from these, we can derive the formula:\n", "\n", "\\begin{align}\n", "\\cos\\phi = \\frac {\\sum v_i w_i}{|v| \\cdot |w|}\n", "\\end{align}\n", "\n", "where $\\phi$ is the angle created between the two vectors. This number is also known as the cosine similarity between two vectors. When the angle is 0, the cosine is 1, indicating maximal similarity. For an angle of 90, the cosine is 0, meaning the two vectors are orthogonal, they don't have nothing in common. \n", "\n", "Cosine similarity is an important metric used in information retrieval and search engines. Read more about [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sum of squares\n", "\n", "If we try to take the dot product of a vector with itself, we get the sum of squares." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def sum_of_squares(v):\n", " \"\"\"v_1 * v_1 + ... + v_n * v_n\"\"\" \n", " return dot(v, v)\n", "\n", "v = [2, 3]\n", "sum_of_squares(v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A vector's magnitude\n", "\n", "A vector's magnitude (or its length) in the 2D space can be calculated through the Pythagorean theorem: the square root of the sum of squares of the projections (on the two axes).\n", "\n", "$c = \\sqrt{a^2 + b^2}$\n", "\n", "Below is a visual depiction for the vector with length 5, whose projections are 3 and 4, ($5*5 = 3*3 + 4*4$)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "soa = np.array([[0, 0, 3, 0], [0, 0, 0, 4], [0, 0, 2.9, 3.9]]) # ignore 2.9 and 3.9, it's because the arrow needs space\n", "X, Y, U, V = zip(*soa)\n", "plt.figure(figsize=(5,5))\n", "plt.quiver(X, Y, U, V, angles='xy', scale_units='xy', scale=1)\n", "plt.xlim([-0.05, 5])\n", "plt.ylim([0, 5])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now write a function that calculates the magnitude of a vector. We can take as example the vector we plotted (3, 4), for which we know that the magnitude is 5." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def magnitude(v):\n", " \"\"\"the square root of the sum of squares\"\"\"\n", " return sqrt(sum_of_squares(v))\n", "\n", "magnitude([3, 4])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding the distance between two vectors\n", "\n", "The distance between vectors is the square root of the sum of squares of a vector difference. \n", "\n", "$\\sqrt{(v_1-w_1)^2 + ... + (v_n-w_n)^2}$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def squared_distance(v, w):\n", " \"\"\"(v_1 - w_1) ** 2 + ... + (v_n - w_n) ** 2\"\"\" \n", " return sum_of_squares(vector_subtract(v, w))\n", "\n", "def distance(v, w):\n", " return sqrt(squared_distance(v, w))\n", "\n", "distance(v, w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way to write it is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def distance(v, w):\n", " return magnitude(vector_subtract(v, w))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Matrices\n", "\n", "A matrix is a two-dimensional collection of numbers. We will represent matrices as lists of lists, with each inner list having the same size and representing a row of the matrix. If A is a matrix, then $A[i][j]$ is the element in the ith row and the jth column. Per mathematical convention, we will typically use capital letters to represent matrices. For example:\n", "\n", "```\n", "A = [[1, 2, 3],\n", " [4, 5, 6]]\n", " \n", "B = [[1, 2],\n", " [3, 4],\n", " [5, 6]]\n", "```\n", "\n", "Given this list-of-lists representation, the matrix A has `len(A)` rows and `len(A[0])` columns, which we consider its **shape**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def shape(A):\n", " num_rows = len(A)\n", " num_cols = len(A[0]) if A else 0 # number of elements in first row\n", " return num_rows, num_cols\n", "\n", "B = [[1, 2],\n", " [3, 4],\n", " [5, 6]]\n", "\n", "shape(B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a matrix has $n$ rows and $k$ columns, we will refer to it as a $n×k$ matrix. We can think of each row of a $n×k$ matrix as a vector of length $k$, \n", "and each column as a vector of length $n$. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def get_row(A, i): \n", " return A[i] # A[i] is already a row\n", "\n", "\n", "def get_column(A, j): \n", " return [A_i[j] for A_i in A] # get the j-th element of every i-th row" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "get_row(B, 1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "get_column(B, 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE:** In `pandas`, vectors correspond to the class `Series` and matrices to the class `DataFrame`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }