Python is available for you favorite platform on the downloads page (or you can choose the anaconda version instead). You can use the Python interpreter interactively by typing python at a terminal window. However, for data analysis we recommend IPython, which is a nicer front end to python (it works with both a regular Python install, or an anaconda version of it). If you have installed it (or are using department machines) it can be invoked with:
ipython
To quit, type control-d
To run python code in a file code.py, either type
run code.py
in the ipython interpreter, or
python code.py
at the unix command line.
In addition to python statements/expressions ipython allows you to type in shell commands and its own special magic commands, and it provides better integration with matplotlib, which is the best python plotting library.
You can use Python as a calculator. For example, what is the value of $(100\cdot 2 - 12^2) / 7 \cdot 5 + 2\;\;\;$?
In [301]: (100*2 - 12**2) / 7*5 + 2 Out[301]: 42
In order to compute something like $\sin(\pi/2)$ we first need to import the math module:
In [303]: import math In [304]: math.sin(math.pi/2) 1.0
How do I find out what other mathematical functions are available?
help("math")
Can I work with vectors and matrices in python?
Of course! Every data analysis tool is worth its bytes should.
The numpy
package provides the required magic.
Vectors and matrices are all represented as numpy arrays. First, some vectors:
In [1]: import numpy as np In [2]: x = np.array([1,1])
We can multiply a vector by a scalar:
In [3]: x * 2 Out[3]: array([2, 2])
And we can add vectors:
In [4]: x + np.array([1,0]) Out[4]: array([2, 1])
After we introduce matrices, we'll show how to do inner products.
Let's create an array that represents the following matrix: \[\left ( \begin{array}{cc} 1 & 2\\ 3 & 4\\ 5 & 6 \end{array} \right ) \]
In [18]: X = np.array([[1,2], [4,3], [5,6]]) In [19]: X Out[19]: array([[1, 2], [4, 3], [5, 6]])
We'll think of $X$ as the feature matrix of a machine learning dataset. To access a row of the matrix (corresponding to the features of the ith example in the dataset):
In [20]: X[0] Out[20]: array([1, 2])
To access a column of the matrix (a single feature):
In [21]: X[:,0] Out[22]: array([1, 4, 5])
Let's construct a weight vector for a linear classifier:
In [20]: w = np.array([1,-1])
We can easily compute the dot/inner product of a row of $X$ with the weight vector:
In [21]: np.inner(w, X[0]) Out[21]: -1
We can even compute the inner products for all the rows of the matrix all at once:
In [22]: np.inner(w, X) Out[22]: array([-1, 1, -1])
Let's construct another matrix
In [33]: A = np.ones((2,3)) * 2 In [34]: A Out[34]: array([[ 2., 2., 2.], [ 2., 2., 2.]])
Let's look for a way to compute the matrix product $A \times X$. Our first guess would be to try the multiplication operator, since we saw above that we can multiply a matrix by a scalar:
In [36]: A * X --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-36-578836d375ce> in <module>() ----> 1 A * X
So it didn't work. You can multiply matrices that are the same shape using the *
operator, but it performs
component-wise multiplication rather than matrix product. Instead, use the
numpy
function dot
to do matrix multiplication:
In [37]: np.dot(A, X) Out[37]: array([[ 20., 22.], [ 20., 22.]])
Turns out that dot
is a method, so you can also do:
In [38]: A.dot(X) Out[38]: array([[ 20., 22.], [ 20., 22.]])
An array is transposed by
In [39]: A.transpose() Out[39]: array([[ 2., 2.], [ 2., 2.], [ 2., 2.]]) In [41]: A.T Out[41]: array([[ 2., 2.], [ 2., 2.], [ 2., 2.]])
And let's take a look at all the methods that an array has:
In [39]: A. A.T A.cumsum A.min A.shape A.all A.data A.nbytes A.size A.any A.diagonal A.ndim A.sort A.argmax A.dot A.newbyteorder A.squeeze A.argmin A.dtype A.nonzero A.std A.argpartition A.dump A.partition A.strides A.argsort A.dumps A.prod A.sum A.astype A.fill A.ptp A.swapaxes A.base A.flags A.put A.take A.byteswap A.flat A.ravel A.tobytes A.choose A.flatten A.real A.tofile A.clip A.getfield A.repeat A.tolist A.compress A.imag A.reshape A.tostring A.conj A.item A.resize A.trace A.conjugate A.itemset A.round A.transpose A.copy A.itemsize A.searchsorted A.var A.ctypes A.max A.setfield A.view A.cumprod A.mean A.setflags
Elements and sub-matrices are easily extracted:
In [42]: X Out[42]: array([[1, 2], [4, 3], [5, 6]]) In [43]: X[0,0] Out[43]: 1 In [44]: X[-1,-1] Out[44]: 6 In [46]: X[0:2, 0:2] Out[46]: array([[1, 2], [4, 3]]) # my favorite way of indexing: using an array! In [47]: X[ [0,2] ] Out[47]: array([[1, 2], [5, 6]])
How do I find the inverse of a matrix?
In [2]: z = np.array([[2,1,1],[1,2,2],[2,3,4]]) In [3]: z Out[3]: array([[2, 1, 1], [1, 2, 2], [2, 3, 4]]) In [4]: np.linalg.inv(z) Out[4]: array([[ 0.66666667, -0.33333333, 0. ], [ 0. , 2. , -1. ], [-0.33333333, -1.33333333, 1. ]]) In [5]: np.dot(z, np.linalg.inv(z)) Out[5]: array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])
Let's get on to that all important step of visualizing data. We will be using the matplotlib Python package for that. Let's start by plotting the function $f(x) = x^2$.
First, let's generate the numbers. Well, there are tons of ways to do so. Python has some nifty syntax for generating lists. Watch this! A list comprehension!!
In [9]: f = [i**2 for i in range(10)] In [10]: f Out[10]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
There's an alternative way of doing this using numpy
:
In [12]: f = np.arange(10)**2 In [13]: f Out[13]: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
To plot the data, first import the pyplot
module.
In [6]: import matplotlib.pyplot as plt In [7]: plt.plot(range(10), f, 'ob') Out[7]: [<matplotlib.lines.Line2D at 0x10549b590>]
In order to actually see the plot you need to do:
In [8]: plt.show()
As an alternative, you can put matplotlib in interactive mode before plotting using the command plt.ion()
.
Also note that plotting functions accept either Python lists or numpy
arrays.
We can add a second plot to the same axes by calling plot again:
In [16]: plt.plot(x, x, 'dr')