Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
===== A bit of Python =====
Python is available for you favorite platform on the [[https://www.python.org/downloads/ | downloads]] page (or you can choose the [[https://www.continuum.io/anaconda-overview | anaconda]] version instead).
You can use the Python interpreter interactively by typing //python// at a terminal window. However, for data analysis we recommend [[https://ipython.org/ | IPython]], which is a nicer front end to python (it works with both a regular Python install, or an anaconda version of it). If you have installed it (or are using department machines) it can be invoked with:
ipython
To quit, type control-d
To run python code in a file //code.py//, either type
run code.py
in the //ipython// interpreter, or
python code.py
at the unix command line.
In addition to python statements/expressions //ipython// allows you to type in shell commands and its own special [[http://ipython.readthedocs.io/en/stable/interactive/tutorial.html#magics-explained | magic commands]], and it provides better integration with matplotlib, which is the best python plotting library.
You can use Python as a calculator. For example, what is the value of $(100\cdot 2 - 12^2) / 7 \cdot 5 + 2\;\;\;$?
In [301]: (100*2 - 12**2) / 7*5 + 2
Out[301]: 42
In order to compute something like $\sin(\pi/2)$ we first need to //import// the //math// module:
In [303]: import math
In [304]: math.sin(math.pi/2)
1.0
How do I find out what other mathematical functions are available?
help("math")
===== Linear algebra in Python =====
Can I work with vectors and matrices in python?
Of course! Every data analysis tool is worth its bytes should.
The ''numpy'' package provides the required magic.
Vectors and matrices are all represented as numpy arrays. First, some vectors:
In [1]: import numpy as np
In [2]: x = np.array([1,1])
We can multiply a vector by a scalar:
In [3]: x * 2
Out[3]: array([2, 2])
And we can add vectors:
In [4]: x + np.array([1,0])
Out[4]: array([2, 1])
After we introduce matrices, we'll show how to do inner products.
Let's create an array that represents the following matrix:
\[\left ( \begin{array}{cc}
1 & 2\\
3 & 4\\
5 & 6
\end{array} \right ) \]
In [18]: X = np.array([[1,2], [4,3], [5,6]])
In [19]: X
Out[19]:
array([[1, 2],
[4, 3],
[5, 6]])
We'll think of $X$ as the feature matrix of a machine learning dataset.
To access a row of the matrix (corresponding to the features of the ith example in the dataset):
In [20]: X[0]
Out[20]: array([1, 2])
To access a column of the matrix (a single feature):
In [21]: X[:,0]
Out[22]: array([1, 4, 5])
Let's construct a weight vector for a linear classifier:
In [20]: w = np.array([1,-1])
We can easily compute the dot/inner product of a row of $X$ with the weight vector:
In [21]: np.inner(w, X[0])
Out[21]: -1
We can even compute the inner products for all the rows of the matrix all at once:
In [22]: np.inner(w, X)
Out[22]: array([-1, 1, -1])
Let's construct another matrix
In [33]: A = np.ones((2,3)) * 2
In [34]: A
Out[34]:
array([[ 2., 2., 2.],
[ 2., 2., 2.]])
Let's look for a way to compute the matrix product $A \times X$.
Our first guess would be to try the multiplication operator, since we saw above that we can multiply a matrix by a scalar:
In [36]: A * X
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in ()
----> 1 A * X
So it didn't work. You can multiply matrices that are the same shape using the ''*'' operator, but it performs
component-wise multiplication rather than matrix product. Instead, use the
''numpy'' function ''dot'' to do matrix multiplication:
In [37]: np.dot(A, X)
Out[37]:
array([[ 20., 22.],
[ 20., 22.]])
Turns out that ''dot'' is a method, so you can also do:
In [38]: A.dot(X)
Out[38]:
array([[ 20., 22.],
[ 20., 22.]])
An array is transposed by
In [39]: A.transpose()
Out[39]:
array([[ 2., 2.],
[ 2., 2.],
[ 2., 2.]])
In [41]: A.T
Out[41]:
array([[ 2., 2.],
[ 2., 2.],
[ 2., 2.]])
And let's take a look at all the methods that an array has:
In [39]: A.
A.T A.cumsum A.min A.shape
A.all A.data A.nbytes A.size
A.any A.diagonal A.ndim A.sort
A.argmax A.dot A.newbyteorder A.squeeze
A.argmin A.dtype A.nonzero A.std
A.argpartition A.dump A.partition A.strides
A.argsort A.dumps A.prod A.sum
A.astype A.fill A.ptp A.swapaxes
A.base A.flags A.put A.take
A.byteswap A.flat A.ravel A.tobytes
A.choose A.flatten A.real A.tofile
A.clip A.getfield A.repeat A.tolist
A.compress A.imag A.reshape A.tostring
A.conj A.item A.resize A.trace
A.conjugate A.itemset A.round A.transpose
A.copy A.itemsize A.searchsorted A.var
A.ctypes A.max A.setfield A.view
A.cumprod A.mean A.setflags
Elements and sub-matrices are easily extracted:
In [42]: X
Out[42]:
array([[1, 2],
[4, 3],
[5, 6]])
In [43]: X[0,0]
Out[43]: 1
In [44]: X[-1,-1]
Out[44]: 6
In [46]: X[0:2, 0:2]
Out[46]:
array([[1, 2],
[4, 3]])
# my favorite way of indexing: using an array!
In [47]: X[ [0,2] ]
Out[47]:
array([[1, 2],
[5, 6]])
How do I find the inverse of a matrix?
In [2]: z = np.array([[2,1,1],[1,2,2],[2,3,4]])
In [3]: z
Out[3]:
array([[2, 1, 1],
[1, 2, 2],
[2, 3, 4]])
In [4]: np.linalg.inv(z)
Out[4]:
array([[ 0.66666667, -0.33333333, 0. ],
[ 0. , 2. , -1. ],
[-0.33333333, -1.33333333, 1. ]])
In [5]: np.dot(z, np.linalg.inv(z))
Out[5]:
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
===== Plotting =====
Let's get on to that all important step of visualizing data. We will be using the [[http://matplotlib.org |matplotlib]] Python package for that. Let's start by plotting the function $f(x) = x^2$.
First, let's generate the numbers. Well, there are tons of ways to do so.
Python has some nifty syntax for generating lists. Watch this! A [[http://www.secnetix.de/olli/Python/list_comprehensions.hawk|list comprehension]]!!
In [9]: f = [i**2 for i in range(10)]
In [10]: f
Out[10]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
There's an alternative way of doing this using ''numpy'':
In [12]: f = np.arange(10)**2
In [13]: f
Out[13]: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
To plot the data, first import the ''pyplot'' module.
In [6]: import matplotlib.pyplot as plt
In [7]: plt.plot(range(10), f, 'ob')
Out[7]: []
In order to actually see the plot you need to do:
In [8]: plt.show()
As an alternative, you can put matplotlib in interactive mode before plotting using the command ''plt.ion()''.
Also note that plotting functions accept either Python lists or ''numpy'' arrays.
We can add a second plot to the same axes by calling //plot// again:
In [16]: plt.plot(x, x, 'dr')