Section 2: The numpy
Library
Table of Contents >
Chapter 7 > Section 2
We have seen how a Python list can be used to represent information of
arbitrary size. Unfortunately, processing very large amounts of data
stored in Python lists can be slow. The numpy library provides a more
efficient representation for a large amount of data in the form of an
array. Unlike Python lists which are heterogeneous (a given list can
have entries of different types), arrays are homogeneous - all entries must
be of the same type.
Creating numpy
Arrays
You can create numpy
arrays from existing Python lists or
tuples using the array()
function:
>>> import numpy as np
>>> x = np.array([1, 2, 3])
>>> y = np.array((4.0, 5.0, 6.0))
>>> type(x)
numpy.ndarray
>>> type(y)
numpy.ndarray
Note that the type of x
and y
is numpy.ndarray
.
The nd
stands for n-dimensional. The arrays that we just
created are 1-dimensional but higher dimensional arrays are possible and
we'll consider them a little later on in this chapter.
The numpy
library has a function arange()
that
is analogous to Python's range()
function, see:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html#numpy.arange
Note the values produced by each of the following calls:
>>> np.arange(5)
array([0, 1, 2, 3, 4])
>>> np.arange(2, 5)
array([2, 3, 4])
>>> np.arange(1, 10, 2)
array([1, 3, 5, 7, 9])
>>> np.arange(1, 5, 0.5)
array([ 1. , 1.5, 2. , 2.5, 3. , 3.5,
4. , 4.5])
Unlike the Python range()
function, numpy
's arange()
works with arguments of type float
.
Broadcasting Operations
An important way in which numpy
arrays differ from Python
lists is that operations on numpy
arrays are broadcast across
elements of the array.
Note the following comparison of operations on Python lists and numpy
arrays. Let's start with Python lists:
>>> lst1 = list(range(5))
>>> lst2 = list(range(5, 10))
>>> lst1 + lst2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> lst1 + 5
TypeError: can only concatenate list (not "int") to list
Now let's try the same with numpy
arrays...
>>> arr1 = np.arange(5)
>>> arr2 = np.arange(5, 10)
>>> arr1 + arr2
array([ 5, 7, 9, 11, 13])
Compare the last result with the one obtained above for lst1 + lst2
.
When applied to Python lists, the +
operator creates a new
list that consists of all the elements of lst1
followed by all
the elements in lst2
. However, when applied to numpy
arrays, the +
operator is broadcast to produce the
sum of the data in the arrays, element-by-element. Here are a
few more examples of broadcasting operations:
>>> arr1 + 5 # broadcast '+ 5' across
all elements of arr1
array([5, 6, 7, 8, 9])
>>> arr1 < 3 # broadcast '< 3' across all
elements of arr1
array([ True, True, True, False, False], dtype=bool)
The numpy
library has a very large number of functions that
operate on arrays. See the collections of mathematics:
http://docs.scipy.org/doc/numpy/reference/routines.math.html
and statistics:
http://docs.scipy.org/doc/numpy/reference/routines.statistics.html
functions (numpy
calls them routines), for
example.
Some of these functions are such that they broadcast an operation across all
items in the array to which they are applied:
np.sin(arr1) # produce the sin()
of each element in arr1
while others produce aggregate results:
np.mean(arr1) # produce the mean of all
the entries in arr1
The documentation describes the purpose of each function and the value
produced.
Indexing Operations
You can index into a numpy
array exactly the same way as you
would with any sequence type in Python:
>>> x = np.arange(5)
>>> x[0]
0
>>> x[1]
1
>>> x[-1]
4
Slicing Operations
Slicing operations can be applied to numpy
arrays using the
same syntax as for Python lists. Recall that a slicing operation on a
Python list produces a copy of the list. Consequently,
changes made to the copy are not reflected in the original (and vice
versa). However, a slicing operation on a numpy
array
produces a view onto the original array. Hence, any change
made to the view is made to the original and vice-versa – BEWARE!
arr[start:end:skip]
- produces a view onto the numpy
array arr
starting at index start
, skipping ahead skip
entries every time, up to but not including the element at index end
.
Note that all of these arguments are optional. If start
is not provided, it assumes the default value of 0
. If end
is not provided, it assumes the default value of arr.size
(the
size of the array). If skip
is not provided, it assumes
the default value of 1
.
Consider the following operations on lists:
>>> l = [1, 2, 3, 4]
>>> m =
l[:]
#copy using a slicing operation
>>> m[0] =
7
#change first item in m
>>> m
[7, 2, 3, 4]
#see that first item has been changed
>>> l
[1, 2, 3,
4]
#but original list has not been modified
and compare them with similar operations on numpy
arrays:
>>> r = np.array([1, 2, 3, 4])
>>> s =
r[:]
#s is a view onto the whole of r
>>> s[0] =
7
#change first item in s
>>> s
array([7, 2, 3, 4]) #see that
first item has been changed
>>> r
array([7, 2, 3, 4]) #original
array has also been modified
This can be incredibly useful, particularly in cases where you want to apply
an operation to a subset of the entries of an array. You simply create
a view onto the array, then apply the desired operation to the view:
>>> r = np.arange(1, 5)
>>> s =
r[::2] #view
consisting of every 2nd element of r
>>> s[:] =
0
#assign 0 to every element of s
>>> r
array([0, 2, 0, 4]) #every 2nd element of r has the
value 0
Fancy Indexing
There are a couple of different forms of "fancy" indexing that can be
performed on numpy
arrays.
Arrays of integers can be used to index into other arrays:
>>> ia = np.array([0, 3, 4])
>>> a = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>>
a[ia]
array([1.0, 4.0,
5.0])
#view onto a of entries at
#indexes 0, 3 & 4 only
>>> a[ia] = -1
>>> a
array([-1, 2, 3, -1, -1, 6]) #entries
at index 0, 3 and 4
#have the value -1
Arrays of Booleans can be used to index into other arrays. The result
is an array containing the elements of the indexed array that correspond
only to True
values in the Boolean array.
>>> ba = np.array([True, False, True, True,
False, True])
>>> a = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> a[ba]
array([1.0, 3.0, 4.0, 6.0]) #view onto
a of entries that
#correspond to True values in ba
>>> a[ba] = 0
>>>
a
array([0, 2, 0, 0, 5,
0]) #entries that
correspond to True
#values in ba are now 0