Notes of PDADW: Chapter 4


Python for Data Analysis - Data Wrangling Chapter 4

While NumPy provides the computational foundation for these operations, you will likely want to use pandas as your basis for most kinds of data analysis (especially for structured or tabular data) as it provides a rich, high-level interface making most common data tasks very concise and simple.

There are a number of other functions for creating new arrays.

  1. zeros and ones create arrays of 0’s or 1’s, respectively, with a given length or shape.

  2. empty creates an array without initializing its values to any particular value.

An important first distinction from lists is that array slices are views on the original array.

import numpy as np

aa = np.arrange(10)  # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a3 = aa[2:8]
a3[2] = 10
a3  # array([ 2,  3, 10,  5,  6,  7])
aa  # array([ 0,  1,  2,  3, 10,  5,  6,  7,  8,  9])

In a high-dimensional numpy array, individual elements can be accessed recursively, or by passing a comma-separated list of indices to select individual elements.

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[0][2]  # 3
arr2d[0, 2]  # 3

In boolean indexing, either != or - can be used to negate the condition

names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
data = randn(7, 4)
data[names == 'Bob']  # array([[-0.048 , 0.5433, -0.2349, 1.2792], [ 2.1452, 0.8799, -0.0523, 0.0672]])

A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

Others, such as add or maximum, take 2 arrays (thus, binary ufuncs) and return a single array as the result.

In general, vectorized array operations will often be one or two (or more) orders of magnitude faster than their pure Python equivalents, with the biggest impact in any kind of numerical computations.

arr = randn(4, 4)
np.where(arr > 0, 2, -2)

np.save and np.load are the two workhorse functions for efficiently saving and loading array data on disk.

Arrays are saved by default in an uncompressed raw binary format with file extension .npy.

Posted with : python, numpy, notes

Related Posts