Numpy Tutorial for Beginners: Get Started Fast – wiki词典

NumPy Tutorial for Beginners: Get Started Fast

NumPy, short for Numerical Python, is the cornerstone of scientific computing in Python. It provides an efficient and powerful way to work with large, multi-dimensional arrays and matrices, along with a comprehensive collection of high-level mathematical functions to operate on these arrays. If you’re venturing into data science, machine learning, or any form of numerical analysis in Python, understanding NumPy is essential.

Why Use NumPy?

While Python’s built-in lists can store collections of numbers, they fall short when it comes to numerical operations on large datasets. Here’s why NumPy is indispensable:

  1. Efficiency: NumPy arrays are implemented in C, making them significantly faster and more memory-efficient than standard Python lists, especially for large datasets. This speed comes from storing elements in contiguous blocks of memory and optimized underlying C code.
  2. Multi-dimensionality: NumPy’s core object is the ndarray (N-dimensional array), which can represent vectors (1D), matrices (2D), and higher-dimensional data with ease. This is crucial for handling complex data structures common in scientific applications.
  3. Powerful Operations (Vectorization): NumPy provides a vast library of mathematical, statistical, and linear algebra functions that operate directly on entire arrays without the need for explicit loops. This concept, known as “vectorization,” not only speeds up computations but also makes your code cleaner and more readable.
  4. Integration: NumPy integrates seamlessly with almost all other scientific Python libraries like Pandas, SciPy, Matplotlib, and Scikit-learn, forming the foundation of the modern Python data science ecosystem.

Installation

Before you can start using NumPy, you need to install it. The most common and recommended way is using pip, Python’s package installer.

Open your terminal or command prompt and execute the following command:

bash
pip install numpy

If you have multiple Python versions installed, you might need to specify pip3:

bash
pip3 install numpy

Alternatively, if you’re looking for an all-in-one distribution that includes Python, NumPy, and many other scientific computing packages, consider installing the Anaconda Distribution.

Importing NumPy

Once installed, you’ll need to import NumPy into your Python scripts or interactive sessions. The conventional practice is to import it with the alias np:

python
import numpy as np

This allows you to access all NumPy functionalities using the np. prefix, which is short, widely recognized, and improves code readability.

Creating NumPy Arrays

NumPy arrays (ndarrays) can be created in several ways, catering to different needs:

1. From Python Lists or Tuples

This is the most common method for creating arrays from existing Python sequences.

“`python
import numpy as np

Create a 1-D array from a list

arr1d = np.array([1, 2, 3, 4, 5])
print(“1-D Array:”, arr1d)

Output: 1-D Array: [1 2 3 4 5]

Create a 2-D array (matrix) from a list of lists

arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(“2-D Array:\n”, arr2d)

Output:

2-D Array:

[[1 2 3]

[4 5 6]]

Specify the data type (dtype)

arr_float = np.array([1, 2, 3], dtype=’float32′)
print(“Array with float dtype:”, arr_float)

Output: Array with float dtype: [1. 2. 3.]

“`

2. Arrays Filled with Zeros

The np.zeros() function creates an array of a specified shape (a tuple) filled entirely with zeros.

“`python

Create a 2×3 array of zeros

zeros_arr = np.zeros((2, 3))
print(“Zeros Array:\n”, zeros_arr)

Output:

Zeros Array:

[[0. 0. 0.]

[0. 0. 0.]]

“`

3. Arrays Filled with Ones

Similar to np.zeros(), np.ones() creates an array of a given shape filled with ones.

“`python

Create a 3×2 array of ones

ones_arr = np.ones((3, 2))
print(“Ones Array:\n”, ones_arr)

Output:

Ones Array:

[[1. 1.]

[1. 1.]

[1. 1.]]

“`

4. Empty Arrays

np.empty() creates an array without initializing its entries. Its initial content is random and depends on the state of the memory at that time, making it slightly faster for very large arrays when you immediately intend to fill it with data.

“`python
empty_arr = np.empty((2, 2))
print(“Empty Array:\n”, empty_arr)

Output: (will contain arbitrary values)

Empty Array:

[[X. X.]

[X. X.]]

“`

5. Arrays with a Range of Numbers

np.arange() is analogous to Python’s built-in range() function but returns a NumPy array.

“`python

Create an array from 0 to 9 (exclusive of 10)

range_arr = np.arange(10)
print(“Arange Array (0-9):”, range_arr)

Output: Arange Array (0-9): [0 1 2 3 4 5 6 7 8 9]

Create an array from 0 to 10 with a step of 2

range_step_arr = np.arange(0, 10, 2)
print(“Arange Array (0-10, step 2):”, range_step_arr)

Output: Arange Array (0-10, step 2): [0 2 4 6 8]

“`

6. Linearly Spaced Arrays

np.linspace() creates an array with values that are evenly spaced over a specified interval. It takes the start, stop, and the number of elements as arguments.

“`python

Create an array of 5 linearly spaced values between 0 and 10

linspace_arr = np.linspace(0, 10, 5)
print(“Linspace Array:”, linspace_arr)

Output: Linspace Array: [ 0. 2.5 5. 7.5 10. ]

“`

7. Random Arrays

NumPy’s random module provides functions to create arrays filled with random numbers.

“`python

Create a 2×2 array of random floats between 0.0 and 1.0

random_float_arr = np.random.rand(2, 2)
print(“Random Float Array:\n”, random_float_arr)

Create a 2×3 array of random integers between 0 (inclusive) and 10 (exclusive)

random_int_arr = np.random.randint(0, 10, size=(2, 3))
print(“Random Integer Array:\n”, random_int_arr)
“`

Array Attributes

NumPy arrays come with several useful attributes that provide information about their structure and the data they hold:

“`python
a = np.array([[1, 2, 3], [4, 5, 6]])

print(“Array ‘a’:\n”, a)
print(“Number of dimensions (ndim):”, a.ndim) # Output: 2 (it’s a 2D array)
print(“Shape of array (shape):”, a.shape) # Output: (2, 3) (2 rows, 3 columns)
print(“Total number of elements (size):”, a.size) # Output: 6 (2 * 3 = 6 elements)
print(“Data type of elements (dtype):”, a.dtype) # Output: int32 or int64 (depending on system)
print(“Size of each element in bytes (itemsize):”, a.itemsize) # Output: 4 or 8 (e.g., 4 bytes for int32)
“`

Basic Operations

NumPy excels at performing fast, element-wise operations.

1. Element-wise Arithmetic Operations

Standard arithmetic operators apply element by element:

“`python
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print(“arr1 + arr2:”, arr1 + arr2) # Output: [5 7 9]
print(“arr1 – arr2:”, arr1 – arr2) # Output: [-3 -3 -3]
print(“arr1 * arr2:”, arr1 * arr2) # Output: [ 4 10 18]
print(“arr1 / arr2:”, arr1 / arr2) # Output: [0.25 0.4 0.5 ] (float division)
print(“arr1 ** 2:”, arr1 ** 2) # Output: [1 4 9] (element-wise exponentiation)
“`

2. Broadcasting

Broadcasting is a powerful feature that allows NumPy to perform operations on arrays of different shapes and sizes, provided they are compatible. NumPy automatically “stretches” the smaller array to match the larger one.

“`python
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30]) # A 1D array

print(“Array ‘a’:\n”, a)
print(“Array ‘b’:”, b)
print(“a + b (with broadcasting):\n”, a + b)

Output:

a + b (with broadcasting):

[[11 22 33]

[14 25 36]]

``
Here,
bis effectively added to each row ofa`.

3. Comparison Operations

Comparison operators also work element-wise and return a boolean array:

“`python
arr = np.array([1, 5, 2, 8, 3])
print(“arr > 3:”, arr > 3)

Output: arr > 3: [False True False True False]

“`

Indexing and Slicing

Accessing and modifying elements in NumPy arrays is similar to Python lists but with enhanced capabilities for multi-dimensional arrays.

1. Accessing Individual Elements

Elements are accessed using zero-based indexing. For multi-dimensional arrays, you provide an index for each dimension:

“`python
arr = np.array([10, 20, 30, 40, 50])
print(“First element:”, arr[0]) # Output: 10
print(“Last element (negative indexing):”, arr[-1]) # Output: 50

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(“Matrix element at [1, 2] (row 1, column 2):”, matrix[1, 2]) # Output: 6
“`

2. Slicing Arrays

Slicing allows you to extract a portion (sub-array) of an array. The syntax is [start:stop:step].

“`python
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(“Original array:”, arr)
print(“Elements from index 2 to 5 (exclusive):”, arr[2:6]) # Output: [2 3 4 5]
print(“Every other element:”, arr[::2]) # Output: [0 2 4 6 8]
print(“Reverse array:”, arr[::-1]) # Output: [9 8 7 6 5 4 3 2 1 0]

matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(“Original Matrix:\n”, matrix)
print(“First row:”, matrix[0, :]) # Output: [1 2 3] (All columns of the first row)
print(“First column:”, matrix[:, 0]) # Output: [1 4 7] (All rows of the first column)
print(“Sub-matrix (rows 0-1, cols 1-2):\n”, matrix[0:2, 1:3])

Output:

Sub-matrix (rows 0-1, cols 1-2):

[[2 3]

[5 6]]

“`

3. Boolean Indexing

You can use a boolean array to select elements that satisfy a certain condition. This is incredibly powerful for filtering data.

python
arr = np.array([10, 20, 30, 40, 50])
condition = arr > 25
print("Elements greater than 25:", arr[condition]) # Output: [30 40 50]
print("Elements greater than 25 (direct):", arr[arr > 25]) # Shorthand

Reshaping Arrays

Reshaping changes the dimensions of an array without changing its data.

1. Using reshape()

The reshape() function allows you to change the shape of an array into another compatible shape. The new shape must have the same total number of elements as the original array.

“`python
arr = np.arange(1, 10) # 1D array from 1 to 9
print(“Original 1D array:”, arr)

Output: Original 1D array: [1 2 3 4 5 6 7 8 9]

reshaped_arr = arr.reshape(3, 3) # Reshape to a 3×3 2D array
print(“Reshaped 3×3 array:\n”, reshaped_arr)

Output:

Reshaped 3×3 array:

[[1 2 3]

[4 5 6]

[7 8 9]]

“`

2. Using -1 for Automatic Dimension Calculation

You can use -1 in reshape() to let NumPy automatically calculate one of the dimensions based on the array’s size.

“`python
arr = np.arange(1, 13) # 1D array from 1 to 12
print(“Original 1D array:”, arr)

Output: Original 1D array: [ 1 2 3 4 5 6 7 8 9 10 11 12]

Reshape to 3 rows, let NumPy figure out the columns

reshaped_auto_cols = arr.reshape(3, -1)
print(“Reshaped (3, -1) array:\n”, reshaped_auto_cols)

Output:

Reshaped (3, -1) array:

[[ 1 2 3 4]

[ 5 6 7 8]

[ 9 10 11 12]]

Reshape to 2 columns, let NumPy figure out the rows

reshaped_auto_rows = arr.reshape(-1, 2)
print(“Reshaped (-1, 2) array:\n”, reshaped_auto_rows)

Output:

Reshaped (-1, 2) array:

[[ 1 2]

[ 3 4]

[ 5 6]

[ 7 8]

[ 9 10]

[11 12]]

“`

3. Flattening Arrays

To convert a multi-dimensional array into a 1D array, you can use reshape(-1) or ravel().

“`python
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(“Original Matrix:\n”, matrix)

Output:

Original Matrix:

[[1 2 3]

[4 5 6]]

flattened_reshape = matrix.reshape(-1)
print(“Flattened using reshape(-1):”, flattened_reshape) # Output: [1 2 3 4 5 6]

flattened_ravel = matrix.ravel()
print(“Flattened using ravel():”, flattened_ravel) # Output: [1 2 3 4 5 6]
“`

Common NumPy Functions

NumPy provides a wide array of mathematical and statistical functions that operate efficiently on arrays.

1. Mathematical Functions

These functions operate element-wise on arrays:

“`python
arr = np.array([0, np.pi/2, np.pi]) # np.pi is NumPy’s constant for Pi

print(“Sine of arr:”, np.sin(arr)) # Output: [0.00000000e+00 1.00000000e+00 1.22464680e-16] (approx 0, 1, 0)
print(“Square root of arr:”, np.sqrt(arr)) # Output: [0. 1.25331414 1.77245385]
print(“Exponential of arr:”, np.exp(arr)) # Output: [ 1. 4.81047738 23.14069263]
“`

2. Statistical Functions

NumPy offers functions for common statistical calculations across arrays or along specific axes (dimensions).

“`python
data = np.array([[1, 2, 3], [4, 5, 6]])

print(“Data:\n”, data)
print(“Minimum element:”, np.min(data)) # Output: 1
print(“Maximum element:”, np.max(data)) # Output: 6
print(“Sum of all elements:”, np.sum(data)) # Output: 21
print(“Mean of all elements:”, np.mean(data)) # Output: 3.5
print(“Standard deviation of all elements:”, np.std(data)) # Output: 1.707825127659933
print(“Median of all elements:”, np.median(data)) # Output: 3.5

Operations along an axis

print(“Sum along columns (axis=0):”, np.sum(data, axis=0)) # Output: [5 7 9] (sum of each column)
print(“Mean along rows (axis=1):”, np.mean(data, axis=1)) # Output: [2. 5.] (mean of each row)
“`

Conclusion

This tutorial has guided you through the fundamental aspects of NumPy, covering its importance, installation, various methods for array creation, essential array attributes, basic element-wise operations, advanced indexing and slicing techniques, and array reshaping. You also explored some of the most commonly used mathematical and statistical functions.

NumPy’s efficiency and comprehensive features make it an indispensable tool for anyone working with numerical data in Python. This foundational knowledge will serve as a strong stepping stone for diving into more advanced topics like linear algebra operations, advanced array manipulation, and integrating NumPy with other powerful libraries in the Python data science stack. Continue practicing with these concepts, and you’ll quickly unlock the full potential of numerical computing with NumPy.

滚动至顶部