Data Science

Complete Numpy Tutorial from Scratch – Part 1


NumPy is the most simple and efficient tool for scientific computation and python data manipulation.

If you’re working on data analysis or machine learning projects, it’s almost mandatory to have a solid numPy understanding. Since other data processing packages (like pandas) are developed on top of numPy and the scikit-learn tool that is used to develop machine learning applications works extensively with numPy as well.

Now, the question is “What does Numpy array provide?”

At its core, numPy offers excellent ndarray elements, short for n-dimensional arrays.
You may store several elements of the same data type in a ‘ndarray’, or ‘array.’ It is the facilities around the array object that allows numPy so easy for math and data manipulation. Moreover, it is very faster than a simple python list. For demonstration, I show you some piece of code, in which I calculate the time for iterating 1000000 elements of simple list vs numpy list and you see a huge difference in time between it.

# this code is creating a list of 1000000 elements and iterate 
# over it
%timeit [i+1 for i in range(1000000)]

#> 121 ms ± 2.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# this code is creating a numpy array of 1000000 elements and 
# iterate over it
%timeit np.arange(1000000)+1

#> 5.5 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You can clearly see a huge difference between the times of both lists. i.e. simple list vs NumPy lists.

Now You may think, ‘I can store numbers and other items in a python list and do all kinds of computation and manipulating by list comprehensions. What do I need to do with a numPy array? “

Well, there is a lot of significance of numPy array over lists.

Let’s see how to build a numPy array first to grasp this.

How to create a numPy array?

There are several ways to create a numPy array, most of which will be covered when you read it. However, one of the most popular methods is to construct a list or a list like an entity by forwarding it to the np.array method.

So, First, we have to import numPy library, like this:

import numpy as np

So, let’s create our first numpy array.

#creating 1d NumPy array
# Print the array
print (a)

You can check the type, it will show you numpy.ndarray:

print (type(a))
output: <class 'numpy.ndarray'>

The main difference between the numPy array and the list is that the numPy arrays are built to perform vectorized operations whereas the python list is not.
This ensures that if you implement a function, it will be executed on any element in the array rather than on the entire array items.

Suppose you like to add number 2 to every element in the NumPy array. The logical way to do so is like this:

# Add 2 to each element of a
a + 2

#> array([2, 3, 4, 5, 6, 7])

You may also create an numPy array with arange method.


#> array([1,2,3,4,5,6,7,8,9])

If you want to load data from the file and store it in a numPy array, then you have to use load function.

#load data.txt file which is comma separated

You can create an empty numPy array with default garbage values with an empty method.

print (c)

array([[0.01388889, 0.        , 0.        ],
       [0.        , 0.01388889, 0.        ],
       [0.        , 0.        , 1.        ]])

You can also create identity matrix with numPy with identity function.

print (d)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

Creating a 2-d Array:

You can create an n-dimensional array with numPy. For example, you want to create 2d array so, just pass it the array-like matrix, it will create 2d numPy array for you.

# Create a 2d array
ar2d = np.array([[0,1,2], [3,4,5], [6,7,8]])

#> array([[0, 1, 2],
#>        [3, 4, 5],
#>        [6, 7, 8]])

Another aspect is that once a numPy array is produced, you can not increase its size. You’ll have to build a fresh list to do so. Yet such a change in size activity is common in the array.

You may also define the datatype by setting the argument for the dtype. Some of the more widely used numpy dtypes are ‘float’, ‘int’, ‘bool’, ‘str’, and ‘object’
You can opt to use one of ‘float32,’ ‘float64,’ ‘int8,’ ‘int16’ or ‘int32’ to manage memory allocations.

# Create a float 2d array
ar2d_f = np.array([[0,1,2], [3,4,5], [6,7,8]], dtype='float')

#> array([[ 0.,  1.,  2.],
#>        [ 3.,  4.,  5.],
#>        [ 6.,  7.,  8.]])

The decimal point after each number indicates the type of float data. You can also convert it to another data type by using the astype method.

# Convert it into 'int' datatype
#> array([[0, 1, 2],
#>        [3, 4, 5],
#>        [6, 7, 8]])

The numpy array must have all the items of the same data type, unlike the lists. This is another important difference. however, if you want to store multiple type data in the same list then you can use an object as a type.

# Create an object array to store numbers as well as strings
ar1d_obj = np.array([3, 'b'], dtype='object')

#> array([3, 'b'], dtype=object)


To summarise, the main differences of NumPy array with python lists are:

  1. Numpy arrays support vectorized operations, while lists don’t.
  2. Once a Numpy array is created, you cannot change its size. You will have to create a new array or overwrite the existing one.
  3. Every Numpy array has one and only one dtype. All items in it should be of that type.
  4. An equivalent numpy array occupies much less space than a python list of lists.

Have a Look on complete video tutorial on NumPy for more Detail:

Recommended posts:

Top 11 Free Structured Courses to Start Career in Machine Learning

10 useful linear algebra resource for machine learning

Teachable machine 3.0 by Google – Train a model even without knowing coding skills

Leave a Comment