Introduction to pandas
Pandas is a powerful Python library for data analysis and manipulation. It provides data structures like Series (1D labeled arrays) and DataFrame (2D tables) that simplify handling structured data. With Pandas, you can easily read, clean, transform, and analyze data from various sources, including CSV, Excel, and databases.
Install pandas
You can use the pip
command to install pandas.
pip install pandas
How to read csv files
Here is a small csv file. Let’s see how to open and work with this pandas.
ID,Name,Age,Salary,City
1,Alice,25,50000,New York
2,Bob,30,55000,Los Angeles
3,Charlie,35,60000,Chicago
4,David,40,65000,Houston
5,Eve,28,52000,San Francisco
6,Frank,33,58000,Seattle
7,Grace,29,53000,Boston
8,Hank,45,70000,Denver
9,Ivy,31,56000,Miami
10,Jack,38,62000,Atlanta
read the csv file using read_csv()
read_csv()
converts the given file to a pandas Dataframe.
And a pandas dataframe helps us to preprocess ( filter, edit, sort ) the given file.
import pandas as pd
df = pd.read_csv('sample_data.csv')
Here same_data.csv
is a file that exists in the same directory.
df
is a pandas dataframe
Getting the first five and last five rows.
first_five = df.head(5)
print(first_five)
last_five =df.tail(5)
print(last_five)
ID Name Age Salary City
0 1 Alice 25 50000 New York
1 2 Bob 30 55000 Los Angeles
2 3 Charlie 35 60000 Chicago
3 4 David 40 65000 Houston
4 5 Eve 28 52000 San Francisco
ID Name Age Salary City
5 6 Frank 33 58000 Seattle
6 7 Grace 29 53000 Boston
7 8 Hank 45 70000 Denver
8 9 Ivy 31 56000 Miami
9 10 Jack 38 62000 Atlanta
the tail(n)
or head(n)
takes in an argument which is the number of rows you want.
Finding the mean of a column.
(mean: average of n numbers)
df['Salary'].mean()
58100.0
You might have noticied the df['Salary']
syntax. Here is how to access an entire column in pandas.
After selecting the ‘Salary’ column, we can find mean with .mean()
Sort a dataframe by a column’s values.
Let’s sort all the rows by the column Age
df.sort_values('Age')
ID Name Age Salary City
0 1 Alice 25 50000 New York
4 5 Eve 28 52000 San Francisco
6 7 Grace 29 53000 Boston
1 2 Bob 30 55000 Los Angeles
8 9 Ivy 31 56000 Miami
5 6 Frank 33 58000 Seattle
2 3 Charlie 35 60000 Chicago
9 10 Jack 38 62000 Atlanta
3 4 David 40 65000 Houston
7 8 Hank 45 70000 Denver