Pandas Introduction - GeeksforGeeks

Pandas Introduction

Last Updated : 23 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Pandas is a powerful and open-source Python library. The Pandas library is used for data manipulation and analysis. Pandas consist of data structures and functions to perform efficient operations on data.

This free tutorial will cover an overview of Pandas, covering the fundamentals of Python Pandas.

What is Pandas Libray in Python?

Pandas is a powerful and versatile library that simplifies the tasks of data manipulation in Python.

Pandas is well-suited for working with tabular data, such as spreadsheets or SQL tables.

The Pandas library is an essential tool for data analysts, scientists, and engineers working with structured data in Python.

Did you know?

Pandas name is derived from “panel data” and is also refered as “Python Data Analysis“.

What is Python Pandas used for?

The Pandas library is generally used for data science, but have you wondered why? This is because the Pandas library is used in conjunction with other libraries that are used for data science.

It is built on top of the NumPy library which means that a lot of the structures of NumPy are used or replicated in Pandas.

The data produced by Pandas is often used as input for plotting functions in Matplotlib, statistical analysis in SciPy, and machine learning algorithms in Scikit-learn.

You must be wondering, Why should you use the Pandas Library. Python’s Pandas library is the best tool to analyze, clean, and manipulate data.

Here is a list of things that we can do using Pandas.

  • Data set cleaning, merging, and joining.
  • Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data.
  • Columns can be inserted and deleted from DataFrame and higher-dimensional objects.
  • Powerful group by functionality for performing split-apply-combine operations on data sets.
  • Data Visualization.

Getting Started with Pandas

Let’s see how to start working with the Python Pandas library:

Installing Pandas

The first step in working with Pandas is to ensure whether it is installed in the system or not.  If not, then we need to install it on our system using the pip command.

Follow these steps to install Pandas:

Step 1: Type ‘cmd’ in the search box and open it.
Step 2: Locate the folder using the cd command where the python-pip file has been installed.
Step 3: After locating it, type the command:

pip install pandas

For more reference, take a look at this article on installing pandas follows.

Importing Pandas

After the Pandas have been installed in the system, you need to import the library. This module is generally imported as follows:

import pandas as pd

Note: Here, pd is referred to as an alias for the Pandas. However, it is not necessary to import the library using the alias, it just helps in writing less code every time a method or property is called. 

Data Structures in Pandas Library

Pandas generally provide two data structures for manipulating data. They are:

  • Series
  • DataFrame

Pandas Series

A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects, etc.). The axis labels are collectively called indexes.

The Pandas Series is nothing but a column in an Excel sheet. Labels need not be unique but must be of a hashable type.

The object supports both integer and label-based indexing and provides a host of methods for performing operations involving the index.

Pandas Series

Creating a Series

Pandas Series is created by loading the datasets from existing storage (which can be a SQL database, a CSV file, or an Excel file).

Pandas Series can be created from lists, dictionaries, scalar values, etc.

Example: Creating a series using the Pandas Library.

Python3




import pandas as pd 
import numpy as np
  
# Creating empty series 
ser = pd.Series() 
print("Pandas Series: ", ser) 
  
# simple array 
data = np.array(['g', 'e', 'e', 'k', 's']) 
    
ser = pd.Series(data) 
print("Pandas Series:\n", ser)


Output

Pandas Series: Series([], dtype: float64)
Pandas Series:
0 g
1 e
2 e
3 k
4 s
dtype: object

For more information, refer to Creating a Pandas Series

Pandas DataFrame

Pandas DataFrame is a two-dimensional data structure with labeled axes (rows and columns).

Creating DataFrame

Pandas DataFrame is created by loading the datasets from existing storage (which can be a SQL database, a CSV file, or an Excel file).

Pandas DataFrame can be created from lists, dictionaries, a list of dictionaries, etc.

Example: Creating a DataFrame Using the Pandas Library

Python3




import pandas as pd 
    
# Calling DataFrame constructor 
df = pd.DataFrame() 
print(df)
  
# list of strings 
lst = ['Geeks', 'For', 'Geeks', 'is', 'portal', 'for', 'Geeks'
    
# Calling DataFrame constructor on list 
df = pd.DataFrame(lst) 
print(df)


Output:

Empty DataFrame
Columns: []
Index: []
0
0 Geeks
1 For
2 Geeks
3 is
4 portal
5 for
6 Geeks

Note: For more information, refer to Creating a Pandas DataFrame 

How to run the Pandas Program in Python?

The Pandas program can be run from any text editor, but it is recommended to use Jupyter Notebook for this, as Jupyter gives you the ability to execute code in a particular cell rather than the entire file.

Jupyter also provides an easy way to visualize Pandas DataFrame and plots.

Note: For more information on Jupyter Notebook, refer to How To Use Jupyter Notebook – An Ultimate Guide 

Conclusion

This tutorial provides a solid foundation for mastering the Pandas library, from basic operations to advanced techniques. We have also covered the Pandas data structures (series and DataFrame) with examples.

After completing this tutorial, you will gain a complete idea of what is Python Pandas. What is Pandas used for? and how to use Python Pandas.

As you apply these skills to your projects, you will discover how Pandas enhances your ability to explore, clean, and analyze data, making it an indispensable tool in the data scientist’s toolkit.



Previous Article
Next Article

Similar Reads

Python | pandas.to_markdown() in Pandas
With the help of pandas.to_markdown() method, we can get the markdown table from the given dataframes by using pandas.to_markdown() method. Syntax : pandas.to_markdown() Return : Return the markdown table. Example #1 : In this example we can see that by using pandas.to_markdown() method, we are able to get the markdown table from the given datafram
1 min read
Python Pandas - pandas.api.types.is_file_like() Function
In this article, we will be looking toward the functionality of pandas.api.types.is_file_like() from the pandas.api.types module with its various examples in the Python language. An object must be an iterator AND have a read or write method as an attribute to be called file-like. It is important to note that file-like objects must be iterable, but
2 min read
Pandas DataFrame hist() Method | Create Histogram in Pandas
A histogram is a graphical representation of the numerical data. Sometimes you'll want to share data insights with someone, and using graphical representations has become the industry standard. Pandas.DataFrame.hist() function plots the histogram of a given Data frame. It is useful in understanding the distribution of numeric variables. This functi
4 min read
Pandas DataFrame iterrows() Method | Pandas Method
Pandas DataFrame iterrows() iterates over a Pandas DataFrame rows in the form of (index, series) pair. This function iterates over the data frame column, it will return a tuple with the column name and content in the form of a series. Example: Python Code import pandas as pd df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 32, 3
2 min read
Pandas DataFrame interpolate() Method | Pandas Method
Python is a great language for data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.  Python Pandas interpolate() method is used to fill NaN values in the DataFrame or Series using various interpolation techniques to fill the m
3 min read
Pandas DataFrame duplicated() Method | Pandas Method
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas duplicated() method identifies duplicated rows in a DataFrame. It returns a boolean series which is True only for unique rows. Ex
3 min read
Pandas Series dt.day_name() Method | Get Day From Date in Pandas
Pandas dt.day_name() method returns the day names of the DateTime Series objects with specified locale. Example C/C++ Code import pandas as pd sr = pd.Series(['2012-12-31 08:45', '2019-1-1 12:30', '2008-02-2 10:30', '2010-1-1 09:25', '2019-12-31 00:00']) idx = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5'] sr.index = idx sr = pd.to_datetime(sr) resu
2 min read
Pandas Series dt.weekday | Find Day of the Week in Pandas
The dt.weekday attribute returns the day of the week. It is assumed the week starts on Monday, which is denoted by 0, and ends on Sunday which is denoted by 6. Example C/C++ Code import pandas as pd sr = pd.Series(['2012-10-21 09:30', '2019-7-18 12:30', '2008-02-2 10:30', '2010-4-22 09:25', '2019-11-8 02:22']) idx = ['Day 1', 'Day 2', 'Day 3', 'Day
2 min read
Pandas Series dt.weekofyear Method | Get Week of Year in Pandas Series
The dt.weekofyear attribute returns a Series containing the week ordinal of the year in the underlying data of the given series object. Example C/C++ Code import pandas as pd sr = pd.Series(['2012-10-21 09:30', '2019-7-18 12:30', '2008-02-2 10:30', '2010-4-22 09:25', '2019-11-8 02:22']) idx = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5'] sr.index =
2 min read
Pandas Series dt.dayofyear | Get Day of Year in Pandas
Pandas dt.dayofyear attribute returns the ordinal day of the year in the underlying DateTime data in the given Series object. Example: C/C++ Code import pandas as pd sr = pd.Series(['2012-10-21 09:30', '2019-7-18 12:30', '2008-02-2 10:30', '2010-4-22 09:25', '2019-11-8 02:22']) idx = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5'] sr.index = idx sr =
2 min read
Article Tags :
Practice Tags :