- Python Pandas Tutorial
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Introduction to Data Structures
- Python Pandas - Series
- Python Pandas - DataFrame
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Descriptive Statistics
- Function Application
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Sorting
- Working with Text Data
- Options & Customization
- Indexing & Selecting Data
- Statistical Functions
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Missing Data
- Python Pandas - GroupBy
- Python Pandas - Merging/Joining
- Python Pandas - Concatenation
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Categorical Data
- Python Pandas - Visualization
- Python Pandas - IO Tools
- Python Pandas - Sparse Data
- Python Pandas - Caveats & Gotchas
- Comparison with SQL
- Python Pandas Useful Resources
- Python Pandas - Quick Guide
- Python Pandas - Useful Resources
- Python Pandas - Discussion
Python Pandas - IO Tools
The Pandas I/O API is a set of top level reader functions accessed like pd.read_csv() that generally return a Pandas object.
The two workhorse functions for reading text files (or the flat files) are read_csv() and read_table(). They both use the same parsing code to intelligently convert tabular data into a DataFrame object −
pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None
pandas.read_csv(filepath_or_buffer, sep='\t', delimiter=None, header='infer', names=None, index_col=None, usecols=None
Here is how the csv file data looks like −
S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900
Save this data as temp.csv and conduct operations on it.
S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900
Save this data as temp.csv and conduct operations on it.
read.csv
read.csv reads data from the csv files and creates a DataFrame object.
import pandas as pd df=pd.read_csv("temp.csv") print df
Its output is as follows −
S.No Name Age City Salary 0 1 Tom 28 Toronto 20000 1 2 Lee 32 HongKong 3000 2 3 Steven 43 Bay Area 8300 3 4 Ram 38 Hyderabad 3900
custom index
This specifies a column in the csv file to customize the index using index_col.
import pandas as pd df=pd.read_csv("temp.csv",index_col=['S.No']) print df
Its output is as follows −
S.No Name Age City Salary 1 Tom 28 Toronto 20000 2 Lee 32 HongKong 3000 3 Steven 43 Bay Area 8300 4 Ram 38 Hyderabad 3900
Converters
dtype of the columns can be passed as a dict.
import pandas as pd df = pd.read_csv("temp.csv", dtype={'Salary': np.float64}) print df.dtypes
Its output is as follows −
S.No int64 Name object Age int64 City object Salary float64 dtype: object
By default, the dtype of the Salary column is int, but the result shows it as float because we have explicitly casted the type.
Thus, the data looks like float −
S.No Name Age City Salary 0 1 Tom 28 Toronto 20000.0 1 2 Lee 32 HongKong 3000.0 2 3 Steven 43 Bay Area 8300.0 3 4 Ram 38 Hyderabad 3900.0
header_names
Specify the names of the header using the names argument.
import pandas as pd df=pd.read_csv("temp.csv", names=['a', 'b', 'c','d','e']) print df
Its output is as follows −
a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 HongKong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900
Observe, the header names are appended with the custom names, but the header in the file has not been eliminated. Now, we use the header argument to remove that.
If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.
import pandas as pd df=pd.read_csv("temp.csv",names=['a','b','c','d','e'],header=0) print df
Its output is as follows −
a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 HongKong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900
skiprows
skiprows skips the number of rows specified.
import pandas as pd df=pd.read_csv("temp.csv", skiprows=2) print df
Its output is as follows −
2 Lee 32 HongKong 3000 0 3 Steven 43 Bay Area 8300 1 4 Ram 38 Hyderabad 3900