real panda
June 17, 2019 * Python Programming

Pandas - Remove or drop columns from Pandas dataframe

Temporary columns need to be created within a dataframe, which later requires removal. Following examples show how to remove or drop columns.

Drop columns and create a copy

A list of columns can be dropped from a dataframe. The resulting entity is assigned to a new variable. This can cause strain on system memory for large datasets.

Drop or remove data columns

#read test data from csv file and display top 3 rows
df1 = pd.read_csv(
    'data_deposits.csv'
)
print(df1.head(3))

#list of columns to remove
cols_to_drop = [
    'city',
    'deposit'
]

#remove two columns and show final dataframe
df2 = df1.drop(columns=cols_to_drop, axis=1)
print(df2.head(3))
--[df1 before column removal]----------------
  firstname lastname    city age deposit
0    Herman  Sanchez   Miami  52    9300
1      Phil   Parker   Miami  45    5010
2    Bradie  Garnett  Denver  36    6300

--[df2 post column removal]------------------
  firstname lastname age
0    Herman  Sanchez  52
1      Phil   Parker  45
2    Bradie  Garnett  36
---------------------------------------------------

Drop inplace without reassignment to new variable

Removing columns with the inplace option is better for memory management.

Remove columns from dataframe inplace

#read test data from csv file and display top 3 rows
df1 = pd.read_csv(
    'data_deposits.csv'
)
print(df1.head(3))

#list of columns to remove
cols_to_drop = [
    'city',
    'deposit'
]

#drop columns inplace
df1.drop(
    columns=cols_to_drop, 
    axis=1, inplace=True
)
--[df1 before column removal]----------------
  firstname lastname    city age deposit
0    Herman  Sanchez   Miami  52    9300
1      Phil   Parker   Miami  45    5010
2    Bradie  Garnett  Denver  36    6300

--[df1 post column removal]------------------
  firstname lastname age
0    Herman  Sanchez  52
1      Phil   Parker  45
2    Bradie  Garnett  36
---------------------------------------------------

The result is the same. We just do not need any new variables. The first example could have reassigned the dropped column dataframe back to df1. In that case the memory consumption would be a temporary or transient phenomena.

References