Deleting Missing Values

Goal

This post aims to introduce how to delete missing values using pandas in python.

Libraries

In [3]:
import pandas as pd
import numpy as np

Create DataFrame

In [13]:
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
df
Out[13]:
A B C D
0 -1.111902 1.095301 0.140572 0.541279
1 1.197394 0.173438 -0.369171 0.861130
2 1.472260 2.063012 -1.214586 -1.709280
3 -2.990860 -0.315950 -0.521123 -0.889226
4 -0.148088 0.891630 -0.422730 -0.095359
5 0.297797 -0.617062 -0.144902 -1.628348
In [14]:
# create missing values
df.loc[3, 'B'] = None
df.loc[4, 'D'] = None
df
Out[14]:
A B C D
0 -1.111902 1.095301 0.140572 0.541279
1 1.197394 0.173438 -0.369171 0.861130
2 1.472260 2.063012 -1.214586 -1.709280
3 -2.990860 NaN -0.521123 -0.889226
4 -0.148088 0.891630 -0.422730 NaN
5 0.297797 -0.617062 -0.144902 -1.628348

Deleting Missing Values

In [15]:
# identify the index by fillna
df.isna()
Out[15]:
A B C D
0 False False False False
1 False False False False
2 False False False False
3 False True False False
4 False False False True
5 False False False False
In [21]:
df.isna().any(axis=1)
Out[21]:
0    False
1    False
2    False
3     True
4     True
5    False
dtype: bool
In [23]:
# Deleteging the rows containing NaN
df.loc[~df.isna().any(axis=1), :]
Out[23]:
A B C D
0 -1.111902 1.095301 0.140572 0.541279
1 1.197394 0.173438 -0.369171 0.861130
2 1.472260 2.063012 -1.214586 -1.709280
5 0.297797 -0.617062 -0.144902 -1.628348
In [24]:
# Deleteging the ciks containing NaN
df.loc[:, ~df.isna().any(axis=0)]
Out[24]:
A C
0 -1.111902 0.140572
1 1.197394 -0.369171
2 1.472260 -1.214586
3 -2.990860 -0.521123
4 -0.148088 -0.422730
5 0.297797 -0.144902

Comments

Comments powered by Disqus