Exploring pandas.isnull, notna, and notnull for Missing Value Detection
Detecting Missing Values in pandas: isnull, notna, and notnull
Pandas provides several functions to identify missing data in your datasets. These functions are essential for data preprocessing and cleaning. This section covers pandas.isnull, pandas.notna, and pandas.notnull.
pandas.isnull
Syntax:
pandas.isnull(obj)
Parameters:
obj: scalar or array-like object to check for missing values.
Returns:
- bool or array-like of bool: For scalar input, returns a boolean. For array-like input (DataFrame, Series, Index), returns a same-shaped object of booleans indicating if each element is missing.
Functionality:
Detects missing values. Missing values include NaN in numeric arrays, None or NaN in object arrays, and NaT in datetime-like arrays.
Code Examples:
- Detect missing in a DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'X': [1, np.nan, 3],
'Y': [np.nan, 2, np.nan],
'Z': [1, 2, 3]
})
print(pd.isnull(df))
Output:
X Y Z
0 False True False
1 True False False
2 False True False
- Detect missing in a Series:
series = pd.Series([10, np.nan, 30, None])
print(pd.isnull(series))
Output:
0 False
1 True
2 False
3 True
dtype: bool
- Detect missing in a scalar:
print(pd.isnull(np.nan)) # True
print(pd.isnull(42)) # False
pandas.notna
Syntax:
pandas.notna(obj)
Parameters:
obj: scalar or array-like object to check for non-missing values.
Returns:
- bool or array-like of bool: Returns
Truewhere values are not missing.
Functionality:
This is the inverse of isnull. It identifies valid (non-missing) entries.
Code Examples:
- Check non-missing in a DataFrame:
df = pd.DataFrame({
'X': [1, np.nan, 3],
'Y': [np.nan, 2, np.nan],
'Z': [1, 2, 3]
})
print(pd.notna(df))
Output:
X Y Z
0 True False True
1 False True True
2 True False True
- Check non-missing in a Series:
series = pd.Series([5, None, 15, np.nan])
print(pd.notna(series))
Output:
0 True
1 False
2 True
3 False
dtype: bool
- Check non-missing in a scalar:
print(pd.notna(np.nan)) # False
print(pd.notna(7)) # True
pandas.notnull
Syntax:
pandas.notnull(obj)
Parameters:
obj: scalar or array-like object.
Returns:
- bool or array-like of bool: Identical to
notna.
Functionality:
notnull is an alias for notna. It performs exactly the same operation.
Code Examples:
- Apply
notnullto a DataFrame:
df = pd.DataFrame({
'A': [np.nan, 2, 3],
'B': [1, np.nan, 3]
})
print(pd.notnull(df))
Output:
A B
0 False True
1 True False
2 True True
- Apply
notnullto a Series:
series = pd.Series([1.0, np.nan, 3.0])
print(pd.notnull(series))
Output:
0 True
1 False
2 True
dtype: bool
- Apply
notnullto a scalar:
print(pd.notnull(None)) # False
print(pd.notnull('text')) # True
Summary
These three functions are fundamental for handling missing data. Use isnull to locate missing values and notna/notnull to find valid entries. They work consistently across DataFrames, Series, Index objects, and scalars.