Skip to content

pandas-nunique-constant-series-check (PD101)

Derived from the pandas-vet linter.

What it does

Check for uses of .nunique() to check if a Pandas Series is constant (i.e., contains only one unique value).

Why is this bad?

.nunique() is computationally inefficient for checking if a Series is constant.

Consider, for example, a Series of length n that consists of increasing integer values (e.g., 1, 2, 3, 4). The .nunique() method will iterate over the entire Series to count the number of unique values. But in this case, we can detect that the Series is non-constant after visiting the first two values, which are non-equal.

In general, .nunique() requires iterating over the entire Series, while a more efficient approach allows short-circuiting the operation as soon as a non-equal value is found.

Instead of calling .nunique(), convert the Series to a NumPy array, and check if all values in the array are equal to the first observed value.

Example

import pandas as pd

data = pd.Series(range(1000))
if data.nunique() <= 1:
    print("Series is constant")

Use instead:

import pandas as pd

data = pd.Series(range(1000))
array = data.to_numpy()
if array.shape[0] == 0 or (array[0] == array).all():
    print("Series is constant")

References