[1]:
import pandas as pd
pd.set_option("display.max_rows", 5)

Count

This function counts the number of rows that exist when grouping by one or more columns. It is equivalent to a group by followed by a summarize counting the rows of each group.

[2]:
from siuba import _, group_by, summarize, count
from siuba.data import mtcars

Specifying column to count

[3]:
# longer approach
mtcars >> group_by(_.cyl) >> summarize(n = _.cyl.size)

# shorter approach
mtcars >> count(_.cyl)
[3]:
cyl n
0 4 11
1 6 7
2 8 14

Counting multiple columns and sorting

[4]:
mtcars >> count(_.cyl, _.gear, sort = True)
[4]:
cyl gear n
0 8 3 12
1 4 4 8
... ... ... ...
6 4 3 1
7 6 5 1

8 rows × 3 columns

Note that since it’s common to want to see the groups with the highest counts, passing sort = True returns counts in descending order.

Counting expressions

As is the case with group_by, the count function accepts complex expressions, as long are they are passed as keyword arguments.

[5]:
mtcars >> count(_.cyl, many_gears = _.gear > 3)
[5]:
cyl many_gears n
0 4 False 1
1 4 True 10
... ... ... ...
4 8 False 12
5 8 True 2

6 rows × 3 columns

Mutating and counting with add_count

While count is equivalent to a group by and summarize, add_count is equivalent to group by and mutate. This means that it keeps the original data, but adds on a new column of counts.

[6]:
from siuba import add_count

mtcars >> add_count(_.cyl)
[6]:
mpg cyl disp hp drat wt qsec vs am gear carb n
0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 7
1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 7
... ... ... ... ... ... ... ... ... ... ... ... ...
30 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 14
31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 14

32 rows × 12 columns

Edit page on github here. Interactive version: Binder badge