[1]:
import pandas as pd
pd.set_option("display.max_rows", 5)
Count¶
This function counts the number of rows that exist when grouping by one or more columns. It is equivalent to a group by followed by a summarize counting the rows of each group.
[2]:
from siuba import _, group_by, summarize, count
from siuba.data import mtcars
Specifying column to count¶
[3]:
# longer approach
mtcars >> group_by(_.cyl) >> summarize(n = _.cyl.size)
# shorter approach
mtcars >> count(_.cyl)
[3]:
cyl | n | |
---|---|---|
0 | 4 | 11 |
1 | 6 | 7 |
2 | 8 | 14 |
Counting multiple columns and sorting¶
[4]:
mtcars >> count(_.cyl, _.gear, sort = True)
[4]:
cyl | gear | n | |
---|---|---|---|
0 | 8 | 3 | 12 |
1 | 4 | 4 | 8 |
... | ... | ... | ... |
6 | 4 | 3 | 1 |
7 | 6 | 5 | 1 |
8 rows × 3 columns
Note that since it’s common to want to see the groups with the highest counts, passing sort = True
returns counts in descending order.
Counting expressions¶
As is the case with group_by
, the count
function accepts complex expressions, as long are they are passed as keyword arguments.
[5]:
mtcars >> count(_.cyl, many_gears = _.gear > 3)
[5]:
cyl | many_gears | n | |
---|---|---|---|
0 | 4 | False | 1 |
1 | 4 | True | 10 |
... | ... | ... | ... |
4 | 8 | False | 12 |
5 | 8 | True | 2 |
6 rows × 3 columns
Mutating and counting with add_count
¶
While count
is equivalent to a group by and summarize, add_count
is equivalent to group by and mutate. This means that it keeps the original data, but adds on a new column of counts.
[6]:
from siuba import add_count
mtcars >> add_count(_.cyl)
[6]:
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | n | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 7 |
1 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 7 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
30 | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 | 14 |
31 | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 | 14 |
32 rows × 12 columns
Edit page on github here. Interactive version: