[1]:

import pandas as pd
pd.set_option("display.max_rows", 5)


# Summarize¶

This function lets you define a new column in your data, which is a single number calculated either across all the data, or within specified groups. It will result in a DataFrame with as many rows as the number of unique groups, or if no groups are defined, one row.

[2]:

from siuba import _, group_by, summarize
from siuba.data import mtcars


## Summarize over everything¶

When you use summarize with an ungrouped DataFrame, the result is a single row.

[3]:

mtcars >> summarize(avg_mpg = _.mpg.mean())

[3]:

avg_mpg
0 20.090625

## Summarizing per group¶

When you use summarize with a grouped DataFrame, the result has the same number of rows as there are groups in the data. For example, there are 3 values of cylinders (cyl) a row can have (4, 6, or 8), so the result will be 3 rows.

[4]:

(mtcars
>> group_by(_.cyl)
>> summarize(avg_mpg = _.mpg.mean())
)

[4]:

cyl avg_mpg
0 4 26.663636
1 6 19.742857
2 8 15.100000

Note that summarize also accepts a single value, like a string or number.

[5]:

(mtcars
>> group_by(_.cyl)
>> summarize(
measure = "mean miles per gallon",
value = _.mpg.mean()
)
)

[5]:

cyl measure value
0 4 mean miles per gallon 26.663636
1 6 mean miles per gallon 19.742857
2 8 mean miles per gallon 15.100000

Edit page on github here. Interactive version: