--- jupyter: jupytext: text_representation: extension: .Rmd format_name: rmarkdown format_version: '1.1' jupytext_version: 1.1.1 kernelspec: display_name: Python 3 language: python name: python3 --- ```{python nbsphinx=hidden} import pandas as pd pd.set_option("display.max_rows", 5) ``` ## Summarize This function lets you define a new column in your data, which is a single number calculated either across all the data, or within specified groups. It will result in a DataFrame with as many rows as the number of unique groups, or if no groups are defined, one row. ```{python} from siuba import _, group_by, summarize from siuba.data import mtcars ``` ### Summarize over everything When you use summarize with an ungrouped DataFrame, the result is a single row. ```{python} mtcars >> summarize(avg_mpg = _.mpg.mean()) ``` ### Summarizing per group When you use summarize with a grouped DataFrame, the result has the same number of rows as there are groups in the data. For example, there are 3 values of cylinders (`cyl`) a row can have (4, 6, or 8), so ther result will be 3 rows. ```{python} (mtcars >> group_by(_.cyl) >> summarize(avg_mpg = _.mpg.mean()) ) ``` Note that summarize also accepts a single value, like a string or number. ```{python} (mtcars >> group_by(_.cyl) >> summarize( measure = "mean miles per gallon", value = _.mpg.mean() ) ) ```