---
jupyter:
  jupytext:
    text_representation:
      extension: .Rmd
      format_name: rmarkdown
      format_version: '1.1'
      jupytext_version: 1.1.1
  kernelspec:
    display_name: Python 3
    language: python
    name: python3
---

```{python nbsphinx=hidden}
import pandas as pd
pd.set_option("display.max_rows", 5)
```

## Arrange

This function lets you to arrange the rows of your data, through two steps...

* choosing columns to arrange by
* specifying an order (ascending or descending)

Below, we'll illustrate this function with a single variable, multiple variables, and more general expressions.

```{python}
from siuba import _, arrange, select
from siuba.data import mtcars

small_mtcars = mtcars >> select(_.cyl, _.mpg, _.hp)

small_mtcars
```

### Arranging rows by a single variable


The simplest way to use arrange is to specify a column name. The `arrange` function uses `pandas.sort_values` under the hood, and arranges rows in ascending order.

For example, the code below arranges the rows from least to greatest horsepower (`hp`).

```{python}
# simple arrange of 1 var
small_mtcars >> arrange(_.hp)
```

If you add a `-` before a column or expression, `arrange` will sort the rows in descending order. This applies to all types of columns, including arrays of strings and categories!

```{python}
small_mtcars >> arrange(-_.hp)
```

### Arranging rows by multiple variables


When arrange receives multiple arguments, it sorts so that the one specified first changes the slowest, followed by the second, and so on.

```{python}
small_mtcars >> arrange(_.cyl, _.mpg)
```

```{python}
small_mtcars >> arrange(_.cyl, -_.mpg)
```

### Expressions


You can also `arrange` the rows of your data using more complex expressions, similar to those you would use in a `mutate`.

For example, the code below sorts by horsepower (`hp`) per cylindar (`cyl`).

```{python}
small_mtcars >> arrange(_.hp / _.cyl)
```

#### Arranging Categorical series


Note that when arranging a categorical series, it will be arranged in the order of its categories. For example, the DataFrame below consists of a category with three entries.

```{python}
df = pd.DataFrame({
    "x_cat": pd.Categorical(["c", "b", "a"])
    })

df
```

While the values of the category go from "c" to "a", the default levels of a categorical are already sorted, so go from "a" to "c". This can be seen in the very last line of output below.

```{python}
df.x_cat
```

Since `pd.sort_values` would sort the categorical according to the order listed under "Categories", arrange does this also.

```{python}
df >> arrange(_.x_cat)
```

This means that if reorder the categories, the arrange will follow that reordering!

```{python}
from siuba.dply.forcats import fct_rev

df["rev_x_cat"] = fct_rev(df.x_cat)
df.rev_x_cat
```

```{python}
df >> arrange(_.rev_x_cat)
```