[1]:
import pandas as pd
pd.set_option("display.max_rows", 5)

Arrange

This function lets you to arrange the rows of your data, through two steps…

  • choosing columns to arrange by

  • specifying an order (ascending or descending)

Below, we’ll illustrate this function with a single variable, multiple variables, and more general expressions.

[2]:
from siuba import _, arrange, select
from siuba.data import mtcars

small_mtcars = mtcars >> select(_.cyl, _.mpg, _.hp)

small_mtcars
[2]:
cyl mpg hp
0 6 21.0 110
1 6 21.0 110
... ... ... ...
30 8 15.0 335
31 4 21.4 109

32 rows × 3 columns

Arranging rows by a single variable

The simplest way to use arrange is to specify a column name. The arrange function uses pandas.sort_values under the hood, and arranges rows in ascending order.

For example, the code below arranges the rows from least to greatest horsepower (hp).

[3]:
# simple arrange of 1 var
small_mtcars >> arrange(_.hp)
[3]:
cyl mpg hp
18 4 30.4 52
7 4 24.4 62
... ... ... ...
28 8 15.8 264
30 8 15.0 335

32 rows × 3 columns

If you add a - before a column or expression, arrange will sort the rows in descending order. This applies to all types of columns, including arrays of strings and categories!

[4]:
small_mtcars >> arrange(-_.hp)
[4]:
cyl mpg hp
30 8 15.0 335
28 8 15.8 264
... ... ... ...
7 4 24.4 62
18 4 30.4 52

32 rows × 3 columns

Arranging rows by multiple variables

When arrange receives multiple arguments, it sorts so that the one specified first changes the slowest, followed by the second, and so on.

[5]:
small_mtcars >> arrange(_.cyl, _.mpg)
[5]:
cyl mpg hp
31 4 21.4 109
20 4 21.5 97
... ... ... ...
4 8 18.7 175
24 8 19.2 175

32 rows × 3 columns

[6]:
small_mtcars >> arrange(_.cyl, -_.mpg)
[6]:
cyl mpg hp
19 4 33.9 65
17 4 32.4 66
... ... ... ...
14 8 10.4 205
15 8 10.4 215

32 rows × 3 columns

Expressions

You can also arrange the rows of your data using more complex expressions, similar to those you would use in a mutate.

For example, the code below sorts by horsepower (hp) per cylindar (cyl).

[7]:
small_mtcars >> arrange(_.hp / _.cyl)
[7]:
cyl mpg hp
18 4 30.4 52
7 4 24.4 62
... ... ... ...
28 8 15.8 264
30 8 15.0 335

32 rows × 3 columns

Arranging Categorical series

Note that when arranging a categorical series, it will be arranged in the order of its categories. For example, the DataFrame below consists of a category with three entries.

[8]:
df = pd.DataFrame({
    "x_cat": pd.Categorical(["c", "b", "a"])
    })

df
[8]:
x_cat
0 c
1 b
2 a

While the values of the category go from “c” to “a”, the default levels of a categorical are already sorted, so go from “a” to “c”. This can be seen in the very last line of output below.

[9]:
df.x_cat
[9]:
0    c
1    b
2    a
Name: x_cat, dtype: category
Categories (3, object): ['a', 'b', 'c']

Since pd.sort_values would sort the categorical according to the order listed under “Categories”, arrange does this also.

[10]:
df >> arrange(_.x_cat)
[10]:
x_cat
2 a
1 b
0 c

This means that if reorder the categories, the arrange will follow that reordering!

[11]:
from siuba.dply.forcats import fct_rev

df["rev_x_cat"] = fct_rev(df.x_cat)
df.rev_x_cat
[11]:
0    c
1    b
2    a
Name: rev_x_cat, dtype: category
Categories (3, object): ['c', 'b', 'a']
[12]:
df >> arrange(_.rev_x_cat)
[12]:
x_cat rev_x_cat
0 c c
1 b b
2 a a

Edit page on github here. Interactive version: Binder badge