[1]:

import pandas as pd
pd.set_option("display.max_rows", 5)

Arrange¶

This function lets you to arrange the rows of your data, through two steps…

choosing columns to arrange by
specifying an order (ascending or descending)

Below, we’ll illustrate this function with a single variable, multiple variables, and more general expressions.

[2]:

from siuba import _, arrange, select
from siuba.data import mtcars

small_mtcars = mtcars >> select(_.cyl, _.mpg, _.hp)

small_mtcars

[2]:

	cyl	mpg	hp
0	6	21.0	110
1	6	21.0	110
...	...	...	...
30	8	15.0	335
31	4	21.4	109

32 rows × 3 columns

Arranging rows by a single variable¶

The simplest way to use arrange is to specify a column name. The arrange function uses pandas.sort_values under the hood, and arranges rows in ascending order.

For example, the code below arranges the rows from least to greatest horsepower (hp).

[3]:

# simple arrange of 1 var
small_mtcars >> arrange(_.hp)

[3]:

	cyl	mpg	hp
18	4	30.4	52
7	4	24.4	62
...	...	...	...
28	8	15.8	264
30	8	15.0	335

32 rows × 3 columns

If you add a - before a column or expression, arrange will sort the rows in descending order. This applies to all types of columns, including arrays of strings and categories!

[4]:

small_mtcars >> arrange(-_.hp)

[4]:

	cyl	mpg	hp
30	8	15.0	335
28	8	15.8	264
...	...	...	...
7	4	24.4	62
18	4	30.4	52

32 rows × 3 columns

Arranging rows by multiple variables¶

When arrange receives multiple arguments, it sorts so that the one specified first changes the slowest, followed by the second, and so on.

[5]:

small_mtcars >> arrange(_.cyl, _.mpg)

[5]:

	cyl	mpg	hp
31	4	21.4	109
20	4	21.5	97
...	...	...	...
4	8	18.7	175
24	8	19.2	175

32 rows × 3 columns

[6]:

small_mtcars >> arrange(_.cyl, -_.mpg)

[6]:

	cyl	mpg	hp
19	4	33.9	65
17	4	32.4	66
...	...	...	...
14	8	10.4	205
15	8	10.4	215

32 rows × 3 columns

Expressions¶

You can also arrange the rows of your data using more complex expressions, similar to those you would use in a mutate.

For example, the code below sorts by horsepower (hp) per cylindar (cyl).

[7]:

small_mtcars >> arrange(_.hp / _.cyl)

[7]:

	cyl	mpg	hp
18	4	30.4	52
7	4	24.4	62
...	...	...	...
28	8	15.8	264
30	8	15.0	335

32 rows × 3 columns

Arranging Categorical series¶

Note that when arranging a categorical series, it will be arranged in the order of its categories. For example, the DataFrame below consists of a category with three entries.

[8]:

df = pd.DataFrame({
    "x_cat": pd.Categorical(["c", "b", "a"])
    })

df

[8]:

	x_cat
0	c
1	b
2	a

While the values of the category go from “c” to “a”, the default levels of a categorical are already sorted, so go from “a” to “c”. This can be seen in the very last line of output below.

[9]:

df.x_cat

[9]:

0    c
1    b
2    a
Name: x_cat, dtype: category
Categories (3, object): ['a', 'b', 'c']

Since pd.sort_values would sort the categorical according to the order listed under “Categories”, arrange does this also.

[10]:

df >> arrange(_.x_cat)

[10]:

	x_cat
2	a
1	b
0	c

This means that if reorder the categories, the arrange will follow that reordering!

[11]:

from siuba.dply.forcats import fct_rev

df["rev_x_cat"] = fct_rev(df.x_cat)
df.rev_x_cat

[11]:

0    c
1    b
2    a
Name: rev_x_cat, dtype: category
Categories (3, object): ['c', 'b', 'a']

[12]:

df >> arrange(_.rev_x_cat)

[12]:

	x_cat	rev_x_cat
0	c	c
1	b	b
2	a	a

Edit page on github here. Interactive version:

siuba

Navigation

Related Topics

Arrange¶

Arranging rows by a single variable¶

Arranging rows by multiple variables¶

Expressions¶

Arranging Categorical series¶