siuba.dply.vector

alias_series_agg

siuba.dply.vector.alias_series_agg(name)

cumall

siuba.dply.vector.cumall(x)
siuba.dply.vector.cumall(x: pandas.core.series.Series)
siuba.dply.vector.cumall(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.cumall(__data: siuba.siu.Call, *args, **kwargs)

Return a same-length array. For each entry, indicates whether that entry and all previous are True-like.

Example

>>> cumall(pd.Series([True, False, False]))
0     True
1    False
2    False
dtype: bool

cumany

siuba.dply.vector.cumany(x)
siuba.dply.vector.cumany(x: pandas.core.series.Series)
siuba.dply.vector.cumany(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.cumany(__data: siuba.siu.Call, *args, **kwargs)

Return a same-length array. For each entry, indicates whether that entry or any previous are True-like.

Example

>>> cumany(pd.Series([False, True, False]))
0    False
1     True
2     True
dtype: bool

cummean

siuba.dply.vector.cummean(x)
siuba.dply.vector.cummean(x: pandas.core.series.Series)
siuba.dply.vector.cummean(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.cummean(__data: siuba.siu.Call, *args, **kwargs)
siuba.dply.vector.cummean(x: pandas.core.groupby.generic.SeriesGroupBy)

Return a same-length array, containing the cumulative mean.

desc

siuba.dply.vector.desc(x)
siuba.dply.vector.desc(x: pandas.core.series.Series)
siuba.dply.vector.desc(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.desc(__data: siuba.siu.Call, *args, **kwargs)

Return array sorted in descending order.

dense_rank

siuba.dply.vector.dense_rank(x, na_option='keep')
siuba.dply.vector.dense_rank(x: pandas.core.series.Series, na_option='keep')
siuba.dply.vector.dense_rank(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.dense_rank(__data: siuba.siu.Call, *args, **kwargs)

Return the dense rank.

This method of ranking returns values ranging from 1 to the number of unique entries. Ties are all given the same ranking.

Example

>>> dense_rank(pd.Series([1,3,3,5]))
0    1.0
1    2.0
2    2.0
3    3.0
dtype: float64

percent_rank

siuba.dply.vector.percent_rank(x, na_option='keep')
siuba.dply.vector.percent_rank(x: pandas.core.series.Series, na_option='keep')
siuba.dply.vector.percent_rank(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.percent_rank(__data: siuba.siu.Call, *args, **kwargs)

Return the percent rank.

Note

Uses minimum rank, and reports the proportion of unique ranks each entry is greater than.

Examples

>>> percent_rank(pd.Series([1, 2, 3]))
0    0.0
1    0.5
2    1.0
dtype: float64
>>> percent_rank(pd.Series([1, 2, 2]))
0    0.0
1    0.5
2    0.5
dtype: float64
>>> percent_rank(pd.Series([1]))
0   NaN
dtype: float64

min_rank

siuba.dply.vector.min_rank(x, na_option='keep')
siuba.dply.vector.min_rank(x: pandas.core.series.Series, na_option='keep')
siuba.dply.vector.min_rank(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.min_rank(__data: siuba.siu.Call, *args, **kwargs)

Return the min rank. See pd.Series.rank with method=”min” for details.

cume_dist

siuba.dply.vector.cume_dist(x, na_option='keep')
siuba.dply.vector.cume_dist(x: pandas.core.series.Series, na_option='keep')
siuba.dply.vector.cume_dist(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.cume_dist(__data: siuba.siu.Call, *args, **kwargs)

Return the cumulative distribution corresponding to each value in x.

This reflects the proportion of values that are less than or equal to each value.

row_number

siuba.dply.vector.row_number(x)
siuba.dply.vector.row_number(x: pandas.core.generic.NDFrame)
siuba.dply.vector.row_number(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.row_number(__data: siuba.siu.Call, *args, **kwargs)
siuba.dply.vector.row_number(g: pandas.core.groupby.groupby.GroupBy) → pandas.core.groupby.groupby.GroupBy

Return the row number (position) for each value in x, beginning with 1.

Example

>>> ser = pd.Series([7,8])
>>> row_number(ser)
0    1
1    2
dtype: int64
>>> row_number(pd.DataFrame({'a': ser}))
0    1
1    2
dtype: int64
>>> row_number(pd.Series([7,8], index = [3, 4]))
3    1
4    2
dtype: int64

ntile

siuba.dply.vector.ntile(x, n)
siuba.dply.vector.ntile(x: pandas.core.series.Series, n)
siuba.dply.vector.ntile(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.ntile(__data: siuba.siu.Call, *args, **kwargs)

TODO: Not Implemented

between

siuba.dply.vector.between(x, left, right, default=False)
siuba.dply.vector.between(x: pandas.core.series.Series, left, right, default=False)
siuba.dply.vector.between(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.between(__data: siuba.siu.Call, *args, **kwargs)

Return whether a value is between left and right (including either side).

Example

>>> between(pd.Series([1,2,3]), 0, 2)
0     True
1     True
2    False
dtype: bool

Note

This is a thin wrapper around pd.Series.between(left, right)

coalesce

siuba.dply.vector.coalesce(x, *args)
siuba.dply.vector.coalesce(x: pandas.core.series.Series, *args)
siuba.dply.vector.coalesce(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.coalesce(__data: siuba.siu.Call, *args, **kwargs)

Returns a copy of x, with NaN values filled in from *args. Ignores indexes.

Parameters
  • x – a pandas Series object

  • *args – other Series that are the same length as x, or a scalar

Examples

>>> x = pd.Series([1., None, None])
>>> abc = pd.Series(['a', 'b', None])
>>> xyz = pd.Series(['x', 'y', 'z'])
>>> coalesce(x, abc)
0       1
1       b
2    None
dtype: object
>>> coalesce(x, abc, xyz)
0    1
1    b
2    z
dtype: object

lead

siuba.dply.vector.lead(x, n=1, default=None)
siuba.dply.vector.lead(x: pandas.core.series.Series, n=1, default=None)
siuba.dply.vector.lead(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.lead(__data: siuba.siu.Call, *args, **kwargs)
siuba.dply.vector.lead(x: pandas.core.groupby.generic.SeriesGroupBy, n=1, default=None)

Return an array with each value replaced by the next (or further forward) value in the array.

Parameters
  • x – a pandas Series object

  • n – number of next values forward to replace each value with

  • default – what to replace the n final values of the array with

Example

>>> lead(pd.Series([1,2,3]), n=1)
0    2.0
1    3.0
2    NaN
dtype: float64
>>> lead(pd.Series([1,2,3]), n=1, default = 99)
0     2
1     3
2    99
dtype: int64

lag

siuba.dply.vector.lag(x, n=1, default=None)
siuba.dply.vector.lag(x: pandas.core.series.Series, n=1, default=None)
siuba.dply.vector.lag(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.lag(__data: siuba.siu.Call, *args, **kwargs)
siuba.dply.vector.lag(x: pandas.core.groupby.generic.SeriesGroupBy, n=1, default=None)

Return an array with each value replaced by the previous (or further backward) value in the array.

Parameters
  • x – a pandas Series object

  • n – number of next values backward to replace each value with

  • default – what to replace the n final values of the array with

Example

>>> lag(pd.Series([1,2,3]), n=1)
0    NaN
1    1.0
2    2.0
dtype: float64
>>> lag(pd.Series([1,2,3]), n=1, default = 99)
0    99.0
1     1.0
2     2.0
dtype: float64

n

siuba.dply.vector.n(x)
siuba.dply.vector.n(x: pandas.core.generic.NDFrame)
siuba.dply.vector.n(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.n(__data: siuba.siu.Call, *args, **kwargs)
siuba.dply.vector.n(x: pandas.core.groupby.groupby.GroupBy) → siuba.experimental.pd_groups.groupby.GroupByAgg

Return the total number of elements in the array (or rows in a DataFrame).

Example

>>> ser = pd.Series([1,2,3])
>>> n(ser)
3
>>> df = pd.DataFrame({'x': ser})
>>> n(df)
3

n_distinct

siuba.dply.vector.n_distinct(x)
siuba.dply.vector.n_distinct(x: pandas.core.series.Series)
siuba.dply.vector.n_distinct(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.n_distinct(__data: siuba.siu.Call, *args, **kwargs)
siuba.dply.vector.n_distinct(__ser: pandas.core.groupby.generic.SeriesGroupBy, *args, **kwargs) → siuba.experimental.pd_groups.groupby.GroupByAgg

Return the total number of distinct (i.e. unique) elements in an array.

Example

>>> n_distinct(pd.Series([1,1,2,2]))
2

na_if

siuba.dply.vector.na_if(x, y)
siuba.dply.vector.na_if(x: pandas.core.series.Series, y)
siuba.dply.vector.na_if(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.na_if(__data: siuba.siu.Call, *args, **kwargs)

Return a array like x, but with values in y replaced by NAs.

Examples

>>> na_if(pd.Series([1,2,3]), [1,3])
0    NaN
1    2.0
2    NaN
dtype: float64

near

siuba.dply.vector.near(x)
siuba.dply.vector.near(x: pandas.core.series.Series)
siuba.dply.vector.near(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.near(__data: siuba.siu.Call, *args, **kwargs)

TODO: Not Implemented

nth

siuba.dply.vector.nth(x, n, order_by=None, default=None)
siuba.dply.vector.nth(x: pandas.core.series.Series, n, order_by=None, default=None)
siuba.dply.vector.nth(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.nth(__data: siuba.siu.Call, *args, **kwargs)

Return the nth entry of x. Similar to x[n].

Note

first(x) and last(x) are nth(x, 0) and nth(x, -1).

Parameters
  • x – series to get entry from.

  • n – position of entry to get from x (0 indicates first entry).

  • order_by – optional Series used to reorder x.

  • default – (not implemented) value to return if no entry at n.

Examples

>>> ser = pd.Series(['a', 'b', 'c'])
>>> nth(ser, 1)
'b'
>>> sorter = pd.Series([1, 2, 0])
>>> nth(ser, 1, order_by = sorter)
'a'
>>> nth(ser, 0), nth(ser, -1)
('a', 'c')
>>> first(ser), last(ser)
('a', 'c')

first

siuba.dply.vector.first(x: pandas.core.series.Series, n, order_by=None, default=None)
siuba.dply.vector.first(x: pandas.core.series.Series, n, order_by=None, default=None)
siuba.dply.vector.first(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.first(__data: siuba.siu.Call, *args, **kwargs)

Return the nth entry of x. Similar to x[n].

Note

first(x) and last(x) are nth(x, 0) and nth(x, -1).

Parameters
  • x – series to get entry from.

  • n – position of entry to get from x (0 indicates first entry).

  • order_by – optional Series used to reorder x.

  • default – (not implemented) value to return if no entry at n.

Examples

>>> ser = pd.Series(['a', 'b', 'c'])
>>> nth(ser, 1)
'b'
>>> sorter = pd.Series([1, 2, 0])
>>> nth(ser, 1, order_by = sorter)
'a'
>>> nth(ser, 0), nth(ser, -1)
('a', 'c')
>>> first(ser), last(ser)
('a', 'c')

last

siuba.dply.vector.last(x: pandas.core.series.Series, n, order_by=None, default=None)
siuba.dply.vector.last(x: pandas.core.series.Series, n, order_by=None, default=None)
siuba.dply.vector.last(__data: siuba.siu.Symbolic, *args, **kwargs)
siuba.dply.vector.last(__data: siuba.siu.Call, *args, **kwargs)

Return the nth entry of x. Similar to x[n].

Note

first(x) and last(x) are nth(x, 0) and nth(x, -1).

Parameters
  • x – series to get entry from.

  • n – position of entry to get from x (0 indicates first entry).

  • order_by – optional Series used to reorder x.

  • default – (not implemented) value to return if no entry at n.

Examples

>>> ser = pd.Series(['a', 'b', 'c'])
>>> nth(ser, 1)
'b'
>>> sorter = pd.Series([1, 2, 0])
>>> nth(ser, 1, order_by = sorter)
'a'
>>> nth(ser, 0), nth(ser, -1)
('a', 'c')
>>> first(ser), last(ser)
('a', 'c')