siuba.dply.vector¶
cumall¶
-
siuba.dply.vector.
cumall
(x)¶ -
siuba.dply.vector.
cumall
(x: pandas.core.series.Series) -
siuba.dply.vector.
cumall
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
cumall
(__data: siuba.siu.Call, *args, **kwargs) Return a same-length array. For each entry, indicates whether that entry and all previous are True-like.
Example
>>> cumall(pd.Series([True, False, False])) 0 True 1 False 2 False dtype: bool
cumany¶
-
siuba.dply.vector.
cumany
(x)¶ -
siuba.dply.vector.
cumany
(x: pandas.core.series.Series) -
siuba.dply.vector.
cumany
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
cumany
(__data: siuba.siu.Call, *args, **kwargs) Return a same-length array. For each entry, indicates whether that entry or any previous are True-like.
Example
>>> cumany(pd.Series([False, True, False])) 0 False 1 True 2 True dtype: bool
cummean¶
-
siuba.dply.vector.
cummean
(x)¶ -
siuba.dply.vector.
cummean
(x: pandas.core.series.Series) -
siuba.dply.vector.
cummean
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
cummean
(__data: siuba.siu.Call, *args, **kwargs) -
siuba.dply.vector.
cummean
(x: pandas.core.groupby.generic.SeriesGroupBy) Return a same-length array, containing the cumulative mean.
desc¶
-
siuba.dply.vector.
desc
(x)¶ -
siuba.dply.vector.
desc
(x: pandas.core.series.Series) -
siuba.dply.vector.
desc
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
desc
(__data: siuba.siu.Call, *args, **kwargs) Return array sorted in descending order.
dense_rank¶
-
siuba.dply.vector.
dense_rank
(x, na_option='keep')¶ -
siuba.dply.vector.
dense_rank
(x: pandas.core.series.Series, na_option='keep') -
siuba.dply.vector.
dense_rank
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
dense_rank
(__data: siuba.siu.Call, *args, **kwargs) Return the dense rank.
This method of ranking returns values ranging from 1 to the number of unique entries. Ties are all given the same ranking.
Example
>>> dense_rank(pd.Series([1,3,3,5])) 0 1.0 1 2.0 2 2.0 3 3.0 dtype: float64
percent_rank¶
-
siuba.dply.vector.
percent_rank
(x, na_option='keep')¶ -
siuba.dply.vector.
percent_rank
(x: pandas.core.series.Series, na_option='keep') -
siuba.dply.vector.
percent_rank
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
percent_rank
(__data: siuba.siu.Call, *args, **kwargs) Return the percent rank.
Note
Uses minimum rank, and reports the proportion of unique ranks each entry is greater than.
Examples
>>> percent_rank(pd.Series([1, 2, 3])) 0 0.0 1 0.5 2 1.0 dtype: float64
>>> percent_rank(pd.Series([1, 2, 2])) 0 0.0 1 0.5 2 0.5 dtype: float64
>>> percent_rank(pd.Series([1])) 0 NaN dtype: float64
min_rank¶
-
siuba.dply.vector.
min_rank
(x, na_option='keep')¶ -
siuba.dply.vector.
min_rank
(x: pandas.core.series.Series, na_option='keep') -
siuba.dply.vector.
min_rank
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
min_rank
(__data: siuba.siu.Call, *args, **kwargs) Return the min rank. See pd.Series.rank with method=”min” for details.
cume_dist¶
-
siuba.dply.vector.
cume_dist
(x, na_option='keep')¶ -
siuba.dply.vector.
cume_dist
(x: pandas.core.series.Series, na_option='keep') -
siuba.dply.vector.
cume_dist
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
cume_dist
(__data: siuba.siu.Call, *args, **kwargs) Return the cumulative distribution corresponding to each value in x.
This reflects the proportion of values that are less than or equal to each value.
row_number¶
-
siuba.dply.vector.
row_number
(x)¶ -
siuba.dply.vector.
row_number
(x: pandas.core.generic.NDFrame) -
siuba.dply.vector.
row_number
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
row_number
(__data: siuba.siu.Call, *args, **kwargs) -
siuba.dply.vector.
row_number
(g: pandas.core.groupby.groupby.GroupBy) → pandas.core.groupby.groupby.GroupBy Return the row number (position) for each value in x, beginning with 1.
Example
>>> ser = pd.Series([7,8]) >>> row_number(ser) 0 1 1 2 dtype: int64
>>> row_number(pd.DataFrame({'a': ser})) 0 1 1 2 dtype: int64
>>> row_number(pd.Series([7,8], index = [3, 4])) 3 1 4 2 dtype: int64
ntile¶
-
siuba.dply.vector.
ntile
(x, n)¶ -
siuba.dply.vector.
ntile
(x: pandas.core.series.Series, n) -
siuba.dply.vector.
ntile
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
ntile
(__data: siuba.siu.Call, *args, **kwargs) TODO: Not Implemented
between¶
-
siuba.dply.vector.
between
(x, left, right, default=False)¶ -
siuba.dply.vector.
between
(x: pandas.core.series.Series, left, right, default=False) -
siuba.dply.vector.
between
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
between
(__data: siuba.siu.Call, *args, **kwargs) Return whether a value is between left and right (including either side).
Example
>>> between(pd.Series([1,2,3]), 0, 2) 0 True 1 True 2 False dtype: bool
Note
This is a thin wrapper around pd.Series.between(left, right)
coalesce¶
-
siuba.dply.vector.
coalesce
(x, *args)¶ -
siuba.dply.vector.
coalesce
(x: pandas.core.series.Series, *args) -
siuba.dply.vector.
coalesce
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
coalesce
(__data: siuba.siu.Call, *args, **kwargs) Returns a copy of x, with NaN values filled in from *args. Ignores indexes.
- Parameters
x – a pandas Series object
*args – other Series that are the same length as x, or a scalar
Examples
>>> x = pd.Series([1., None, None]) >>> abc = pd.Series(['a', 'b', None]) >>> xyz = pd.Series(['x', 'y', 'z']) >>> coalesce(x, abc) 0 1 1 b 2 None dtype: object
>>> coalesce(x, abc, xyz) 0 1 1 b 2 z dtype: object
lead¶
-
siuba.dply.vector.
lead
(x, n=1, default=None)¶ -
siuba.dply.vector.
lead
(x: pandas.core.series.Series, n=1, default=None) -
siuba.dply.vector.
lead
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
lead
(__data: siuba.siu.Call, *args, **kwargs) -
siuba.dply.vector.
lead
(x: pandas.core.groupby.generic.SeriesGroupBy, n=1, default=None) Return an array with each value replaced by the next (or further forward) value in the array.
- Parameters
x – a pandas Series object
n – number of next values forward to replace each value with
default – what to replace the n final values of the array with
Example
>>> lead(pd.Series([1,2,3]), n=1) 0 2.0 1 3.0 2 NaN dtype: float64
>>> lead(pd.Series([1,2,3]), n=1, default = 99) 0 2 1 3 2 99 dtype: int64
lag¶
-
siuba.dply.vector.
lag
(x, n=1, default=None)¶ -
siuba.dply.vector.
lag
(x: pandas.core.series.Series, n=1, default=None) -
siuba.dply.vector.
lag
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
lag
(__data: siuba.siu.Call, *args, **kwargs) -
siuba.dply.vector.
lag
(x: pandas.core.groupby.generic.SeriesGroupBy, n=1, default=None) Return an array with each value replaced by the previous (or further backward) value in the array.
- Parameters
x – a pandas Series object
n – number of next values backward to replace each value with
default – what to replace the n final values of the array with
Example
>>> lag(pd.Series([1,2,3]), n=1) 0 NaN 1 1.0 2 2.0 dtype: float64
>>> lag(pd.Series([1,2,3]), n=1, default = 99) 0 99.0 1 1.0 2 2.0 dtype: float64
n¶
-
siuba.dply.vector.
n
(x)¶ -
siuba.dply.vector.
n
(x: pandas.core.generic.NDFrame) -
siuba.dply.vector.
n
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
n
(__data: siuba.siu.Call, *args, **kwargs) -
siuba.dply.vector.
n
(x: pandas.core.groupby.groupby.GroupBy) → siuba.experimental.pd_groups.groupby.GroupByAgg Return the total number of elements in the array (or rows in a DataFrame).
Example
>>> ser = pd.Series([1,2,3]) >>> n(ser) 3
>>> df = pd.DataFrame({'x': ser}) >>> n(df) 3
n_distinct¶
-
siuba.dply.vector.
n_distinct
(x)¶ -
siuba.dply.vector.
n_distinct
(x: pandas.core.series.Series) -
siuba.dply.vector.
n_distinct
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
n_distinct
(__data: siuba.siu.Call, *args, **kwargs) -
siuba.dply.vector.
n_distinct
(__ser: pandas.core.groupby.generic.SeriesGroupBy, *args, **kwargs) → siuba.experimental.pd_groups.groupby.GroupByAgg Return the total number of distinct (i.e. unique) elements in an array.
Example
>>> n_distinct(pd.Series([1,1,2,2])) 2
na_if¶
-
siuba.dply.vector.
na_if
(x, y)¶ -
siuba.dply.vector.
na_if
(x: pandas.core.series.Series, y) -
siuba.dply.vector.
na_if
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
na_if
(__data: siuba.siu.Call, *args, **kwargs) Return a array like x, but with values in y replaced by NAs.
Examples
>>> na_if(pd.Series([1,2,3]), [1,3]) 0 NaN 1 2.0 2 NaN dtype: float64
near¶
-
siuba.dply.vector.
near
(x)¶ -
siuba.dply.vector.
near
(x: pandas.core.series.Series) -
siuba.dply.vector.
near
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
near
(__data: siuba.siu.Call, *args, **kwargs) TODO: Not Implemented
nth¶
-
siuba.dply.vector.
nth
(x, n, order_by=None, default=None)¶ -
siuba.dply.vector.
nth
(x: pandas.core.series.Series, n, order_by=None, default=None) -
siuba.dply.vector.
nth
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
nth
(__data: siuba.siu.Call, *args, **kwargs) Return the nth entry of x. Similar to x[n].
Note
first(x) and last(x) are nth(x, 0) and nth(x, -1).
- Parameters
x – series to get entry from.
n – position of entry to get from x (0 indicates first entry).
order_by – optional Series used to reorder x.
default – (not implemented) value to return if no entry at n.
Examples
>>> ser = pd.Series(['a', 'b', 'c']) >>> nth(ser, 1) 'b'
>>> sorter = pd.Series([1, 2, 0]) >>> nth(ser, 1, order_by = sorter) 'a'
>>> nth(ser, 0), nth(ser, -1) ('a', 'c')
>>> first(ser), last(ser) ('a', 'c')
first¶
-
siuba.dply.vector.
first
(x: pandas.core.series.Series, n, order_by=None, default=None)¶ -
siuba.dply.vector.
first
(x: pandas.core.series.Series, n, order_by=None, default=None) -
siuba.dply.vector.
first
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
first
(__data: siuba.siu.Call, *args, **kwargs) Return the nth entry of x. Similar to x[n].
Note
first(x) and last(x) are nth(x, 0) and nth(x, -1).
- Parameters
x – series to get entry from.
n – position of entry to get from x (0 indicates first entry).
order_by – optional Series used to reorder x.
default – (not implemented) value to return if no entry at n.
Examples
>>> ser = pd.Series(['a', 'b', 'c']) >>> nth(ser, 1) 'b'
>>> sorter = pd.Series([1, 2, 0]) >>> nth(ser, 1, order_by = sorter) 'a'
>>> nth(ser, 0), nth(ser, -1) ('a', 'c')
>>> first(ser), last(ser) ('a', 'c')
last¶
-
siuba.dply.vector.
last
(x: pandas.core.series.Series, n, order_by=None, default=None)¶ -
siuba.dply.vector.
last
(x: pandas.core.series.Series, n, order_by=None, default=None) -
siuba.dply.vector.
last
(__data: siuba.siu.Symbolic, *args, **kwargs) -
siuba.dply.vector.
last
(__data: siuba.siu.Call, *args, **kwargs) Return the nth entry of x. Similar to x[n].
Note
first(x) and last(x) are nth(x, 0) and nth(x, -1).
- Parameters
x – series to get entry from.
n – position of entry to get from x (0 indicates first entry).
order_by – optional Series used to reorder x.
default – (not implemented) value to return if no entry at n.
Examples
>>> ser = pd.Series(['a', 'b', 'c']) >>> nth(ser, 1) 'b'
>>> sorter = pd.Series([1, 2, 0]) >>> nth(ser, 1, order_by = sorter) 'a'
>>> nth(ser, 0), nth(ser, -1) ('a', 'c')
>>> first(ser), last(ser) ('a', 'c')