--- jupyter: jupytext: text_representation: extension: .Rmd format_name: rmarkdown format_version: '1.2' jupytext_version: 1.4.2 kernelspec: display_name: Python 3 language: python name: python3 --- ```{python nbsphinx=hidden} import pandas as pd pd.set_option("display.max_rows", 5) ``` # SQL backend ## Step 1: Column Translation Column translation requires three pieces: 1. **Locals:** Functions for creating the sqlalchemy clause corresponding to an operation. 2. **Column Data:** Classes representing columns under normal and aggregate settings. 3. **Translator:** A class that can take a symbolic expression (e.g. `_.x.mean()`) and return the correct sqlachemy clause. ```{python} # Column data ================================================================= from siuba.sql.translate import SqlColumn, SqlColumnAgg # used if you want to add a scalar or window translation # (eg. something that gets used in mutate) class WowSqlColumn(SqlColumn): pass # used if you want to add a aggregate translation # (eg. something that gets used in a query with a GROUP BY clause) class WowSqlColumnAgg(WowSqlColumn, SqlColumn): pass # Locals ====================================================================== from siuba.sql.translate import ( win_over, win_agg, sql_agg, sql_scalar, sql_colmeth, sql_not_impl, ) scalar = { "__add__": sql_colmeth("__add__"), "round": sql_scalar("round"), } window = { "rank": win_over("rank"), "mean": win_agg("mean"), } aggregation = { "rank": sql_not_impl(), "mean": sql_agg("mean"), } # Translator ================================================================== from siuba.sql.translate import SqlTranslator translator = SqlTranslator.from_mappings( scalar, window, aggregation, WowSqlColumn, WowSqlColumnAgg ) ``` ## Column Data There are two kinds of data classes, corresponding to whether the generated outermost query in the generated SQL will use a `GROUP BY` clause. ## Locals The entries of each local dictionary are functions that take a sqlalchemy.sql.ClauseElement--which is the parent class of many sqlalchemy elements--and returns a ClauseElement. ```{python} from sqlalchemy import sql expr_rank = window["rank"](sql.column("a_col")) expr_rank ``` ```{python} print(expr_rank) ``` ## Translator Below, we set up a sqlalchemy select statement in order to demonstrate the translator in action. ```{python} from siuba import _ from sqlalchemy.sql import column, select sel = select([column('x'), column('y')]) ``` Then we feed the columns to the translated call. ```{python} call_add = translator.translate(_.x + _.y) call_add(sel.columns) ``` Note that behind the scenes, the translator goes down the call tree and swaps functions like `"__add__"` with the local translations. ```{python} # the root node is __add__. shown as +. _.x + _.y ``` ```{python} # We can see this in action by calling the translation directly. scalar["__add__"](sel.columns.x, sel.columns.y) ``` By default the translate method assumes the expression is using window functions, so operations like `.mean()` return SqlAlchemy Over clauses. ```{python} f_translate = translator.translate(_.x.mean()) expr = f_translate(sel.columns) expr ``` ```{python} print(expr) ``` Keep in mind--since the translator doesn't know about grouping variables--it returns an empty over clause. This separation of concerns: * translator: handles simple column ops, including returning over clauses to be filled. * backend: handles broader contexts--like if the data has been grouped, arranged, or limited--by visiting the result of each translation. ## User defined functions Remember those column data classes we made and passed to the translator? They allow users to register custom column operation functions! Below we show a custom round function, which calls the sqlalchemy that usually corresponds to `ROUND(