Trino
Install
Install Ibis and dependencies for the Trino backend:
Install with the trino
extra:
pip install 'ibis-framework[trino]'
And connect:
import ibis
= ibis.trino.connect() con
- 1
- Adjust connection parameters as needed.
Install for Trino:
conda install -c conda-forge ibis-trino
And connect:
import ibis
= ibis.trino.connect() con
- 1
- Adjust connection parameters as needed.
Install for Trino:
mamba install -c conda-forge ibis-trino
And connect:
import ibis
= ibis.trino.connect() con
- 1
- Adjust connection parameters as needed.
Connect
ibis.trino.connect
= ibis.trino.connect(
con ="user",
user="password",
password=8080,
port="database",
database="default",
schema )
ibis.trino.connect
is a thin wrapper around ibis.backends.trino.Backend.do_connect
.”
Connection Parameters
do_connect
do_connect(self, user='user', password=None, host='localhost', port=8080, database=None, schema=None, source=None, timezone='UTC', **kwargs)
Connect to Trino.
Parameters
Name | Type | Description | Default |
---|---|---|---|
user |
str | Username to connect with | 'user' |
password |
str | None | Password to connect with | None |
host |
str | Hostname of the Trino server | 'localhost' |
port |
int | Port of the Trino server | 8080 |
database |
str | None | Catalog to use on the Trino server | None |
schema |
str | None | Schema to use on the Trino server | None |
source |
str | None | Application name passed to Trino | None |
timezone |
str | Timezone to use for the connection | 'UTC' |
kwargs |
Additional keyword arguments passed directly to the trino.dbapi.connect API. |
{} |
Examples
>>> catalog = "hive"
>>> schema = "default"
Connect using a URL, with the default user, password, host and port
>>> con = ibis.connect(f"trino:///{catalog}/{schema}")
Connect using a URL
>>> con = ibis.connect(f"trino://user:password@host:port/{catalog}/{schema}")
Connect using keyword arguments
>>> con = ibis.trino.connect(database=catalog, schema=schema)
>>> con = ibis.trino.connect(database=catalog, schema=schema, source="my-app")
Authenticating with SSO
Ibis supports connecting to SSO-enabled Trino clusters using the OAuth2Authentication
helper from the trino
library.
import ibis
from trino.auth import OAuth2Authentication
= ibis.trino.connect(
con ="user",
user="hostname",
host=443,
port="database",
database="default",
schema=OAuth2Authentication(),
auth="https"
http_scheme )
Connecting to Starburst managed Trino instances
Starburst makes use of role-based access controls. When connecting to a Starburst Trino cluster, if you encounter issues listing or connecting to tables, ensure that a role is specified using the roles
keyword.
import ibis
= ibis.trino.connect(
con ="user",
user="hostname",
host=443,
port="sample",
database="demo",
schema="defaultrolewithtableaccess",
roles="https"
http_scheme )
- 1
- Role names will be visible in the Starburst Galaxy dashboard.
Finding your Starburst host
Log into Starburst Galaxy and select Clusters
from the left-hand-side menu:
Select Connection info
for the cluster you wish to connect to – the username and hostname displayed can be copied directly into the Ibis connect
call.
trino.Backend
add_operation
add_operation(self, operation)
Add a translation function to the backend for a specific operation.
Operations are defined in ibis.expr.operations
, and a translation function receives the translator object and an expression as parameters, and returns a value depending on the backend.
begin
begin(self)
compile
compile(self, expr, limit=None, params=None, pretty=False, **kwargs)
Compile an Ibis expression to a SQL string.
connect
connect(self, *args, **kwargs)
Connect to the database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
*args |
Mandatory connection parameters, see the docstring of do_connect for details. |
() |
|
**kwargs |
Extra connection parameters, see the docstring of do_connect for details. |
{} |
Notes
This creates a new backend instance with saved args
and kwargs
, then calls reconnect
and finally returns the newly created and connected backend instance.
Returns
Type | Description |
---|---|
ibis.backends.BaseBackend | An instance of the backend |
create_database
create_database(self, name, catalog=None, force=False)
Create a database named name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the database to create. | required |
catalog |
str | None | Name of the catalog in which to create the database. If None , the current catalog is used. |
None |
force |
bool | If False , an exception is raised if the database exists. |
False |
create_schema
create_schema(self, name, database=None, force=False)
create_table
create_table(self, name, obj=None, *, schema=None, database=None, temp=False, overwrite=False, comment=None, properties=None)
Create a table in Trino.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the table to create | required |
obj |
pandas.pandas.DataFrame | pyarrow.pyarrow.Table | ibis.expr.types.ibis.expr.types.Table | None | The data with which to populate the table; optional, but one of obj or schema must be specified |
None |
schema |
ibis.expr.schema.ibis.expr.schema.Schema | None | The schema of the table to create; optional, but one of obj or schema must be specified |
None |
database |
str | None | Not yet implemented. | None |
temp |
bool | This parameter is not yet supported in the Trino backend, because Trino doesn’t implement temporary tables | False |
overwrite |
bool | If True , replace the table if it already exists, otherwise fail if the table exists |
False |
comment |
str | None | Add a comment to the table | None |
properties |
collections.abc.Mapping[str, typing.Any] | None | Table properties to set on creation | None |
create_view
create_view(self, name, obj, *, database=None, schema=None, overwrite=False)
disconnect
disconnect(self)
drop_database
drop_database(self, name, catalog=None, force=False)
Drop the database with name
in catalog
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Name of the schema to drop. | required |
catalog |
str | None | Name of the catalog to drop the database from. If None , the current catalog is used. |
None |
force |
bool | If False , an exception is raised if the database does not exist. |
False |
drop_schema
drop_schema(self, name, database=None, force=False)
drop_table
drop_table(self, name, database=None, force=False)
drop_view
drop_view(self, name, *, database=None, schema=None, force=False)
execute
execute(self, expr, params=None, limit='default', **kwargs)
Execute an expression.
get_schema
get_schema(self, table_name, *, catalog=None, database=None)
Compute the schema of a table
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
table_name |
str | May not be fully qualified. Use database if you want to qualify the identifier. |
required |
catalog |
str | None | Catalog name | None |
database |
str | None | Database name | None |
Returns
Type | Description |
---|---|
ibis.expr.schema.ibis.expr.schema.Schema | Ibis schema |
has_operation
has_operation(cls, operation)
insert
insert(self, table_name, obj, schema=None, database=None, overwrite=False)
Insert data into a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
table_name |
str | The name of the table to which data needs will be inserted | required |
obj |
pandas.pandas.DataFrame | ibis.expr.types.ibis.expr.types.Table | list | dict | The source data or expression to insert | required |
schema |
str | None | [deprecated] The name of the schema that the table is located in | None |
database |
str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . ::: {.callout-note} ## Ibis does not use the word schema to refer to database hierarchy. A collection of tables is referred to as a database . A collection of database is referred to as a catalog . These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology. ::: |
None |
overwrite |
bool | If True then replace existing contents of table |
False |
list_catalogs
list_catalogs(self, like=None)
List existing catalogs in the current connection.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like |
str | None | A pattern in Python’s regex format to filter returned database names. | None |
Returns
Type | Description |
---|---|
list[str] | The catalog names that exist in the current connection, that match the like pattern if provided. |
list_databases
list_databases(self, like=None, catalog=None)
List existing databases in the current connection.
schema
to refer to database hierarchy.
A collection of table
is referred to as a database
. A collection of database
is referred to as a catalog
.
These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like |
str | None | A pattern in Python’s regex format to filter returned database names. | None |
catalog |
str | None | The catalog to list databases from. If None , the current catalog is searched. |
None |
Returns
Type | Description |
---|---|
list[str] | The database names that exist in the current connection, that match the like pattern if provided. |
list_schemas
list_schemas(self, like=None, database=None)
list_tables
list_tables(self, like=None, database=None, schema=None)
List the tables in the database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
like |
str | None | A pattern to use for listing tables. | None |
database |
tuple[str, str] | str | None | The database location to perform the list against. By default uses the current database (self.current_database ) and catalog (self.current_catalog ). To specify a table in a separate catalog, you can pass in the catalog and database as a string "catalog.database" , or as a tuple of strings ("catalog", "database") . |
None |
schema |
str | None | [deprecated] The schema inside database to perform the list against. |
None |
raw_sql
raw_sql(self, query)
Execute a raw SQL query.
read_csv
read_csv(self, path, table_name=None, **kwargs)
Register a CSV file as a table in the current backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
str | pathlib.Path | The data source. A string or Path to the CSV file. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to the backend loading function. | {} |
Returns
Type | Description |
---|---|
ibis.expr.types.ibis.expr.types.Table | The just-registered table |
read_delta
read_delta(self, source, table_name=None, **kwargs)
Register a Delta Lake table in the current database.
Parameters
Name | Type | Description | Default |
---|---|---|---|
source |
str | pathlib.Path | The data source. Must be a directory containing a Delta Lake table. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to the underlying backend or library. | {} |
Returns
Type | Description |
---|---|
ibis.expr.types.ibis.expr.types.Table | The just-registered table. |
read_json
read_json(self, path, table_name=None, **kwargs)
Register a JSON file as a table in the current backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
str | pathlib.Path | The data source. A string or Path to the JSON file. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to the backend loading function. | {} |
Returns
Type | Description |
---|---|
ibis.expr.types.ibis.expr.types.Table | The just-registered table |
read_parquet
read_parquet(self, path, table_name=None, **kwargs)
Register a parquet file as a table in the current backend.
Parameters
Name | Type | Description | Default |
---|---|---|---|
path |
str | pathlib.Path | The data source. | required |
table_name |
str | None | An optional name to use for the created table. This defaults to a sequentially generated name. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to the backend loading function. | {} |
Returns
Type | Description |
---|---|
ibis.expr.types.ibis.expr.types.Table | The just-registered table |
reconnect
reconnect(self)
Reconnect to the database already configured with connect.
register_options
register_options(cls)
Register custom backend options.
rename_table
rename_table(self, old_name, new_name)
Rename an existing table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
old_name |
str | The old name of the table. | required |
new_name |
str | The new name of the table. | required |
sql
sql(self, query, schema=None, dialect=None)
table
table(self, name, schema=None, database=None)
Construct a table expression.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Table name | required |
schema |
str | None | [deprecated] Schema name | None |
database |
tuple[str, str] | str | None | Database name | None |
Returns
Type | Description |
---|---|
Table | Table expression |
to_csv
to_csv(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a CSV file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Table | The ibis expression to execute and persist to CSV. | required |
path |
str | pathlib.Path | The data source. A string or Path to the CSV file. | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs |
typing.Any | Additional keyword arguments passed to pyarrow.csv.CSVWriter | {} |
https |
required |
to_delta
to_delta(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a Delta Lake table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Table | The ibis expression to execute and persist to Delta Lake table. | required |
path |
str | pathlib.Path | The data source. A string or Path to the Delta Lake table. | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
kwargs |
typing.Any | Additional keyword arguments passed to deltalake.writer.write_deltalake method | {} |
to_pandas
to_pandas(self, expr, *, params=None, limit=None, **kwargs)
Execute an Ibis expression and return a pandas DataFrame
, Series
, or scalar.
This method is a wrapper around execute
.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Expr | Ibis expression to execute. | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs |
typing.Any | Keyword arguments | {} |
to_pandas_batches
to_pandas_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **kwargs)
Execute an Ibis expression and return an iterator of pandas DataFrame
s.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Expr | Ibis expression to execute. | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
chunk_size |
int | Maximum number of rows in each returned DataFrame batch. This may have no effect depending on the backend. |
1000000 |
kwargs |
typing.Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
collections.abc.Iterator[pandas.pandas.DataFrame] | An iterator of pandas DataFrame s. |
to_parquet
to_parquet(self, expr, path, *, params=None, **kwargs)
Write the results of executing the given expression to a parquet file.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Table | The ibis expression to execute and persist to parquet. | required |
path |
str | pathlib.Path | The data source. A string or Path to the parquet file. | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
**kwargs |
typing.Any | Additional keyword arguments passed to pyarrow.parquet.ParquetWriter | {} |
https |
required |
to_polars
to_polars(self, expr, *, params=None, limit=None, **kwargs)
Execute expression and return results in as a polars DataFrame.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Expr | Ibis expression to export to polars. | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs |
typing.Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
dataframe | A polars DataFrame holding the results of the executed expression. |
to_pyarrow
to_pyarrow(self, expr, *, params=None, limit=None, **kwargs)
Execute expression and return results in as a pyarrow table.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Expr | Ibis expression to export to pyarrow | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
kwargs |
typing.Any | Keyword arguments | {} |
Returns
Type | Description |
---|---|
Table | A pyarrow table holding the results of the executed expression. |
to_pyarrow_batches
to_pyarrow_batches(self, expr, *, params=None, limit=None, chunk_size=1000000, **_)
Execute expression and return an iterator of pyarrow record batches.
This method is eager and will execute the associated expression immediately.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Expr | Ibis expression to export to pyarrow | required |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means “no limit”. The default is in ibis/config.py . |
None |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Mapping of scalar parameter expressions to value. | None |
chunk_size |
int | Maximum number of rows in each returned record batch. | 1000000 |
Returns
Type | Description |
---|---|
RecordBatchReader | Collection of pyarrow RecordBatch s. |
to_torch
to_torch(self, expr, *, params=None, limit=None, **kwargs)
Execute an expression and return results as a dictionary of torch tensors.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expr |
ibis.expr.types.ibis.expr.types.Expr | Ibis expression to execute. | required |
params |
collections.abc.Mapping[ibis.expr.types.ibis.expr.types.Scalar, typing.Any] | None | Parameters to substitute into the expression. | None |
limit |
int | str | None | An integer to effect a specific row limit. A value of None means no limit. |
None |
kwargs |
typing.Any | Keyword arguments passed into the backend’s to_torch implementation. |
{} |
Returns
Type | Description |
---|---|
dict[str, torch.torch.Tensor] | A dictionary of torch tensors, keyed by column name. |
truncate_table
truncate_table(self, name, database=None, schema=None)
Delete all rows from a table.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name |
str | Table name | required |
database |
str | None | Name of the attached database that the table is located in. For backends that support multi-level table hierarchies, you can pass in a dotted string path like "catalog.database" or a tuple of strings like ("catalog", "database") . ::: {.callout-note} ## Ibis does not use the word schema to refer to database hierarchy. A collection of tables is referred to as a database . A collection of database is referred to as a catalog . These terms are mapped onto the corresponding features in each backend (where available), regardless of whether the backend itself uses the same terminology. ::: |
None |
schema |
str | None | [deprecated] Schema name | None |