The next Ibis release is out, with some major new functionality:
- SQLite client and support for most SQLite built-in functions
- Python 3 compatibility (single codebase)
- SQLAlchemy-based expression translation toolchain to enable easier internal code reuse amongst SQL engines and pave road for PostgreSQL, Redshift, Vertica, and other analytic SQL engine support in the near future.
- Asynchronous query execution API (
expr.execute(async=True)) for Impala supporting query status and cancellation. This is very helpful in building multithreaded applications.
- Support for using Impala user-defined aggregate (UDA) functions
There's a lot more, of course. Check out the detailed release notes, and read on for more about the upcoming roadmap.
Install Ibis from PyPI with
pip install ibis-framework
Thanks to all who contributed patches:
$ git log v0.4.0..v0.5.0 --pretty=format:%aN | sort | uniq -c | sort -rn 55 Wes McKinney 9 Uri Laserson 1 Kristopher Overholt
Big news: expanding SQL engine support
One of the major goals of Ibis is to enable analytics work to be migrated from SQL code to Python code. Since much data being warehoused in analytic SQL systems (like Impala on HDFS or Redshift on AWS) isn't going anywhere soon, architecturally this requires building a feature-complete SQL translation toolchain. Ibis compiles Python to SQL behind the scenes and sends it to your data engine of choice.
We are taking SQL feature coverage very seriously. That means if you find a
SELECT SQL query that cannot be expressed with Ibis, we will treat it as a
Between Ibis 0.4 and 0.5, I undertook significant refactoring to separate Impala-specific functionality from the more generic SQL compilation toolchain. As part of this, I added a SQLAlchemy compiler-translator that converts Ibis expressions into SQLAlchemy expressions. To see this through to completion, I built a SQLite Ibis client that takes advantage of this.
Supporting more SQL engines is a lot of work, because each system has its own set of built-in functions, and these have to be wrapped and connected to the SQL-independent Ibis expression DSL.
Having this flexible and reusable translation toolchain available also makes it easier to smooth over behavior differences and API inconsistencies between SQL engines.
I would like to add more SQL engines; those designed for analytics (like Redshift, Vertica, and Presto) are likely to receive more attention in the short term. If you would like to get involved please get in touch.
Upcoming Ibis roadmap
Focus area in the coming months for the project will be:
- Expanding SQL engine support (Redshift, Presto, Vertica, and Spark SQL are high priorities)
- Support for Impala complex (nested) types
- Tools for more complex ETL workflows on Impala
See the GitHub issue tracker for the granular feature roadmap.