Ibis Project Blog

Python productivity framework for the Apache Hadoop ecosystem. Development updates, use cases, and internals.

Ibis 0.7: Kudu-Impala integration, SQL compiler improvements

Ibis 0.7.0 has been released! The biggest new feature in the release is Impala-Kudu integration. This is great timing, because Kudu's Python client went beta officially in its recent 0.7.0 release.

In addition to many bug fixes, Ibis includes a much smarter SQL compiler for more complex pandas-like expressions. For example, consider the following operation:

table = ibis.table([('flag', 'string'),
                    ('value', 'double')],
                   'tbl')

flagged = table[table.flag == '1']
unflagged = table[table.flag == '0']

fv = flagged.value
uv = unflagged.value

expr = (fv.mean() / fv.sum()) - (uv.mean() / uv.sum())

Now in Ibis 0.7.0, this expression can be transformed to the correct effective SQL:

SELECT t0.`tmp` - t1.`tmp` AS `tmp`
FROM (
  SELECT avg(`value`) / sum(`value`) AS `tmp`
  FROM tbl
  WHERE `flag` = '1'
) t0
  CROSS JOIN (
    SELECT avg(`value`) / sum(`value`) AS `tmp`
    FROM tbl
    WHERE `flag` = '0'
  ) t1

Thanks to all who contributed patches:

$ git log v0.6.0..v0.7.0 --pretty=format:%aN | sort | uniq -c | sort -rn
    21 Wes McKinney
     1 Uri Laserson
     1 Kristopher Overholt