Ibis Project Blog

Python productivity framework for the Apache Hadoop ecosystem. Development updates, use cases, and internals.

Leveraging SQL window functions in Ibis

Window (also known as analytic) functions are a valuable technique in analytic SQL, but unfortunately they are generally considered to be an advanced skill among SQL programmers. Conceptually, they are relatively simple, and indeed many everyday pandas and R operations can be expressed in SQL through their use. Mechanically, they can be difficult to use, largely because of the SQL syntax.

Ibis has had comprehensive support for window functions from 0.3 onward, and I invested quite a bit of effort to design an API to make them available to users in a much simpler way. I also made sure that you don't have to be a SQL expert to use them.

Ibis Design: Modeling high level analytics tasks

Outside of scalability and high performance on large data sets with Python, Ibis is focused on simplifying analytics tasks for end users. By designing a rich pandas-like domain specific language (DSL) embedded in Python code, we can hide away the complexities normally associated with expressing analytical concepts in SQL or some other tool. This post gives some specific examples and shows how we're solving them in Ibis.

Using other compute engines with Ibis

Several people have asked me about using Ibis with execution engines other than Impala. The purpose of this post is to explain how one can make Ibis work with other systems and what that might mean for the actual users.

Hello World

Welcome to the Ibis project blog. As the project develops we'll post here periodically with:

  • Applications and use cases
  • Development updates and release notes
  • Details on project internals and design
  • Updates from Impala development and the broader community
  • Thought pieces on the Big Data and Python ecosystems

I'm looking forward to the journey; it's going to be really exciting.