Tuesday, December 11, 2012

Postgres temporary table definition and loading in SQLAlchemy, as well as handling large lists of values to be checked against as query parameters

Often, I have to pass a large list of potential value matches to a PostgreSQL query through SQLAlchemy . I believe that SQLAlchemy limits the number of allowed arguments to somewhere around 1000, so to get around that, I'll pass a comma separated list of values and then split the values in my query. For instance, say I am looking for values with ids matching 2, 4, 6, 8, and 10. The query would work as follows:

SELECT * from my_table where id = any(string_to_array( '2,4,6,8,10' )); --Yes, you have to worry about casting here with numeric ids...
And you could automate this behavior with a Python function like so:

def csv_string_selectable(element_list): from sqlalchemy import select, func return select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) )]) def sample_usage(): matching_ids = [ "2", "4", "6", "8", "10" ] query = session.query(MyTable) query = query.filter(MyTable.id.in_(csv_string_selectable(matching_ids)))
Unfortunately, Postgres as of 8.4 doesn't always process these sorts of queries as optimally as it could. That is, if I rewrite the query to use a temporary table, I see a performance improvement. If multiple of these csv lists are used against different tables in the same query, using a temporary table provides a huge performance improvement. The incantation for getting the temporary table up and into the correct session was difficult to derive:

#You also have to have your sqlalchemy # metadata variable defined and available from sqlalchemy.sql.expression import Executable, ClauseElement from sqlalchemy.ext.compiler import compiles class InsertFromSelect(Executable, ClauseElement): def __init__(self, table, select): self.table = table self.select = select @compiles(InsertFromSelect) def visit_insert_from_select(element, compiler, **kw): return "INSERT INTO %s (%s)" % ( compiler.process(element.table, asfrom=True), compiler.process(element.select) ) def csv_string_selectable_tt(session, element_list): #Generate a unique temporary table name import uuid table_name = "temp_" + str(uuid.uuid1()).replace("-", '') #Create the temporary table and commit it to existence temp_table = Table( table_name, metadata, Column('id', VARCHAR(length=255, convert_unicode=False, assert_unicode=None), primary_key=True), prefixes=['TEMPORARY'] ) temp_table.create(bind=session.bind) #This commit is necessary or else the session will be rolled back before the next command session.commit() from sqlalchemy import select, func, insert source_select = select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) ).label('id')]) #This custom generator is necessary to select into a table from a selectable insert_statement = InsertFromSelect(temp_table, source_select) session.execute(insert_statement) return temp_table #Usage: def sample_usage(): allowed_ids = [ "2", "4", "6", "8", "10" ] allowed_ids_table = csv_string_selectable_tt(session, allowed_ids) query = session.query(MyTable) query = query.filter(MyTable.description == My2ndTable.description ) query = query.filter(MyTable.id == allowed_ids_table.alias().c.id ) query = query.filter(My2ndTable.id == allowed_ids_table.alias().c.id )