Often, I have to pass a large list of potential value matches to a PostgreSQL query through SQLAlchemy . I believe that SQLAlchemy limits the number of allowed arguments to somewhere around 1000, so to get around that, I'll pass a comma separated list of values and then split the values in my query. For instance, say I am looking for values with ids matching 2, 4, 6, 8, and 10. The query would work as follows:
SELECT * from my_table where id = any(string_to_array( '2,4,6,8,10' ));
--Yes, you have to worry about casting here with numeric ids...
And you could automate this behavior with a Python function like so:
def csv_string_selectable(element_list):
from sqlalchemy import select, func
return select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) )])
def sample_usage():
matching_ids = [ "2", "4", "6", "8", "10" ]
query = session.query(MyTable)
query = query.filter(MyTable.id.in_(csv_string_selectable(matching_ids)))
Unfortunately, Postgres as of 8.4 doesn't always process these sorts of queries as optimally as it could. That is, if I rewrite the query to use a temporary table, I see a performance improvement. If multiple of these csv lists are used against different tables in the same query, using a temporary table provides a huge performance improvement.
The incantation for getting the temporary table up and into the correct session was difficult to derive:
#You also have to have your sqlalchemy
# metadata variable defined and available
from sqlalchemy.sql.expression import Executable, ClauseElement
from sqlalchemy.ext.compiler import compiles
class InsertFromSelect(Executable, ClauseElement):
def __init__(self, table, select):
self.table = table
self.select = select
@compiles(InsertFromSelect)
def visit_insert_from_select(element, compiler, **kw):
return "INSERT INTO %s (%s)" % (
compiler.process(element.table, asfrom=True),
compiler.process(element.select)
)
def csv_string_selectable_tt(session, element_list):
#Generate a unique temporary table name
import uuid
table_name = "temp_" + str(uuid.uuid1()).replace("-", '')
#Create the temporary table and commit it to existence
temp_table = Table(
table_name, metadata,
Column('id', VARCHAR(length=255, convert_unicode=False, assert_unicode=None), primary_key=True),
prefixes=['TEMPORARY']
)
temp_table.create(bind=session.bind)
#This commit is necessary or else the session will be rolled back before the next command
session.commit()
from sqlalchemy import select, func, insert
source_select = select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) ).label('id')])
#This custom generator is necessary to select into a table from a selectable
insert_statement = InsertFromSelect(temp_table, source_select)
session.execute(insert_statement)
return temp_table
#Usage:
def sample_usage():
allowed_ids = [ "2", "4", "6", "8", "10" ]
allowed_ids_table = csv_string_selectable_tt(session, allowed_ids)
query = session.query(MyTable)
query = query.filter(MyTable.description == My2ndTable.description )
query = query.filter(MyTable.id == allowed_ids_table.alias().c.id )
query = query.filter(My2ndTable.id == allowed_ids_table.alias().c.id )