Tuesday, December 11, 2012

Postgres temporary table definition and loading in SQLAlchemy, as well as handling large lists of values to be checked against as query parameters

Often, I have to pass a large list of potential value matches to a PostgreSQL query through SQLAlchemy . I believe that SQLAlchemy limits the number of allowed arguments to somewhere around 1000, so to get around that, I'll pass a comma separated list of values and then split the values in my query. For instance, say I am looking for values with ids matching 2, 4, 6, 8, and 10. The query would work as follows:

SELECT * from my_table where id = any(string_to_array( '2,4,6,8,10' )); --Yes, you have to worry about casting here with numeric ids...
And you could automate this behavior with a Python function like so:

def csv_string_selectable(element_list): from sqlalchemy import select, func return select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) )]) def sample_usage(): matching_ids = [ "2", "4", "6", "8", "10" ] query = session.query(MyTable) query = query.filter(MyTable.id.in_(csv_string_selectable(matching_ids)))
Unfortunately, Postgres as of 8.4 doesn't always process these sorts of queries as optimally as it could. That is, if I rewrite the query to use a temporary table, I see a performance improvement. If multiple of these csv lists are used against different tables in the same query, using a temporary table provides a huge performance improvement. The incantation for getting the temporary table up and into the correct session was difficult to derive:

#You also have to have your sqlalchemy # metadata variable defined and available from sqlalchemy.sql.expression import Executable, ClauseElement from sqlalchemy.ext.compiler import compiles class InsertFromSelect(Executable, ClauseElement): def __init__(self, table, select): self.table = table self.select = select @compiles(InsertFromSelect) def visit_insert_from_select(element, compiler, **kw): return "INSERT INTO %s (%s)" % ( compiler.process(element.table, asfrom=True), compiler.process(element.select) ) def csv_string_selectable_tt(session, element_list): #Generate a unique temporary table name import uuid table_name = "temp_" + str(uuid.uuid1()).replace("-", '') #Create the temporary table and commit it to existence temp_table = Table( table_name, metadata, Column('id', VARCHAR(length=255, convert_unicode=False, assert_unicode=None), primary_key=True), prefixes=['TEMPORARY'] ) temp_table.create(bind=session.bind) #This commit is necessary or else the session will be rolled back before the next command session.commit() from sqlalchemy import select, func, insert source_select = select([func.unnest( func.string_to_array( ','.join(element_list), ',' ) ).label('id')]) #This custom generator is necessary to select into a table from a selectable insert_statement = InsertFromSelect(temp_table, source_select) session.execute(insert_statement) return temp_table #Usage: def sample_usage(): allowed_ids = [ "2", "4", "6", "8", "10" ] allowed_ids_table = csv_string_selectable_tt(session, allowed_ids) query = session.query(MyTable) query = query.filter(MyTable.description == My2ndTable.description ) query = query.filter(MyTable.id == allowed_ids_table.alias().c.id ) query = query.filter(My2ndTable.id == allowed_ids_table.alias().c.id )

Thursday, March 8, 2012

Compare the simlarity of two strings in Python

Again, I have adapted this code from some C code I found on the Internet years ago.

Side note: Did you know that toward and towards are completely interchangeable? Toward tends to get used in American English while towards is used more heavily in British English.



def wordcompare (s1, s2, weight_toward_front = False):
"""Compares two strings and returns a value between 0 and 100 for similarness"""

if not s1 or not s2:
return 0

if s1 == s2:
return 100

s1_len = len(s1)
s2_len = len(s2)

n = i = j = k = x = y = 0

while i < s1_len:
x = i
j = 0
while j < s2_len:
y = j
k = 0
while x < s1_len and y < s2_len and s1[x] == s2[y]:
eb = 1
if weight_toward_front:
if (x < 1 and y < 1):
eb += 3
if (x < 2 and y < 2):
eb += 2
if (x < 3 and y < 3):
eb += 1

k += 1
n += (k*k*eb)
x += 1
y += 1

j += 1
i += 1

n = int((n * 20) / (s1_len * s2_len))

if n > 100:
n = 100

return n

Wednesday, March 7, 2012

Compare the simlarity of two strings in Visual Basic

I have adapted this code from some C code I found on the Internet years ago.


Function WordCompare(word1 As String, word2 As String, Optional weight_towards_front As Boolean = False) As Integer
'Compares two strings and returns a value between 0 and 100 for similarness
If word1 = "" And word2 = "" Then
WordCompare = 0
Exit Function
End If

If word1 = word2 Then
WordCompare = 100
Exit Function
End If

Dim word1_len, word2_len As Integer

word1_len = Len(word1)
word2_len = Len(word2)

Dim n, i, j, k, x, y, eb As Integer
n = 0
i = 0
j = 0
k = 0
x = 0
y = 0

For i = 1 To word1_len
x = i

For j = 1 To word2_len
y = j
k = 0

Do While x <= word1_len And y <= word2_len And Mid(word1, x, 1) = Mid(word2, y, 1)
eb = 1
If weight_towards_front Then
If x < 2 And y < 2 Then
eb = eb + 3
End If
If x < 3 And y < 3 Then
eb = eb + 2
End If
If x < 4 And y < 4 Then
eb = eb + 1
End If
End If

k = k + 1
n = n + (k * k * eb)
x = x + 1
y = y + 1
Loop

Next j
Next i

n = (n * 20) / (word1_len * word2_len)

If n > 100 Then
n = 100
End If

WordCompare = n

End Function