Wednesday, February 08, 2006


Categories: , , ,

One thing I think would be useful is the ability to create a SQL database "on top" of another SQL database. This would allow you to write an application where the user creates their own DB schema, somewhat like MS Access, but the underlying database schema doesn't change. I'm exploring this a bit right now for my next project.

The basic idea is to create four tables: vtable, vcol, vrow, and vcell. vtable is a "virtual table", vcol is a "virtual column," and so on. I'm currently using SQLObject for the DB layer, so all the examples will use SQLObject, even though I'm not using it to the extent it could be used.

The setup code for a table is below:

class VTable(SQLObject):

def create(klass, name, **columns):
for colname, coltype in columns.items():
t.addColumn(colname, coltype)
return t

def addColumn(self, colname, coltype):
VCol(colname=colname, coltype=coltype, vtable=self)

def getColumn(self, colname):
for c in self.vcols:
if c.colname == colname: return c
return None

def insert(self, **values):
row = VRow(vtable=self)
for colname, value in values.items():
cell=VCell(vcol=col, vrow=row, stringValue=value)
return row

Fairly straightforward: a vtable is a collection of vcols & vrows. I also include helper functions for performing table creation and inserts. Next is the column definition:

class VCol(SQLObject):

Even better: a column is just a "type" (currently ignored), a name, and a collection of vcells. The Row object is also simple:

class VRow(SQLObject):

def __getitem__(self, colname):
for cell in vcells:
if cell.vcol.colname == colname: return cell.stringValue
return None

def __setitem__(self, colname, value):
cell = self[colname]
if cell is None:
cell=VCell(vcol=vcol, vrow=self, stringValue=value)
cell.stringValue = value

A row's main purpose is to have an id value which ties all the cells of a row together. Without further ado, here is the vcell class:
class VCell(SQLObject):

A vcell is simply a vrow & vcol reference, and an attached data value. Now, what can we do with this? Well, we can create vtables and insert values into them fairly easily. Suppose we have created and populated a table as follows:

t=VTable.create('mytable', col1='str', col2='str', col3='date')
for i in range(3):
t.insert(col1='foo%d' % i, col2='bar%d' % i, col3='baz')

This is equivalent to

CREATE TABLE mytable (col1 TEXT, col2 TEXT, col3 TEXT);
INSERT INTO mytable(col1,col2,col3) VALUES ('foo0', 'bar0', 'baz');
INSERT INTO mytable(col1,col2,col3) VALUES ('foo1', 'bar1', 'baz');
INSERT INTO mytable(col1,col2,col3) VALUES ('foo2', 'bar2', 'baz');

So what if we want to do something like SELECT col1, col2 FROM mytable with the vtable? What does the SQL look like for that? Can you even do it in a single SQL statement? It turns out you can:

SELECT vcell0.string_value AS col1,vcell1.string_value AS col2
FROM vtable vt0, vcol vcol0,vcell vcell0, vcol vcol1,vcell vcell1,
WHERE vt0.tabname='mytable'
AND vcol0.vtable_id = AND vcol0.colname='col1' AND
AND vcol1.vtable_id = AND vcol1.colname='col2' AND
AND vcell0.vrow_id=vcell1.vrow_id;

If we run this, we get the expected results:

>>> conn.queryAll(sql)
[('foo0', 'bar0'), ('foo1', 'bar1'), ('foo2', 'bar2')]

But this is ugly and hard to write and read. What if there were a function to build a virtualized query automatically? Well, I've created such a function. Right now, it is very kludge-y and has limited functionality (it doesn't support a virtual "WHERE" clause yet, for example.) But it's a starting point. What I'd really like to have is a sql-ish interface that allows complete virtualization of table creation, query, update, etc., all without modifying the underlying database schema. If anyone is interested in possibly contributing to such a project, let me know in the comments below, and I'll set up a project on SourceForge.

Oh, and here's the "virtual query builder".

def vquery(columns, tables):
'''vquery(columns, tables) - create a (real) query from a virtual one

columns : list of (table,column) tuples
tables : list of table names
tabtrans = {} # tabtrans[tabname] = tab_label
coltrans = {} # coltrans[colname] = (col_label, cell_label)
num_tables = 0
num_columns = 0
sql='SELECT '
real_tables, real_columns, real_cells = [], [], []
for t in tables:
tab_label = 'vt%d' % num_tables
num_tables += 1
tabtrans[t] = tab_label
for t,c in columns:
col_label = 'vcol%d' % num_columns
cell_label = 'vcell%d' % num_columns
num_columns += 1
coltrans[c] = (tabtrans[t], col_label, cell_label)
selectclause.append('%s.string_value as %s' %
(cell_label, c))
# Build virtualized whereclause & tablist
tablist = []
for user_name, real_name in tabtrans.items():
tablist.append('vtable %s' % real_name)
whereclause.append("%s.tabname='%s'" % (real_name, user_name))
last_cell_name = None
for user_col_name, (real_table_name, real_col_name, real_cell_name) in coltrans.items():
tablist.append('vcol %s' % real_col_name)
tablist.append('vcell %s' % real_cell_name)
whereclause.append("%s.vtable_id =" %
(real_col_name, real_table_name))
whereclause.append("%s.colname='%s'" %
(real_col_name, user_col_name))
whereclause.append("" %
(real_cell_name, real_col_name))
if last_cell_name is not None:
whereclause.append("%s.vrow_id=%s.vrow_id" %
(real_cell_name, last_cell_name))
last_cell_name = real_cell_name
sql = 'SELECT %s FROM %s WHERE %s' % (','.join(selectclause),
' AND '.join(whereclause))
return sql

No comments:

Post a Comment