Archive for the ‘c0re’ Category

Reliable Software Design

Thursday, September 18th, 2008

Avoiding Race conditions

A typical situation in ERP systems is, that you have to use SQL tables as an “interface” between applications.
Let’s say you want to submit an order. You write an “order header” containing the customer number etc and orderlines how much to ship of what.

order = {'customer_no': 12345,
         'delivery_date': '2001-01-01',
         'orderlines': [{'item_no': '10105', 'quantity': 80},
                        {'item_no': '14600', 'quantity': 12}]
        }

In SQL you use two tables and add some primary keys:

order_head:

id   | customer_no | delivery_date | processed
-----+-------------+---------------------------
5123 | 12345       | 2001-01-01    | False

orderline:

orderid | item_no | quantity
--------+---------+----------
5123    | 10105   | 80
5123    | 14600   | 12

Since oldschool databases usually don’t have auto incrementing keys, you usually have to write something like this to insert an order:

next_id = query('SELECT MAX(id) FROM order_head')
query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(%d, '12345', '2001-01-01', False)" % next_id)
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '10105', 80)" % next_id)
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '14600', 12)" % next_id)

This has some issues: what if some programm starts reading after you have written order_head and the first orderline, but not the second one? The ustomer would not get the goods from the second orderline. Transaction in modern SQL databases can solve this, but if you don’t have transactions you can solfe this problem in otehr ways. Let’s assume the code in the ERP for processing this interface looks like this:

for order in query('SELECT * FROM order_head'):
    for orderlines in query("SELECT * FROM orderline
                             WHERE orderid='%d'" % order.id):
        # do something with order and orderline
    query("UPDATE orderline SET processed=True
           WHERE orderid='%d'" % order.id)

This means that orderlines are never read unless there is also an header with the respective ID. So if we write the orderlines first we are save from them beeing read if tey are not ready yet. So by reordering our code we can make it already much more robust:

next_id = query('SELECT MAX(id) FROM order_head')
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '10105', 80)" % next_id)
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '14600', 12)" % next_id)
query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(%d, '12345', '2001-01-01', False)" % next_id)

This already avoids the race condition “reading while writing”. But what if two processes try to write orders at once? Then next_id = query('SELECT MAX(id) FROM order_head') would result in both processes getting the same next_id and this mixing up two orders into a single one.

The easiest solution is to ensure that only one process can ever write orders. This avoids lot’s of trouble. But it is suprisingly hard to enforce that. So by having code which reduces the risk of dual write race conditions we can make our program much more robust.

One approach would be to use SQL functionality like this to insert the head?

query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(SELECT MAX(id) FROM order_head, '12345', '2001-01-01', False)")

This would ensure that no id is used twice but how would we get the ID we generated for our order? SELECT MAX(id) FROM order_head would not help since some other program might have inserted another record in the meantime.

We are here in an situation where we can’t make the code reliable without resorting to some tricks. In this case the processed field is our solution: Teh application reading the database ignores all records where processed == True. So as long as we set processed to True we can use the other fields for whatever we like. In this case we use the date field to temporary store an unique id identifying the record.

So let’s generate a nice 10 character random value:

token = hex((int(time.time() * 10000)
            ^ (os.getpid() << 16)
            ^ thread.get_ident() << 8)
           % 0xFFFFFFFFFF).rstrip('L')[2:]

If this random value is guaranteed to be unique it’s easy to write a robust insertation code.

def insert_order():
    token = hex((int(time.time() * 10000)
        ^ (os.getpid() << 16)
        ^ thread.get_ident() < 1:
    next_id = query('SELECT MAX(id) FROM order_head')
    query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(%d, '12345', %r, False)" % (next_id, token))
    rowcount = query('SELECT COUNT(*) FROM order_head WHERE id = %d' % next_id)
    if rowcount > 1:
       # the race condition has hit - remove our entry and retry
        query('DELETE FROM order_head WHERE delivery_date = %r' % token)
        time.sleep(random.randint()/100.0)
        insert_order()
    else:
        query('INSERT INTO orderline (orderid | item_no | quantity)
               VALUES(%d, '10105', 80)" % next_id)
        query('INSERT INTO orderline (orderid | item_no | quantity)
               VALUES(%d, '14600', 12)" % next_id)
        # fix up the header
        query('UPDATE order_head SET delivery_date='2001-01-01',
               processed=False WHERE delivery_date=%r' % token)

While this code is much more complex than the original, it can guarantee that no race conditions occur as long as token is unique.

Gude Expert Power Control NET

Thursday, September 11th, 2008

The Gude Expert Power Control NET switch is an ethernet controllable power socket. It an switch a single socket. It sells for about 125 Euro.

It can be controlled by HTTP although it uses GET requests wich breaks idempotence. Still: it works:

curl http://admin:admin@192.168.0.225/SWITCH.CGI?s1=0 > /dev/null

One nice thing about the Switch is the fact that it supports DHCP.

ePowerSwitch-4

Thursday, September 11th, 2008

The ePowerSwitch-4 is a ethernet controllable power socket by Leuning GmbH. It is designed to be rack-mountable and comes in a solid PVC housing.

The Software Stack running on the Switch seems to be based on QNX and acts quite reasonable. (See the manual for an overview). E.g. you can use curl to switch ports on and off. E.g.

curl --data 'P4=0' -i http://admin:admin@192.168.100.100/config/home_f.html

to switch Port 4 off.

One thing I don’t like about the device is the fact that it doesn’t support DHCP but needs a fixed IP. On the Pro side it comes with an additional RS-232 port for configuration and operation.

The Switch is on sale for about 400 Euro which results in a cost per port of about 100 Euro.

Hooray!

Friday, August 22nd, 2008

Authenticate Apache 2.2 against Zimbra LDAP

Thursday, August 14th, 2008

Add something like this to httpd.conf.

LoadModule authz_host_module libexec/apache22/mod_authz_host.so
LoadModule authz_user_module libexec/apache22/mod_authz_user.so
LoadModule authnz_ldap_module libexec/apache22/mod_authnz_ldap.so
LoadModule authz_default_module libexec/apache22/mod_authz_default.so
LoadModule auth_basic_module libexec/apache22/mod_auth_basic.so
LoadModule auth_digest_module libexec/apache22/mod_auth_digest.so
LoadModule ldap_module libexec/apache22/mod_ldap.so

  AuthName "HUDORA Internal"
  AuthType Basic
  AuthBasicProvider ldap
  AuthLDAPURL ldap://zimbra.example.com/?mail
  Require valid-user

This should alow you to login as yourmail@zimbra.example.com with your zimbra password.

Enjoy!

How to monitor Disk temperature with 3ware controllers, FreeBSD 7 and munin

Friday, August 8th, 2008

1. The munin-node code in FreeBSD seems not to be the most recent version. I downloaded this, replaced the first line with #!/usr/bin/perl and moved the file to /usr/local/etc/munin/plugins/hddtemp_smartctl. I also had to remove the lime below “# Avoid spinning up sleeping disks” in hddtemp_smartctl to make it work.

For monitoring 4 disks connected to a 3ware 9500S-12 i changed /usr/local/etc/munin/plugin-conf.d/plugins.conf to read:

[hddtemp_smartctl]
user root
env.smartctl /usr/local/sbin/smartctl
env.drives ad0 da0 da1 da2 da3
env.type_da0 3ware,1
env.type_da1 3ware,1
env.type_da2 3ware,2
env.type_da3 3ware,3
env.dev_da0 twa0
env.dev_da1 twa0
env.dev_da2 twa0
env.dev_da3 twa0

Finally you have to install /usr/ports/sysutils/smartmontools.

That’s it

Packages I needed to get http://laconi.ca/ running on FreeBSD 7:

Sunday, August 3rd, 2008

apache-2.2.9 mhash-0.9.9 mysql-client-5.0.51a pear-1.7.2 pecl-fileinfo-1.0.4 pecl-filter-0.11.0 pecl-hash-1.5 pecl-json-1.2.1 php5-5.2.6 php5-calendar-5.2.6 php5-ctype-5.2.6 php5-curl-5.2.6 php5-dba-5.2.6 php5-dom-5.2.6 php5-extensions-1.1 php5-gd-5.2.6 php5-gettext-5.2.6_1 php5-gmp-5.2.6 php5-iconv-5.2.6 php5-ldap-5.2.6 php5-mbstring-5.2.6 php5-mcrypt-5.2.6 php5-mhash-5.2.6 php5-mysql-5.2.6 php5-openssl-5.2.6 php5-pcre-5.2.6 php5-pdo-5.2.6 php5-pdo_sqlite-5.2.6 php5-posix-5.2.6 php5-readline-5.2.6 php5-recode-5.2.6_1 php5-session-5.2.6 php5-simplexml-5.2.6 php5-spl-5.2.6 php5-sqlite-5.2.6 php5-tokenizer-5.2.6 php5-xml-5.2.6 php5-xmlreader-5.2.6 php5-xmlwriter-5.2.6 php5-zlib-5.2.6 recode-3.6_7.

using JasperReports & pyJasper

Monday, July 7th, 2008

Two years ago or so we where looking into ways to generate various PDF documents from python. I had used various ways of generating PDFs before like directly outputting Postscript (code generation!) and converting to PDF, FPDF, ReportLab and HTML2PDF. I was looking fore something more “What you see is what you get” like. And i wanted integrated PDF generation.

It turned out JasperReports had all I needed. While Jasper is promoted as reporting solution it can be used as a simple PDF generator. Together with the iReport visual report designer we had the complete Toolchain. USers could use iReport to “paint” the desired PDF output using placeholders for all the variable data to be filled in. iReport would save this design in a .jrxml file. The jasper library would then compile that file to a .jasper file for performance reasons.

For generation the actual report you would feed that .jasper and a “datasource” to the JasperReports library.. A datasource could be a SQL/JDBC query or an xml-file. This datasource would than be used to fill in the placeholders you left in the .jrxml and generate a PDF.

All nice and smooth. The only problem is: Jasper Reports is written in Java and our application stack is written mostly in Python. I also avoid programming Java whenever possible. I feel incredibly clumsy when having to code in Java.

Jython to the rescue! Jython allows you to access Java libraries and still write Python code. this is done by having Python running inside a Java VM. Wit some help we where able to generate PDFs from jython. But the whole beast was still running inside the java VM.

So to call it from our regular Python code running inside mod_python on the Apache web server we had to write the XML to disk, fork a process, fire up the Java VM with Jython and the call the Jasper library to generate the PDF from the XML and the jrxml report source code.

Worked but took about 7 seconds per document. This was OK for low volume production but for some application we needed much shorter turn-arround times. It turned out, starting up the java VM was the biggest time consumer. So we created a long running Jython process which received PDF rendering jobs via a home grown UDP Protocol. It was messi. Sending absolute filesystem paths via UDP was probably not a good idea but it worked. PDF Generation times where down to 1.5 seconds or so. Using ths structure we have been running 21 months or so and generated about 500.000 PDFs.

But it was still a mess. Especially it meant that every server which had to generate PDFs also needed Java installed. And Java on FreeBSD Servers is very limited fun. So we came up with a servlet based approach: There would be a webserver; upon POSTing a XML datasource and a JRXML file to a specific URL you would get back a rendered PDF. As a Java based Webserver we use Jetty which is driving a Jython servlet calling the JasperReports library. Works like a charm.

After using it for 9 Months or so Im just in the process of publishing this Jetty based version real soon now(TM). It will be available at http://cybernetics.hudora.biz/projects/wiki/pyJasper

German zip codes

Monday, June 30th, 2008

<!–


This browser does not have a Java Plug-in.


Get the latest Java Plug-in here.

<!–

This is based on Ben Fry’s zipdecode and PLZ.tab from OpenGeoDB. I used a simple script to preprocess the data.

Built with Processing.

If the Java Applet doesn’t work, try this page. Klick into the applet and tipe a few numbers, e.g. “42″

FreeBSD Jails and Utility computing

Monday, June 23rd, 2008

Using the FreeBSD Jail mechanism, I’m slowly getting to a minimal Admin intervention Server setup:

host# ezjail-admin create -f djangoserver extranet a.b.c.6
host# /usr/local/etc/rc.d/ezjail.sh start extranet
workstation# ssh a.b.c.6
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
	The Regents of the University of California.  All rights reserved.

FreeBSD 7.0-RELEASE (GENERIC) #0: Sun Feb 24 10:35:36 UTC 2008

This is a preconfigured virtual machine (jail) for running Django applications.
Avoid manual configuration!
[md@extranet ~]$ su
extranet# cd /usr/local/www
extranet# svn co https://example.com/.../projects/extranet/trunk extranet
extranet# ln -s /usr/local/www/extranet /usr/local/www/djangoprojectdir
extranet# sh /usr/local/etc/rc.d/apache22 start