Archive for October, 2003

Bookmarklet for c0re books

Friday, October 31st, 2003

The link below (”c0re books”) is a so called bookmarklet – a tiny JavaScript application which is intended to run inside your bookmarks bar. Click on the link below, keep the mouse button pressed and drag the link into your bookmarks bar.

>>> c0re books <<<

What’s that for?

If you click on the “c0re books” in your bookmark bar the page you are currently visiting (meaning the url, the title and the current selection) is inspected for ISBNs and you are redirected to the page belonging to that ISBN on the tmnhc books aggregator. To try it to an amazon book page and click on the bookmarklet. Or mark an ISBN with the mouse (e.g. 0-13-937681-X ) on an other page and click on the bookmarklet.

The bookmarklet was tested on Safari and Mozilla.

BSD Userland PPP crashing

Tuesday, October 28th, 2003

BSD’s Userland PPP ran as expected for years. Now it crashed 3 times in 5 days. Strange and sad.

Formating ISBNs

Monday, October 27th, 2003

I found out there are strict rules for formatting ISBNs.

There are Implementations in C, Emacs elisp, php and check-digit checking perl.

There are also articles about the structure and country prefixes. Finally there is the official Handbook.

I have ported the implementation from bibclean over to Python done some pythonisations to speed it up and added a check sum validation routine inspired by the perl version.

Download isbn.py.

Archiving and preserving the Internet

Monday, October 27th, 2003

I’m Trying to Archive (mostly Internet) Content for a long time now. My first tries were mirroring and burning on CD of what I considered Important. I started a project trying to scan the CCC Paper based archive and digitized hundreds of hours Radio Intergalaktik (seems the CCC deleted it) and experimented with keeping copies of sites I surfed to.

I experimented with several archiving proxies like Gerald Oskoboiny system, Autojot, Archiver Proxy, Agent Frank and it’s precessor.

It turned out that all proxies degenerate my Web experience.
So I turned to Low level Networking and sniffed the HTTP directly from the network interface by modifying several parts of dsniff.

Since I’m also contemplating to make archived (semi-) public it turned out to be an problem that also password protected pages were archived. I finally gave up the idea of archiving the data in the fly and decided to use an separate crawler for archiving. Just sniffing the requested URLs from the wire was much easier but it turned out that it is even easier to extract URLs to archive from the browser history, RSS Reader and Email-Archives. URLs are then sent via XML-RPC to the archiving server where larbin downloads them and they are archived in the ARC Format.

Using HTTP-Authentication in Web Applications

Monday, October 27th, 2003

I was wondering for a long time why so few Web Applications use HTTP-Authenitcation. OK, I understand webdesigners want more control over the password input Layout and and I see the issues with “logging out” when using HTTP-Authentication but for many applications these both are not an issue. And password management in browsers for HTTP-Authentication is usually so much better. At least for me using Safari which uses Keychain for password management.

Today I tried to implement HTTP-Authentication in Webware and found out the hard way why so little Web Applications support HTTP-Authentication. The Apache Webserver deliberately tries to bar CGIs and the like from implementing HTTP-Authentication. The reasoning is that “user supplied” scripts might steal authentication credentials when the “system” is doing the authentication. Might be. But many apache deployments have no user supplied scripts at all – everything is controlled by the same entity so there is no reason not to thrust scripts with the authentication information.

You can change this behavior by setting SECURITY_HOLE_PASS_AUTHORIZATION when compiling apache.

If you can’t recompile apache you can work arround the problem by using mod_rewrite to add the missing information to the environment. For Example using Webware’s mod_webkit something like this should do:


  WKServer localhost 8086
  SetHandler webkit-handler
  RewriteEngine On
  RewriteRule /WK(.*) - [E=X-HTTP_AUTHORIZATION:%{HTTP:Authorization},PT]

But the rewrite Rule should also work with CGIs and other modules.

If you are a apache module author you should make sure you pass the Authorization to your scripting code. ap_add_common_vars(r) and ap_add_cgi_vars(r); refuse to do. So you must retrieve the Authorization header via ap_table_get(r->headers_in, "Authorization") and pass it on.

I have created a patch for Webware 0.8.1 which implements this.

On the Application server site code would look like this:

import base64
def authorized(self):
  httpAuth = self.request().environ().get('HTTP_AUTHORIZATION', \
    self.request().environ().get('X-HTTP_AUTHORIZATION'))
  if not httpAuth: return 0
  authType, auth = httpAuth.split(' ', 1)
  assert authType.lower() == 'basic', 'Only basic HTTP authentication'
  name, password = base64.decodestring(auth.strip()).split(':', 1)
  return self.authorizeUser(name, password)

[code based on Ian Bicking's]

When apache is recompiled HTTP_AUTHORIZATION is supplied. If you use the mod_rewrite or mod_webkit approach X-HTTP_AUTHORIZATION is used since apache doesn't allow it's internel variabled to be redefined.

For an overview of HTTP-Authentification with Webware and and different approach in solving the problem see the Webware Wiki.

Internet Archive Data Structures

Saturday, October 25th, 2003

The Internet Arcive stores data in well defined formats. I once have written an interface Python for it called ARCive.

Now I found some newer Documentation on the ARC, CDX and DAT file format.

Thursday, October 23rd, 2003

Clockwise:

Conversation, Macirssi, gv showing my Phd. Thesis.

Notebook keyboard

Wednesday, October 22nd, 2003

I attached a IBM PS/2 Keyboard to my Powerbook via an Ymouse PS/2-to-USB adapter. Works like a charm.

Cisco Aironet – first Impressions

Wednesday, October 22nd, 2003

I finally recieved my Cisco Aironet 350 (AIR-PCM352) WaveLAN card. My first Impression of the hard- and software is excellent. Reception is great, I now can see my neighbor’s without going to the street. The Mac OS X drivers also make a very good impression.

Webware and Unicode

Wednesday, October 22nd, 2003

To use Unicode in Webware Servlets, you can patch the Page class to handle Unicode correctlly or add a UnicodePage class which understands Unicode. Both can handle all kinds of encodings (defaulting to latin-1/utf-8) for actual output.