Placement logbook

This is the logbook I wrote during my placement at ArsDigita. It forms the biggest part of the project report. The rest of this page is autogenerated by LaTeX2HTML from the original LaTeX version (also available as PDF).

Logbook for Ed Avis

This details what I did during my six month placement at the ArsDigita London office. Inevitably any description of a programming job will be full of technical details, and it is difficult to state these concisely. This means the logbook is rather too long to read quickly. So each week has a short summary of issues which avoids details of the particular problems at hand, but explains the general principles and decisions.

As required by the markers, this logbook has been signed by my boss at the company:

Week beginning 2001-04-01

Started at ArsDigita with three other interns: Tom Ayles, Miles Barr, and Tom Fotherby. The company produces community-driven websites using its own open-source toolkit, the ArsDigita Community System or ACS. This is built in Tcl on top of AOLserver, which is a webserver designed to talk to a database--in our case Oracle.

To get up to speed with all this we are told to complete the problem sets, or psets, which are tutorial exercises. I installed a PC with AOLserver, Oracle and the ACS, and started the first pset. It's pretty standard for new employees to complete the psets before being considered ready for real work.

The first problem set introduces AOLserver and its Tcl scripting extension, as well as some basic SQL and Oracle's sqlplus (command line SQL interface) and sqlloader (bulk loading tables from data files) utilities. Fortunately, I already knew basic SQL from the course at College.

Issues this week: What is the best way to bring new developers up to speed? Learn by doing is always most effective, but is it better to make up toy projects for learning, or to start straight away on production code? I feel that creating problem sets is a good approach, particularly as they can also be followed by others outside the company (the ACS psets were taught at MIT and form the basis for the ArsDigita boot camps, where members of the public learn the basics of our toolkit over a weekend).

Week beginning 2001-04-08

Finished first pset, and started on the second. Whereas the first was an introduction to SQL, Tcl and the basic AOLserver scripting facilities, the second pset covers the ACS toolkit built on top. This is a collection of data models (predefined database tables) and Tcl library code for common things like user management, permissioning, and several packages of code for bulletin boards, ticket trackers and the like which can be added to your site easily. Some of the data model is implemented using Oracle's PL/SQL procedural language, which appears to be the unnatural child of SQL and Ada.

This problem set asks you to write a room booking system, but a fairly complex one where some rooms require confirmation and there are specially appointed administrators (`Überusers') who confirm bookings and can boot off other users.

The pset lays down a few rules on how to implement the system. I need to write a general booking system where any object can be booked for a certain period of time, but cannot be booked for two overlapping periods. Then the room booking system, with its extra features of administrators who can approve bookings and forcibly seize them from non-administrators, is implemented as another package on top of that.

The intention is that you follow the same standards as would be used when developing production packages at ArsDigita--a requirements document, a short spec, and then documentation for all the code. This seems like overkill for a relatively simple project, but it does force you to think about what is needed. At least half of my specification is taken up with error cases, mostly the race conditions that are inevitable whenever humans need to make and approve requests. (For example, what happens if a user makes a room booking that requires approval, but before it is approved another booking request for the same time period is made? The administrator can approve at most one of these requests, but not both.)

Away on Friday.

Issues this week: Is it better to define objects and methods at a low level, as part of the database itself, or in the scripting language used in the web server? Oracle's PL/SQL language is clunky, but it does seem attractive to code `close to the metal' and have the program type-checked by a compiler which knows about the database schema. My main concern is lack of portability to other RDBMSes.

Week beginning 2001-04-15

Away on Monday and Tuesday. Continued on second pset, a system to book rooms and ensure that bookings cannot clash.

As expected from writing the spec, the code is fairly simple apart from all the things that could go wrong. It is an invariant that an object is never booked for a clashing period; this is enforced by the Oracle PL/SQL package that manages the data model (providing methods for `new booking', `cancel booking' and so on). The room booking system on top of that has its own invariants: it is possible to have two clashing bookings if they are both waiting for approval (the administrator will approve at most one), but if an approved booking has been made it cannot clash with anything else.

Generalizing out the booking system into a base package sounds like a good idea, but it actually makes life harder. Because room bookings require approval (called pending bookings) and two pending bookings might clash, it's not possible to use the base package to keep track of pending bookings. I had to implement storing those separately. And then the invariant that a room cannot have a pending and a firm booking for the same time period has to check across two different tables and two different packages. It would have been simpler and more maintainable not to factor out the idea of a booking; unfortunately this was required by the problem set. I could also have made life easier for myself by not being so flexible with pending bookings when I wrote the spec. Oh well, at least I got plenty of practice with making packages and writing SQL, which is the point of the exercise.

Wrote a rather nice admin page that presents a list of pending room booking requests and lets you choose which to approve or deny. If two bookings clash and the admin tries to approve both of them, then the page comes back highlighting where the clash is.

Issues this week: It is probably a bad idea to give the design as part of the specification. A spec should say what is needed, not how to implement it. I can appreciate that design hints are useful in tutorial material but I would baulk at anything like this in real work. The idea of writing code to be reusable is a non-issue: the extra complexity caused by the supposedly reusable design is so much that it would have been quicker to write a simple version first and rewrite if necessary later.

Week beginning 2001-04-22

Although we haven't completely finished every possible problem set (in particular, the code for the second one hasn't been reviewed by anyone), there are projects starting and the four interns got assigned to stuff. Tom F. and I are working on a project for HTA, a firm of architects. [Note: in all my time working on the project I never found out what HTA stood for. In fact, it didn't occur to me that it might stand for anything.]

HTA specialize in consultations and gathering opinions. For large public building projects (a new housing development or redevelopment) it is important to get the public's opinion on the existing area and how it should be improved. Their business is partly in performing these consultations as a service for other firms of architects or housing associations. (They probably do other stuff too, but this is the area of the company I know about.)

They would like to do some of this via the web: to create consultation websites which they can then include as part of their services. It is not certain whether we'll get the contract, but we'll knock up a quick demonstration of some important features. In a way this is the archetype of a traditional ACS site: registered users who post content and form a community. So a lot of the functionality should be out-of-the-box with existing packages like bulletin boards.

One new feature is a neighbourhood map where you can click and view details of the new development, then post comments on them. This requires a zoomable imagemap. I start work on an interface to set an image and point and click to choose points on it, which in turn link to other images.

An imagemap is a picture on a web page where certain areas are clickable and link to other pages. Making one is easy, but it is an interesting challenge trying to create an administration interface. We would like the site administrator to choose what areas of the image should link to other pages, and this must be done on a web page. It would be far too complex to support rectangles and triangles for live areas without resorting to something like client-side Java. The old server-side imagemaps used to support choosing the nearest point to where the mouse was clicked, so you define a few points on the map and each will get its own area. That is ideal for us, because you can define these points with a single mouse click. But client-side imagemaps dropped that support. They do however have circle-shaped live areas, so I wrote a nearest-point simulator which given some points creates a circular live area around each one, which varies in size according to how near the other points are. It seems to work well.

We have some example paper documents about one development project, the Oakridge project, which gives us the general idea of what images are needed (maps of an area, development plans, artists' impressions of new streets).

Issues this week: Most of the work in a project like this is not building a single site, but a general interface to let others create sites. In other words the administration interface is 80% of the work. It is very easy to create a demonstration site which looks good and might even work, but then it takes far longer to do the job properly and provide the amount of customization required.

Week beginning 2001-04-29

Met the people from HTA this week, we are going ahead with the project but on a time and materials basis, since they don't know whether they'll be able to sell the `product' to their customers. We are quite a long way up (or down?) the food chain. They would like to get involved in building houses to order, where the prospective inhabitant gets some choice over how many bedrooms are built, what style of kitchen, and so on. They give us some drawings which represent the different styles of house and I start making a house customizer page where you can mix and match different front doors, roofs, and paint colours.

Many of the pages on this site require images, and these images must themselves be changeable by the administrator. I wrote a simple data model to map an image ID to a URL. We aren't storing the images themselves in the database, we serve the images as static files and just store the URL. You can even use images served from a different site (so-called bandwidth theft :-)).

Issues this week: To what extent can you rely on a URL as a published interface? Is it a sensible design to store the URLs of things like images and rely on them being served reliably? Perhaps not, but I do feel that in a web application the user should be able to enter a URL in any place where a filename might be used.

Week beginning 2001-05-06

Another meeting with HTA. We start making a better-looking demo by adding more real housing-related images--previously I had put together pages for testing by picking irrelevant but colourful images from http://photo.net/.

The point of the site is consultation, and a big part of that is comments. The ACS has a package called general comments which allows any object in the database to have comments attached to it, which users can post and view. (An object means a row in the acs_objects table; when we make our own tables there is usually a foreign key back to acs_objects for each row, as a crude form of inheritance.)

However, at this stage we don't really have objects, we only have web pages. For example there is a single page showing the possible development scenarios, and this single page doesn't correspond to any row in a database table. We get round the problem by making a web_pages table which lets us have an object for each URL; we can then attach comments and permissions to that.

We've also started setting up the other ACS packages, like bulletin boards and chat rooms. Particularly interesting is the Simple Survey package, because HTA do a lot of surveys and the questions they ask could easily be put on the web. The package supports only yes/no, multiple choice and free-text questions, but that covers almost everything you'd want. Unfortunately it was written for the older ACS version 3 and hasn't been fully ported to the new version (changes to the Oracle data model are the most troublesome). Fortunately the chap who did some of the porting work is downstairs, so I can ask him if I run into trouble.

Issues this week: Is it sensible in a database-backed site to have content pages which are just sitting there in the filesystem? Or should every item of content be served from the DB? We do use a templating system where the HTML is generated from an ADP page (AOLserver's equivalent of ASP or JSP), so that's one example already of content not from the database.

Week beginning 2001-05-13

HTA would like to upload chunks of text to the site, policy documents and so on. They could do this by creating new template pages, since the ADP format is close to ordinary HTML and intended for a designer (rather than a programmer) to edit. But they don't even have a designer or webmaster, so something even less technical is needed.

I created a very simple page which stores some text in the database, allows you to upload a file to replace it, and serves it to ordinary users visiting the page. This is the third thing we've done that has required a mapping between URLs and database objects. [Note: with hindsight I realize this was an indication we were doing things in slightly the wrong way. We later switched to making these editable pages a package which can be instantiated in as many places as desired. Adding a package_id row to each table and query stops the instances of the package interfering with each other.]

They'd also like to put PDF documents directly on the site, and in principle serving a PDF is no more difficult that serving a page of HTML, so I included support for that with file uploads.

This sounds simple, and it should be. But to store text longer than four kilobytes, you can't use a normal Oracle string, you have to use a LOB (large object), which fall into two types, blob and clob. The SQL parser has hardcoded limits on the length of data you can insert, so there is a special syntax to update LOB columns. That's not dreadfully bad in itself, merely a nuisance, but I can't get it to work.

Eventually tracked down some of the problems to a badly configured character set. Oracle had been configured for ASCII but we thought (and the web server thought) things were in UTF-8 (Unicode). That accounts for the strange happenings when high-bit-set characters (often generated by Mac word processors) were entered; but there is still a mystifying problem where PDFs are getting truncated after a few kilobytes.

Still, it works fairly well for plain text, and I was able to generalize the uploading and page serving into some library routines which can be used elsewhere. At the moment this means the meeting minutes pages, which also involve uploading big chunks of text.

Uploading HTML also works, although I still have to deal with the problem of taking an existing complete document and trying to slap another header and footer on top. Probably some kludge to extract just the body of the HTML will be needed.

Before now we were developing the site on our local machines, which we had working Oracle installations on because of doing the problem sets. (The source was under version control in a single CVS repository, so we would edit and test locally, then commit the changes.) But the office doesn't have a real (non-masqueraded) network connection, so any site on these desktop PCs is not visible outside unless you fiddle with tunnelling. This week we moved onto a Solaris box hosted at a colocation provider. It is much slower than my klone PC running Linux.

Issues this week: Why is the DBMS mucking around with the content of LOBs to change the character set? It should give back the exact same sequence of bytes it was given. If character set translation is needed it can be done in the client libraries, if explicitly asked for. I would no more expect Oracle to change the character encoding of text than to convert an image file from GIF format to PNG. At the very least the default should be UTF-8 and not ASCII: this is one instance of the general rule that every default setting in Oracle is wrong.

Week beginning 2001-05-20

We are fixing bugs in the HTA site and tarting it up by changing the master template. All programmers believe that they have better taste in such things than graphic designers; I added a pleasingly dull dark grey bar with HTA's logo across the top of each page.

Also did some more work on the uploading text, so you can type directly into a web form to edit the existing text rather than uploading a file each time. You can also download the existing content as a file so you can edit it locally and then upload. It's looking quite nice.

Apart from that we spent the whole week tidying, fixing bugs and improving the appearance of the site. It's surprising, but it does seem to actually do something useful. I'm still sceptical about whether anyone would bother using it.

The uploading and downloading of PDFs is still broken. They still get truncated after a few kilobytes. I've been trying to debug this by putting in assertions, using different SQL syntax to do the updating and retrieving, but without success.

(We've started using the ArsDigita ticket tracker to report bugs, assign them to people, and see which are still outstanding. The ticket tracker is itself an ACS package, and HTA have expressed some interest in adapting it for housing-related things like `please fix my broken cold tap in the kitchen'.)

Issues this week: What is the best way to develop database-backed applications as a team? Should we work independently, with individual webservers and instances of Oracle? That makes for the least interference between developers, but makes it difficult to keep the data models in sync. At least it makes sure that the data model creation scripts get well tested. Or should we share a single webserver and database? That helps us work in sync but increases the scope for treading on each others' toes.

Week beginning 2001-05-27

The HTA demo to their customers is on Friday. I have been tidying up the site and adding features requested. Partly this is æsthetic improvements--wrapping each page in a master template, improving the text and so on. But also there are features and fixes that needed to be done before the demo. I got Simple Survey working a bit more, and rewrote Team Details. The text uploading needed a few bugfixes to select the body from HTML.

Finally tracked down the problems with blob uploading; an interaction of character set mismatches, truncation at the first null byte when using select, and a bug in db_write_blob (bind variables don't work) was stopping it working before. Now I use db_write_blob with string interpolation, and that seems okay. Now that blobs are working, I've rewritten the picture storage code so that you can actually upload pictures to the server. The interface is now quite slick--from the imagemap page you can choose a picture, and go back to the imagemap form to enter the rest of the data. In fact you can go imagemap add new clickable area choose imagemap choose picture upload new file, click OK, and all the necessary steps will happen and you'll be back at the original page with the file added as the target of a clickable area.

Unfortunately I haven't had any luck getting the acs-templating form manager to work with file uploads; had to write that stuff more or less from scratch.

Issues this week: It is disheartening to find bugs in the core toolkit and report them without any seeming effect. Even if a piece of software is reaching the end of its development, there needs to be someone in charge of maintenance and bugfixes, for morale if nothing else.

It would be so much better if Oracle didn't make arbitrary distinctions between short and long strings (varchar and blob types). There should be a uniform interface for both without arbitrary limits on length. If large objects need to be treated differently for efficiency reasons, I wouldn't mind giving the DBMS a hint when I set up the table that one column is likely to hold large values. But there is no reason for a completely different interface, with parallel versions of length, substring and so on.

Week beginning 2001-06-03

HTA didn't pay us for this week, but nonetheless we added two extra developers (Simon and Sarah). Tom and I spent the week packaging the site; I split up code into modules:

Tom and I made the HTA site itself into a package, so each new development project can create a new instance. This involved adding package_id to lots of table definitions and queries.

Issues this week: Packaging code which was previously lumped together in a big bucket is not much fun. It is probably better to start work on packaging and deployment early on in the project (although to be fair, we didn't know quite how much things like pictures would end up being generalized).

Week beginning 2001-06-10

Monday: network broken for lots of the day, spent time fixing tickets, particularly removing occurrences of hardcoded /hta/. Basically spent the whole week fixing tickets and making sure the packaging works.

Issues this week: Again, I wish we'd thought more about packaging when originally writing the code, although that was difficult while continually focused on getting a working demo of the latest feature as soon as possible. It would be really good to have an automated tool which checks for hardcoded filenames or URLs. This could be quite hard in Tcl though, since the language makes no distinction between strings and code.

Week beginning 2001-06-17

Rory's demo on Wednesday; fixing up the site for that (lots of cosmetic and usability fixes). Made sure that you can create multiple instances of our site, each with its own chat room, etc. Just spent all week fixing tickets, basically.

Issues this week: We have tried to modularize the code so you can install bits independently. And we've made use of existing packages like chat, again installable on their own. But our customer wants all of these together as a unit, and several instances of this super-package. It's a pity there is no support in the ACS for installing and mounting a predefined collection of several packages.

Week beginning 2001-06-24

More ticket fixing; we're coming to the end now and we have to make sure the site is packaged up and ready to use. Added a page to upload newsletters (basically a clone of minutes: again you have a date and some associated content).

Issues this week: When you are running out of time at the end of a project, you don't always have the opportunity to do things in the best way. I ended up cutting and pasting the minutes code into newsletters, when really I should have factored out the common code. But what is the alternative?

Week beginning 2001-07-01

On Monday and Tuesday I worked on HTA. Having fixed almost all the tickets, I looked at Simon's calendar code and adapted it into a general replacement for the Page package. This meant learning how to use the content repository and create new content items and revisions. Now almost all of the uploaded text on the site supports multiple revisions and we have a good replacement for the ACS standard Page package--which I have yet to commit back to the tree.

In the second half of the week I started on APLAWS, the local authority website project (we are dealing with Camden). I do not remember what APLAWS stands for, but local government is fond of long acronyms. The first job was to get the still-under-development ACS 4.5 installed and running; after a couple of days' futzing with Tomcat, CLASSPATH and configuration files I got a server with a login page.

ACS 4.5 is based on Java Servlets using XSLT to generate the output pages. The architecture is documented at http://developer.arsdigita.com/acs-java/acs-core/doc/architecture.html. I need to learn how to use each of the layers and tie them together. XSL is a scary-looking language to extract data from an XML document and write it in a different order to create a new XML document.

The data storage (RDMBS) layer is based on Oracle or another relational database as before. The `relational' part seems to be de-emphasized a bit in this release, because the queries themselves are abstracted away into the persistence layer. This provides Java methods to `get' an object from the database--in other words select a row from a table, `set' the object's attributes, and then `store' the object by updating the database row. So in the simple case changing one row now requires two database calls rather than one. I am told that this won't be a problem, because the majority of database activity is reading and not updating.

To do relational stuff like joins and unions, you no longer write SQL as part of the application code. Instead you write some more code for the persistence layer, definining the SQL query you want and giving it a name. This is a bit like creating a view within the database itself. For selecting several objects, in other words more than one row from a table or query, you get some kind of result set object and then ask for its contents. At this stage I don't know all the details. To select only some of the objects--i.e. an SQL where clause--you pass an extra parameter which is a boolean expression deciding which rows, er I mean objects, to select. This parameter is a string and it has its own syntax; it gets parsed and converted to a where clause.

I haven't yet looked into the application logic layer. It seems fairly straightforward: just call methods on the data objects to get the values from the database. It looks like there is some mechanism to have persistent Java objects from one page view to the next, which aren't stored in the database but are used for things like session tracking.

To do presentation, you don't generate HTML directly. Instead, you generate the data in XML and then write one or more XSLT stylesheets to transform it. I won't document XSLT in full here, but basically you write an HTML element such as table, inside which you have xsl:value-of elements and/or other HTML elements. You can also add XSLT elements to set the attributes of title, such as class, bgcolor and so on.

The XML itself is not generated directly either. There is a toolkit called Bebop which was apparently designed as `Swing for web pages'. This means that you create Bebop components for text, form elements, buttons, links and so on, and embed them inside other Bebop components for layout. It's not clear how this fits together with the traditional model of fetching some data and letting the graphic designer write a template to determine the layout.

Since the code is not finished yet, and tutorials are not written, it could be a bit difficult to learn it all. Fortunately an example application called Notes is already written, and I can copy bits of that.

Issues this week: Is it wise to switch suddenly from coding close to the database to using an object-relational mapping layer? Similarly, should we be putting in extra layers (at least two) between the Java code and the HTML generated?

Week beginning 2001-07-08

Only in on Monday this week, the rest was holiday. I continued looking at ACS Java; my server is finally up and running and I can play with the Notes application.

The APLAWS site will centre around content management. The content management system or CMS is a way for users to upload text, images or other content and edit it. There are provisions for requiring approval by an editor or publisher before a content item goes live on the site, and for archiving it after a given time period. The CMS is an ACS Java package; like the rest of the software it is not finished yet, but CMS seems to be in a more finished state than many of the other components.

My task will be categorization of content: each content item stored in the database belongs to one or more categories. These categories are arranged in a tree with the more general categories at the top--so a category School might have children `School terms' and `School dinners'.

We want to create a general view of all content items belonging to a particular category, and mount this at a single place in the directory tree. A requirement from the customer is that a single well-known URL should work to find the same kind of information in sites from different councils, so we'll provide an easy way to view all information relating to a particular category (or categories). This may also turn out to be useful for exchanging information between servers; we want users to be able to go to their local council's site but use it to view selected information from other councils (e.g. if you live in one borough and work in another). We will probably do this via HTTP page serving, since that is the simplest solution.

Issues this week: The idea of a category for each content item is powerful, but there are several decisions to be made. Should categories be arranged as a tree, as a directed graph, or without any structure? Can an item belong to more than one category? Is it a sensible user interface to browse through the categories, and to what extent should this replace the traditional hierarchical navigation?

Week beginning 2001-07-15

Monday: project meeting explaining Extreme Programming (XP). I will be pair programming with Miles creating a content section. This is an instance of the CMS package, but we need to write code for the Initializer to create content types and provide a way to display them. I think.

I have to catch up with the bulletin boards (bboards) where people discuss the ACS Java framework and post questions. I've also been assigned to educate the rest of the team about Javadoc.

Tuesday: finished researching Javadoc, tried to make a new content section with Tom. I wrote a handout for my Javadoc presentation using LATEX.

Wednesday: continued work on making a new content section. It seems that to mount something in the URL hierarchy, you have to modify the CMSDispatcher class. This takes a URL and maps it to a Page object. The way it works is to look for a `resource' matching the URL, if that isn't found it tries to use a class called ItemResolver to turn the URL into a content item (URLs of the form class/oid, and there is a table mapping classes in the URL to the Java class used to display a page). What we must do is add a new Resource at the URL we want, specifying the name of a new Java class. We copied an existing Bebop page to be a new class called AllCategories, and added it to the resources.

Unfortunately, the new resources did not get added because they are done by the installer for the content section, and this gets run only once. We modified the initializer code to run the installer every time, whether or not the content section was already installed. We also had to make the installer more robust so that it wouldn't fail completely when things were already present in the database, just catch the error and move on to the next thing.

So now it is possible to add new lines to the CMS's Installer class mounting a Page object at a particular location, restart the server and they will start working. Obviously in the long run there should be a neater way of doing this, perhaps even a web interface for developers to test new classes by temporarily mounting them at a URL, but since we're working with a prerelease of the ACS code we have to write this by hand.

Gave my Javadoc presentation, but really everything relevant was on the handout. We discussed one issue--should the @author tag be filled in? Really this is most useful when code is being distributed far and wide, to people who don't know the original author but might need to contact him or her. Among a tight-knit group it is not so useful, particularly when we want to have common ownership of all code without exception.

We decided that the @author tag will be added, but only of the first author--subsequent changes by another person don't require adding a new entry to the list of authors. I feel this is pretty useless, but anyway it's not a big deal.

Unfortunately we've been having more version control problems; we're moving away from getting ACS Core straight from Perforce version control and switching instead to the relatively stable weekly builds made available by the core team. Tom and I spent the rest of the day trying to get things to build.

Thursday: Spent the morning trying to get my ACS to build and fixing CVS, the version control system, to remove any vestiges of the old, pre-weekly-build core code. After getting the code to build, it would not run, because of `invalid column' errors. This indicates a change in the data model.

Friday: Diffing the old and new .sql files, which set up the data model. I hope to find the changes and patch them in by hand using the sqlplus command-line SQL tool.

In some cases I need to drop and recreate a table. Doesn't sound so hard, but Oracle won't let me because the table is referenced by foreign key constraints (even though none of the constraints are actually doing anything, since the table is empty). I cannot persuade Oracle to let me drop the table and then recreate it immediately afterwards. There needs to be a way to make data model changes atomically--in effect checking that the constraints are still satisified after you've finished making the changes, not before each change. Unfortunately with Oracle things like creating tables happen immediately, they are not transactioned or rollbackable like changes to the data within tables.

Waiting for the sysadmin to get here so I can find out the database password and disable constraints by hand. Of course integrity constraints are very useful, but it's a serious design flaw that you can't defer checking them or temporarily suspend them for database maintenance, not even on your own tables.

When the sysadmin eventually got here, I created a new tablespace and loaded the new data model. Then I worked on fixing our content-section-creating code, which still doesn't work. I need a working content section so I can add the category-listing page to it.

Eventually got things running and, for the first time ever, managed to create a page with some text on it. The next task is to make it fetch some data (a list of categories) and display it--but I'll do that next week.

Issues this week: There is a lot to be said for keep it simple, which is one of the tenets of XP. Not everyone writing ACS Java seems to follow this advice, it appears to suffer from second-system syndrome. Or maybe I am just too stupid to understand it.

Like assertions, database integrity constraints are a useful safety feature to catch logic errors, but there needs to be a way to disable them when it's convenient for debugging. The human should be in charge, not the computer. At the very least Oracle should suspend constraint checking until the transaction is committed--if they fail then, the transaction can be rolled back and integrity is preserved.

Week beginning 2001-07-22

Monday: working with Simon to extend the all-categories page to do something useful. We've broken down the task into small subtasks, and the next is to display the categories as a tree rather than as a flat list. Fortunately, there is already a Bebop component to display trees. We looked at the data model for categorization and tried to figure out how to turn it into a TreeModel to feed to the component, before realizing that there's already a CategoryTree class to do this stuff.

Unfortunately, CategoryTree doesn't work properly. I tried to debug it by calling its method getTree() to return the TreeModel object, which I could then examine to check that it contains the necessary data. However it's not possible to get the TreeModel without having a PageState object, which isn't available. Apparently this is for permissioning reasons, but there needs to be a way to get the object anyway. Encapsulation is clearly useful, but forced encapsulation with no way to work around it can be a real nuisance. I much prefer the convention of scripting languages that using published interfaces, as with any other good coding practice, is the responsibility of the programmer. When debugging and developing there needs to be a way to override Java's access control and find out about the insides of an object you're using, because development code can never be a black box.

Anyway, the reason I can't get a PageState is because of the three stages ACS Java uses to generate a page. First you instantiate Bebop components and add them to the page. Then once the page layout is finished (`locked'), it generates XML for the page content from each Bebop component in turn. Finally, an XSLT transformation turns this into HTML for the browser.

The code I'm writing is in the first stage--assembling together the components. The PageState information is not available until the second stage. I could put my debugging code in there, inside the code that generates XML, but at the moment I'm not experienced enough to know how to do that. In the end I settled for putting some debugging code into the CategoryTree itself, to print information to the server's standard output.

Tuesday: spent most of today installing a webserver for HTA. The biggest headache is getting Oracle on there. In the past I have installed Oracle by stopping the server and physically copying all the files to the new machine. This sounds too simple to work, but it is the way many offline backups are implemented, and they're considered reliable. Unfortunately in this case I'm installing Oracle onto a machine with a different C library version. Although the same Oracle binary distribution works for several versions of Linux with several versions of the standard C library, the installer relinks the executables at install time to adapt them to whatever C library is on the system. This means that you cannot upgrade your C library without installing Oracle, and you cannot move Oracle to a machine with a different version of glibc. I don't understand why Oracle had to do this, or if they do really need to, why it couldn't be done when the server starts rather than at install time. It seems they are determined to make installation as difficult as possible.

Anyway, Mark the sysadmin is kindly installing Oracle for me, and he'll have it done by tomorrow. Then I can install AOLserver, ACS Tcl, and get the HTA code from CVS. Hopefully it should all run smoothly, but I can't do anything until Oracle is on there.

Wednesday: still no Oracle on HTA's box. Turns out that the Oracle installer needs the X Window system to run, but because it's intended as a server, X was not installed on the Cobalt box! Mark is installing X and then Oracle.

In the meantime I continue working on the tree view of categories. I've established that the model (as in model, view, controller) of the tree data is okay, it's only the display which is going wrong. Looking at the debugging version of the page shows that this is because the XML generated has the attribute collapsed="t" when it should have expanded="t" for an expanded tree. I tried to fix this in the source, but mysteriously nothing happened.

After a bit of headscratching I realized that this is because the source code we have of ACS core doesn't actually build--it's just to look at. In order to actually make changes, I'd need to run javac and update the .jar file (a collection of compiled Java classes) by hand. A further problem is that the Bebop build we're using is actually from the CMS (Content Management) team, because they made some local changes themselves. This means that even if I did get the ACS-core source to compile, it would be no use, because it's likely to be out of sync with the version of Bebop we're actually using.

Since last week there has been a new build of core and a new build of CMS--both in a fresh new CVS repository. I'll switch over to that and hopefully be able to make the momentous breakthrough of changing a source file and having it compile and run. Hmm... it doesn't work. What a surprise. Time to spend a few more hours trying to get the server up and running.

(In the meantime Oracle has been installed but with the wrong character set--the same problem I had before. More fun fixing it to use UTF-8.)

OK, after some fiddling with the configuration file I got the new ACS Java checkout up and running. Now to merge in the code changes I had made previously. Another problem: now not only the ACS core, but also the CMS is from a weekly binary distribution. This means I can't modify CMS and recompile. Since the page I was developing (list of all categories in a particular content section) was an extension to CMS functionality, and it needs to be implemented as part of a content section, and it gets placed into the site map by adding code to the CMS initializer, this is very bad. I would like to implement my code as something separate from the CMS team's distribution, but I don't know if this is possible. I'll contact the people making the weekly distributions and ask if it's possible to get a buildable source tree corresponding to each build--with decent version control this should be easy.

Thursday: Received instructions on how to check out the formal builds' source from Perforce. Managed to set that up and check out a source tree. I have yet to build it.

Spent most of the day setting up HTA's box. I thought we would be able to use a dump of the old tablespace, but it generates errors on loading into Oracle (`invalid number of columns'), and this is in ACS stuff rather than in any code we wrote. The resulting installation doesn't work, so I install a new tablespace and a new ACS 4.2 Tcl installation. Then Mark told me that the instructions on ArsDigita's website about setting up Oracle are completely wrong, so I set up a third tablespace and install the ACS again. Eventually I got a working but blank ACS installation.

Now the task is to check out the source tree and install it. It's necessary to run the packages through the Package Manager's installer in order to convince the ACS that it has a package, mount it in the site map, load its data model and so on.

To do this I need to generate package tarballs from a running installation, which we have on the development server. But before doing that, I need to tidy up the dev tree--there are still a few changes (and whole files) that haven't been committed to version control.

I generated the tarballs, copied them across to the new server and installed them. Then used the deployment script we wrote to make some new instances of the HTA site (this needed some additional modules like bboard).

Friday: continuing to set up HTA. Installed the service packages like workflow, simple-pics and imgmaps, and got the deployment script working. But ImageMagick needs to be installed. It has lots of dependencies (various libraries for different image file formats), none of which is provided by the stripped-down Linux installed on HTA's box. Now I know what Mark felt like when installing ImageMagick on Solaris. After some futzing with compiling a newer rpm package (which itself required a newer version of the RPM package manager, chicken and egg, which meant I had to install a binary from Red Hat, which didn't work, so I had to downgrade again, etc...) I rebuilt all the libraries and ImageMagick itself from their source packages and installed them.

Saturday: finished off HTA, by committing yesterday's bugfixes to version control. We will move office on Monday, and this week I've been organizing a removal firm and doing (a little) packing. But that's not something that needs detailed coverage in a logbook.

Issues this week: Metrics are always fallible, but maybe there should be a commonly used metric that stipulates that any class cannot depend on more than two other classes. I have spent too long chasing the trail of `to get an instance of X you need an instance of Y, and to get one of those call a method on an instance of Z, ...'.

Week beginning 2001-07-31

Monday: moving office, the network is not functional, did nothing.

Tuesday: HTA are not entirely happy with the site I gave them, because it didn't include any of the old data (I told them to enter it from scratch themselves). That was because the Oracle import of the old tablespace failed mysteriously, and even the sysadmin Mark didn't understand what the problem was. Rather than try to debug it I had tried to make it into a virtue by insisting that the customer should fulfil their responsibility and enter data themselves. We write the code, they enter the content. However it does look a bit silly to lose everything when moving from one server to another, so it would be better to restore the old tablespace.

It turns out that the bits which failed to import are unused views and triggers from an old version of the project calendar page, based on CMS, which isn't used any more. (I generalized it and merged it with the `upload some text' pages.) So we did another import, just dropping these views, and it worked.

Now to get back to where I was on APLAWS: debugging the Bebop Tree component to find out why it refuses to expand. I think I will sync to the latest formal build first.

Wednesday: Change of plan: since the tree component is not an essential requirement--a first step I wanted to do in order to familiarize myself with categories--I won't bother with it. Instead, I'll go straight on to writing the category dispatcher. This is something which takes a URL and maps it onto the category tree: so a URL of /Housing/Flats/ would display all content items in the category Flats, which is a subcategory of Housing.

First we need to sort out the CVS; first try to get build 13 working. That didn't go too well, but the ACS Java 4.6 release was made recently, so we go with that. Managed to get a server up and running, now to start handling the category-type URLs.

The way to do this is to write a new dispatcher class and tell the servlet container (Tomcat) to mount this at the root of the site, /. Basically we want category URLs to be mixed in together with the URLs for mounted packages and content sections--there could be namespace collisions between a category and a package with the same name (as is already the case with Housing), but never mind.

We can make a derived class of the existing root Dispatcher, called CategoryDispatcher, which overrides the dispatch() method so that it first checks whether the current URL matches something in the category tree. The algorithm is simple:

While there is some URL left:

If every element of the URL matches a category at that point in the tree (so the remaining string is empty), then display a list of all content items in that category. Not sure yet how this will be done.

Pair programming with Ash, we managed to create a new dispatcher class and make the server use it. So far all it does is a few System.out.println() statements.

Thursday: continuing work on the category dispatcher. I need to find the root category for the whole site, but I'm not sure if there is such a thing. All the code I've seen suggests that there is a root category for each content section, but we have several content sections.

We discussed how to navigate through the content items on the site. There is already a hierarchy of content folders, which can contain other folders or content items. Permissioning, workflow (what happens to an article before it gets published, like whether it requires approval) and lifecycle (what happens after the article is published, like archiving and expiry) are managed on the basis of content folders. So it's like a conventional directory structure.

But we also want trees of categories, which arrange items according to their content. For readers of the site (rather than those creating content), this is a more useful view. We want the navigational aids on the site to show the category hierarchy. But because an item can belong to more than one category, and because there's no constraint that items in the same category have to be differently named (unlike content folders, which enforce giving different names to two items in the same folder), it is difficult to uniquely reference a content item (from its URL) simply by listing categories and its name. We decide that where this is ambiguous, we'll append the globally-unique database object identifier to the name, to make sure that the URL can refer unambiguously to this item.

We need the categories to appear in the URL so that the resulting page knows what navigational aid to display--the navigation stuff will vary depending on how the user reached that particular item, and because an item can appear in several different categories, there are several different possible navigation trees. So the cleanest thing is to put the whole tree in the URL, and this is required anyway in order to have standard URLs for particular sets of content.

We considered doing away with content folders altogether, and just using categories for everything, but this is problematic because we wouldn't want the internal administrative details of permissions and lifecycle (which are managed with content folders) to appear to the public as categories. We would need to have two kinds of category, publicly visible ones and private ones. Also, the team working on content folders is in Munich, while the categorization people are working in Boston, Massachusetts. It could be difficult to persuade one group or the other to give up their scheme in favour of a single unified way of organizing content items.

Anyway, I still need to make a dispatcher which maps a URL to a location in the category tree. The first thing I need to find is the root category, or the root of the category tree. It's not clear how to do that. In the end we selected all categories and picked the one that happens to be named `root'. This should work reasonably well for now. We updated the Tomcat configuration to start using our new dispatcher class for all URLs (in other words, mounting it at /) and it gets called on every request. It's a simple string parsing to check whether this request matches a child of the current root category; if it does, we move down the category tree and chop off the first section of the URL.

Friday: there are some problems with getting a page served from the dispatcher (runtime casting exceptions in the core code). This turns out to be because we're not a site node but instead we are trying to serve pages straight from the root dispatcher, which means that the necessary `site node context' isn't set up. Until we find out how to work around this, I'll mount the dispatcher at /categories/ as a proper site node. This involves writing an initializer class which will add it at that point.

Managed to get a display of all CategorizedObjects in a particular category. The problem now is working out what methods to call to get meaningful information about them and making a link to the URLs they're actually published at.

Issues this week: What role does the filesystem layout have to play in determining the layout of a website? In ACS Tcl you would create an ADP template and corresponding Tcl file, and it would appear in the site with a URL corresponding to the directory structure. In ACS Java the way to add new pages is to write code extending the dispatcher. I can't help feeling that the filesystem is even today the more obvious and bulletproof way of doing things like this; it makes it clear where the code for a particular URL resides and should scale better to large numbers of developers (each can work on a particular page in its own file, there is no central file that needs editing for each change to the URL tree). People have been predicting the end of source trees arranged as individual files for a long while now, but even most integrated development environments still work with an old-fashioned source tree. It's certainly possible to make a hierarchical structure without letting it map to the filesystem, but it seems difficult to do it well. (As we have started to learn already with the category tree!)

Week beginning 2001-08-05

Monday: added some more stuff to the CategoryDispatcher: links to parent and child categories, and links to display each item. Although at present the links to display an item aren't working, because I don't know how to do that. I should look at a content section, find a page displaying some content, and figure out how it works. Although since at the moment we don't have a working content section that actually displays things, this could be difficult.

It isn't possible to display a content item without having a content section, because in order for the CMS components to generate XML, they need to call CMSDispatcher.getXMLGenerator() with the current content section as a parameter. It seems that the CMS intended way of serving items based on category is to write an ItemResolver for a content section. But that would be tied in to one particular section, and what we want is something applying to all of them. Filed a bug report in the SDM (Software Development Manager, a replacement for the old ticket tracker).

Tuesday: to work around the NullPointerException in CMS caused by not having a current content section, I need to find some other way of displaying the item. Because the source code we have for core and CMS doesn't actually compile, I need to work around the problem before calling off into code I can't patch.

I try creating a ContentItemPage directly and calling its service() method. That also dies with a NullPointerException in CMS, but for a different reason. Something to do with an ACSObjectSelectionModel (a class which keeps track of which of a set of objects is selected) returning null, meaning that nothing is selected, but this null return value is not checked for by the CMS code, so it falls over and dies. Filed another couple of bug reports.

(This is a coding standards issue: the API should clearly document whether a parameter may be null. Java's type system allows you to declare the types of parameters to a method, but doesn't let you say whether or not null would be an acceptable value. So you have to specify that as part of the documentation--maybe as @pre preconditions, which at present are not automatically checked. Similarly, it should be documented whether the return value from a method may be null, so the caller knows to check it before trying to call methods on it. Finally, a return value of null should not be used to signal an exceptional condition; Java's exceptions should be used instead.)

Wednesday: still no response to any of the SDM bug reports, so I'll have to find a third way of displaying this item. I think the best course of action might be to reimplement ContentItemPage, or at least make a derived class, and override the service() method. Then I'll have control over what gets called--although again, once I do call into core or CMS the thread of execution is out of my hands and it might do the wrong thing. The NullPointerExceptions I'm trying to avoid happen several levels deep in the call stack, so I might end up reimplementing quite a lot of classes in order to avoid them. But it looks doable.

I come across a problem with Java's private access specifier: it makes it very hard to write derived classes without cutting and pasting large chunks of code. The derived class cannot set private variables or call private methods in the base class, so if you're trying to override a method which does this, you're stuck--even if the method being overridden is public. The only option is to create new variables and methods in the derived class, but changes to these values won't be picked up by code in the base class, so you end up having to override every single public method to make just one change.

I suggested using protected instead, which allows users some flexibility to make derived classes which use most of the code from the base class, but the official position is that protected methods are considered part of the published API, so changing something from private to protected means you have to officially support it.

While it is a good thing to clearly define the boundary between interface and implementation, it's a bad idea to enforce this with the compiler even when the programmer wishes otherwise. People, not computers, are in the best position to decide whether to take the risk of using implementation-dependent code. While you might not want to do that for production code, it is certainly useful for debugging.

In the end I had to cut and paste most of the code from the base class in order to write a derived class that overrides one method.

Thursday: continuing with trying to display an Article. Ran into another bondage-and-discipline problem: the locking of Bebop pages. The ContentItemPage class calls lock() on itself in the constructor, so once it is built you cannot change it--not even to write a derived class which adds some small extra feature to the page.

Friday: A working solution was to override the getContentSection() method to return a dummy content section (a fresh instance of SimpleContentSection) which could be used to display the page. This trick worked for both ContentItemPage (the admin interface to a content item) and MasterPage (a simple display of the item).

Got some responses to my bug reports from the CMS team: they suggest calling the ContentSection.getContentSection() static method to find out the content section an item comes from. I wish I had known this earlier.

With that out of the way, continuing to tidy up the pages to make them slicker and more demoable. What's wanted is a tree display--I should look again at the Bebop Tree component and see if I can make it work.

Issues this week: Two flaws in the Java language seemed to impede progress this week. Firstly, no clear way to document whether a parameter or return type may be null (if it may, then of course you need to document what that null value means). C ++ has the distinction between pointers and references (which are never null); Java could use something similar. Secondly, the computer should not be telling me what to do; if I wish to override Java's access control for debugging purposes then that is my decision not the compiler's. I'd have liked to have time to investigate free Java compilers that let you disable these restrictions if desired.

Week beginning 2001-08-12

Monday: working on using XSLT to style the category pages. I need to make the XML element generated have a particular class--like `aplaws-categories'--and then write an XSL file which matches against this class and outputs whatever HTML is needed for a generic local-council look and feel. I ended up modifying the BoxPanel layout container to produce XML with the right class attribute set.

Spent quite some time trying to work out XSL's handling of entities. These are the magic sequences like & (ampersand) or   (non-breaking space) which appear in HTML documents. XML also has its own set of entities, but it is much smaller. Now if you include an & in the XSL file, it gets treated as the ampersand entity, which is then output in HTML not as a literal ampersand, but re-escaped as &. So what happens if you put   in your XSL? Er, it breaks.   is not a recognized XML entity. Since all you want is to insert the literal text   in the output document, you might expect that escaping the initial ampersand would work--so   in the XML document. But this doesn't work because again, & goes back to being & in the HTML. (Confusingly, the Bebop debug pages show long sections of   in the XML and it seems to work fine. Perhaps their quoting is messed up or something.) In the end I resorted to the character code for non-breaking space,  .

So I got the browsing through categories pages styled correctly, now I have to do the same for the display of an item itself. The item display pages are not ones I construct myself, I get them from CMS code, so it might be harder to make them do what I want.

Tom A. has written a replacement ItemResolver which sets the class of MasterPages to `aplaws' before returning them, so you can use this ItemResolver instead of the CMS standard one and all items will be styled correctly.

Tuesday: cleaned up loose ends in the morning (wrote a document describing my work and how to demo it), left for holiday at noon. On holiday the rest of this week.

Issues this week: If you generate XML and transform it to HTML using XSLT, it is best to have the XML represent the meaning of the data. The XML generated from Bebop has things like BoxPanel which are specifying a particular layout, and that seems to miss the point.

Week beginning 2001-08-19

On holiday all week. Reading Stroustrop's book on C ++. It sounds so appealing compared to Java. Also read Douglas Coupland's novel, Microserfs--which, when I think about it, reminds me a lot of this logbook. (It's in the form of a diary...)

Issues this week: is it necessary to use sun block when I'm already wearing a long-sleeved shirt and hat?

Week beginning 2001-08-26

Monday: on holiday.

Tuesday: got back around midday, spent time catching up and marvelling at my huge todo list which has accumulated. On the plus side, several of my bugs were marked `expected release 4.6.5', and this release has now been made, so maybe they actually got fixed! Unfortunately our team hasn't moved to that release yet.

Dusted off my development server, cvs updated and tried to get it running again. It doesn't work, but after a while I gave up because we will be starting from scratch next week.

We are gathering requirements and user stories--or rather we aren't. We are making up user stories ourselves because, supposedly, Camden don't have any clue about what they want. So I start work on answering a few requirements questions, trying to resist the temptation to give whatever answer will be easiest to implement. I am guessing at requirements without ever having met the customer.

Wednesday: we continued answering requirements questions, in the area of `administration'.

HTA contacted us, they have some problems with the house customizer (the web page that lets you choose what your house will look like once built). Apparently the images for the various combinations (`2 bedrooms for rent', etc.) are wrong and we should change them. With hindsight, the fancy customizer thingy where you can choose roof, first floor and ground floor separately was a mistake. It works by pasting together images for the different sections of a house, but many of the combinations are not actually buildable, so it's a lot of trouble to create a set of rules for what options can be chosen. It would be simpler to choose between a set of static images. Also HTA want to add two new options, pictures of blocks of flats, which really don't fit well into the existing setup.

The best option would be to remove the whole fancy pasting-together setup and go with static images. But we are trying to do as little work as possible, since HTA aren't paying us at the moment. So we just add a few more special-case rules and kludges to the existing page.

Thursday: worked some more on HTA (although we are not charging them, I wouldn't have much else to do). Went through all the changes that Tom made over the past couple of months and committed them to CVS, or improved them (in some cases things could be done better). Also packaged up the latest versions of the reusable packages we made as part of the HTA project, and contacted the OpenACS people to see if they're interested (OpenACS is an openly developed offshoot of ACS Tcl--the internal version is not maintained any more).

Friday: discussed the answers to the requirements questions, fixed some more bugs for HTA.

Issues this week: The problems with the house customizer show that you should let all the content be provided by the users whenever possible. If we'd just gone for a simple design with a few fixed images and let the HTA people upload new ones then there wouldn't be any of these complaints about the pictures being wrong. At the time I wrote it I thought that we'd want to independently mix and match the three parts of the house like some kind of card game, but this turned out not to be the case because many combinations cannot actually be built!

Another important thing to remember is that you shouldn't be afraid to scrap code and replace it with something simpler. All the time we have worked on the house customizer since writing it I've had the nagging feeling that we ought to just throw it out and replace it with a simple selection of images. But there has never been time to do that, HTA haven't been paying for extra work so the emphasis was on fixing important bugs as quickly as possible. And yet, if we had rewritten it earlier we would have saved a lot of time in the long run.

Week beginning 2001-09-02

Sunday: helped with the first ACS Java boot camp in London. Members of the public turn up and learn how to use our toolkit. I was around to answer questions and fix any problems that arose (although in the case of NullPointerExceptions buried deep within the code, I decided that since it usually took me half a day to track down and report such things, during a boot camp it would be better to sidestep the problem and just not call that method :-P).

Monday: This week people from Camden are coming round to gather requirements. But there are more of us than there are of them, and we didn't want them to feel intimidated, so a couple of people (including me) didn't attend the meeting. I worked on HTA instead--working out what changes we've made to the ACS core and submitting diffs to the OpenACS people.

Tuesday: today I actually did take part in gathering user stories. We are trying to get requirements for content and configuration of the site. About a dozen people from various local authorities came in the morning, we split up into groups, and they talked about what they wanted. We made them write their thoughts down as user stories on bits of card, then in the afternoon we read through the stories, removed duplicates and estimated time needed.

Gathering requirements is mostly a passive job; the topics for a particular day are set in advance but once you get going the customers are quite happy to babble away without too much prompting. You occasionally have to remind them to write things down or nudge them back on topic if the discussions start going off into outer space (also known as local government internal administration).

Wednesday: more user stories gathering, as yesterday. Today we discussed navigation and personalization. There was some dispute on whether it was acceptable to personalize content based on the age of the site user (many council services apply only to over 65s); some council employees felt this might offend some users.

Thursday: gathering user stories on accessibility--mostly, making sure that blind or partially-sighted users can read the site. This is really nothing more than good web practice which we do anyway (or would do, if the Bebop page layout system allowed more flexibility in what HTML is generated).

Then we estimated time requirements for all the stories. These usually have two parts: time taken to implement the feature assuming that the base ACS functionality is already written, and time to implement the base functionality itself. At least half of the user stories are `zero days' if the ACS packages like bboard are written and working correctly, `several weeks' if not.

Friday: we've finished gathering the user stories but now we need to decide which ones to implement during the next phase of the project (the next four weeks). All the council employees came round for one last visit and we cleared up any ambiguities in stories they'd written.

By the afternoon we had information on user stories scattered across several different places: written on cards, in text files, in messages sent to members of the group and on a bulletin board. We rationalized things by putting all the information once and for all into a database table.

Issues this week: Programmers are always told to keep information in a single place and not to duplicate it, but we didn't apply this lesson to the user stories data. It wasn't until we eventually got fed up and put it all in the database that there was some sort of order. It is better when gathering data of any kind to think in advance of where you're going to store it.

Week beginning 2001-09-09

Monday: I won't be working on the APLAWS project for the next phase, and the other projects I could join haven't started yet, so I spent the day tidying up the database schema for user stories and working on the web interface to it. The web interface, along with the rest of the `project site' for APLAWS, is implemented in the old ACS Tcl. Also reading The Pragmatic Programmer.

Tuesday: continued with user stories manager. It is starting to look quite nice, we've made an ACS package and started to generalize things so that other projects could reuse the code. The user stories database, as with the rest of the project site we use to interact with customers, is written in the older ACS Tcl rather than the new Java version. I had almost forgotten what it feels like to actually get some work done.

The initial database schema we chose was somewhat quick-and-dirty. For example, we stored the author of a user story as a string rather than a foreign key into the users table. It took about half a day to sort out this kind of thing; it's a bit awkward to match up strings and convert them to row ids, partly because it takes a while to write the SQL and partly because people tend to mistype their names. The moral of the story is that changes in the data model are difficult, much harder than changing code in an editor, so it's worth trying harder to get it right in the first place.

Rest of week: continued work on user stories package and read some books. Not much is happening at the moment.

Issues this week: Our database of user stories was certainly a big improvement on having them in several files and scribbled on bits of card. But it still suffered from short-sighted design: my decision not to use foreign keys for user names but just type them in. That wasted time in the long run. It seems that changing database table definitions is a real pain, so it's worth spending some time getting them right before you start loading data.

Week beginning 2001-09-16

Did some more enhancements to the user stories package, giving different ways to sort and view stories. Also fixed some bugs discovered in HTA's site. But again, not a lot happened this week.

Issues this week: We wrote a large amount of fairly trivial code to select user stories from the database and order them by one of several columns. It sounds like this could be factored out, not just for user stories but in general: by defining your type, saying what the fields are and then using generic code to browse through all instances, sort them and edit them. That, I suppose, is what a CMS does. But in fact we were more productive implementing these things from scratch (with only a thin templating and utility layer to help us) than when working with the much more full-featured ACS Java toolkit. There seems to be a conflict between the principles of `don't repeat yourself' (generalize code and reuse it) and `keep it simple'.

Week beginning 2001-09-23

Monday: another project involving Camden council is starting. Whereas the APLAWS project is to build outward-facing sites for many councils, this is a smaller project to build an internal site (`intranet') for this council. We are gathering user stories as we did for APLAWS. Went to the Camden office and had a meeting with about a dozen people; one of them had participated in the requirements gathering for APLAWS so she could explain to the others what was needed. We got a huge number of stories written down on little cards--some of the council employees must have been making a list in advance. In the afternoon we categorized them and thought of time estimates.

Tuesday: another council project starting, this time for Harrow council, which involves even more travelling than Camden. But the experience was very different. Rather than excitable junior employees rapidly scribbling out user stories, we got a four hour meeting during which representatives of various departments droned on about possible legal issues (which they know nothing about--the legal department was not present). On the bright side, this means there are fewer user stories to type up later.

Wednesday: went to Harrow again for more user stories. There are hundreds of official performance indicators for local government (number of visits to sports centres per thousand inhabitants, number of queries answered within seven days, and so on) and they would like a way to gather together this data and generate reports. But actually calculating the measurements based on raw data is a lot harder--potentially involving new programming effort for each new metric--and we probably won't have time to do that.

There seems to be a hard scalability bottleneck in number of developers at a meeting. The limit is two: if you have more than that, the less important ones will be pushed off the bottom of the list and not say much during the meeting. This happened to me on Tuesday when I was one of three people from ArsDigita; on Wednesday there was only one other developer there (Ash) so I could participate a lot more (I tried to keep things on topic by reminding the Harrow people that internal policy decisions don't need to be part of the user requirements for the software). When gathering user stories you often overcome the scalability problem by splitting the meeting into several groups with one or two developers in each group.

Thursday: today is my penultimate day, I started tidying my home directory and seeing what needs to be archived. I also tidied up this logbook, converting it into LATEX format from plain text, and adding the `issues this week' section. When I originally wrote each weekly or daily update, I tried to explain technical things to someone who wasn't familiar with our toolkit, but mostly it ended up being verbose or awkward or both. Hopefully the `issues' will be a readable summary.

Friday: proofreading my logbook and adding more issue summaries. The style in which I wrote the entries was fairly informal, and this final printed version is also informal but perhaps not quite so much. There are some stylistic features which I have tended to overuse, particularly I have used `unnecessary' quotation marks all over the place. Today I wrote a short lint tool for English text, called englint which checks for this fault in my writing, and also warns about the frequency of a couple of words I overuse: `just' and `although'. I went through the document and toned down the writing style a little. I also made my lint program list all the acronyms so I could check that I'd defined all the uncommon ones before use.

Issues this week: The user stories gathering at Harrow went much more slowly than it needed to because the people there didn't know what the internal policy for various aspects of the intranet would be. It might have been better to have junior staff, as at Camden: they are happy to say what they think without hesitation, and even if a lot of it is not going to be done, that can be weeded out in the prioritizing that happens after all the stories are gathered. Or you could talk to senior people who can clearly decide the project's aims (at Harrow there was much mention of `Malcolm', who apparently makes all the decisions but wasn't present at the meeting). But avoid those in the middle. It might also have been useful to just have fewer people present, since only three or four did most of the talking anyway.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.56)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 logbook.tex

The translation was initiated by Edward Avis on 2001-09-28


next_inactive up previous
Edward Avis 2001-09-28