Site menu Achieving ACID transactions with common files

Achieving ACID transactions with common files

An embedded system that we have developed neeed a very safe method to store certain numbers, since those numbers would be routinely audited. Since those numbers were related to money values (like a cash register, which stores grand totals in a tamper-resistant memory card), the storage mechanism had to achieve the following goals:

To wrap up, the storage should achieve ACID transactions (atomic, consistent, isolated and durable). Since it was not possible to put a database inside this embedded system, the only tool available was the filesystem itself. We chose ext3 since it has journaling, which guarantees the ACIDness of file metadata.

A detail that turns things simple is that the data set to be stored is quite small, so it is perfectly affordable to write all data (i.e. all numbers) over and over again.

The technique described below is in current use by several thousand deployed appliances, with excellent results and with no data loss (except by hard CompactFlash failure).

Steps taken in EVERY storage update:

Steps taken to read data when appliance is turned on:

The transaction described above is atomic since the file is valid only if it is completely written. Should any interruption occurs, the file will forcefully get an invalid/nonexistent MAC and it is discarded.

The journaled filesystem is a key component here since it guarantees that file metadata is never half-written. The file will never be left in an unreadable state and/or be impossible to overwrite. It may well have corrupted data, but this is detected by MAC. The scheme would still work with a non-journaled filesystem, but the possibility of corrupted metadata would increase the probability of the appliance to need a technician instead of self-correcting.

The transaction is also consistent because the MAC checksum guarantees that. A well-chosen MAC algorithm will also protect against fraud to some extent, making more difficult for a tamperer to put a fake file in place of the legitimate one.

The transaction is naturally isolated since only one process is in charge of writing the files. No software measure was need to further guarantee the isolation.

And finally, the transaction is durable since it is written in synchronous mode. The write system call returns only when the file lands on flash. The application is then completely sure that file data is safe. In asynchronous (normal) mode, actual writing on flash may (and will) be delayed for a long time by the operating system, and writes may well happen in a different order than issued by the application. Most databases employ synchronous file operations to guarantee durability of transactions.

Two or more files must be employed in this scheme in order to guarantee durability. If we employed only one file, we could lose the old data while trying to write the new data. Any failure on the "right" moment and we lose both versions of data forever. (This problem is analogous to someone that makes backups on the same DVD-RW every day — he won't have any backups while the new one is being burnt.) While we employed two files, we could use three or more. Since our CompactFlash was slow, we chose the minimum.

Instead of writing to files as we did, we could also use a single file but write it always in append mode, as a redo log. Old data is never rewritten. This scheme is also ACID and allows for rollbacks as well as audit of past snapshots of data.

Since our appliance did not need to store the data history in the appliance itself (the grand totals are daily transcribed and audited by humans) and our CompactFlash was small, and numbers were updated frequently (several times a minute), redo log was not for our appliance, mainly due to filesystem size requirements. Our appliance should be able to work unattended for years. The two-file approach was enough.