This article originally appeared on Logicworks' old blog.
Logicworks' new blog can be found at www.gatheringclouds.com.
Logicworks' new blog can be found at www.gatheringclouds.com.
In my previous post I discussed a puzzle I had discovered in a reverse engineering project here at Logicworks. The goal of the project was to keep our closed source accounting system's invoices and purchase orders synchronized with our ticketing, inventory, and CMS system, LogicOps. The specific puzzle was the lack of a primary key in the database tables I was querying and my tentative solution was to identify a set of columns that when taken as a group (concatenated, really) could be guaranteed to be unique. After some analysis with quick Perl scripts, I had identified a set of four columns that I could use to create this composite key, the account ID, the invoice ID, a line-item sequence ID, and a component sequence ID.
Here's an example of the rows for an account, with a quantity and product name column added for clarity:
| account_id | invoice_id | line_item_sequence_id | component_sequence_id | quantity | product_name |
| 012015 | R27246 | 212992 | 0 | 225.0 | Backup Committment |
| 012015 | R27246 | 221184 | 0 | 1.0 | BW committment for server (1mb/s) |
| 012015 | R27246 | 245760 | 0 | 1.0 | Dedicated Xeon SCSI server |
| 012015 | R27246 | 245760 | 49152 | 1.0 | Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI |
| 012015 | R27246 | 245760 | 65536 | 1.0 | Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core |
| 012015 | R27246 | 245760 | 81920 | 2.0 | 2GB DDRII 667 FBDIMM |
| 012015 | R27246 | 245760 | 98304 | 2.0 | 73GB SCSI 10K RPM hard drive |
| 012015 | R27246 | 245760 | 114688 | 2.0 | 300GB SCSI 10K RPM hard drive |
| 012015 | R27246 | 245760 | 131072 | 1.0 | Microsoft Windows 2008 Enterprise |
| 012015 | R27246 | 262144 | 0 | 1.0 | Shared loadbalancing service |
| 012015 | R27246 | 278528 | 0 | 1.0 | Shared firewall service |
| 012015 | R27246 | 311296 | 0 | 1.0 | Dedicated Xeon SCSI server |
| 012015 | R27246 | 311296 | 49152 | 1.0 | Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI |
| 012015 | R27246 | 311296 | 65536 | 1.0 | Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core |
| 012015 | R27246 | 311296 | 81920 | 2.0 | 2GB DDRII 667 FBDIMM |
| 012015 | R27246 | 311296 | 98304 | 2.0 | 73GB SCSI 10K RPM hard drive |
| 012015 | R27246 | 311296 | 114688 | 2.0 | 300GB SCSI 10K RPM hard drive |
| 012015 | R27246 | 311296 | 131072 | 1.0 | Microsoft Windows 2008 Enterprise |
| 012015 | R27246 | 327680 | 0 | 1.0 | Shared loadbalancing service |
| 012015 | R27246 | 344064 | 0 | 1.0 | Shared firewall service |
The
account_id and invoice_id columns are obvious enough. Some querying confirmed that these are foreign keys to other tables in the database, even through they were not explicitly defined as such in the database schema. But what are the line_item_sequence_id and component_sequence_id columns? This is one of the perils of reverse engineering. You may find a solution, but you may never understand why something is the way it is. Ideally you'll be able to take the time to put to rest any mysteries that remain in your solution. For this puzzle, it helped to understand these columns' purpose by taking a look at how this customer's invoice looked:
| QTY | Product Description |
| 225 | Backup Committment |
| 1 | BW committment for server (1mb/s) |
| 1 | Dedicated Xeon SCSI server |
| 1 | Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI |
| 1 | Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core |
| 2 | 2GB DDRII 667 FBDIMM |
| 2 | 73GB SCSI 10K RPM hard drive |
| 2 | 300GB SCSI 10K RPM hard drive |
| 1 | Microsoft Windows 2008 Enterprise |
| 1 | Shared loadbalancing service |
| 1 | Shared firewall service |
| 1 | Dedicated Xeon SCSI server |
| 1 | Server Chassis: dual socket Xeon (5000/5100/5300) dual/quad-core, 4-bay SCSI |
| 1 | Intel Xeon 5410 2.33Ghz 1333FSB 12MB qual-core |
| 2 | 2GB DDRII 667 FBDIMM |
| 2 | 73GB SCSI 10K RPM hard drive |
| 2 | 300GB SCSI 10K RPM hard drive |
| 1 | Microsoft Windows 2008 Enterprise |
| 1 | Shared loadbalancing service |
| 1 | Shared firewall service |
Notice how these invoice line-items are grouped? Compare the set of components under one of the chassis' to the corresponding database rows. It appears that grouping of line-items together results in them having the same
line_item_sequence_id. Furthermore, the "parent" line item in a group has a component_sequence_id of zero, such as the "Dedicated Xeon SCSI server" rows. Finally, it appears that the component_sequence_id and line_item_sequence_id together provide an ordering for the invoice with component_sequence_id ordering the higher level line items and line_item_sequence_id ordering the nested line items. This mystery is unraveling! Those two columns are used for grouping and ordering the line items within an invoice.
So it appears that the developers of this system did some mixing of presentation and the data model in their design. But that's understandable here since the layout of the line items in an invoice can be considered an essential characteristic of the invoice. The big remaining mystery is the actual values for
line_item_sequence_id and component_sequence_id. Why are they so huge? Why not just use 0, 1, 2, 3... and so on? The values almost look random, but to a programmers' eye some of the values are quite conspicuous. At this point I had a solution that I felt I could move forward with, but I didn't feel entirely satisfied that I really understood it. The mystery of the strange values for those two columns could be innocuous, or it could be a ticking timebomb that I'll only discover after weeks of erroneous synchronization fouling up the LogicOps database. I felt like I needed to understand those numbers, so I sat back and started contemplating what might've been the motivations of the developers of this software. In the next post I'll explain what I discovered.
Write a comment
* = required field