| Back
Hierarchical Storage Management
()
Archiving of scanned documents could be defined as last phase, but also easily as the most important part of an EDMS (Electronic Document Management System). With the large quantity of documents we have to handle on day to day basis, an adequate archiving system plays a crucial role, for it is practically impossible to track all of the (scanned) document files “manually”. The degree to which various business systems depend on document archiving varies, based primarily on the nature and organizational structure of each system. However, we could probably hardly find an activity whose archived documents will never be needed again. Therefore, I’ll devote some more lines to this field of work.
Following the growing needs for disk capacities and long before the introduction of personal computers, they already developed and used special products which allowed Hierarchical Storage Management of data (files). These HSM systems ensure optimum use of storage media and convenient access to files, that is to say, documents.
A multilevel archiving system can be organized in a way to match the hierarchical consecutive order of storage devices in which the files have been archived by HSM in accordance to the level of usage, i.e. access frequency, of each file. The most important advantage of such a procedure is that users have no need to keep track of where a certain file is located, for the HSM system automatically moves them through the various levels (hard disk, jukebox, external devices, etc.).
HSM System simulates the classic archiving methods, where users keep the most frequent documents on their desks, while the “outdated” ones are stored in file cabinets or basements. The difference between the manual and automatic system lies in the fact that the user doesn’t need to take care of documents being delivered from the archive, as well as, filling them back after they are not needed any longer. By doing so, we can avoid otherwise usual loosing of documents and/or their unnecessary photocopying when the same document is required by several users. With the use of various media (more or less expensive storage devices) we can then set up some sort of (electronic) archive in which the documents “travel” from one media to another - what depends on frequency of usage.
To illustrate the HSM system let’s imagine a pyramid (see picture) that represents various media for archiving. The pyramid illustrates each archiving level, namely:
the hard disk in HSM Server (one or more, so-called partitions)
the hard disk on SAN or NAS (Storage Area Network or Network Attached Storage)
the jukebox for magneto - optical disks (MO disks) (the volume’s number depends on number of disks)
the stand-alone external drive for MO disks
the tape drive and/or standard, that is to say fire protected cabinet, or off-line archive where MO disks are stored.
The top of the pyramid is represented by (local) disk in HSM server, where most data is located; the latest and/or most frequently required by the users. Because of the nature of the HSM system’s functions, their response times are the fastest. This section of the pyramid also has the least space available (disk size); on the other hand, it is the (most) expensive storage media.
We (could) use one or more disks (i.e., volumes) for the next level in a file or a network server, while the response times in accessing this data are slightly slower, since the HSM has to “search” for the right file and then route it back to the user. The third level in our example represents the jukebox for MO disks. Data access is still automatic, i.e. without operator’s interventions, but the access times could be slightly slower; especially in case, when various users require files that are located on different MO disks.
Our fourth level represents the stand-alone MO drive, and we can now begin to gradually set up the so-called “off-line” archive. Data access is not automated any longer, since operator has to manually change each MO disk located in the fifth level, i.e. in the closet. As an archiving media on the fifth level we could also use tape drives with a suitable jukebox (in that case, the closet represents the sixth level), although in my opinion, this doesn’t seem to make sense, since when using the MO disks we achieve considerably higher flexibility, as with the tapes.
Moreover, we can configure the HSM system also in other ways using various number of levels - what depends primary on the amount of data that needs assured minimum response times and accessibility without manual interventions (changing of MO disks). When configuring the HSM system, we can help ourselves by figuring the current amount analysis and predictive data size, for then we can calculate, at least approximate capacities required in storage media of each level.
Migration is a process of “moving” individual documents through the hierarchic scale (from one media to another), up or down accordingly to previously defined rules, i.e. procedures(steps).
Each object located in HSM system depends on, so-called storage profile with function that defines a migration route of an object across each level, i.e. through the media that constitutes the system for the hierarchic archiving. The individual groups of objects can be bound to a common profile what means, they represent a group of objects in a certain migration process and follow (undergo) the same, i.e. common rules. Each profile can have it’s own set of media, or they can share it in common, i.e. the same disks are used (partly or completely) by several profiles.
In the migration process a special role represents the operating section, so called Zero-Level in the HSM System. In this section most current objects are located namely, those that have been scanned in the period after the last migration. This level also includes those objects that have already been archived on the lower levels in the HSM System and were accessed by the users also later on. This process ensures very short access times to most topic documents, for the “re-migration” doesn’t have to move through levels, since the object is already copied onto Zero-Level of the HSM System mentioned before. After this level has become gradually full (automatically, in principle), the cleaning process takes place and now all newly formed objects are moved to the first level, while the others are simply cleaned off. In case when there are some changes done to the objects that have previously been archived on the lower level, so changed objects are being processed as a newly formed (and at the same time the old ones on lower levels are being cleaned off).
The categories in the HSM Systems
Remember, each archiving level mustn’t be equalized or identified with the HSM System Levels into which these sets of software could be “placed” among. In accordance to software’s properties and characteristics, the HSM Systems can be grouped in five levels:
1. Level 1 allows simple and automated data (file) migration and their transparent access. While the files migrate to the higher levels, the users don’t “notice” any changes. The file is virtually still located on, e.g. disk G: ; although it is already (physically) on the next level. After the user has “opened” the file, the intervention is performed by the HSM.
2. Level 2 supports at least two or three levels (see the pyramid) and migration flows accordingly to previously defined procedures and rules; it depends on (over)occupancy status of the media in each level.
3. Level 3 supports two or three levels (pyramid); the migration criteria can vary and become adjusted dynamically, i.e. we can add media to each level. The HSM Systems of Level 3 already support optical and tape drives.
4. Level 4 allows definition of rules and procedures for data migration based on the qualification (priority) of these files. Virtually, all archiving levels (pyramid) receive their support and we can therefore use various drives in the whole network.
5. Level 5 represents the highest level in the HSM Systems, and differs from the previous level in a way, that here we don’t deal with, so-called file system but, with a an object-like system. Moreover, these systems are, in my humble opinion, most suitable for archiving of scanned documents. The reason lies within the fact that we deal here with an object-wise way of handling the scanned documents (therefore, it isn’t merely the file system alone), that aren’t a consisting part of the database, since the database locates only reference data about the (current) location of each object. The object alone represents an independent entity in migration process; of course, controlled by the HSM System. The present-day HSM Systems allow also the so-called, data replication between different HSM Servers in WAN environment.
Right decision on most suitable HSM Software is not very easy indeed, because in most cases we are bound to a subsystem that is incorporated in the whole Imaging & Archiving System. In my opinion we have to pay most attention on this very software when choosing the whole system and carefully examine all the possibilities it offers (or lacks). When deciding upon buying a complete document management system, we need to have the right picture of how such system will serve (suit) some years from now. The main issue lies in conceptual questions in regard to where and how we shall capture the data and archive documents (both, scanned and others), about the expecting record format, i.e. compressing method and similar characteristics.
Unsuitable choice of the HSM System may not, in beginning, cause any trouble, but the solution to problems appearing later on, will definitely not represent a very cheap adventure; especially, if you’ll find out the whole software needs to be (almost) completely changed. Although, it is true that software is getting cheaper every day, you must not forget to consider - beside license costs, also those costs you’ve spent on training and introduction of the new system and of course, data conversions from the previous into the present system.
Just remember that what's essential is usually invisible to the eye...
| Back