3. Framework of Digital Libraries

In this section, a new distributed digital library framework based on cloud computing is proposed. In combination with the design concept and technology of the cloud computing platform, the framework separates the subsystems of different functions within the system into logically independent structures with a function level division method. Each subsystem is not only logically separated but also arranged in different computing nodes in different areas. The system has both the characteristics of a distributed structure and the capacity of integrated distributed service with good expandability. The overall framework of the system is loosely coupled, and information is exchanged by way of message queue communication middleware and XML document flow with service bus integration. The mash-up and other techniques are also introduced for effective integration of the geographically distributed library resources and services, formulating an integrated distributed digital library system.

In accordance with the different business logic functions, the computing service nodes in the overall framework can be divided into two subsystems: business subsystem and service integration subsystem. Both of them can operate in several computer clusters in different areas, united to form a common consolidated digital library framework as shown in Figure 2. The following is a detailed introduction to the functions and characteristics of the two kinds of subsystems.


Figure 2.  System Organization Framework

3.1 Business Subsystem

The business subsystem is made up of one or several geographically distributed or concentrated computer nodes, and a number of business subsystems form the business support and application centre of the overall distributed digital library system.

The base of the business subsystem is the hardware cluster supporting system operation, including the basic hardware device, cluster system management device, and logically complete digital library operation basic framework as well as relevant digital resources. Driving the hardware clusters of each business subsystem is the basic cluster operation system software, managing such operations as system load balance, system log, and cluster monitoring system, as well as the distributed computing organization framework built on this basis, such as the data storage framework and semi-structured data storage framework based on NoSQL [6]. The above mentioned distributed data management and storage system forms the basic cluster system of digital libraries, and makes up the complete base platform of individual business systems. The business system may prepare the basic cluster system with locally allocated distributed clusters or via a third party, that is, to build the basic cluster of the business subsystem by relying on the IaaS service or PaaS service provided by the third party.

Built on top of the cluster management system are the library data resources and application systems supporting the library business, including business systems of unified search, reference consultation, journal catalog, and online cataloging. Whether it is a locally allocated business system or via a third party, the structure and implementation mechanism are the same.

All of the functions and services of each business subsystem are kept complete and consolidated within the system, without influence or interference from other business subsystems. Different business subsystems operate independently by adopting a shared-nothing [7] framework, with the digital resources achieving independent data partitioning and application of different business subsystems by way of horizontal segmentation. In addition, the business subsystem includes the following two basic characteristics: (1) All business subsystems realize loose coupling and flexible expandability by way of modularization. Different business subsystems can offer services diversified in form but complementary in content. For instance, plentiful video data for distance education can be kept in some business subsystems to provide video teaching services, whereas relatively complete information on classic books is preserved in other business subsystems to offer literature-related research and services. (2) The business subsystem provides all services to users and realizes integration with third party services via open API. With service-oriented architecture (SOA) packaging, and by way of mash-up, data and services are integrated between different regions and various types of business subsystems, thus offering web-based services to the users.

3.2  Service Integration Subsystem

The service integration subsystem is a key component in integrating distributed services of digital libraries; it integrates and coordinates service functions, business logic, and organization and discovery of digital resources in different business subsystems, and provides the third party system or users with SaaS type library services. The service integration subsystem is compatible with all business subsystems via semantics and interfaces provided by open API, and provides open API trust service to all business subsystems. The individual business subsystem and service integration subsystem jointly form a new service system which is low coupling, highly cohesive, controllable, and self-adaptable. An integrated framework is achieved by way of SOA.

The service integration subsystem and business subsystem operate in different process spaces. Resources, services, and functions provided by different business subsystems are managed and made accessible by means of registration. Credible business subsystems first register service interfaces, data resource scale, data resource contents, URI information, and other contents provided by the service integration subsystem. When the service integration subsystem gets all the required information for registration, the relevant information is renewed in the business system database, and the available resource list of the system is refreshed by intermittent scanning in batch model. When users log on for a second time, they can see the newly added resources via Web page. In cases of failed automatic registration, system managers must make checks and manually add new resources at the front end. Moreover, the business subsystem can dynamically register, add, revise, or cancel the provided resources and services in accordance with its actual situation.

The service integration subsystem is the gateway for users to access and make use of library services. It provides cross-platform, cross-region, integrated, and one-stop library integration service by adopting Web sites via the Internet. Once the users gain access to the service integration subsystem, they are under the management of the access control mechanism. Once access has been gained to these resources, the functions and services provided by the platform can be fully utilized.

3.3 Integration with the Existing Digital Library System

An important criterion in judging whether an information system function is complete is its compatibility with the existing system, thus ensuring a smooth upgrading and transition process. Though the framework described in this paper is mainly applicable to a newly established distributed digital library system, there are two ways for the resources or services of existing digital libraries to be integrated into the system without influencing its normal function. (1) Integration at the metadata tier via the CALIS Nebula framework. Centralized management of metadata is realized by means of metadata partitioning, which leads to resource retrieval and browsing of multiple digital library systems, as well as integration and management of metadata information at the node tier. (2) Complete (large-grained) service integration by means of non-invasive proxy user. The system manager sets up a workable proxy account in the business subsystem and sets permissions to the available functions and accessible resources of the account. The account is used by the external system to access resources and services newly added to the digital libraries. Integration of resources and services is effectively gained between the new entry system and the entire system by way of this large-grained integration, without influencing the independent operation of the new entry system. As the new entry system represents the basic service-providing subsystem at the service integration subsystem tier, other users only need to control the user authentication when they use proxy accounts to access the entire system. The business subsystem tier is completely transparent to the other parts of the entire system.