Users can search from DSA search engine by requesting DSA web site from their browsers. From the web page displayed by listing component the user enters his query phrase. When this request comes to the web server of DSA, PHP script of listing component retrieves and passes this phrase to searching component and lists the results in html format to the end user’s browser.
Results are listed ten by ten and user can traverse the result pages by clicking “Next 10” and “Previous 10” links displayed on the page. Obviously, results are sorted and listed in relevance order from the most relevant to the least relevant one supplied by searching component.
In each result, following details are displayed: Rank of result, title, description, address (url) of page, last visited time, content size, citation page link and citation text. When user clicks on title of the result, user’s browser opens the address of the clicked page.
Figure 3.13 shows an example result from DSA search engine. The searched keyword is “franchising” and the web site “www.mcdonalds.com.tr” is indexed by client modules before this search. First result found is shown in the figure.
![]() |
Figure 3.13. An example search result for keyword “franchising”
Client configuration component handles subscriptions to DSA system and prepares encrypted client system configuration parameters file that is used by incremental indexing, scoring, compression and communication components. These components use client system configuration parameters in much functionality which should not be tampered. On the server system, encrypted file is built with the key generated from client module’s id and sent to the administrator of web site. Only that client module can read from and use this file.
This component handles subscriptions of web site owners or administrators wishing to use DSA client modules. In DSA homepage, a link for this task is present where web site owners can subscribe, get a username and password and download necessary applications and client module scripts as well as client configuration parameter files generated by client configuration component for their use. When they install these applications and follow the instructions listed in the installation procedures file, their client modules are ready to function and index their web site or web sites.
In the subscription page, first the user fills company name, home page url, domain limitation expression (if user wants to), denied extensions (if user wants to), maximum page count, user name and password fields. This entry page is shown in Figure 3.14.

Figure 3.14. DSA subscription page client module parameters entry page
When user submits parameters for his client module, DSA server builds necessary client user configuration parameters and encrypted client system parameters file according to the fields entered by user to a temporary folder on server. Besides, client module Perl modules and stopword files are prepared and the download page is displayed. On this page, the user clicks on the download link and directory contents of client modules and files are displayed as shown in Figure 3.15 and the user can download his client module and install it following procedures in “install.txt” file.

Figure 3.15. Contents of DSA client modules and files
In this subsection of the thesis, some key points on security and performance issues will be emphasized. In DSA system, Apache web server is used as the server module socket connection point and most of vulnerabilities and performance issues are overcome since this web server is known to be almost error free and the most used one on the Internet. In addition, performance strikes higher than a simple socket application because of threading and caching implementations in Apache.
Perl modules use database connection pooling application interface of Apache, which is another added performance value for this distributed search engine platform. Persistence and reusable database connections help to pass away connection-opening overheads and increase server module processing and search performance.
Denial of service attacks to server modules are also considered as a design criteria. Only a limited number of server modules are executed at a time and by this way performance of the distributed search engine server is controlled in a steady state. More connection requests of client modules are refused and handled by the communication component.
Maximum file post size is set at Apache web server level which may be used as an attack to fill bandwidth and server memory with intruder garbled file posts to server modules. These file uploads are banned at the start of http file upload protocol, although they would be discarded and not processed by server modules if the entire file were uploaded.
Blowfish is used as the encryption algorithm during meta data file uploads where its implementation is used from third party Perl module, CBC. Session key used in this encryption is generated from authentication ids both in server and client modules. Key generation is hardcoded in implementation code. Development of stronger security, key exchange and protection of key generation algorithms are assumed to be beyond the scope of this thesis. Here, the intention is to show that encryption can also be used for the communication protocol of this distributed search engine and to avoid accepting garbled or distorted word weight indexes by server modules.
Encryption is also used in storing logs and statistics of crawler and parser components of client modules which are later used and updated in revisits to web sites to decide whether a page should be indexed and uploaded to server module.
Client system configuration parameters are stored in a file in an encrypted format and used by the client module component which controls the way of crawling, parsing, scoring and indexing web pages and assures that these parameters cannot be changed by client web site administrators who want to increase the index scores to increase their page ranks.
Installation of this system includes two parts. One of them is installing server system; the other one is sending and installing client modules on web site hosting systems.
Server system can be installed on most machines and operating systems as long as Perl [21] and PHP [30] programming languages, Apache web server [22], Sybase [27] and MySql [29] database servers are available. Server system can be thought of three different packages which are server module, database storage and search interface system packages. These three parts can be located in different machines and operating systems or in one machine. The relationships between these packages and the DSA server system are shown in Figure 3.16. In the Subsections 3.4.1.1, 3.4.1.2 and 3.4.1.3 details of these packages will be explained.
Figure 3.16. Packages of DSA system and data communication between packages
Perl is widely used in operating systems such as most of Linux and Unix distributions and Windows family. Standard Perl v5.6.1 installation and afterwards CBC, Zlib and DBD modules installation are applied to the server module system. Our DSA prototype is tested and used on Windows 98, Windows 2000 and Redhat Linux 6.1 operating systems.
The other application that is used in server module system is the Apache web server that will receive client module requests and run server module Perl scripts. Apache version 1.3.9 is used and tested on Windows 98, Windows 2000 and Redhat Linux operating systems in our DSA prototype.
After these installations, server module Perl scripts are copied to the Apache scripts folder and after installation of database servers, server module system of DSA is ready to work. The client modules can now send meta data to this server according to their server module url parameter.
As mentioned above, server module system package is made up of a Perl application, its related modules and an Apache web server application. Cluster of server module systems can be built by installing this package on multiple machines with each connecting to the same database server. Client modules can be organized and split into regions such that each region of client modules can communicate with a different server module system package machine. In this way, scalability and load balancing of DSA system can be achieved. The more client modules are used by web sites, the more clusters can be used.
In Figure 3.17, single server model of server module system package is shown in a dashed box. In Figure 3.18, clustered model of server module system package. Dashed boxes separate client module regions and module servers.
![]() |
Figure 3.17. Single server module system package
![]() |
Figure 3.18. Clustered server module system package machines and client module regions
Both of these database management systems can be installed on most Linux and Unix platforms as well as the Windows family. DSA database storage package is tested with and used Redhat Linux 6.1, Windows 98 and Windows 2000 operating systems. In Figure 3.19, clustered server module packages are used with a single database storage package where multiple module servers connect to a single database server.
When clustered database storage packages are used as shown in Figure 3.20, for scalability, multiple MySql server machines but one Sybase server machine can be used. Each server module package cluster or groups of clusters can connect to different MySql database server machines and in this way faster stemming operations can be processed.
Figure 3.19. Clustered server module packages with single database storage package
In Figure 3.19 and Figure 3.20, each package is shown with dashed boxes. Sybase database server which is used to store index data, cannot be used as clustered in the DSA system because searching mechanism assumes that all index data and web page information is stored in a single place.
![]() |
Figure 3.20. Clustered server module packages with clustered database storage packages
This package is tested on Redhat Linux 6.1, Windows 98 and Windows 2000 operating systems and the prototype of DSA uses PHP and Apache applications which have typical installations on Windows 2000. When installed on a separate machine, it takes over full system resources which increase search performance. As a remark, all three packages of DSA platform can be either installed on a single machine like our prototype system or installed on distinct machines as mentioned above.
Client modules can operate on operating systems where the Perl application, CBC encryption module and Zlib compression modules can be installed. These are typically most of Linux and Unix distributions and Windows family. If client modules are not installed on web hosting machines of web sites for security or performance reasons, they can reside on a different application server which is in the same network that client modules can access the web site quickly and do not use additional internet bandwidth for crawling. Web site owners or administrators wishing to use client modules of DSA should adjust necessary firewall and security configurations of their network to access DSA server module system.
Web site hosting owners or administrators can easily subscribe and download client system package from DSA home page. This package includes Perl setup application, encryption and compression Perl modules, client module configuration parameter files and installation procedures and how-to file with a greeting message.
Web site owner should first install Perl application from its setup program. In the next step, by using PPM (Perl Package Manager) he should download and install CBC and Zlib Perl modules and put the parameter files with client module scripts into a folder he has created. In the final step he should create a scheduled job at his operating system and add the client module Perl script to run on desired intervals and his client module is now ready to work.