![]()
|
[Introduction] [The Basics] [General requirements for sites] [Requirements for papers] [Examples] [Glossary] GUIDELINES FOR SITES AND PAPERSVersion: 5 November 2003 |
IntroductionThe ERPA software splits the management of papers included in the series between remote clients (Cologne, Florence, Harvard, Vienna etc.) and a central serving point (Vienna). The central serving point needs only little information about the remote clients (WWW-sites). The below rules were designed to be as scarce and flexible as possible. For the spiders and the search-engine to function properly, however, it is essential that all sites and papers stick to some (the following) guidelines. Based on this approach, the clients may add papers to their sites just as they usually do and the central serving point will be able to include automatically these papers and their specifics (eg. author, title, ...). Those questioning the search-engine will find this particular information exclusively. The structure of this text is as follows: First, a short overview is given how ERPA works. Then the general requirements for sites are described, and third, the requirements for the papers are defined. The latter are also explained in exemples. Please note that there is a glossary at the end of this text; notions in the text which appear in this glossary are hyperlinked to the glossary section. |
The BasicsThe basic structure of ERPA is as follows: there is one spider for each participating site which knows its URL, the exact address of the directories etc. (see the general requirements for sites). All papers to be included in the ERPA search-base have to conform to certain requirements for papers. Once a day (probably during the night in Europe), the spiders will look at each site for new or changed papers.
The spider then stores the data of all papers (i.e. the name of the author, the title etc. and also the full text) in two files per paper at the central serving point. The search form at the ERPA homepage (http://eiop.or.at/erpa/) is generated by the search-engine. This software then searches in the pairs of files for all papers (and not in the remote files at each site). |
General requirements for sitesTo find papers dedicated for the ERPA (and only those), the spider needs three types of information concerning each site:
The information above (cf. points 1.3. above) has to be transmitted to the central administrator who could then include this information in the spider. It is not difficult to change this information if necessary. |
Requirements for papersPapers included in ERPA have to be either in HTML or in PDF format. The following rules apply: A series which offer the papers in PDF format only has two options:
Each paper must consist of one ERPA main file (in HTML format) and zero or several other files. The main file includes the information necessary for ERPA and also guides the spider to the possible other parts (files) of the paper. The spider searchs for the following list of fields to extract information from the main file. The fields are marked up in the HTML files according to the rules below. Note that some of them (author, title, date) are compulsory and others are not: |
|
The following rules apply to this list of fields:
Note that a number of special characters are allowed: see the respective list here. |
Examples for ERPA markup in the papers[META-tags] [COMMENT-tags] [mixed solution] Alternative 1: META-tagsDownload the template.htm for a solution <HEAD> José I. Torreblanca"> - For an enlarged European Union"> majority voting, Council of Ministers, European Parliament"> 1997-001b.htm, 1997-001c.htm"> the facts behind the resignation of the European Commission under Jacques Santer, followed by theoretical considerations on the significance of trust and reputation from the principal-agent-theory perspective. The third part puts the emphasis on discussing as to which extent a loss of trust and reputation had an influence in the resignation of the Santer-Commission.<BR>The author concludes that the Santer-Commission underestimated the increased power of the European Parliament. The inadequate information policy and the increasing practice of manipulating documents led to a loss of trust. After the threshold had been crossed in connection with the BSE-scandal further violations finally led to the destruction of reputation of the Santer-Commission."> </HEAD> The files 1997-001.htm, 1997-001b.htm, 1997-001c.htm are all stored in the same directory as the main file and include the main text of the paper in three parts; in each of these files the beginning and the end of the text is marked up with the COMMENT-tag for the field "text". |
Alternative 2: COMMENT-tags<BODY> In this example, no keywords are given; the file includes all parts of the text; and the search-engine will point to this file only. |
Alternative 3: mixed solutionExample taken from a test version of EIoP paper no. 1997-001: the paper consists of three files: 1997-001a.htm which includes the main information necessary for the search engine; 1997-001.htm which is the file with the main text; 1997-001t.htm which includes the tables and which will not be included in the full text search. The following code is extracted from the 1997-001a.htm file (dots [...] indicate deleted further markup): <HEAD> CONTENT="http://eiop.or.at/eiop/texte/1997-001.htm"> Union European Parliament In the file 1997-001.htm, the tags <!--BEGIN text--> and <!--END text--> are placed right before the first heading and just before the beginning of the references part (to avoid that words in the titles of cited literature may be found in a full text search). |
Glossary
|