Task 2 - Evaluating the supporting data sets
The goal of Task 2 is to evaluate the quality of the data from which earthquake parameters are derived (supporting data sets). For this purpose it is necessary to define a tool (scale), simple enough for classifying a large amount of supporting data sets. This tool will then be used for sorting out multiple determinations and providing a preliminary file of earthquakes.
The following considerations mainly apply to data of earthquakes before 1900 and to macroseismic data after 1900, though they can be adapted also to instrumental data.
2a - Methodology
The supporting data sets
How "good" are the supporting data sets of our PEC? Where do they come from ?
It seems reasonable to assume that each catalogue entry (set of earthquake parameters) was compiled on the basis of a supporting data set (historical, instrumental, mixed, etc.), according to some procedures. Therefore, the quality of parameters depends on both determination procedures and supporting data sets.
Procedures used to determine parameters are sometimes described in the "introductions" to the catalogues; their analysis and improvement belong to Task 5.
Supporting data sets show a great variety of type. They range from a few lines, compiled by someone with the aim of summarising the earthquake knowledge, to papers or monographs reporting comprehensive investigation on specific earthquakes.
Catalogues mentioning their supporting data sets normally put numerical or alphanumerical codes in the column "reference" and supply the key to the code in a comprehensive bibliographical list. In many PEC each entry has a single bibliographical reference; in other cases each entry may have multiple references. In a few cases references are not explicitly linked with earthquakes, but they are simply quoted in the introduction to the catalogue altogether. The situation of input PEC with respect to references is given in Tab. 2.1.
References belong to the following typologies (see some examples in the box Examples of references (supporting data sets) from the BEECD Working File.):
Sources, references and roots
The historical investigation of an earthquake is usually performed dealing with a number of sources: the investigator summarises the information gathered from the sources in a report where he provides some overview of the investigation and the main points on the effect distribution. The sources are then quoted at the end of the study as references.
Now, it seems reasonable to assume that, for each earthquake, the data
set used by the catalogue compiler for determining the parameters is the
whole output of the investigation and not the single sources, even if this
output is not formally reported. Also in the case of one source only, the
compiler must interpret the information provided by the source in terms
useful for seismological scopes before using it.
The output of the investigation, prepared in the view of a seismological use, can assume the form of a report, a study, a map of intensity data points, etc. and will be called hereafter the root of the parameters.
The sketch below explains these concepts.
A root scale
The problem of qualifying the supporting data sets, or roots, was discussed in the frame of European workshops, such as the meetings of ESC Historical Earthquake Data and Macroseismology Working Groups in Liblice (1990), Milano (December 1993), Athens (September 1994) and Police (November 1995). Though there is an agreement on the fact that the average quality of supporting data sets is far from being satisfactory (e.g.: Stucchi et al., 1991a; Stucchi, 1993; Vogt, 1994a), there is not yet a commonly accepted method to improve them. For instance, Alexandre (1990; 1994) suggests to disregard parametric catalogues and to re-compile earthquake catalogues starting from a thorough research based on the rules of historical criticism only; others (e.g.: Guidoboni and Stucchi, 1993; Ambraseys et al., 1994; Albini et al., 1996a) suggest that results achieved by previous researchers should be taken into account by means of a careful handling and of a critical interpretation of data sets supporting their seismological interpretation.
Let's examine today situation as it comes out from the survey of current
In many cases supporting data sets come from the layers of a sedimentation process through which the original information (primary earthquake records) was interpreted and manipulated, from the time the earthquake took place to today (Marmo, 1989; Alexandre, 1991). Obviously, the quality of such data sets depends on how much the adopted layer of interpretation is close in time to the earthquake, how reliable are the sources which are supporting that layer and how good was the historical interpretation. For instance, the catalogues by Shebalin et al. (1974), Postpischl (1985a), Leydecker (1986), Herak (1995) mostly rely on other parametric earthquake catalogues, while the catalogues by Ribaric (1982), Zsíros et al. (1988), Papazachos and Papazachou (1989) and Houtgast (1992) rely on earthquake compilations. Furthermore, data sets are presented according to different levels of formalisation: from very detailed monographs where all aspects and problems have been taken into account and possibly solved (such as Rodríguez de la Torre, 1984; Wechsler, 1987; Gutdeutsch et al., 1987; Figliuolo, 1988; Ambraseys and Finkel, 1989; Stucchi et al., 1991b; Ambraseys and Karcz, 1992; Grünthal, 1992; Hammerl, 1992; Moroni and Stucchi, 1993; Lambert et al., 1994; Olivera et al., 1994a), to synthetic studies, such as Postpischl (1985b), Ambraseys and Finkel (1995), Vogt (1994b), Alexandre (1994), Castelli (1993), to collections of earthquake records put together in a folder, stored in some institutions' cupboards.
In many other cases supporting data comes from the top layers of the sedimentation process; that is, supporting data is another parametric catalogue, from which compilers simply imported the parameters. This is mostly the case of the European catalogue (Van Gils and Leydecker, 1991), the catalogues for Italy (Postpischl, 1985a), for Albania (Sulstarova and Kociu, 1975a), for Croatia (Herak, 1995). In their turn, such parameters may come from unreliable data sets: they can have been determined according to other procedures and may have been just copied or even been altered to some extent without notice.
A very complicated scale would therefore be needed to classify every
aspect of supporting data sets which have revealed to be so different from
each other. On the other hand, the large number of PEC entries to be evaluated
and the broad scope of the classification call for a robust, easy to use,
tool. Therefore, one must be ready to accept that intermediate cases will
not easily fit in this classification and might escape a clear assessment.
Two main aspects will be considered in developing a classification of supporting data sets: type and quality. Type means the degree of seismological formalisation according to which the data set is organised. Quality has to do with the quality of the sources, in its turn.
Earthquake studies are papers, reports, compilations where the author put together the information coming from earthquake records and came to some conclusions with respect to earthquake time, most affected area, effects distribution, etc.
Studies may range from papers or monographs devoted to individual earthquakes, such as Alexandre et Vogt (1994), Glavcheva and Radu (1994a), Vogt and Grünthal (1994), Grünthal and Meier (1995), to compilations devoting a few lines to each event, such as Papazachos and Papazachou (1989), Ambraseys and Jackson (1990) (in these cases the portion of earthquake compilation which is dedicated to a single event is one study). Top studies are those performed according to historical seismology methods: they usually present conclusions in a report explaining how earthquake records were assembled with respect to both historical and seismological criticism and which are the unsolved problems, if any. Obviously, we assume that earthquake studies are better roots than PEC.
With respect to type, roots have been divided into five classes, in the following order of preference which corresponds to a decreasing level of formalisation:
Type 2: earthquake studies
Earthquake studies as described above, but where earthquake effects are not assessed in terms of intensity at each place belong to this type. For instance, this is the case of the 1640 earthquake in Northern Europe (Vogt, 1994c). Most seismological compilations (Davison, 1899; Baratta, 1901; Schorn, 1902; Rethly, 1952, etc.) and early studies can be inserted in this type.
Modern studies giving only symbols for "strong", "damaging" and so on are also associated to type 2, considering that they need some interpretation to be upgraded to type 1 (see for instance, Albini et al., 1994b).
Type 3: earthquake material
In many cases, compilers determined parameters from material at a stage of formalisation earlier than a study. Records pertaining to an earthquake were assembled and put in a folder, the author had an idea of how the effects distribution looked like, the research was still in progress but he had no time nor intention to reach or write down any conclusions.
This situation is obviously one step back with respect to "literature" or "earthquake study" (type 2). Perhaps what is lacking is only that the conclusions are not written and, therefore, such situations can be in principle easily upgraded to type 2 and even to type 1.
Multiple references for which no root (main reference) can be assessed belong in principle to this type. They will normally be considered of type 3; only in the case that all references are PEC, the type will be 4.
Type 4: set of parameters exported from other PEC
As the preliminary survey presented in Tab. 2.1 has shown, parametric earthquake catalogues are frequently used as roots for BEECD input PEC entries. Parameters can have been:
Type 5: unknown
Some PEC do not supply references in connection with each earthquake, or they just supply a general list of references in the introduction to or at the end of the catalogue, without any possibilities of referring them to each earthquake. This means that there is no way to find out the root, without exceeding in interpreting the author's indications. Such roots will preliminary be given the worst rank, but it is possible that further investigation will bring them up in the ranking.
The quality of a root depends on the quality of sources used and historical reconstruction. While the latter cannot be assessed unless doing it again, the quality of the sources quoted by the roots of type 1 to 4 can be assessed more easily. Irrespectively of its type, a root can rely on:
Quality B: primary historical sources and compilations (secondary
Most roots use information derived by sources other than primary historical ones, such as:
Quality C: compilations (secondary sources) only
A short description of this kind of sources was given above. The most common case of a quality C root is that of most seismological compilations. They do represent the most widespread kind of root used by input PEC, because they supply an "in nuce" study for each earthquake; though they consist of a minimum amount of information only, the compilers found them very suitable for determining the earthquake parameters.
According to the methodology above described, the root class is assigned by the combination of type and quality: 1A, 1B, 1C and so on. App. A1 to A11 present 11 examples of root class assessment (all root classes except 4A, never assessed, and 5) concerning the same amount of entries from different PEC and on what reasons they were attributed to each root class.
It has to be stressed that the root class is assigned to the root with respect to each earthquake, and not to the reference in general. This means that the same reference, such as Sulstarova and Kociu (1975b), can be classified as type 1 for the 1865, October 10, earthquake for which it supplies intensity data points and as type 2 for the 1858, September 20 earthquake, for which it gives only an effects description.
More often it will be possible to assess for the same reference a different quality, such as A and C. The most typical example is the case of a late 19th century compilation, which can summarise the information on a 16th century earthquake, but supply original, primary data for a coeval event (e.g.: Volger's compilation (1857) on Swiss earthquakes from the year 562 to about 1830 will be of quality C, while it will be of quality A for the years 1830-1857).
Root classes can be put in a hierarchical scale by evaluating how far they are one from another; the best way is to estimate how much work has to be done to upgrade a root to an upper class. A closer look to the root classes suggests three levels (see table below):
(may become good with some effort)
(needs substantial investigation)
2b - Results
Application of the root scale
The root scale was applied to all input PEC of the BEECD WF, which contains 13.220 entries. The classification was performed by partners and other researchers who collaborated with the project: in all, approximately 10 people. The classification started after a workshop (July 1995, Milano) in the framework of which first methodological aspects of the root scale were discussed and then some cases from different PEC were chosen as test samples.
A first version of the root scale was prepared and tested within the first year of the project. The tests done by partners and researchers evidenced some weak points of the scale and asked for some calibration. As a consequence, both methodology and number of root classes were adjusted, taking into account the suggestions coming from the partners (Musson, 1996; Albini et al., 1996b; Albini and Stucchi, 1997; Stucchi and Camassi, 1997).
The summary results of the application of the root scale to the input PEC and some comments are presented in the App. B1 to B20.
Here follow some case histories.
Some PEC revealed some peculiarities with respect to supporting data sets; therefore, the application of the root scale was in these cases guided by general considerations, not exactly foreseen by the methodology described in Part 2a.
All VGL91 entries (649) do not carry any roots and they should have been classified 5. It was preferred to classify them 4C by default, as the authors explicitly mention in the introduction that this PEC was compiled by merging "national" files supplied by European institutions.
The input PEC for the Iberian peninsula, MEM83, quotes its roots in a general reference list, without any explicit links between each root and each entry; therefore it should have been classified as in the previous case. However, its supporting data consists of a large amount of seismological compilations and earthquake studies which have been considered of a uniform type and quality; therefore, a root class 3C was assigned by default to all its entries (1.149).
LAA96 is in conditions very similar to MEM83, that is no explicit links between roots and entries. Since the supporting data sets appear to be of a better quality, all its 417 entries were given a root of class 3B, with the exception of 41 entries for which this PEC supplies the list of references, which were assessed as of class 3A.
The set of "Additional Determinations" (AD.DET), made by 673 entries coming from three files (ADBUL from Bulgaria, 197 entries, ADLEN from Austria, 116 entries, and ADVCA, 360 entries, from Belgium), was assigned to root class 5.
When the roots were available to the researchers performing the classification, the application of the root scale was done on a steady and reliable basis.
Sometimes they were not, and this was not an unfrequent case. However, it was out of the scope and possibilities of the project to retrieve all the not available roots, which would have meant, for most cases, to perform again the investigation from the beginning. Whenever roots where not available on the operators' desks, the choice was to assess the more conservative level both for type and quality.
Same item, different root class
The application of the scale was based on the analysis of the root supporting a specific entry. In many cases the same item acts as root for many entries; therefore it was necessary to give the root class to the part or section of the item chosen as the root for each entry. This means that it is a consequence of a correct root classification to assess different root classes to the same root with respect to different entries. Some case histories are described in detail in App. B.
Problems in applying the root scale
a) Same earthquake, same root, different Rc
The study by Rethly (1952) is the root for the same earthquake in different PEC in a few cases. When assessing the root quality, the researcher who classified ZSA88 considered more important the fact that Rethly had in his hands a set of primary sources, and assessed for this root class 2A or 2B. On the other hand, the researcher who classified LAB95 took into consideration the fact that Rethly is a seismological compilation, re-elaborating both primary sources and compilations and gave to such root class 2B or 2C. However, this happened only for a few roots and the difference did not affect the following step (choice among multiple determinations).
b) Same earthquake, similar roots, different Rc
In other cases, supporting data sets (roots) for the same earthquake are quite similar and the differences in classification depend on the slightly different points of view of the researchers who did the assessment. This is mostly the case of some seismological compilations, such as Sieberg (1940b) or Rethly (1952), Sieberg (1940b) or Montandon (1953), etc., dealing with the same earthquake. Once again, the different Rc assessment did not affect the following step.
Some difficulties were met in identifying the main root to be classified in case of multiple references; as explained in Part 2a - Methodology, it was not always possible to find out the root and a "ghost" or "hidden" root (HID, type 3) was assessed to overcome this situation.
Inside the BEECD Working File the hidden roots are about 46% out of the roots of type 3 (1.169 HID roots out of 2.496 roots of type 3). There are only 71 HID roots classified as 4C, all of them in SHA74. Though it was not possible to identify the main root, all references were parametric earthquake catalogues, making the type 4 prevail on type 3.
The classification of hidden roots resulted different only with respect to quality, which was assessed mainly on the basis of the quality most represented inside the supporting data set. HID roots are equally divided between quality B (50%) and quality C (44%).
This problem does not concern the roots of the input PEC, but rather the studies, analysed in the frame of Task 3, which prove that an earthquake listed in the input PEC is fake, because some original information was later distorted: in this case the study will not give datapoints (therefore, type 2). As for the quality, in most cases the source of distortion will be found far in time from the event; consequently, no primary source should come in the game (therefore, quality C).
However, these studies do use primary sources close to the distortion, which actually represents the "event" which gave birth to the entry; moreover, in some cases the sentence of "fake" is issued after checking some primary sources contemporary to the presumed quake, which turn out silent with respect it. Therefore, considering that such studies supply an effective contribution which in most cases cannot be upgraded, they were assigned quality B in most cases.
Type and quality were uniformly assessed by different researchers involved in this particular task, though the agreement on the meaning and application of the root scale were reached only after workshops (July 1995, Milano; November 1995, Milano) and bilateral meetings and discussions among partners and operators.
The supporting data sets which gave origin to each PEC are reflected by some "clusters" of classes in some of them (e.g.: 5 classes for SUK75, 4 classes for MUS94 and LEY86, 3 classes for SHA74 and HER95, 2 classes for COM82).
When 7 or more classes are present, this could be seen as a consequence of a larger supporting data set available for this PEC and may be as well of a better knowledge of its roots by the researcher who did the classification (e.g.: 7 classes for GRU88, LAB95, PAP89; 8 classes for POS85, RIB82 and ZSA88; 10 classes for KSH82).
In all, the application of the root scale was successful from a general point of view and within the scopes of this task of the project. The root scale turned out to be a useful tool, though not all the 13 classes, obtained by combining type and the quality, are significantly present in the WF.
Root class and root level distribution
The distribution of root classes within each PEC for the time-window 1400-1899 (total figures and percentages on each input PEC) is given in Tab. 2.3, Fig. 2.1 and Fig. 2.2. As it can be seen, the distribution varies substantially from one PEC to another. With respect to quality, the best ranked input PEC are MUS94, LAB95 and ZSA88.
The distribution pattern is not uniform in time. Some particularities in distribution appear when looking at different time-windows. Root class 1A (1%) is represented in more than one PEC only between 1850 and 1899. With the exception of one root of POS85 for the 18th century, between 1400 and 1849 all roots of class 1A belong to only one, that is MUS94. Between 1500-1599 the three root classes of type 2 are the 42%, by far the most important cluster inside the general time-window (1400-1899).
The time-window which shows the largest distribution of root among classes is 1850-1899, while the period when class 4C, the most represented one, is the most significant as well is between 1700 and 1799 (48,5%); in the same time-window class 2C is the 17% only. As an example, Fig. 2.3 and Fig. 2.4 show the same analysis of Fig. 2.1 and Fig. 2.2 in the time-window 1500-1599: in this time-window HOU92 shows a better performance.
Obviously, the overall situation improves if we consider only large earthquakes (Ix/Io _ 7/8 or M _ 5.2; Tab. 2.4, Fig. 2.5 and 2.6). The percentages of the best root classes are higher than in the whole BEECD WF; GRU88 shows a quality peak. Here, again, the distribution pattern is not uniform in time: Fig. 2.7 and Fig. 2.8 shows the same analysis of Fig. 2.5 and Fig. 2.6 over the time-window 1500-1599.
Looking at the WF 1400-1899 as a whole (Tab. 2.5, Fig. 2.9 and Fig. 2.10; percentages on the WF figures), the most represented class is 4C (39%), followed by class 2C (23%). The input PEC that mainly contributes to this uneven situation is POS85, which contains the 63% of roots located in class 4C and the 74% of roots assessed in class 2C. Root class 3C constitutes the third important root class (13%), followed by root class 2B (9%).
Finally, Fig. 2.11 , Fig. 2.12 and 2.13 give an idea of the distribution of root type, quality and level - respectively - in the time-window 1400-1899.
The qualified BEECD Working File
The final goal of Tasks 1 & 2 was to compile a comprehensive list of earthquakes, to find out the best supporting data set (root) available for each of them and to establish priorities for the investigation.
For this purpose, multiple determinations (MD) of the same event were sorted out selecting one of each family (same Fn) by means of the parameter Rc. That is, for each family of MD the entry carrying the best Rc was selected and given the code F (first) to the parameter "BEECD preference code" (P, first column). In case of same Rc among two or more entries, the entry belonging to the oldest PEC was adopted as F.
Obviously, single determinations (SD) were all given F by default.
As a general rule, DD entries were never adopted as F; entries provided by the files defined as "source of additional determinations" (AD.DET) were considered separately.
A portion of the qualified WF 1400-1899 is given in App. C.
Examples of references (supporting data
from the BEECD Working File.
Or Ye Mo Da Ho Mi Ax Ref RC Om Ix Io Lat Lon Mm LEY86 1593 02 06 NOERDLINGEN/RIES S40 0 70 48.833 10.500 0
Historical earthquake investigation
Or Ye Mo Da Ho Mi Ax Ref RC Om Ix Io Lat Lon Mm ZSA88 1590 09 15 AUSTRIA 67 90 48.140 16.120 0
Or Ye Mo Da Ho Mi Ax Ref RC Om Ix Io Lat Lon Mm POS85 1898 03 04 21 05 CALESTANO 36 0 70 44.633 10.167 0
Parametric earthquake catalogue
Or Ye Mo Da Ho Mi Ax Ref RC Om Ix Io Lat Lon Mm ZSA88 1830 08 02 11 BRESTANICA 70 0 70 46.000 15.500 50
0080 02 08 1830 10 00 00 46.000 15.500 c 7.00 0.0:000 00.0 0.00 0.00 0.00 00.000 181 1,131 LJU Brestanica dolina Save, (Val)
An example of how an individual reference
chosen as the main one (the root) among multiple references
Or Ye Mo Da Ho Mi Ax Ref1 Ref2 RC Om Ix Io Lat Lon PAP89 1780 10 HIERAPETRA 72 129 100 34.900 25.800
This entry of the catalogue by Papapazachos and Papazachou (1989) is supported by the data contained in two texts of seismological literature.
Code "72" points to an earthquake compilation
by Mallet R., 1854. Third Report on the facts of Earthquakes Phenomena,
Catalogue of recorded Earthquakes from 1606 B.C. to A.D. 1850. Report of
the 22nd Meeting of the British Association for the Advancement of Science,
pp. 1-176, London, who supplies on this earthquake the following information:
|Locality||Direction, duration and number of shocks||Meteorological and other phenomena||Authority|
Probably at the
preceded by others for
Eropeter with its
garrison of 300 Turks
was swallowed up
villages and their
of 11 Nov., p. 56,
quoting "la rubrique"
of Leghorn of the
15th Oct., which
Code "129" points to Sieberg A., 1932b. Untersuchungen über Erdbeben und Bruchschollenbau im Östlichen Mittelmeergebiet, Jena, who supplies on this earthquake the following information: