TCA Letters to the Editor
Press Release
Title: Mind the gap!
Date: March 2006
Organization: Stuttgart University Library
Letter:
Since Elsevier representatives, when confronted by librarians at public Scopus road shows and expert meetings with the results of Louise Deis' and David Goodman's investigations on the degree of completeness of Scopus as published in TCA (original review and present update), have repeatedly denounced these as untrue or misleading, I thought it worthwile to share my own observations with your readers.
My conclusions presented here are based upon my own extensive review of WoS and Scopus from Feb 15 to March 31, 2005, and again from March 10 to March 21.
To assess completeness, one method used by LFD and DG was to compare sample university Scopus and WoS item counts per year. For 2002 and earlier, between 15% to 40% of the items counted by WoS seemed to be missing in Scopus. However, we could demonstrate that the large discrepancies found in affiliation searches were mainly an artefact of changed indexing praxis: From 2003 onwards, Scopus captured the affiliation information for all authors (as does ISI in WoS), prior to 2003 only that of the first author was captured. When they fill in gaps for 2002 and earlier, they will capture all affiliations which tends to blur the discrepancy somewhat but most of the earlier content will remain as is. It has always been difficult to search for and collect all papers authored by an institution, but in searching Scopus the degree of incompleteness varies considerably due to its changed indexing praxis. Given the large percentage of coauthored papers that involve different institutions, the magnitude of the bias thus introduced into the statistics of institutional indicators is considerable (underestimate by 30%, depending on the data segment looked at).
However, LFD and DG also demonstrated gaps in content coverage directly by looking at specific samples of journals.
Since Elsevier representatives specifically challenged the statement that most of the specific lacks listed in the original review were still not filled in a year later (e.g. they told us that from hundred missing items per title perhaps only a few were still missing), I examined again each of the cases listed in Appendix A of the original review on volume and issue basis. My result is only slightly different, although two further months have passed by: In the 11 sample journals with partially missing 1996+ content, 6 have been corrected, 2 partially corrected (ASME: Noise control & Acoustics (still missing vol. 25 and 28), Monthly Review (still missing most of 2000 and 2001)), and 3 remain largely uncorrected (among them the Elsevier title Organizational dynamics (missing almost all of 1996-2000), the European Journal of Finance (missing all of 1996-2002), and Journal Europeen d'Hydrologie (still missing all issues of 1996-2001). One title (Molecular aspects of medicine) only had a few individual missing articles (now filled in), but publishes sometimes only one large review article per vol so that it appeared as if a lot of articles were missing. Also, for the "Journal of Comparative Physiology A", ostensible large gaps were caused by variant title forms not matched by the source index (only properly truncated searches via basic or advanced search on source title catch these variants); anyhow, some small gaps have been filled here also).
We also examined a randomly generated sample of 147 journals (after eliminating 12 titles only recently added from a slightly larger group) and three subgroups of ca. 50 each to check for sample variations. Within the large group, our examination revealed gaps in content 1996- for 25%(+/-5%) of the titles. The missing percentage ranged from 2% to 80% (the median value for the titles affected was 37%(+/-3%). In total, the missing content represents 7%(+/-2%) of the estimated total article count in the sample, which was 125000.
A year ago, in March 2005, we examined a set of 170 ISI physics journals in both WoS and Scopus and compared article counts vs. publication year for 1996-2004. 40% of these core physics journals, especially from society publishers but even from Elsevier appeared to miss some content in Scopus, 25% showed marked gaps (missing many issues or complete volumes), especially around 1999. You could easily miss some nobel prize winners most influential articles from that time. Another obvious problem were duplicate entries which affected about 25% of journals. The Scopus team was alerted to these serious deficiencies. A year later, fortunately, most of the duplicates have been sorted out, and many gaps filled in. In fact, using the Advanced Search capabilities in Scopus, it is quite easy to check how much content from e.g. 1996-2003 has been added to Scopus after January 2005 (you remember that Scopus promised that, for the overwhelming majority of journals, including all the top journals, they would have cover-to-cover coverage by the end of January [2005]), that is in 415 days. It is about 5% of present content (4,5%/year), with somewhat higher values for Chemistry (9%), Physics (7%), Social Science (7%) and Mathematics (9%), and lower values around 3% for Health, AgriBio and Life Sciences. The rate of adding in missing material has somewhat slowed down by now (to 1/3 during the last 3 months) and we estimate that it will take Elsevier 2-4 more years to fill in the rest. From the 477000 articles added to 1996-2003 content since Feb 2005, only about 9000 were from recent (2005) additions to the Scopus title list (some 850 titles over the original 14000, if we do not count mere title changes and variants of titles already included), the rest was from the original title list of Nov 2004.
But even in the above mentioned core physics journal list, some gaps still remain to be filled. Some examples: Journal de Physique IV - Proceedings is 4 volumes or 6 months (!) behind in Scopus, from 40 vols. published in 2002-2005 22 vols. are missing. Also missing: the precursors of Eur. Phys. J. A-E, Zeitschrift f. Physik A-E 1996-97, Nuovo Cimento D 1996-98 and the french title Microscopy, Microanalysis, Microstructures 1996-97. The prestigious Journal of High Energy Physics (JHEP) is missing all content from 12(1997) to 12(1999). Eur. Phys. J. C is missing vols. 15 and 23, Acta Physica Polonica B content for 1996-2001, A for 2001-2002, Physica Scripta T the content for vols. 74-78 (1998), IEEE Transactions on Nanotechnology lacks vol. 1(2002). Of the Journal of Magnetic Resonance, Subseries A is missing for 1996.
That former title forms of mergers and changed titles are often overlooked, is also a problem in Chemistry.
From the precursors of Physical Chemistry Chemical Physics, the Faraday Transactions 1996-1998 are still completely missing, while Berichte der Bunsengesellschaft/Physical Chemistry Chemical Physics is missing 80% of its content from 1996-1998. For 1996 and 1997, also the precursors of Eur. J. Inorg. Chem. and Eur. J. Org. Chem. are missing, among them Chemische Berichte, Recueil, Liebigs Annalen, Recueil, Recueil des Travaux Chimiques des Pays Bas, Bulletin de la Societe Chimique de France, and Bulletin des sociétés chimiques belges.
Another problem: articles from J. Prakt. Chemie from 1996-2000 are incorrectly listed under the later title "Advanced Synthesis & catalysis". This has nothing to do with established citation practices and is a disservice to the user.
This brings us to the "Source Index" which as handled by Scopus is not as useful as it could be and even misleading for the unsuspecting user, especially in case of subseries with variant title forms which are not covered: compare e.g. SRCTITLE(Physical Review E) with its appearance in the SOURCE Index. In this context, it is remarkable that a promise made in the November 2004 issue of the InsideScopus Newsletter to implement Journal Volume and Issue Browse ("Shortly you'll be able to browse journals on Scopus, from 1996 onwards, by volume and issue. This will allow you to browse with greater accuracy and precision through the nearly 14,000 peer-reviewed journals.") has not been fulfilled, and Scopus sales persons now say they don't plan to implement it, allegedly because the market does not demand it. We beg to differ, and their claim is very strange in view of their former announcement and successful implementations like those of EBSCO in Academic and Business Search Premier. I guess a more likely reason is that 1996+ content is still far from complete and that it would become plainly obvious had the feature already been implemented.
Another problem concerns metadata quality and article-level linking. Disappointingly, and despite a very promising exchange with the dedicated Scopus Customer Service team, 11 Months after alerting the team to inconsistent treatment and missing OpenURL encoding of article numbers replacing page numbers in some AIP and all APS journals (e.g. Phys Rev Lett), this problem still affects most APS articles published between 2001 and 2004, and still also recent ones, leading to either missing "View at Publisher" links or nonresolvable OpenURL metadata, or both. In contrast, ISI has no problem with these titles and provides link resolvers with correct metadata using the ARTNUM field. The Scopus DTD is also apparently not able (or perhaps rather not yet properly used) to represent these article numbers in the citations provided, making exported data unusable for reference purposes.
We also noticed that Scopus seems to generate "View at Publisher Links" tentatively by looking for matches in the Crossref database for just First Author (ignoring initials) plus First Page, if DOIs are not already included in the metadata and no other processing rules are available. This generates a lot of false hits especially with common chinese names (e.g., have a look at Acta physica sinica vol. 52!). CrossRef's own fuzzy matching algorithm is much more reliable than the quick and dirty approach used by Scopus which should be abandoned.
On the positive side, the lack of prompt updating for core journals in Scopus which still was a big problem a year ago, has now disappeared. Both WoS and Scopus are in general only 1-2 issues behind for journals appearing on a weekly schedule; in addition, Scopus sales people have announced they will soon add content from "Online first" journal areas (pre-publication ahead of schedule as articles are accepted) also.
The relative merits of citation indexing and the conceptualization of search strategies in Scopus vs. WoS is a worthwile but quite complex matter that cannot be addressed adequately in this letter. However, the introduction of a proper SAME Operator in Scopus is long overdue, as the lack of it introduces a lot of annoying spurious hits for all fields that may occur multiple times per record (such as cited references or author affiliations). I begged for this now for a year and the proximity operators are no substitute because in Scopus' implementation they cannot be combined with truncation.
In summary we can only reiterate: "Mind the Gap!" ... Even if some indications of missing content turned out to be spurious (but nevertheless serious because of the bias and inhomogeneity introduced into searches of the Scopus database), and even though it can be shown that Elsevier undertook considerable efforts to fill the gaps, it is undeniable that we are dealing with a seriously incomplete and inhomogeneous index. We may safely infer that 16 months after the official launch of Scopus, still around 7% of 1996+ content is missing, while gaps in individual journal runs, including even core journals, are unpredictable and can be considerably larger. In Scopus presentations it is now reluctantly acknowledged that identifying and filling the remaining gaps (especially in fields such as physics, chemistry, and engineering) is a major task of Scopus Development. According to Elsevier, there is now a team of 5 persons involved in identifying and filling the gaps. What we are missing, however, is a clear concept and strategy how to achieve this aim in the near future. Until then librarians will be well advised to be reluctant before abandoning other indexes in favor of Scopus. The recently formed Content Selection Committee should closely supervise the steps being taken to remedy these deficiencies. It would also be helpful to clarify the Content coverage policy. New titles suggested for inclusion and submitted until October of each year will be evaluated and approved titles will be added in the first quarter of the subsequent year. But will backfile content for these journals be added also? Title lists provided by the publisher should clearly indicate the time span for which full coverage can be expected, as is done by major other database providers (e.g. EBSCO) also.
Bernd-Christoph Kaemper,
Electronic Resources Librarian,
Stuttgart University Library,
P.O. Box 104941, 70043 Stuttgart, Germany

