REMOVE DUPLICATES

Abbreviation:

RD

Command Format:

RD Sn
RD Sn FROM <file number>,<file number>, etc.

REMOVE DUPLICATES is the most frequently used duplicate detection command. The RD command creates a set of unique records, in which only one record from each set of duplicate citations is retained. The format for the command is

RD Sn RD entered without a set number defaults to the last set created.

When duplicate records are identified, records are chosen for retention based on the order in which the files were entered in the BEGIN command. For example, if the command BEGIN 154,72 is entered, records from File 154, MEDLINE® are given priority over records from File 72, EMBASE®. You can change this order of priority with the SET FILES command.

The records in the RD set are in accession number order. Since the SORT command only works with publication year (PY) and publication date (PD) in OneSearch, you can use the IDENTIFY DUPLICATES (ID) command to obtain a set that is approximately sorted by title (initial articles are ignored). Simply use the ID command with the set that resulted from the RD command, as shown in the second group of records in the following example.

Duplicates can be removed from a single file, as well as from multiple files. You can also use the RD command with the FROM option to remove duplicates FROM particular files (e.g., RD S3 FROM 6,8).

Records in a set created with the RD command can be used in later search statements.

Note: If you apply the RD command to a search that includes one or more files that do not offer duplicate detection, a system message notifies you of that fact. The system then processes the remaining files that do offer duplicate detection. All records from the unsupported file(s) are retained in the RD set. Enter the HELP DUP command online to display a list of files that do not offer duplicate detection.

• To REMOVE DUPLICATES from a search done in OneSearch, using ID to SORT by title:

?show files;display sets 
File 154:MEDLINE(R) 1985-1997/Aug W3 
         (c) format only 1997 Dialog Corporation 
File  72:EMBASE 1985-1997/Jun W4 
         (c) 1997 Elsevier Science B.V. 
 
Set     Items   Description 
S1        643   ASPIRIN AND DIABET? 
S2        101   S1/ENG,1996:1997 
 
      Set  Items  Description 
      ---  -----  ----------- 
?rd s2 
...examined 50 records  (50) 
...examined 50 records  (100) 
...completed examining records 
      S3      70  RD S2 (unique items) 
?type s3/6/all 
 
 3/6/1     (Item 1 from file: 154) 
09118702   97319150 
  In experimental diabetes the decrease in the eye of lens carnitine levels 
is an early important and selective event. 
 
3/6/2     (Item 2 from file: 154) 
09109902   97243938 
  Outcome of unstable angina in patients with diabetes mellitus. 
 
3/6/3     (Item 3 from file: 154) 
09100806   97211065 
  Progression  of distal symmetric polyneuropathy during diabetes mellitus: 
clinical,  neurophysiological,  haemorheological  changes  and  self-rating 
scales of patients. 
      . 
      . 
      . 
?id s3 
...examined 50 records  (50) 
...completed examining records 
      S4      70  ID S3 (sorted in duplicate order) 
?type s4/6/1-4 
 
 4/6/1     (Item 1 from file: 154) 
08896349   97077072 
  AL0671,  a  new potassium channel opener, inhibits nonenzymatic glycation 
of protein and LDL oxidation. 
 
4/6/2     (Item 2 from file: 154) 
08889522   97124614 
  An  analysis  of  perioperative  surgical  mortality and morbidity in the 
asymptomatic    carotid    atherosclerosis   study.   ACAS   Investigators. 
Asymptomatic Carotid Artheriosclerosis Study. 
 
4/6/3     (Item 3 from file: 154) 
08607141   96260029 
  Anticoagulation: risks and benefits in atrial fibrillation. 
 
4/6/4     (Item 4 from file: 154) 
08825920   96430980 
  Anticoagulation   for   atrial  fibrillation:  epidemiology  informing  a 

difficult clinical decision.

 

IDENTIFY DUPLICATES

Abbreviation:

ID

Command Format:

ID
ID Sn
ID Sn FROM <file number>,<file number>, etc.

The IDENTIFY DUPLICATES command can be used in single or multiple files to create a sorted set of records in which duplicates are grouped together. The ID command allows you to easily identify duplicate citations, while still retaining all of the records retrieved by your search. Unlike REMOVE DUPLICATES (RD), which automatically eliminates duplicate records from a set, the ID command does not remove records from your search results.

ID entered without a set number defaults to the last set created.

The ID command creates a set of records that have been approximately sorted by title. There are occasional variations to strict alphabetical order because duplicate detection takes into consideration alternate spellings, minor variations in titles, and leading articles, such as "the" and "a."

By displaying the ID set, you can decide which records to TYPE, DISPLAY, or PRINT. The SET FILES command can be used to change the order in which records are sorted in an ID set. You can also use the ID command on a set that has had the duplicates removed; this will sort the set alphabetically by title.

If you typically post-process your search results (e.g., format them into customized bibliographies with word-processing software), you can use the ID command to gather duplicate records and then combine them later into a single record that contains the best feature from each record, such as various editions of a book.

You can also use the ID command with the FROM option to group duplicates FROM particular files (e.g., ID S3 FROM 6,8).

Note: If you apply the ID command to a search that includes one or more files that do not offer duplicate detection, a system message will notify you of that fact. Dialog will then process the remaining files that do offer duplicate detection. Records from unsupported files will be retained in the ID set, but will be sorted to the bottom of the set. A list of files not offering duplicate detection can be obtained online by entering HELP DUP.

• To IDENTIFY DUPLICATES while using OneSearch®:

?b 72,154
       07jun98 15:56:00 User306002 Session D679.3
 
SYSTEM:OS  - DIALOG OneSearch
  File  72:EMBASE  1985-1998/Jun W1
         (c) 1998 Elsevier Science B.V.
  File 154:MEDLINE(R)  1985-1998/Jul W4
         (c) format only 1998 Dialog Corporation
 
      Set  Items  Description
      ---  -----  -----------
?select aspirin and diabet?
           25274  ASPIRIN
          167271  DIABET?
      S1     758  ASPIRIN AND DIABET?
?s s1/1998
             758  S1
          186367  PY=1998
      S2      36  S1/1998
?id s2
...completed examining records
      S3      36  ID S2 (sorted in duplicate order)
?type s3/6/1-5
 
 3/6/1     (Item 1 from file: 72)
10717382   EMBASE No: 98143901
 Acute  coronary  syndromes  in  the  United  States and United Kingdom: A
comparison of approaches
 
 
 3/6/2     (Item 2 from file: 72)
10726687   EMBASE No: 98160346
 Acute  myocardial  infarction  in  Switzerland:  Results  from the PIMICS
myocardial infarction registry
  DER  AKUTE  MYOKARDINFARKT  IN  DER  SCHWEIZ:  RESULTATE  AUS DEM PIMICS-
HERZINFARKT-REGISTER
 
 
 3/6/3     (Item 3 from file: 72)
10679857   EMBASE No: 98115246
 Anticoagulation   to  prevent  stroke  in  atrial  fibrillation  and  its
implications for managed care
 
 
 3/6/4     (Item 4 from file: 154)
09465755   98184483
 Anticoagulation   to  prevent  stroke  in  atrial  fibrillation  and  its
implications for managed care.
 
 
 3/6/5     (Item 5 from file: 154)
09493275   98227654
 Antioxidants  diminish  developmental  damage induced by high glucose and
cyclooxygenase inhibitors in rat embryos in vitro.