Ranking Institutions and Authors by Publications

Overview

This application provides a general framework for ranking institutions and authors based purely on their publications. Customizable policies guide the ranking process, for example, by assigning different weights to different publication venues or by determining how a publication's score will be distributed among multiple authors. The goal of this application is to be applicable to as many subfields of computer science research, and to accommodate as many ranking policies, as possible.

The application is written in Java and requires JDK 1.4 or better. The executable code, full source code, and sample data from the Software Engineering research field is available here.

After unzipping the files, from the top directory of the unzipped files, type either "java -classpath bin edu.uci.isr.ranking.RankingMain" or "java -jar ranking.jar" to run the application. The various files included in the download are discussed in the relevant sections below.

To start a an example application as a Java Web Start application or Java applet, select an appropriate link below. When prompted whether you trust the certificate and want to run the application, please choose "Trust" or "Run".

General Ranking Considerations

The general questions to be answered for a ranking system are:

This section discusses how these questions influenced the design of the ranking application.

Data Collection

While there are many bibliography services, such as DBLP, Computer Science Bibliography, and ACM Digital Library, only INSPEC consistently provide institution information for its bibliographic data. Thus the data included in the download is collected from query results on INSPEC, and the programming logic is based on this format of data. However, there is no limitation in the application's capability to get data from other sources, if this data is more complete and accurate.

For information on how the ranking application imports references from INSPEC, see Adding References.

Scoring Publications

The toughest question is about how to give a score to a publication. Ideally it would be based on a citation count, assuming the most cited paper is the most important paper and thus deserves the highest score. The current application does not support this calculation. However, it can be easily extended to support merit-based scores, given some easily accessible scoring mechanism.

The current methodology bases a publication's score on the place where each publication was originally published. It is assumed that each research field has a set of conferences and journals with various qualities and that publications appearing in more privileged conference/journals should receive higher scores. Inevitably, the weight (privilege) given to each conference/journal is subjective, and treating all publications of a single journal/conference as of the same importance is far from perfect. However, this process is generic, making it applicable to any subfield of computer science, and it is customizable, accommodating many people's judgments.

For information on how the ranking application scores publications, see Score Criteria in Changing Options.

Multiple Authorship

Many publications have multiple authors. Unfortunately the bibliographic data of INSPEC only collects the institution of the first author, where the other sources do not provide this information at all. This shortcoming reduced the accuracy of the ranking.

The application is capable of handling multiple authors from multiple institutions and lets user decide how a publication's score should be distributed among each author.

For information on how the ranking application distributes scores among multiple authors, see Multiple Authors in Changing Options.

Using the Ranking Application

To use this tool, one generally proceeds through the following steps.

  1. Create a New Ranking
  2. Change Options
  3. Add References Note: Options cannot be changed once references are added.
  4. Perform Ranking Note: References are not displayed until after ranking them.
  5. Export the Results

Each step is discussed in detail below.

Creating a New Ranking

A "Ranking" encapsulates a set of policies that determine how the application will score various publications. The following policies are available:

Whether to rank institutions or authors

This is self explanatory.

How a publication's score will be distributed among its authors

The score may optionally be given to only the first institution/author, each participating institution/author (so each institution or author will receive a score for each paper it published), evenly distributed among each institution/author, distributed using the scheme proposed by Journal of Systems and Software, or distributed according to some distribution.

In the last case, an input string like "0.5, 0.4, 0.3" in the input box give the first author 50% of the score, the second author 40%, and the third author 30%. If there are more than three authors for one reference, each following author will get the same percentage as the last listed percentage. The total of the percentages does not have to be 100%.

The minimum number of pages for a publication to be counted

You can leave the minimal page limit as 0 so every paper will be counted, or you can enter a number so those short demos/posters included in conference proceedings will not be counted.

When starting the application, a default ranking is used that ranks institutions, assigns a publication's score only to the first author, and has no minimum page length. To create a new ranking, select File/New Ranking.

Changing Options

The following options can be saved to a file using "Options/Save Options", or loaded using "Options/Load Options". Note: options must be set before adding references. After adding references, you will not be able to change the options until you create a new ranking.

Institution Aliases

The data from INSPEC uses more than one name for an institution. To accurately rank the institutions, all aliases of an institution should be found so each reference can be correctly accredited. Using institution aliases, the ranking application will merge known institution aliases under a single, proper name.

Aliases are stored in a file and may be loaded using "Options/Aliases...". The default aliases file (included in the download) is "aliases.txt". The file format is documented in that file.

NOTE: To find aliases in a reference file, select "Options/Concentrate Name", then add the references. Note: the setting must be selected before adding references. The ranking application will then display each possible alias an institution might use. Institution are normally listed as "NameOfInstitution, City", however, selecting "Options/City First" lists institutions as "City, NameOfInstitution". After grouping institutions by names and by cities, you can identify the various aliases of an institutions and enter the aliases into an alias file. Note: the alias file is only used when when "Options/Concentrate Name" is not selected.

Address Changes

Some addresses do not follow the general format of "Department, University, City, State, Country". They may have a "," in the university name, which makes automated analysis difficult. The ranking application uses address changes to change these addresses to an analyzable format.

Address changes are stored in a file and may be loaded using "Options/Address Changes...". The default address changes file (included in the download) is "addressChanges.txt". The file format is documented in that file.

Score Criteria

The basic scheme of this ranking methodology is to give published papers a score based on the where they where published (i.e., what conference or journal). This mapping is specified by the user.

Score criteria are stored in a file and may be loaded using "Options/Score Criteria". The default score criteria file (included in the download) is "criteria.txt". The file format is documented in that file. To specify a default score for a reference whose origin is not listed in the score criteria file, select "Options/Default Score" and input a default score.

Note: The origin of a reference can be seen from the export of bibliography.

Multiple Authors

Some bibliographic information only contains the institution of the first author, such as that provided by INSPEC. This shortcoming reduced the accuracy of the ranking. The ranking application addresses this in two ways:

The "Options/Institution Separator..." option lets user specify the separator used to separate multiple institutions. The default separator is ";". See Adding References below for an example.

The "Options/As Previous Institution..." option lets the user specify the marker used to indicate that the current institution is the same as the previous institution. The default is "API". See Adding References below for an example.

Adding References

Note: before adding references, all options must be configured. After adding references, you will not be able to change the options, until you create a new ranking. Note: References are not displayed until after ranking them.

To add references, select "File/Add References". You can select multiple files containing reference data at once, and repeat this step multiple times, even after a rank operation. Note: Be careful not to add the same references more than once.

The format of the reference file comes from the Ovid export of INSPEC database. It should be in the following format:

       <RecordNumber>
       <Author>
          Authors' of the paper (These are separated by the character specified in "Options/Institution Separator...")
       <Institution>
          Authors' affiliations. Currently, INSPEC has only one institution for the first author. 
       <Title>
          Title of the paper
       <Source>
          Source of the paper: which conference, which journal, which year

       <NextRecordNumber>
       ......

Please consult the help from Ovid on how to extract the data.

Performing the Ranking

After adding references, you can select "File/Rank". You will input how many top institutions you want to include in the ranking (use 0 to include all), and you can select the beginning and ending year of the publications to be included in the ranking.

After the rank operation is completed, the ranking result will be displayed in a table in the leftmost pane. Selecting an institution (or author) from the results displays the corresponding scores in the central pane and the publications in the rightmost pane. The size of the panes can be changed, using the sliders. Each table can be sorted by clicking on a column header.

Exporting the Results

You can export the ranking result, the bibliography used in the ranking, and the score and publications of the selected institution, by selecting "File/Save Ranking", "File/Save Bibliography", and "File/Save Institution", respectively.

Suggestions