The balance of power

project description

This week’s assignment involves the sort of data collection and manipulation that

you might have to do for a project of your own. It seems straightforward, but the

devil is in the details: there are a lot of land mines even the simplest data-gathering

exercise. The assignment is designed to point them out and help you walk around


The Correlates of War Project (COW) produces one of the most widely-used

datasets in all of international relations: the National Material Capabilities Index

(NMCI). This index consists of annual values for total population, urban population,

iron and steel production, energy consumption, military personnel, and military

expenditure of all countries in the international system, from 1816-2007. In order

to calculate a country’s overall capabilities, COW takes the average of each country’s

share of system-wide resources in each of these six categories.

In this assignment, you will reproduce a more limited version of the NMCI

using present-day data.

1. Go to the Wikipedia pages for world population, GDP, and military expenditures.

?? For each page, extract the relevant data into a separate Excel spreadsheet.

(For population, use United Nations estimates.)

2. Get rid of all columns except country name and the quantity of interest (GDP,

population, and military expenditures, respectively). Add a first row and use

it for column names (“Country†and “GDP,†for example.)

3. Tidy up the data. Get rid of any extra characters. Select the key numeric

variables, go to the Format menu, choose “Cells¦Ã¢â‚¬ , and convert them to

General”this gets rid of commas, dollar signs, percentages, etc., which are a

real pain when saved to a .csv file. When working on this step, beware of the

following problems:

??Wikipedia is convenient, but the data quality sucks. Never forget this. If this were more than an

exercise, you’d take the time to find better sources.


(a) Unless you used Outwit Hub in step 1, some of the country names will

have spaces in front of them. These are a pain to get rid of. The simplest

way is to copy them from the Excel spreadsheet into a text editor, use a

Find-and-replace command to get rid of them, and copy and paste them

back into Excel.

(b) Some names and numbers have Wikipedia footnotes (“b†, “[20]†) stuck

on the end. Outwit won’t save you from these. Deleting them by hand

in Excel is probably the fastest way to get rid of them.

(c) Wikipedia authors don’t always use the same names for the same countries,

so you need to make sure they’re consistent before merging them.

In particular, watch out for “China†and “People’s Republic of China,â€

“Gambia†and “The Gambia,†and various combinations with “Congoâ€

and “Macedonia.â€

4. Once you’re done tidying, save each dataset to a separate .csv file.

5. Create an R command file that does the following:

(a) Read each of the three .csv files into appropriately-named data frames

(for example, Pop, GDP, and MilEx).

(b) Summarize each of these data frames.

(c) In each data frame, create a new variable that reflects each country’s

percentage of the world total. (That is, divide the data column by the

world total.) Make sure that this new variable is part of the same data frame as

the original data column.

(d) Merge the three data frames. Match the data using country name.

(e) Create your version of the NMCI by calculating each country’s average

share of each of these three resources. If a country has 2% of the world’s

population, 1% of world GDP, and 4% of world military expenditures,

for example, its average share should be (2+1+4)/3 = 2.33%.

Your final dataset should have 8 columns: the name of the country, GDP (total

and percent of world), population (total and percent of world), military expenditures

(total and percent of world), and NMCI.

Due to Assignment 4 Drop-box: three .csv files and an R command file that creates

the data set described in 5., above.