* This do-file takes the raw data files from IPUMS 100% sample of the 1880 federal census and
* constructs the city-level segregation measures.
clear
set more 1
cd "$segdir"
use "IPUMS_1880_Extract_JP0086.dta" //1880 100% sample from IPUMS -- keeping data for household heads only
drop if city==0 & incorp==0 // 34% of HH don't live in identifiable cities. Drop them.
*There are some unincorporated portions of a few cities that we want to fix.
replace incorp=80060 if city==830 //Bridgeport, CT
replace incorp=80200 if city==4030 //Meriden, CT
replace incorp=80210 if city==4122 //Middletown, CT
replace incorp=80230 if city==4470 //New Britain, CT
replace incorp=80280 if city==4870 //Norwalk, CT
replace incorp=80290 if city==4890 //Norwich, CT
replace incorp=80370 if city==6730 //Stamford, CT
replace incorp=80420 if city==7250 //Waterbury, CT
replace incorp=80600 if city==6014 //Rutland, VT
replace city=7630 if statefip==39 & incorp==89430 //Youngstown has 50 residents that lie outside city boundary but are still a part of the broader incorporated code
*Combine Georgetown and DC as that is how it is reported in Baker
replace city=7230 if city==7231
replace incorp=80020 if city==7230
*Now places are unqiuely identified by statefip city and incorp. Some incorporated places don't have city codes (and the other way too).
egen place_id=group(statefip city incorp)
* First order the household heads by the order in which they appear in the census manuscript.
* This is done by sorting every household head by his line number (line) on his census
* manuscript page (pageno). Page numbers are not unique, they are repeated across census reels
* and occasionally within reels. For this reason it is necessary to first sort by the reel
* number and then by the microfilm sequence number (microseq).
* Note that page numbers are reported on certain microfilm reels
sort reel microseq pageno line
* Any two individuals on the same census manuscript page should have the same values for reel, microseq and pageno.
* The sample contains only household heads so the line numbers will typically not be consecutive integers
* after the above sorting command (each member of the household is on a separate line so their will typically
* be gaps in line number after the sorting).
* However, line number remains useful in that sorting by line number within a census manuscript page should sort
* the household heads as the enumerator would walk down the street. So the previous and next observations
* should be the adjacent neighbors.
* Two different definitions of neighbors will be used. The first identifies next door neighbors as those household heads
* appearing directly before and directly after an individual's household on the census manuscript page.
* The second definition adds an additional restriction that the street name must be given and match for two individuals to
* be considered neighbors and relaxes the restriction that the neighbors must appear on the same manuscript page, allowing
* for the neighbors to be the last household on one page and the first household on the next page.
* All variables based on this second definition will contain 'alt' in the variable name.
* Note that there are many observations missing street name.
gen sorting_trait = 0 if $sorter~=.
replace sorting_trait = 1 if $criteria
* This variable should be binary, dividing the population into two groups.
* Here we are using nativity, defining the two groups as native born and foreign born.
* For simplicity in interpreting the segregation measure across different sorting traits, construct the sorting_trait
* variable such that the majority group gets a value of 0 and the minority group gets a value of 1.
*Now, for each household head, identify traits of their neighbor
gen neighbor1trait=sorting_trait[_n-1] if reel==reel[_n-1] & microseq==microseq[_n-1] & pageno==pageno[_n-1] & place_id==place_id[_n-1]
gen neighbor2trait=sorting_trait[_n+1] if reel==reel[_n+1] & microseq==microseq[_n+1] & pageno==pageno[_n+1] & place_id==place_id[_n+1]
gen neighbor1traitalt=sorting_trait[_n-1] if reel==reel[_n-1] & microseq==microseq[_n-1] & street==street[_n-1] & place_id==place_id[_n-1] & street~=""
gen neighbor2traitalt=sorting_trait[_n+1] if reel==reel[_n+1] & microseq==microseq[_n+1] & street==street[_n+1] & place_id==place_id[_n+1] & street~=""
drop street
* For the segregation calculations, the expected number of household head's with an observed neighbor having
* a different trait will depend in part on how many neighbors are actually observed. Neighbor's may be unobserved
* if they are on a different manuscript page (the first and last household on a manuscript page will both
* only have one neighbor observed) or if the neighbor is missing information for the sorting trait.
* The lines below establish the number of observed neighbors.
gen neighbor1present=1 if neighbor1trait~=.
replace neighbor1present=0 if neighbor1present~=1
gen neighbor2present=1 if neighbor2trait~=.
replace neighbor2present=0 if neighbor2present~=1
gen neighbor1presentalt=1 if neighbor1traitalt~=.
replace neighbor1presentalt=0 if neighbor1presentalt~=1
gen neighbor2presentalt=1 if neighbor2traitalt~=.
replace neighbor2presentalt=0 if neighbor2presentalt~=1
gen neighborstotal=neighbor1present+neighbor2present
gen neighborstotalalt=neighbor1presentalt+neighbor2presentalt
* Now create indicators for having a next-door neighbor with the opposite sorting trait.
gen opposite1=1 if (neighbor1trait~=sorting_trait & neighbor1present==1) | (neighbor2trait~=sorting_trait & neighbor2present==1)
gen opposite1alt=1 if (neighbor1traitalt~=sorting_trait & neighbor1presentalt==1) | (neighbor2traitalt~=sorting_trait & neighbor2presentalt==1)
replace opposite1=0 if opposite1~=1 & (neighbor1present==1 | neighbor2present==1)
replace opposite1alt=0 if opposite1~=1 & (neighbor1presentalt==1 | neighbor2presentalt==1)
gen opposite2=1 if (neighbor1trait~=sorting_trait | neighbor2trait~=sorting_trait) & neighbor1present==1 & neighbor2present==1
gen opposite2alt=1 if (neighbor1traitalt~=sorting_trait | neighbor2traitalt~=sorting_trait) & neighbor1presentalt==1 & neighbor2presentalt==1
replace opposite2=0 if opposite2~=1 & neighbor1present==1 & neighbor2present==1
replace opposite2alt=0 if opposite2alt~=1 & neighbor1presentalt==1 & neighbor2presentalt==1
gen majoritycounter1=0
replace majoritycounter1=1 if sorting_trait==0 & (neighbor1present==1 | neighbor2present==1)
gen majoritycounter2=0
replace majoritycounter2=1 if sorting_trait==0 & neighbor1present==1 & neighbor2present==1
gen minoritycounter1=0
replace minoritycounter1=1 if sorting_trait==1 & (neighbor1present==1 | neighbor2present==1)
gen minoritycounter2=0
replace minoritycounter2=1 if sorting_trait==1 & neighbor1present==1 & neighbor2present==1
gen majoritycounter1alt=0
replace majoritycounter1alt=1 if sorting_trait==0 & (neighbor1presentalt==1 | neighbor2presentalt==1)
gen majoritycounter2alt=0
replace majoritycounter2alt=1 if sorting_trait==0 & neighbor1presentalt==1 & neighbor2presentalt==1
gen minoritycounter1alt=0
replace minoritycounter1alt=1 if sorting_trait==1 & (neighbor1presentalt==1 | neighbor2presentalt==1)
gen minoritycounter2alt=0
replace minoritycounter2alt=1 if sorting_trait==1 & neighbor1presentalt==1 & neighbor2presentalt==1
gen minoritycounterall=0
replace minoritycounterall=1 if sorting_trait==1
gen majoritycounterall=0
replace majoritycounterall=1 if sorting_trait==0
keep opposite* *counter* statefip city incorp
gen maj_opposite1=majoritycounter1*opposite1
gen maj_opposite2=majoritycounter2*opposite2
gen min_opposite1=minoritycounter1*opposite1
gen min_opposite2=minoritycounter2*opposite2
gen maj_opposite1alt=majoritycounter1alt*opposite1alt
gen maj_opposite2alt=majoritycounter2alt*opposite2alt
gen min_opposite1alt=minoritycounter1alt*opposite1alt
gen min_opposite2alt=minoritycounter2alt*opposite2alt
sort statefip city incorp
collapse (sum) min_all=minoritycounterall maj_all=majoritycounterall n_min_po=minoritycounter1 n_min_pb=minoritycounter2 n_maj_po=majoritycounter1 n_maj_pb=majoritycounter2 x_min_po=min_opposite1 x_min_pb=min_opposite2 x_maj_po=maj_opposite1 x_maj_pb=maj_opposite2 n_min_so=minoritycounter1alt n_min_sb=minoritycounter2alt n_maj_so=majoritycounter1alt n_maj_sb=majoritycounter2alt x_min_so=min_opposite1alt x_min_sb=min_opposite2alt x_maj_so=maj_opposite1alt x_maj_sb=maj_opposite2alt, by(statefip city incorp)
* The following variables are in this file:
* min_all - total number of minority household heads in city
* maj_all - total number of majority household heads in city
* n_min_po - total number of minority household heads with at least one neighbor observed, neighbors defined by manuscript page
* n_min_pb - total number of minority household heads with both neighbors observed, neighbors defined by manuscript page
* n_maj_po - total number of majority household heads with at least one neighbor observed, neighbors defined by manuscript page
* n_maj_pb - total number of majority household heads with both neighbors observed, neighbors defined by manuscript page
* x_min_po - number of minority household heads with at least one neighbor observed living next to a non-minority neighbor, neighbors defined by manuscript page
* x_min_pb - number of minority household heads with both neighbors observed living next to a non-minority neighbor, neighbors defined by manuscript page
* x_maj_po - number of majority household heads with at least one neighbor observed living next to a non-majority neighbor, neighbors defined by manuscript page
* x_maj_pb - number of majority household heads with both neighbors observed living next to a non-majority neighbor, neighbors defined by manuscript page
* n_min_so - total number of minority household heads with at least one neighbor observed, neighbors defined by street
* n_min_sb - total number of minority household heads with both neighbors observed, neighbors defined by street
* n_maj_so - total number of majority household heads with at least one neighbor observed, neighbors defined by street
* n_maj_sb - total number of majority household heads with both neighbors observed, neighbors defined by street
* x_min_so - number of minority household heads with at least one neighbor observed living next to a non-minority neighbor, neighbors defined by street
* x_min_sb - number of minority household heads with both neighbors observed living next to a non-minority neighbor, neighbors defined by street
* x_maj_so - number of majority household heads with at least one neighbor observed living next to a non-majority neighbor, neighbors defined by street
* x_maj_sb - number of majority household heads with both neighbors observed living next to a non-majority neighbor, neighbors defined by street
* statefip - state FIPS code
* fips - City fips code
* Constructing the segregation measures:
gen x_min_po_upper=n_min_pb*(1-((min_all-1)*(min_all-2))/((min_all-1+maj_all)*(min_all-2+maj_all)))+(n_min_po-n_min_pb)*(1-(min_all-1)/(min_all-1+maj_all))
gen x_min_pb_upper=n_min_pb*(1-((min_all-1)*(min_all-2))/((min_all-1+maj_all)*(min_all-2+maj_all)))
gen x_min_so_upper=n_min_sb*(1-((min_all-1)*(min_all-2))/((min_all-1+maj_all)*(min_all-2+maj_all)))+(n_min_so-n_min_sb)*(1-(min_all-1)/(min_all-1+maj_all))
gen x_min_sb_upper=n_min_sb*(1-((min_all-1)*(min_all-2))/((min_all-1+maj_all)*(min_all-2+maj_all)))
gen x_min_po_lower=(n_min_pb/n_min_po+.5*(n_min_po-n_min_pb)/n_min_po)*(2/(n_min_po+1)+2*(1-2/(n_min_po+1)))*(1-(min_all-n_min_po)*(min_all-n_min_po-1)/(min_all*(min_all-1)))
gen x_min_pb_lower=(2/(n_min_pb+1)+2*(1-2/(n_min_pb+1)))*(1-(min_all-n_min_pb)*(min_all-n_min_pb-1)/(min_all*(min_all-1)))
gen x_min_so_lower=(n_min_sb/n_min_so+.5*(n_min_so-n_min_sb)/n_min_so)*(2/(n_min_so+1)+2*(1-2/(n_min_so+1)))*(1-(min_all-n_min_so)*(min_all-n_min_so-1)/(min_all*(min_all-1)))
gen x_min_sb_lower=(2/(n_min_sb+1)+2*(1-2/(n_min_sb+1)))*(1-(min_all-n_min_sb)*(min_all-n_min_sb-1)/(min_all*(min_all-1)))
gen alpha_po=(x_min_po_upper-x_min_po)/(x_min_po_upper-x_min_po_lower)
gen alpha_pb=(x_min_pb_upper-x_min_pb)/(x_min_pb_upper-x_min_pb_lower)
gen alpha_so=(x_min_so_upper-x_min_so)/(x_min_so_upper-x_min_so_lower)
gen alpha_sb=(x_min_sb_upper-x_min_sb)/(x_min_sb_upper-x_min_sb_lower)
gen pct_minority=min_all/(min_all+maj_all)
cd "$outdir"
save "1880_city_neighbor_$sorter.dta", replace