Preparing the Data for Relationships by DataRobot

This is part 2 of a series of blog posts detailing the development of the Relationships by DataRobot web app.  Here’s Part 1.

The starting blocks

As with any new project, particularly when someone else built the datasets, there’s going to be a bit of learning curve figuring out what they did and getting it into the right shape for your analysis.  In this case, we were lucky because the researchers actually created a fairly useful data dictionary (which they call a codebook) and the dataset is available for download (excluding all of the personally identifiable information), so those two steps are taken care of.

The bad news is that the information that you really need from this codebook is contained inside of tables inside of a .pdf document.  I ended up doing a rather kludgy job of stripping out that information.  There are two main tasks that I ended up doing here:

Identifying repeated measures

The structure of the study involved doing several waves of surveys for participants.  These questionnaires asked (mostly) the same questions multiple times.  Unfortunately, the data was set up horizontally, with each row corresponding to an individual couple.  Each questionnaire was then appended as additional columns to the dataset; e.g., the question about housing type was denoted as 5 separate columns (one for each questionnaire) as PPHOUSE, PP2_PPHOUSE, PP3_HOUSE, PP4_HOUSE, and PP5_HOUSE.  Unfortunately, the naming conventions were not 100% consistent, so a fair bit of manual work had to be done to get all the columns correctly parsed.  

For the purposes of this project, I needed one row per couple per questionnaire, so I ultimately had to “un-pivot” this dataset in order to make it useful.  This meant identifying all the repeat questions and renaming systematically.  I ended up just creating a dictionary in python to accomplish this:


bg_colnames = {"CHILDREN_IN_HH":"children_in_hh"
,"CHILDREN_IN_HH2":"children_in_hh"
,"CHILDREN_IN_HH3":"children_in_hh"
,"CHILDREN_IN_HH4":"children_in_hh"
,"CHILDREN_IN_HH5":"children_in_hh"
,"PP2_PPEDUCAT":"education"
,"PP3_PPEDUCAT":"education"
,"PP4_PPEDUCAT":"education"
,"PP5_PPEDUCAT":"education"
,"PPEDUCAT":"education"
,"PP2_PPWORK":"employment_status"
,"PP3_PPWORK":"employment_status"
,"PP4_PPWORK":"employment_status"
,"PP5_PPWORK":"employment_status"
,"PPWORK":"employment_status"
,"PP2_PPNET":"has_internet_access"
,"PP3_PPNET":"has_internet_access"
,"PP4_PPNET":"has_internet_access"
,"PP5_PPNET":"has_internet_access"
,"PPNET":"has_internet_access"
,"PP2_PPHHHEAD":"head_of_household_indicator"
,"PP3_PPHHHEAD":"head_of_household_indicator"
,"PP4_PPHHHEAD":"head_of_household_indicator"
,"PP5_PPHHHEAD":"head_of_household_indicator"
,"PPHHHEAD":"head_of_household_indicator"
,"PP2_PPEDUC":"highest_degree_received"
,"PP3_PPEDUC":"highest_degree_received"
,"PP4_PPEDUC":"highest_degree_received"
,"PP5_PPEDUC":"highest_degree_received"
,"PPEDUC":"highest_degree_received"
,"PP2_PPINCIMP":"household_income"
,"PP3_PPINCIMP":"household_income"
,"PP4_PPINCIMP":"household_income"
,"PP5_PPINCIMP":"household_income"
,"PPINCIMP":"household_income"
,"PP2_PPHHSIZE":"household_size"
,"PP3_PPHHSIZE":"household_size"
,"PP4_PPHHSIZE":"household_size"
,"PP5_PPHHSIZE":"household_size"
,"PPHOUSEHOLDSIZE":"household_size"
,"PP2_PPHOUSE":"housing_type"
,"PP3_PPHOUSE":"housing_type"
,"PP4_PPHOUSE":"housing_type"
,"PP5_PPHOUSE":"housing_type"
,"PPHOUSE":"housing_type"
,"PP2_PPMARIT":"marital_status"
,"PP3_PPMARIT":"marital_status"
,"PP4_PPMARIT":"marital_status"
,"PP5_PPMARIT":"marital_status"
,"PPMARIT":"marital_status"
,"PP2_PPT1317":"members_13_to_17"
,"PP3_PPT1317":"members_13_to_17"
,"PP4_PPT1317":"members_13_to_17"
,"PP5_PPT1317":"members_13_to_17"
,"PPT1317":"members_13_to_17"
,"PP2_PPT25":"members_2_to_5"
,"PP3_PPT25":"members_2_to_5"
,"PP4_PPT25":"members_2_to_5"
,"PP5_PPT25":"members_2_to_5"
,"PPT25":"members_2_to_5"
,"PP2_PPT612":"members_6_to_12"
,"PP3_PPT612":"members_6_to_12"
,"PP4_PPT612":"members_6_to_12"
,"PP5_PPT612":"members_6_to_12"
,"PPT612":"members_6_to_12"
,"PP2_PPT18OV":"members_gt_18"
,"PP3_PPT18OV":"members_gt_18"
,"PP4_PPT18OV":"members_gt_18"
,"PP5_PPT18OV":"members_gt_18"
,"PPT18OV":"members_gt_18"
,"PP2_PPT01":"members_lt_2"
,"PP3_PPT01":"members_lt_2"
,"PP4_PPT01":"members_lt_2"
,"PP5_PPT01":"members_lt_2"
,"PPT01":"members_lt_2"
,"PP2_PPMSACAT":"msa_status"
,"PP3_PPMSACAT":"msa_status"
,"PP4_PPMSACAT":"msa_status"
,"PP5_PPMSACAT":"msa_status"
,"PPMSACAT":"msa_status"
,"PP2_PPRENT":"own_rent"
,"PP3_PPRENT":"own_rent"
,"PP4_PPRENT":"own_rent"
,"PP5_PPRENT":"own_rent"
,"PPRENT":"own_rent"
,"PP2_PPREG4":"region_4"
,"PP3_PPREG4":"region_4"
,"PP4_PPREG4":"region_4"
,"PP5_PPREG4":"region_4"
,"PPREG4":"region_4"
,"PP2_PPREG9":"region_9"
,"PP3_PPREG9":"region_9"
,"PP4_PPREG9":"region_9"
,"PP5_PPREG9":"region_9"
,"PPREG9":"region_9"
,"PP2_PPETHM":"rep_race"
,"PP3_PPETHM":"rep_race"
,"PP4_PPETHM":"rep_race"
,"PP5_PPETHM":"rep_race"
,"PPETHM":"rep_race"
,"PP2_PPCMDATE_YRMO":"survey_date"
,"PP3_PPCMDATE_YRMO":"survey_date"
,"PP4_PPCMDATE_YRMO":"survey_date"
,"PP5_PPCMDATE_YRMO":"survey_date"
,"PPPPCMDATE_YRMO":"survey_date"},

Data dictionary in python 

Decoding numeric coding

All of the variables in the dataset that you can download are encoded numerically -- e.g., 1 = “18-24”, 2 = “25-34” and so on -- which may be useful for some platforms, but it ends up creating readability problems for interpreting the analysis down the road, so I wanted to undo this early on.  There are only a few thousand survey respondents in this dataset, so I didn’t have to worry about storage or efficiency issues that might be associated with doing this for much larger datasets.  Also there were a few encoding inconsistencies between questionnaires, so the numeric values for one question in the first questionnaire were not guaranteed (though they did nearly all of the time) to match the numeric values for the same question in a subsequent questionnaire.

I also ended up doing this with a dictionary (of dictionaries):  


variable_levels =
{"PPAGECAT":{1:"18-24",2:"25-34",3:"35-44",4:"45-54",5:"55-64",6:"65-74",7:"75+",99:"under 18"}
,"PPAGECT4":{1:"18-29",2:"30-44",3:"45-59",4:"60+",99:"under 18"}
,"PPEDUC":{1:"no formal education",2:"1st, 2nd, 3rd, or 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"high school graduate - high school diploma or the equivalent (ged)",10:"some college, no degree",11:"associate degree",12:"bachelors degree",13:"masters degree",14:"professional or doctorate degree"}
,"PP2_PPEDUC":{1:"no formal education",2:"1st, 2nd, 3rd, or 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"high school graduate - high school diploma or the equivalent (ged)",10:"some college, no degree",11:"associate degree",12:"bachelors degree",13:"masters degree",14:"professional or doctorate degree"}
,"PP3_PPEDUC":{1:"no formal education",2:"1st, 2nd, 3rd, or 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"high school graduate - high school diploma or the equivalent (ged)",10:"some college, no degree",11:"associate degree",12:"bachelors degree",13:"masters degree",14:"professional or Doctorate degree"}
,"PP4_PPEDUC":{1:"no formal education",2:"1st, 2nd, 3rd, or 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"high school graduate - high school diploma or the equivalent (ged)",10:"some college, no degree",11:"associate degree",12:"bachelors degree",13:"masters degree",14:"professional or Doctorate degree"}
,"PP5_PPEDUC":{1:"no formal education",2:"1st, 2nd, 3rd, or 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"high school graduate - high school diploma or the equivalent (ged)",10:"some college, no degree",11:"associate degree",12:"bachelors degree",13:"masters degree",14:"professional or Doctorate degree"}
,"PPEDUCAT":{1:"less than high school",2:"high school",3:"some college",4:"bachelor's degree or higher"}
,"PP2_PPEDUCAT":{1:"less than high school",2:"high school",3:"some college",4:"bachelor's degree or higher"}
,"PP3_PPEDUCAT":{1:"less than high school",2:"high school",3:"some college",4:"bachelor's degree or higher"}
,"PP4_PPEDUCAT":{1:"less than high school",2:"high school",3:"some college",4:"bachelor's degree or higher"}
,"PP5_PPEDUCAT":{1:"less than high school",2:"high school",3:"some college",4:"bachelor's degree or higher"}
,"PPETHM":{1:"white, non-hispanic",2:"black, non-hispanic",3:"other, non-hispanic",4:"hispanic",5:"2+ races, non-hispanic"}
,"PPGENDER":{1:"male",2:"female"}
,"PPHOUSE":{1:"a one-family house detached from any other house",2:"a one-family house attached to one or more houses",3:"a building with 2 or more apartments",4:"a mobile home",5:"boat, rv, van, etc."}
,"PPINCIMP":{1:"less than $5,000",2:"$5,000 to $7,499",3:"$7,500 to $9,999",4:"$10,000 to $12,499",5:"$12,500 to $14,999",6:"$15,000 to $19,999",7:"$20,000 to $24,999",8:"$25,000 to $29,999",9:"$30,000 to $34,999",10:"$35,000 to $39,999",11:"$40,000 to $49,999",12:"$50,000 to $59,999",13:"$60,000 to $74,999",14:"$75,000 to $84,999",15:"$85,000 to $99,999",16:"$100,000 to $124,999",17:"$125,000 to $149,999",18:"$150,000 to $174,999",19:"$175,000 or more"}
,"PPMARIT":{1:"married",2:"widowed",3:"divorced",4:"separated",5:"never married",6:"living with partner"}
,"PPMSACAT":{0:"non-metro",1:"metro"}
,"PPREG4":{1:"northeast",2:"midwest",3:"south",4:"west"}
,"PPREG9":{1:"new england",2:"mid-atlantic",3:"east-north central",4:"west-north central",5:"south atlantic",6:"east-south central",7:"west-south central",8:"mountain",9:"pacific"}
,"PPRENT":{1:"owned or being bought by you or someone in your household",2:"rented for cash",3:"occupied without payment of cash rent"}
,"PPWORK":{1:"working - as a paid employee",2:"working - self-employed",3:"not working - on temporary layoff from a job",4:"not working - looking for work",5:"not working - retired",6:"not working - disabled",7:"not working - other"}
,"PPQ14ARACE":{1:"white",2:"black, or african american",3:"american indian or alaska native",4:"asian indian",5:"chinese",6:"filipino",7:"japanese",8:"korean",9:"vietnamese",10:"other asian",11:"native hawaiian",12:"guamanian or chamorro",13:"samoan",14:"other pacific islander",15:"some other race",-2:"not asked",-1:"refused"}
,"PPHISPAN":{1:"no, i am not",2:"yes, mexican, mexican-american, chicano",3:"yes, puerto rican",4:"yes, cuban",5:"yes, central american",6:"yes, south american",7:"yes, caribbean",8:"yes, other spanish/hispanic/latino"}
,"PPRACE_WHITE":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_BLACK":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_NATIVEAMERICAN":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_ASIANINDIAN":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_CHINESE":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_FILIPINO":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_JAPANESE":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_KOREAN":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_VIETNAMESE":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_OTHERASIAN":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_HAWAIIAN":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_GUAMANIAN":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_SAMOAN":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_OTHERPACIFICISLANDER":{0:"no",1:"yes",-1:"refused"}
,"PPRACE_SOMEOTHERRACE":{0:"no",1:"yes",-1:"refused"}
,"PAPGLB_FRIEND":{1:"yes, friends",2:"yes, relatives",3:"yes, both",4:"no",5:"i would prefer to not answer this question"}
,"PPPARTYID3":{1:"republican",2:"other",3:"democrat"}
,"PAPEVANGELICAL":{1:"yes",2:"no"}
,"PAPRELIGION":{1:"baptist - any denomination",2:"Protestant (e.g., Methodist, Lutheran, Presbyterian, Episcopal)",3:"Catholic",4:"Mormon",5:"Jewish",6:"Muslim",7:"Hindu",8:"Buddhist",9:"Pentecostal",10:"Eastern Orthodox",11:"other Christian",12:"other non-Christian, please specify",13:"None"}
,"PPHHCOMP11_MEMBER2_GENDER":{2:"male",3:"female",-1:"please select"}
,"PPHHCOMP11_MEMBER3_GENDER":{2:"male",3:"female",-1:"please select"}
,"PPHHCOMP11_MEMBER4_GENDER":{2:"male",3:"female",-1:"please select"}
,"PPHHCOMP11_MEMBER5_GENDER":{2:"male",3:"female",-1:"please select"}
,"PPHHCOMP11_MEMBER6_GENDER":{2:"male",3:"female",-1:"please select"}
,"PPHHCOMP11_MEMBER7_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER8_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER9_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER10_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER11_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER12_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER13_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER14_GENDER":{2:"male",3:"female"}
,"PPHHCOMP11_MEMBER15_GENDER":{2:"male",3:"female",-1:"please select"}
,"PPHHCOMP11_MEMBER2_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative",-1:"please select"}
,"PPHHCOMP11_MEMBER3_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative",-1:"please select"}
,"PPHHCOMP11_MEMBER4_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative",-1:"please select"}
,"PPHHCOMP11_MEMBER5_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative",-1:"please select"}
,"PPHHCOMP11_MEMBER6_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative",-1:"please select"}
,"PPHHCOMP11_MEMBER7_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER8_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER9_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER10_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER11_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER12_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER13_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER14_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative"}
,"PPHHCOMP11_MEMBER15_RELATIONSHIP":{2:"spouse",3:"child (biological, adopted, or stepchild)",4:"grandchild",5:"parent",6:"sibling",7:"other relative",8:"unmarried partner",9:"housemate/roommate",10:"other non-relative",-1:"please select"}
,"IRB_CONSENT":{1:"yes, i agree to participate",2:"no, i don't agree to participate"}
,"QFLAG":{1:"partnered",2:"no spouse or partner or otherwise unqualified"}
,"PAPGLB_STATUS":{0:"not glb",1:"glb",1:"yes",2:"no",3:"i would prefer to not answer this question"}
,"RECSOURCE":{1:"gen pop sample",2:"glb augment sample",3:"glb withdrawn sample",4:"glb item refused sample"}
,"S1":{1:"yes, i am married",2:"no, i am not married"}
,"S1A":{1:"yes",2:"no",3:"i would prefer not to answer this question"}
,"S2":{1:"yes, i have a sexual partner (boyfriend or girlfriend)",2:"i have a romantic partner who is not yet a sexual partner",3:"no, i am single, with no boyfriend, no girlfriend and no romantic or sexual partner",-1:"refused"}
,"Q3_CODES":{-1:"refused"}
,"Q4":{1:"male",2:"female",3:"other, please specify"}
,"Q5":{1:"yes, we are a same-sex couple",2:"no, we are an opposite-sex couple",-1:"refused"}
,"Q6A":{1:"no (not latino or hispanic)",2:"yes, mexican, mexican american, chicano",3:"yes, puerto rican",4:"yes, cuban",5:"yes, other latino/hispanic",-1:"refused"}
,"Q6B":{1:"white",2:"black or african american",3:"american indian, aleut, or eskimo",4:"asian or pacific islander",5:"other (please specify)",-1:"refused"}
,"Q7A":{1:"yes",2:"no",-1:"refused"}
,"Q7B":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none",-1:"refused"}
,"Q8A":{1:"yes, the same",2:"no, has changed religions",-1:"refused"}
,"Q8B":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none",-1:"refused"}
,"Q10":{1:"no formal education",2:"1st - 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"hs graduate or ged",10:"some college, no degree",11:"associate degree",12:"bachelor's degree",13:"master's degree",14:"professional or doctorate degree",-1:"refused"}
,"Q11":{1:"no formal education",2:"1st - 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"hs graduate or ged",10:"some college, no degree",11:"associate degree",12:"bachelor's degree",13:"master's degree",14:"professional or doctorate degree",-1:"refused"}
,"Q12":{1:"republican",2:"democrat",3:"independent",4:"another party, please specify",5:"no preference",-1:"refused"}
,"Q13A":{1:"yes, the same",2:"no, i have changed religions",-1:"refused"}
,"Q13B":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none",-1:"refused"}
,"Q14":{1:"no formal education",2:"1st - 4th grade",3:"5th or 6th grade",4:"7th or 8th grade",5:"9th grade",6:"10th grade",7:"11th grade",8:"12th grade no diploma",9:"hs graduate or ged",10:"some college, no degree",11:"associate degree",12:"bachelor's degree",13:"master's degree",14:"professional or doctorate degree",-1:"refused"}
,"Q15A1_COMPRESSED":{1:"United States",2:"all others",-1:"refused"}
,"Q17A":{1:"once (this is my first marriage)",2:"twice",3:"three times",4:"four or more times",-1:"refused"}
,"Q17B":{1:"never married",2:"once",3:"twice",4:"three times",5:"four or more times",-1:"refused"}
,"Q17C":{1:"i am sexually attracted only to men",2:"i am mostly sexually attracted to men, less often sexually attracted to women",3:"i am equally sexually attracted to men and women",4:"i am mostly sexually attracted to women, less often sexually attracted to men",5:"i am sexually attracted only to women",-1:"refused"}
,"Q17D":{1:"i am sexually attracted only to women",2:"i am mostly sexually attracted to women, less often sexually attracted to men",3:"i am equally sexually attracted to men and women",4:"i am mostly sexually attracted to men, less often sexually attracted to women",5:"i am sexually attracted only to men",-1:"refused"}
,"GENDER_ATTRACTION":{1:"opposite gender only",2:"mostly opposite",3:"both genders equally",4:"same gender mostly",5:"only same gender"}
,"Q18A_1":{0:"no",1:"yes",-1:"refused"}
,"Q18A_2":{0:"no",1:"yes",-1:"refused"}
,"Q18A_3":{0:"have either DP or CU",1:"have neither DP nor CU",-1:"refused"}
,"Q18B_CODES":{-1:"refused"}
,"Q18C_CODES":{-1:"refused"}
,"Q19":{1:"yes",2:"no",-1:"refused"}
,"Q20":{1:"yes",2:"no",-1:"refused"}
,"Q21A_REFUSAL":{1:"refused"}
,"Q21B":{1:"refused"}
,"Q21C_REFUSAL":{1:"refused"}
,"Q21D_REFUSAL":{1:"refused"}
,"Q21E_REFUSAL":{1:"refused"}
,"Q22":{1:"less than one month",2:"1-3 months",3:"4-6 months",4:"7 months - 1 year",5:"more than 1 year, less than 2 years",6:"more than 2 years, less than 3 years",7:"3 years or more",-1:"refused"}
,"Q23":{1:"i earned more",2:"we earned about the same amount",3:"partner earned more",-1:"refused"}
,"Q24_CODES":{1:"answered but refused to provide information",-1:"refused"}
,"Q25":{1:"same high school",2:"different high school",-1:"refused"}
,"Q26":{1:"attended same college or university",2:"did not attend same college or university",-1:"refused"}
,"Q27":{1:"yes",2:"no",-1:"refused"}
,"Q28":{1:"yes",2:"no",-1:"refused"}
,"Q29":{1:"father and mother",2:"father only",3:"mother only",4:"neither father nor mother are alive",-1:"refused"}
,"Q30":{1:"approve",2:"neither approve nor disapprove",3:"disapprove",4:"do not know",-1:"refused"}
,"Q31_1":{0:"no",1:"yes",-1:"refused"}
,"Q31_2":{0:"no",1:"yes",-1:"refused"}
,"Q31_3":{0:"no",1:"yes",-1:"refused"}
,"Q31_4":{0:"no",1:"yes",-1:"refused"}
,"Q31_5":{0:"no",1:"yes",-1:"refused"}
,"Q31_6":{0:"no",1:"yes",-1:"refused"}
,"Q31_7":{0:"no",1:"yes",-1:"refused"}
,"Q31_8":{0:"no",1:"yes",-1:"refused"}
,"Q31_9":{0:"no",1:"yes",-1:"refused"}
,"Q32":{1:"yes, a social networking site (like facebook or myspace)",2:"no, we did not meet through the internet",3:"yes, an internet dating or matchmaking site (like eharmony or match.com)",4:"yes, an internet classified advertising site (like craigslist)",5:"yes, an internet chat room",6:"yes, a different kind of internet service",-1:"refused"}
,"Q33_1":{0:"no",1:"yes",-1:"refused"}
,"Q33_2":{0:"no",1:"yes",-1:"refused"}
,"Q33_3":{0:"no",1:"yes",-1:"refused"}
,"Q33_4":{0:"no",1:"yes",-1:"refused"}
,"Q33_5":{0:"no",1:"yes",-1:"refused"}
,"Q33_6":{0:"no",1:"yes",-1:"refused"}
,"Q33_7":{0:"no",1:"yes",-1:"refused"}
,"Q34":{1:"excellent",2:"good",3:"fair",4:"poor",5:"very poor",-1:"refused"}
,"Q35_CODES":{-1:"refused"}
,"Q24_MET_ONLINE":{0:"met offline",1:"met online"}
,"MARRYNOTREALLY":{0:"married",1:"not legally married"}
,"CIVILNOTREALLY":{0:"real civ union or dom partnership",1:"perhaps not real civ union or dom partnership"}
,"PARTNER_DECEASED":{0:"not deceased",1:"apparently deceased"}
,"PARTNER_RELIGION_RECLASSIFIED":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none",-1:"refused"}
,"PARTNER_RELIGION_CHILD_RECLASS":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none"}
,"OWN_RELIGION_CHILD_RECLASS":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none"}
,"Q32_INTERNET":{0:"met offline according to q32",1:"met online according to q32"}
,"HOW_MET_ONLINE":{1:"reconnected: already knew partner but reconnected online",2:"Mediated: Online connection was mediated by friends, family, or others",3:"Previously Strangers: Before online connection respondent and partner were strangers",4:"We cannot tell from the existed data whether the respondent and partner knew each other prior to online connection",5:"Probably Did Not meet partner online, despite positive answer to q32 or q24"}
,"EITHER_INTERNET_ADJUSTED":{0:"not met online",1:"met online",-1:"probably not met online, q32 and q24 disagree"}
,"SAME_SEX_COUPLE":{0:"different sex couple",1:"same-sex couple"}
,"POTENTIAL_PARTNER_GENDER_RECODES":{1:"male",2:"female",3:"other, please specify"}
,"ALT_PARTNER_GENDER":{1:"male",2:"female",3:"other, please specify"}
,"HOW_LONG_AGO_FIRST_MET_CAT":{1:"0-2",2:"3-5",3:"6-10",4:"11-15",5:"16-20",6:"21-30",7:"31+"}
,"RESPONDENT_RACE":{1:"NH white",2:"NH black",3:"NH Amer Indian",4:"NH Asian Pac Islander",5:"NH Other",6:"Hispanic"}
,"PARTNER_RACE":{1:"NH white",2:"NH black",3:"NH Amer Indian",4:"NH Asian Pac Islander",5:"NH Other",6:"Hispanic"}
,"MET_THROUGH_FRIENDS":{0:"not met through friends",1:"meet through friends"}
,"MET_THROUGH_FAMILY":{0:"not met through family",1:"met through family"}
,"MET_THROUGH_AS_NEIGHBORS":{0:"did not meet through or as neighbors",1:"met through or as neighbors"}
,"MET_THROUGH_AS_COWORKERS":{0:"did not meet through or as coworkers",1:"met through or as coworkers"}
,"RESPONDENT_RELIGION_AT_16":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none"}
,"RESPONDENT_RELIG_16_CAT":{1:"Protestant or oth Christian",2:"Catholic",3:"Jewish",4:"Neither Christian nor Jewish",5:"No religion"}
,"PARTNER_RELIGION_AT_16":{1:"baptist - any denomination",2:"protestant (e.g. methodist, lutheran, presbyterian, episcopal)",3:"catholic",4:"mormon",5:"jewish",6:"muslim",7:"hindu",8:"buddhist",9:"pentecostal",10:"eastern orthodox",11:"other christian",12:"other non-christian, please specify",13:"none",-1:"refused"}
,"PARTNER_RELIG_16_CAT":{1:"Protestant or oth Christian",2:"Catholic",3:"Jewish",4:"Neither Christian nor Jewish",5:"No religion"}
,"MARRIED":{0:"not married",1:"married"}
,"PARENTAL_APPROVAL":{0:"don't approve or don't know",1:"approve"}
,"HOME_COUNTRY_RECODE":{1:"united states",2:"cambodia",3:"canada",4:"china",5:"colombia",6:"cuba",7:"dominican republic",8:"ecuador",9:"el salvador",10:"former yugoslavia",11:"france",12:"germany",13:"great britain",14:"greece",15:"guatemala",16:"guyana",17:"haiti",18:"honduras",19:"hungary",20:"india",21:"iran",22:"ireland",23:"italy",24:"jamaica",25:"japan",26:"korea",27:"laos",28:"mexico",29:"nicaragua",30:"peru",31:"philippines",32:"poland",33:"portugal",34:"russia",35:"taiwan",36:"thailand",37:"trinidad and tobago",38:"vietnam",39:"another country, please specify"}
,"US_RAISED":{0:"raised outside US",1:"raised in US"}
,"RELATIONSHIP_QUALITY":{1:"very poor",2:"poor",3:"fair",4:"good",5:"excellent"}
,"PP2_AFTERP1":{0:"No second background survey",1:"Yes second background survey"}
,"PP2_PPHOUSE":{1:"a one-family house detached from any other house",2:"a one-family house attached to one or more houses",3:"a building with 2 or more apartments",4:"a mobile home",5:"boat, rv, van, etc."}
,"PP2_PPINCIMP":{1:"less than $5,000",2:"$5,000 to $7,499",3:"$7,500 to $9,999",4:"$10,000 to $12,499",5:"$12,500 to $14,999",6:"$15,000 to $19,999",7:"$20,000 to $24,999",8:"$25,000 to $29,999",9:"$30,000 to $34,999",10:"$35,000 to $39,999",11:"$40,000 to $49,999",12:"$50,000 to $59,999",13:"$60,000 to $74,999",14:"$75,000 to $84,999",15:"$85,000 to $99,999",16:"$100,000 to $124,999",17:"$125,000 to $149,999",18:"$150,000 to $174,999",19:"$175,000 or more"}
,"PP2_PPMARIT":{1:"married",2:"widowed",3:"divorced",4:"separated",5:"never married",6:"living with partner"}
,"PP2_PPMSACAT":{0:"non-metro",1:"metro"}
,"PP2_PPETHM":{1:"white, non-hispanic",2:"black, non-hispanic",3:"other, non-hispanic",4:"hispanic",5:"2+ races, non-hispanic"}
,"PP2_PPREG4":{1:"northeast",2:"midwest",3:"south",4:"west"}
,"PP2_PPREG9":{1:"new england",2:"mid-atlantic",3:"east-north central",4:"west-north central",5:"south atlantic",6:"east-south central",7:"west-south central",8:"mountain",9:"pacific"}
,"PP2_PPRENT":{1:"owned or being bought by you or someone in your household",2:"rented for cash",3:"occupied without payment of cash rent"}
,"PP2_PPWORK":{1:"working - as a paid employee",2:"working - self-employed",3:"not working - on temporary layoff from a job",4:"not working - looking for work",5:"not working - retired",6:"not working - disabled",7:"not working - other"}
,"PP_IGDR1":{0:"value not imputed",1:"value imputed"}
,"PP_IEDUC1":{0:"value not imputed",1:"value imputed"}
,"PP2_IGDR2":{0:"value not imputed",1:"value imputed"}
,"PP2_IEDUC2":{0:"value not imputed",1:"value imputed"}
,"W2_DECEASED":{0:"not deceased",1:"apparently deceased"}
,"W2_MULTINAME":{1:"includes multiple names"}
,"W2_PANELSTAT":{1:"active kn panelist",2:"withdrawn kn panelist"}
,"W2_DONOTCONTACT":{1:"withdrawn case on noncontact list",2:"all other cases"}
,"W2_ASSIGNED":{1:"assigned to survey",2:"not assigned to survey"}
,"W2_F1COMPLETE":{0:"did not complete followup survey",1:"completed followup survey"}
,"W2_XMARRY":{1:"married",2:"partnered"}
,"W2_XSS":{1:"yes, qualified to ask about new domestic parterships",2:"no"}
,"W2_SOURCE":{1:"online",2:"telephone"}
,"W2_Q1":{1:"yes",2:"no"}
,"W2_Q2":{1:"yes",2:"no"}
,"W2_Q3":{1:"divorce",2:"separation with no divorce",3:"(partner) passed away, is deceased"}
,"W2_Q4":{1:"i wanted the (divorce/separation) more",2:"(partner) wanted the (divorce/separation) more",3:"we both equally wanted the (divorce/separation)"}
,"W2_Q5":{1:"yes",2:"no"}
,"W2_Q6":{1:"yes",2:"no"}
,"W2_Q7":{1:"yes, married (partner)",2:"no, did not marry (partner)"}
,"W2_Q8":{1:"no, we have not gotten a domestic partnership or civil union agreement",2:"yes, we have gotten a domestic partnership or civil union agreement"}
,"W2_Q9":{1:"we broke up",2:"(partner) passed away, is deceased",3:"other (please describe)"}
,"W2_Q10":{1:"i wanted to break up more",2:"(partner) wanted to break up more",3:"we both equally wanted to break up"}
,"W2_BROKE_UP":{0:"still together",1:"broke up",2:"partner passed away"}
,"PP3_PPHOUSE":{1:"A one-family house detached from any other house",2:"A one-family house attached to one or more houses",3:"A building with 2 or more apartments",4:"A mobile home",5:"Boat, RV, van, etc."}
,"PP3_PPINCIMP":{1:"Less than $5,000",2:"$5,000 to $7,499",3:"$7,500 to $9,999",4:"$10,000 to $12,499",5:"$12,500 to $14,999",6:"$15,000 to $19,999",7:"$20,000 to $24,999",8:"$25,000 to $29,999",9:"$30,000 to $34,999",10:"$35,000 to $39,999",11:"$40,000 to $49,999",12:"$50,000 to $59,999",13:"$60,000 to $74,999",14:"$75,000 to $84,999",15:"$85,000 to $99,999",16:"$100,000 to $124,999",17:"$125,000 to $149,999",18:"$150,000 to $174,999",19:"$175,000 or more"}
,"PP3_PPMARIT":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PP3_PPMSACAT":{0:"Non-Metro",1:"Metro"}
,"PP3_PPRENT":{1:"Owned or being bought by you or someone in your household",2:"Rented for cash",3:"Occupied without payment of cash rent"}
,"PP3_PPREG4":{1:"Northeast",2:"Midwest",3:"South",4:"West"}
,"PP3_PPREG9":{1:"New England",2:"Mid-Atlantic",3:"East-North Central",4:"West-North Central",5:"South Atlantic",6:"East-South Central",7:"West-South Central",8:"Mountain",9:"Pacific"}
,"INTERSTATE_MOVER_PP1_PP2":{0:"stayer",1:"mover"}
,"INTERSTATE_MOVER_PP2_PP3":{0:"stayer",1:"mover"}
,"INTERSTATE_MOVER_PP1_PP3":{0:"stayer",1:"mover"}
,"PP3_PPWORK":{1:"Working - as a paid employee",2:"Working - self-employed",3:"Not working - on temporary layoff from a job",4:"Not working - looking for work",5:"Not working - retired",6:"Not working - disabled",7:"Not working - other"}
,"PP3_PPETHM":{1:"White, Non-Hispanic",2:"Black, Non-Hispanic",3:"Other, Non-Hispanic",4:"Hispanic",5:"2+ Races, Non-Hispanic"}
,"PP3_NEWER":{0:"no, newer pp3 data is Not available",1:"Yes, pp3 data is newer and available"}
,"W2W3_COMBO_BREAKUP":{0:"still together, or lost to follow-up, or partner deceased",1:"broke up"}
,"W3_BROKE_UP":{0:"still together",1:"broke up",2:"partner deceased"}
,"W3_XPARTNERED":{0:"unqualified bc unpartnered at main survery",1:"Qualified for follow-up at wave3",2:"unqualified bc broke up at wave 2"}
,"W3_XDECEASED":{0:"not deceased",1:"apparently deceased"}
,"W3_MULTINAME":{1:"reported multiple partner names in main survey"}
,"W3_XSS":{1:"yes",2:"no"}
,"W3_XLAST":{1:"1 year ago",2:"2 years ago"}
,"W3_XQUALIFIED":{0:"unqualified for wave 3",1:"qualified for wave 3"}
,"W3_STATUS":{1:"active member of KN panel",2:"subject withdrew from KN panel",3:"subject retired from KN panel, KN decision",4:"Do Not Contact- subject withdrew and asked not to be contacted"}
,"W3_SOURCE":{1:"Online",2:"Telephone"}
,"W3_XMARRY":{1:"Married",2:"Partnered"}
,"W3_XTYPE":{1:"same sex couple",2:"heterosexual couple"}
,"W3_Q1":{1:"yes",2:"no"}
,"W3_Q2":{1:"yes",2:"no",-1:"Refused"}
,"W3_Q3":{1:"divorce",2:"separation with no divorce",3:"(xNameP) passed away, is deceased",-1:"Refused"}
,"W3_Q4":{1:"I wanted the (divorce/separation) more.",2:"(xNameP) wanted the (divorce/separation) more.",3:"We both equally wanted the (divorce/separation)."}
,"W3_MBTIMING_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W3_Q5":{1:"yes",2:"no"}
,"W3_Q6":{1:"yes",2:"no"}
,"W3_Q7":{1:"yes, married (xNameP)", 2:"no, did not marry (xNameP)"}
,"W3_Q8":{1:"No, we have not gotten a domestic partnership or civil union agreement",2:"Yes, we have gotten a domestic partnership or civil union agreement"}
,"W3_Q9":{1:"We broke up",2:"(xNameP) passed away, is deceased",3:"Other (please describe)",-1:"Refused"}
,"W3_Q10":{1:"I wanted to break up more",2:"(xNameP) wanted to break up more",3:"We both equally wanted to break up"}
,"W3_NONMBTIMING_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"ZPNHWHITE_CAT":{0:"<55%",1:"55%-79.99%",2:"80%-91.99%",3:"92% and higher"}
,"ZPNHBLACK_CAT":{0:"<1%",1:"1%-2.99%",2:"3%-19.99%",3:"20% or more"}
,"ZPHISP_CAT":{0:"<2%",1:"2%-3.99%",2:"4%-19.99%",3:"20%+"}
,"ZPMEDHHINC_CAT":{0:"<$34K",1:"$34000-$41999",2:"$42000-$64999",3:"$65K+"}
,"ZPFORBORN_CAT":{0:"<2%",1:"2%-4.99%",2:"5%-11.99%",3:"12%+"}
,"ZPRURAL_CAT":{0:"non rural",1:"rural"}
,"Q15A3_CODES":{-1:"refused"}
,"W4_STATUS":{0:"unqualified for wave 4",1:"qualified for wave 4",1:"Active",2:"Withdrawn",3:"Retired",4:"Do not call"}
,"W4_SOURCE":{1:"On-line",2:"Telephone"}
,"W4_XTYPE":{1:"same-sex couple",2:"different sex couple"}
,"W4_XMONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W4_XMARRY":{1:"Married",2:"Unmarried partners"}
,"W4_XSS":{1:"Yes",2:"No"}
,"W4_Q1":{1:"yes",2:"no",-1:"Refused"}
,"W4_Q2":{1:"yes",2:"no"}
,"W4_QUALITY":{1:"Excellent",2:"Good",3:"Fair",4:"Poor",5:"Very Poor",-1:"Refused"}
,"W4_ATTRACTIVE":{1:"very attractive",2:"moderately attractive",3:"slightly attractive",4:"not at all attractive",-1:"Refused"}
,"W4_ATTRACTIVE_PARTNER":{1:"very attractive",2:"moderately attractive",3:"slightly attractive",4:"not at all attractive",-1:"Refused"}
,"W4_Q3":{1:"divorce",2:"separation with no divorce",3:"(xnamep) passed away, is deceased",-1:"Refused"}
,"W4_Q4":{1:"I wanted the (divorce/separation) more",2:"(xname) wanted the (divorce/separation) more",3:"We both equally wanted the (divorce/separation)"}
,"W4_MBTIMING_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W4_Q7":{1:"Yes, married (xnamep)",2:"No, did not marry (xnamep)"}
,"W4_MAR_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W4_Q8_A":{1:"No, we have not gotten a domestic partnership or civil union agreement",2:"Yes, we have gotten a domestic partnership or civil union agreement"}
,"W4_Q8_B":{1:"No, we have not gotten a domestic partnership or civil union agreement",2:"Yes, we have gotten a domestic partnership or civil union agreement"}
,"W4_Q9":{1:"We broke up",2:"(xnamep) passed away, is deceased"}
,"W4_Q10":{1:"I wanted to break up more",2:"(xnamep) , wanted to break up more",3:"We both equally wanted to break up"}
,"W4_NONMBTIMING_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"RELATIONSHIP_QUALITY_W4":{1:"excellent",2:"good",3:"fair",4:"poor",5:"very poor"}
,"W4_BROKE_UP":{0:"still together",1:"broke up",2:"partner passed away"}
,"W234_COMBO_BREAKUP":{0:"still together at w4, or some follow-up w/o break-up",1:"broke up at wave 2, 3, or 4"}
,"PP4_PPETHM":{1:"White, Non-Hispanic",2:"Black, Non-Hispanic",3:"Other, Non-Hispanic",4:"Hispanic",5:"2+ Races, Non-Hispanic"}
,"PP4_PPHOUSE":{1:"A one-family house detached from any other house",2:"A one-family house attached to one or more houses",3:"A building with 2 or more apartments",4:"A mobile home",5:"Boat, RV, van, etc."}
,"PP4_PPINCIMP":{1:"Less than $5,000",2:"$5,000 to $7,499",3:"$7,500 to $9,999",4:"$10,000 to $12,499",5:"$12,500 to $14,999",6:"$15,000 to $19,999",7:"$20,000 to $24,999",8:"$25,000 to $29,999",9:"$30,000 to $34,999",10:"$35,000 to $39,999",11:"$40,000 to $49,999",12:"$50,000 to $59,999",13:"$60,000 to $74,999",14:"$75,000 to $84,999",15:"$85,000 to $99,999",16:"$100,000 to $124,999",17:"$125,000 to $149,999",18:"$150,000 to $174,999",19:"$175,000 or more 120"}
,"PP4_PPMARIT":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PP4_PPMSACAT":{0:"Non-Metro",1:"Metro"}
,"PP4_PPREG4":{1:"Northeast",2:"Midwest",3:"South",4:"West"}
,"PP4_PPREG9":{1:"New England",2:"Mid-Atlantic",3:"East-North Central",4:"West-North Central",5:"South Atlantic",6:"East-South Central",7:"West-South Central",8:"Mountain",9:"Pacific"}
,"PP4_PPRENT":{1:"Owned or being bought by you or someone in your household",2:"Rented for cash",3:"Occupied without payment of cash rent"}
,"PP4_PPWORK":{1:"Working - as a paid employee",2:"Working - self-employed",3:"Not working - on temporary layoff from a job",4:"Not working - looking for work",5:"Not working - retired",6:"Not working - disabled",7:"Not working - other"}
,"PPA2009_HOW_OFTEN_SERVICES":{1:"More than once a week",2:"Once a week",3:"Once or twice a month",4:"A few times a year",5:"Once a year or less",6:"Never"}
,"W5_SOURCE":{1:"online",2:"phone"}
,"W5_COMPLETE":{0:"eligible but not completed",1:"wave 5 completed"}
,"W5_STATUS":{1:"Active",2:"Withdrawn",3:"Retired",4:"Do not call"}
,"W5X_MARRY":{0:"not qualified for wave 5",1:"qualified for wave 5",1:"married",2:"unmarried partners"}
,"W5X_LAST":{1:"-",3:"-",4:"-",5:"-"}
,"W5X_CIVIL":{0:"no civil union or DP prior to wave 5",1:"yes civil union or DP prior to wave 5"}
,"W5X_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W5X_CIVMONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W5_Q1":{1:"yes",2:"no"}
,"W5_Q2":{1:"yes",2:"no"}
,"W5_SEX_FREQUENCY":{1:"Once a day or more",2:"3 to 6 times a week",3:"Once or twice a week",4:"2 to 3 times a month",5:"Once a month or less",-1:"Refused"}
,"W5_P_MONOGAMY":{1:"Yes, I expect (name) will only have sex with me",2:"No, I expect (name) to have sex with other people besides me",-1:"Refused"}
,"W5_IDENTITY":{1:"heterosexual or straight",2:"gay",3:"lesbian",4:"bisexual",5:"Something else",-1:"Refused"}
,"W5_OUTNESS":{1:"All or most of them",2:"Some of them",3:"Only a few of them",4:"None of them"}
,"W5_Q3":{1:"Divorce",2:"Separation with no divorce",3:"(name) passed away, is deceased"}
,"W5_Q4":{1:"I wanted the divorce/separation more",2:"(name) wanted the divorce/separation more",3:"We both equally wanted the divorce/separation"}
,"W5_MBTIMING_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W5_Q5":{1:"yes",2:"no"}
,"W5_Q6":{1:"yes",2:"no"}
,"W5_Q7":{1:"Yes, married (name)",2:"No, did not marry (name)"}
,"W5_MAR_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December",-1:"Refused"}
,"W5_Q8":{1:"No, we have not gotten a domestic partnership or civil union agreement",2:"Yes, we have gotten a domestic partnership or civil union agreement"}
,"W5_CIV_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W5_Q9":{1:"We broke up",2:"(name) passed away, is deceased"}
,"W5_Q10":{1:"I wanted to break up more",2:"(name) wanted to break up more",3:"We both equally wanted to break up"}
,"W5_NONMBTIMING_MONTH":{1:"January",2:"February",3:"March",4:"April",5:"May",6:"June",7:"July",8:"August",9:"September",10:"October",11:"November",12:"December"}
,"W5_BROKE_UP":{0:"still together",1:"broke up",2:"partner deceased"}
,"W2345_COMBO_BREAKUP":{0:"still together at w5 or some follow-up w/o breakup",1:"broke up at wave 2,3,4, or 5"}
,"PP5_PPAGECAT":{1:"18-24",2:"25-34",3:"35-44",4:"45-54",5:"55-64",6:"65-74",7:"75+",99:"Under 18"}
,"PP5_PPAGECT4":{1:"18-29",2:"30-44",3:"45-59",4:"60+",99:"Under 18"}
,"PP5_PPETHM":{1:"White, Non-Hispanic",2:"Black, Non-Hispanic",3:"Other, Non-Hispanic",4:"Hispanic",5:"2+ Races, Non-Hispanic"}
,"PP5_PPGENDER":{1:"Male",2:"Female"}
,"PP5_PPHOUSE":{1:"A one-family house detached from any other house",2:"A one-family house attached to one or more houses",3:"A building with 2 or more apartments",4:"A mobile home",5:"Boat, RV, van, etc."}
,"PP5_PPINCIMP":{1:"Less than $5,000",2:"$5,000 to $7,499",3:"$7,500 to $9,999",4:"$10,000 to $12,499",5:"$12,500 to $14,999",6:"$15,000 to $19,999",7:"$20,000 to $24,999",8:"$25,000 to $29,999",9:"$30,000 to $34,999",10:"$35,000 to $39,999",11:"$40,000 to $49,999",12:"$50,000 to $59,999",13:"$60,000 to $74,999",14:"$75,000 to $84,999",15:"$85,000 to $99,999",16:"$100,000 to $124,999",17:"$125,000 to $149,999",18:"$150,000 to $174,999",19:"$175,000 or more"}
,"PP5_PPMARIT":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PP5_PPMSACAT":{0:"non-Metro",1:"Metro"}
,"PP5_PPREG4":{1:"Northeast",2:"Midwest",3:"South",4:"West"}
,"PP5_PPREG9":{1:"New England",2:"Mid-Atlantic",3:"East-North Central",4:"West-North Central",5:"South Atlantic",6:"East-South Central",7:"West-South Central",8:"Mountain",9:"Pacific"}
,"PP5_PPRENT":{1:"Owned or being bought by you or someone in household",2:"rented for cash",3:"Occupied without payment"}
,"PP5_PPWORK":{1:"Working - as a paid employee",2:"Working - self-employed",3:"Not working - on temporary layoff from a job",4:"Not working - looking for work",5:"Not working - retired",6:"Not working - disabled",7:"Not working - other"}
,"PPMARIT_2014":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PPMARIT_2013":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PPMARIT_2012":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PPMARIT_2011":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PPMARIT_2010":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PPMARIT_2009":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}
,"PPMARIT_2007":{1:"Married",2:"Widowed",3:"Divorced",4:"Separated",5:"Never married",6:"Living with partner"}

Dictionary of dictionaries in python 

As you can see this was quite a manual process and took a fair bit of time to get right. I ended up dropping some of these variables because of missing values -- e.g., if one question was only asked for a single wave -- or if the answers were almost entirely missing. It’s possible that I’ve made some errors here, but I did my best to get everything as it was intended by the study organizers.

I’ll be using DataRobot for the modeling work, so I don’t need to worry about any other preprocessing tasks; e.g., imputation, transformations, data partitioning, sampling weights, etc.  DataRobot will handle all of that for me, but I did have to worry with getting the data structured properly.

 

Defining the Target

Our main interest for this project was to make predictions about how stable/strong a relationship is.  In this case, we have repeated survey results for a particular couple that indicate whether or not the couple is still together.  While these surveys were taken around one year apart, they are not precisely a year apart.  In cases where a couple ends their relationships, it’s pretty easy to calculate roughly how long they were together.  For couples that stay together, though, how long that relationship will last in not observed (not observable) until the relationship ends.  There are methods for handling this issue (it’s called censoring), but I didn’t go down this path for two reasons.  First, these approaches tend to make a lot of assumptions that I’m not comfortable with for this dataset.  Second, a very large proportion of the couples in this dataset stay together.

For that reason, our prediction target is whether or not a couple stays together for two years from a given date.  That means that for each questionnaire that a couple completes, we look out two years (actually to the survey nearest to two years out) to verify whether or not the couple has stayed together.  That becomes our target -- a binary indicator of whether or not the couple stayed together or broke up.  In cases where I don’t have a survey taken two years later, then I simply drop that observation.

That means that I need one row per couple per survey -- that’s my unit of observation for this project.

 

Three General Categories of Data in the Dataset

  1. Data that is collected at the start of the study and doesn’t change over time.  Even though there are multiple surveys taken, some of the variables don’t change over time.  I prepare these separately.  
  2. Background survey data that is collected at different times throughout the study.  It appears that the background questionnaires and the relationship questionnaires may have been given at different times in some cases, so I split these out into a separate dataset.
  3. Relationship questionnaire collected in each wave.  These were treated separately as well.

 

Merging the Data

At this point, trying to get the data into shape was seeming like an never-ending task (a feeling well-known to data scientists), so I made a simplifying assumption.  Rather than trying to figure out exactly which background survey corresponded to which relationship questionnaire, I realized that there were no cases where either the background survey or the relationship survey were given twice within in the same year.  So I used the year of the survey as a join key.

Once I joined together the three datasets, then I joined on the relationship status from the relationship survey two years later.  Having done all that, apart from a little additional feature engineering, the modeling dataset was ready to go!

 

And finally the code!

I’m no engineer, but here’s the link to the code that I wrote to prepare the dataset for modeling.

 

Keep Reading!

  1. Introduction and Background to Relationships by DataRobot
  2. Preparing the Data for Relationships by DataRobot
  3. Building the Models for Relationships by DataRobot
  4. An Inside Look at the Design Process for Relationships by DataRobot

 

New Call-to-action