Showing 1-7 of 7 results

Health, Consumption, Geography, and Policy Evaluation in Chilean Households

Creators: Jose Ignacio Cuesta, Stanford University; Felipe Gonzalez, Queen Mary University; Juan Pablo Atal, University of Pennsylvania; Cristobal Otero, Columbia University
Publication Date: 2024-01-29
Creators: Jose Ignacio Cuesta, Stanford University; Felipe Gonzalez, Queen Mary University; Juan Pablo Atal, University of Pennsylvania; Cristobal Otero, Columbia University

This collection gathers diverse datasets to support the analysis of health outcomes, consumption behavior, demographic patterns, and policy interventions across Chile. It includes national household surveys such as the EPF (Encuesta de Presupuestos Familiares), administrative health records from IQVIA and AHRQ, geographic grid and map data, and experimental data from field interventions with baseline and follow-up measures. Complementary datasets provide information on population structure (INE), hospitalization trends, chronic disease prevalence, and zip code-level demographics. Spatial datasets covering urban areas like Santiago, Iquique, Valparaíso, and others allow for fine-grained geographic analysis. This integrated data infrastructure enables interdisciplinary research on public health, economic disparities, pharmaceutical access, and the localized impact of social and health policies.

Gender and Health Service Access in Rural India

Creators: Pascaline Dupas, Princeton University; Radhika Jain, University College London
Publication Date: 2024-04-08
Creators: Pascaline Dupas, Princeton University; Radhika Jain, University College London

This data collection compiles survey-based and derived indicators focused on health service utilization, gender disparities, and local governance in India. It includes datasets from audit surveys of hospital patients, household surveys involving childbirth-related care, and surveys of local government leaders (sarpanches). Complementing these are demographic-adjusted estimates of illness incidence and prevalence derived from the Global Burden of Disease study. The datasets are designed to support analyses of gender differences in access to healthcare and regional health burdens across the state of Rajasthan.

Cereal Products, Nutritional Composition, and Market Prices

Creators: Nano Barahona, Cristóbal Otero, Sebastián Otero
Publication Date: 2023-05-01
Creators: Nano Barahona, Cristóbal Otero, Sebastián Otero

This dataset collection integrates product-level and time-series data related to cereal and grain products. It includes nutritional composition data from various years and sources, detailed product characteristics from scanned records and field observations, consumer belief responses linked to product features, and historical commodity price trends for key ingredients such as cocoa, corn, oats, rice, sugar, and wheat. Additionally, a week-to-date mapping file supports temporal alignment across datasets. Together, these files enable the analysis of nutritional trends, consumer perceptions, and pricing dynamics in the cereal product market.

COVID-19 infection cases and deaths in the United States

Creators: The New York Times
Publication Date: 2020/01/21 - 2020/11/03
Creators: The New York Times

This dataset, compiled and maintained by The New York Times, contains daily counts of COVID-19 cases and deaths at the county level in the United States. It was used in the study “Accounting for partisanship and politicization: Employing Benford’s Law to examine misreporting of COVID-19 infection cases and deaths in the United States” to assess potential misreporting of COVID-19 statistics. The dataset spans from early 2020 onwards and includes columns such as date, county, state, FIPS code, cumulative cases, and cumulative deaths.

Social Network Data of Student Relationships

Creators: Rebecca Mauldin
Publication Date: 2024
Creators: Rebecca Mauldin

This dataset is from longitudinal social network analysis research that collected survey data from one class of graduate students (N=142) in a Master of Social Work (MSW) program in a large U.S. public university. The program used cohort-based learning in the first semester after which students were integrated into the student body as a whole. The dataset contains network data about friendships, academic discussion ties, and professional influence among classmates. Student attribute data include archival data from the school (e.g., student demographics, incoming GPA, GRE scores) and survey items (e.g., sense of belonging scale, multicultural perspective, perceived stress).

DATA-SPECIFIC INFORMATION

Participation Status across all Four Waves Overview:
File name: ParticipationAcrossWaves.csv
Number of variables: 6
Number of cases/rows: 145

Wave 1 Characteristics Overview:
File name: w1_Characteristics.csv
Number of variables: 39
Number of cases/rows: 145

Wave 3 Characteristics Overview:
File name: w3_Characteristics.csv
Number of variables: 40
Number of cases/rows: 145

Wave 4 Characteristics Overview:
File name: w4_Characteristics.csv
Number of variables: 49
Number of cases/rows: 145

Wave 1 Know-of Ties:
File name: w1_KnowofEdgelist.csv
Number of variables: 2
Number of cases/rows: 169

 

Academic Ties Overview:
File name (wave 2): w2_AcademicEdgelist.csv
Number of variables: 2
Number of cases/rows: 1464

File name (wave 3): w3_AcademicEdgelist.csv
Number of variables: 2
Number of cases/rows: 1642

File name (wave 4): w4_AcademicEdgelist.csv
Number of variables: 2
Number of cases/rows: 2260

Friendship Ties Overview:
File name (wave 2): w2_FriendshipEdgelist.csv
Number of variables: 2
Number of cases/rows: 684

File name (wave 3): w3_FriendshipEdgelist.csv
Number of variables: 2
Number of cases/rows: 752

File name (wave 4): w4_FriendshipEdgelist.csv
Number of variables: 2
Number of cases/rows: 964

Professional Influence Ties Overview:
File name (wave 2): w2_ProfessionalEdgelist.csv
Number of variables: 2
Number of cases/rows: 567

File name (wave 3): w3_ProfessionalEdgelist.csv;
Number of variables: 2
Number of cases/rows: 809

File name (wave 4): w4_ProfessionalEdgelist.csv
Number of variables: 2
Number of cases/rows: 981

Shared Courses Edgelist Overview:
File name: SharedCourseValuedEdgelist.csv
Number of variables: 3
Number of cases/rows: 14,714

Shared Course Affiliation Matrix Overview:
File name: SharedCourseAffiliationMatrix.csv
Number of matrix rows: 145
Number of matrix columns: 145

World Happiness Report

Creators: F. Helliwell, John; Layard, Richard; Sachs, Jeffrey D. ; De Neve, Jan-Emmanuel; Aknin, Lara B.; Wang, Shun
Publication Date: 2012
Creators: F. Helliwell, John; Layard, Richard; Sachs, Jeffrey D. ; De Neve, Jan-Emmanuel; Aknin, Lara B.; Wang, Shun

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others. The dataset has a size of 80,86 kB.

Heart Disease Data Set

Creators: Janosi, Andras; Steinbrunn, William; Pfisterer, Matthias; Detrano, Robert
Publication Date: 1988
Creators: Janosi, Andras; Steinbrunn, William; Pfisterer, Matthias; Detrano, Robert

The Heart Disease database is a well-regarded resource in the medical research community, particularly for studies related to cardiovascular conditions. It comprises data from four distinct databases: the Cleveland Clinic Foundation, the Hungarian Institute of Cardiology in Budapest, the V.A. Medical Center in Long Beach, California, and the University Hospital in Zurich, Switzerland. Each of these databases contains patient records with various medical attributes, totaling 76 features. However, most research has focused on a subset of 14 key attributes to diagnose the presence of heart disease. he dataset is relatively small, with each database containing a few hundred records. For example, the Cleveland database includes 303 instances. Given the number of attributes and instances, the dataset’s size is minimal, making it easily manageable for analysis without requiring significant storage resources. The data was collected over several years, primarily during the 1980s.

Each patient record in the dataset includes the following 14 attributes commonly used in research:

  • Age: Age of the patient in years.
  • Sex: Gender of the patient (1 = male; 0 = female).
  • Chest Pain Type (cp): Categorical variable indicating the type of chest pain experienced, with values ranging from 0 to 3.
  • Resting Blood Pressure (trestbps): Resting blood pressure in mm Hg upon hospital admission.
  • Serum Cholesterol (chol): Serum cholesterol level in mg/dl.
  • Fasting Blood Sugar (fbs): Binary variable indicating if fasting blood sugar is greater than 120 mg/dl (1 = true; 0 = false).
  • Resting Electrocardiographic Results (restecg): Categorical variable with values 0 to 2 indicating ECG results.
  • Maximum Heart Rate Achieved (thalach): Maximum heart rate achieved during exercise.
  • Exercise-Induced Angina (exang): Binary variable indicating if exercise-induced angina occurred (1 = yes; 0 = no).
  • ST Depression (oldpeak): ST depression induced by exercise relative to rest.
  • Slope of the Peak Exercise ST Segment (slope): Categorical variable with values 0 to 2.
  • Number of Major Vessels Colored by Fluoroscopy (ca): Integer value ranging from 0 to 3.
  • Thalassemia (thal): Categorical variable indicating blood disorder status (3 = normal; 6 = fixed defect; 7 = reversible defect).
  • Diagnosis of Heart Disease (target): Integer value ranging from 0 to 4, indicating the presence and severity of heart disease.

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.