September 8, 2024
Better Futures Institute x Clemson University Applied Data Science
By
Joseph Becker
Eric Vien
Grant Benson
3 min
Better Futures Institute x Clemson University Applied Data Science

Abstract

This report presents an analysis of the quality of life in San Antonio, conducted by a team of three Clemson University computer science students in collaboration with the Better Futures Institute (BFI). The project aims to identify areas in San Antonio where community health and safety initiatives should be prioritized. By analyzing 311 service request data, we developed predictive models to forecast future occurrences of issues related to animals, graffiti, and health and sanitation. The findings are presented in a digestible format, offering insights to inform policy decisions and community interventions.

Introduction

The quality of life in urban areas is a multifaceted issue that encompasses various factors, including public safety, environmental health, and access to essential services. In San Antonio, understanding these factors is critical to ensuring that community health and safety initiatives are effectively targeted. This report documents a collaborative project between Clemson University and the Better Futures Institute (BFI) aimed at analyzing quality of life indicators in San Antonio using data science techniques.

The primary objective of this project was to identify specific neighborhoods and council districts in San Antonio where quality of life issues are most prevalent. By analyzing data from the city's 311 service requests, we sought to predict future occurrences of issues related to animals, graffiti, and health and sanitation. These predictions are intended to guide policymakers and community leaders in prioritizing areas for intervention and resource allocation.

Our approach involved several key steps: cleaning and preparing the data, developing predictive models, visualizing the results, and clustering the data to identify high-priority areas. The findings from this analysis provide valuable insights into the spatial distribution of quality of life issues in San Antonio and offer actionable recommendations for improving community well-being.

Data Cleaning Process

To conduct a thorough analysis of the quality of life in San Antonio, we began by preparing the data provided by BFI. The dataset consisted of over 481,000 entries from the city's 311 service requests, which included reports on various community concerns such as animals, graffiti, and health and sanitation issues. The initial data was disorganized, with entries in a seemingly random order and multiple irrelevant columns.

To make the data more manageable and suitable for analysis, we followed a structured data cleaning process:

  1. Deciding What to Keep: We began by removing unnecessary columns and entries that were not related to our primary focus areas: animals, graffiti, and health and sanitation. This step reduced the dataset to approximately 100,000 relevant entries.
  2. Extracting Information: We parsed the data to extract essential details, such as location and date information. For example, we converted latitude and longitude coordinates from a string format into separate columns for easier manipulation. Similarly, we extracted date information and reformatted it into individual columns for year, month, and day.
  3. Removing Locational Outliers: To ensure the accuracy of our analysis, we identified and removed locational outliers—entries with coordinates that were far outside San Antonio (e.g., entries located in Brazil or Hawaii). This step helped to prevent skewed results.
  4. Sorting Entries: Finally, we organized the data by categorizing the entries into three primary categories (animals, graffiti, and health and sanitation) and their respective subcategories. This sorting process made the data easier to analyze and visualize.

By the end of this process, we had a clean and well-organized dataset that was ready for further analysis.

Model Development

With a clean dataset in hand, we proceeded to develop predictive models aimed at forecasting future occurrences of quality of life issues in San Antonio. The primary goal of these models was to provide insights into where community health and safety initiatives should be focused.

First Model: Neural Network for Binary Classification

Our initial approach involved creating a neural network using the Keras deep learning library. This model consisted of two dense layers: a ReLU layer with 50 neurons and a sigmoid layer with 1 neuron. The model was compiled using the Stochastic Gradient Descent (SGD) optimizer, which, despite its computational intensity, allowed us to train the model within 100 epochs.

Unfortunately, the model exhibited significant overfitting, meaning that it predicted the actual data rather than providing generalizable future predictions. As a result, the model was not suitable for making accurate forecasts.

Second Model: Neural Network with Mean-Squared Error

To address the limitations of the first model, we developed a second neural network, also using Keras, but this time optimized for minimizing the mean-squared error (MSE). The model included three dense layers: two ReLU layers with 64 neurons each and a final dense layer with 2 neurons. The model was compiled using the Adam optimizer, and the learning rate was decreased to reduce the MSE.

This second model was trained over 3,000 epochs, which significantly improved its accuracy. The resulting MSE was close to zero, indicating a high degree of precision in the model's predictions. Although the model was slightly overfitted, this level of accuracy was deemed acceptable given the project’s objectives.

The predictive accuracy of this model was much higher than the first, making it more suitable for identifying areas in San Antonio where quality of life issues are likely to arise.

Clustering and Visualization

After developing an accurate predictive model, the next step was to visualize the data in a way that could inform decision-making. We utilized clustering techniques and Geographic Information System (GIS) data to achieve this.

Clustering Predictions

To better understand the distribution of predicted issues across San Antonio, we applied the K-Means clustering algorithm to the model’s output. This method grouped the predictions into clusters based on their proximity to central points, allowing us to visualize areas with the highest density of predicted issues.

However, initial visualizations were dense and difficult to interpret. Therefore, we adjusted the number of clusters based on the frequency of predictions in each category, which allowed us to pinpoint specific neighborhoods and districts that required attention.

GIS Plotting

To enhance the clarity of our visualizations, we overlaid the clustered predictions onto two types of GIS data: Neighborhood Perimeter Plans and Council District Borders. This approach provided a clearer picture of where quality of life issues were concentrated within San Antonio.

  1. Neighborhood Perimeter Plans
  1. Council District Borders

Using geopandas, we were able to create detailed maps that displayed the density of predicted issues within each perimeter or district. By calculating the number of predicted occurrences per square mile, we identified the areas with the highest need for intervention.

These visualizations serve as powerful tools for policymakers and community leaders, offering a clear, data-driven basis for targeting resources and initiatives.

Example result when we overlaid the predictions

Results

The analysis yielded several key insights into the distribution of quality of life issues in San Antonio. For each of the three main categories—animals, graffiti, and health and sanitation—we identified specific neighborhoods and council districts where interventions should be prioritized.

Neighborhood Focus Areas

  • Animals: The Prospect Hill Area, Guadeloupe Westside, and Dignowity Hill emerged as the top neighborhoods where animal-related issues were most prevalent.
  • Graffiti: The Lavaca, Tobin Hill, and Downtown neighborhoods were identified as hotspots for graffiti-related concerns.
  • Health & Sanitation: The Downtown, Tobin Hill, and South Central neighborhoods showed the highest density of health and sanitation issues.

WITHIN NEIGHBORHOOD PERIMETER PLANS

Animals

Graffiti

Health & Sanitation

Council District Focus Areas

  • Animals: Council Districts 5, 1, and 7 were identified as the districts most affected by animal-related issues.
  • Graffiti: Council Districts 1, 5, and 7 also ranked highest for graffiti-related concerns.
  • Health & Sanitation: Council Districts 1, 5, and 7 were similarly identified as priority areas for health and sanitation interventions.

WITHIN COUNCIL DISTRICTS

Animals

Graffiti

Health & Sanitation

Overall, primary areas where quality of life improvements are most needed based on category, according to our research, are:

  1. Animals: Prospect Hill Area, Guadeloupe Westside, Dignowity Hill
  2. Graffiti: Lavaca, Tobin Hill, Downtown
  3. Health & Sanitation: Downtown, Tobin Hill, South Central
  4. Overall: Downtown, Tobin Hill, Prospect Hill Area

Based on council districts, the results are the following:

  1. Animals: 5, 1, 7
  2. Graffiti: 1, 5, 7
  3. Health & Sanitation: 1, 5, 7
  4. Overall: 1, 5, 7

Policy Implications

These findings offer a clear direction for community health and safety initiatives in San Antonio. By focusing resources on the identified neighborhoods and districts, city officials can address the most pressing issues and improve the overall quality of life for residents.

Conclusion

This report provides a comprehensive analysis of the quality of life in San Antonio based on 311 service request data. By developing predictive models and visualizing the results using GIS data, we identified specific neighborhoods and council districts where community health and safety initiatives should be prioritized.

The findings highlight the importance of targeted interventions in the Prospect Hill, Guadeloupe Westside, Dignowity Hill, Lavaca, Tobin Hill, and Downtown neighborhoods, as well as in Council Districts 1, 5, and 7. By focusing on these areas, city officials can make informed decisions that enhance the quality of life for residents across San Antonio.

Limitations and Future Research

While the predictive models developed in this project were highly accurate, they were slightly overfitted to the existing data. Future research could explore the use of other machine learning algorithms or ensemble methods to improve generalizability. Additionally, incorporating more diverse data sources, such as demographic information or environmental factors, could provide a more holistic view of the quality of life in San Antonio.

Acknowledgments

We would like to extend our sincere gratitude to the Better Futures Institute and Clemson University for their support and guidance throughout this project. Special thanks to the data science professionals who provided invaluable insights and assistance.

Eric Vien, Joseph Becker, and Grant Benson

News & Insights
Join our newsletter
Stay informed about our latest programs, research, and community initiatives aimed at creating a better tomorrow.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Related reading