Research Projects for MSc AI and Data Science

Current cohort: May 2024

Welcome to the MSc AI and Data Science research project on urban big data analytics, I'm delighted to have you on board as we will explore the interesting research topics in urban science and crime science and I believe you will have confidence in your proficiency across most aspects of urban big data analytics and meet the requirements for the research reports upon completing this learning journey. While uni emphasizes the importance of avoiding academic misconduct each academic year, I must also reemphasise that any form of misconduct is unacceptable for this project. Please ensure thorough preparation and patience as you focus on each stage, which will lead to the completion of the report with high quality, ultimately contributing to the attainment of your MSc degree.

1 Research proposal

A research proposal outlines a roadmap for the research project, including the objectives, methodology, timeline, and expected outcomes. In urban analytics, a research proposal aims to idetify urban issues by employing data science and AI methodologies to extract valuable insights in the context of the multifaceted nature of urban environments.

Here are the typical sections included in a research proposal:

Specifically, Table 1 lists all the important research elements for research proposals in urban analytics. Please incorporate all the elements in your research proposal and clearly deliver this table as Appdedix in the proposal.

Table 1 The checklist of the important elements in research proposal.

Element Description Example
Data type Data types refer to the formats of information collected. In urban science, data can be categorised into various types for different intentions. For example, Crime data can be collected by crime survey data, policing recorded data, sel-report data and so on. Urban mobility data can be categorised into mobile phone call detail records, mobile phone GPS data, underground smart card data, WiFi data, social media data and so on. Underground smart card data, policing recorded data
Data resource Data resources are the various sources you collected the data under different usage licences (e.g., education and research licenses). Please read the data use policy carefully if you get access to the open data. London datastore
Independents/features/predictors(X) Features or independent variables are the attributes of the data that are used to predict the outcome. Population mobility variables (measured by travel behaviours from smart card data)
Dependents/targets/responses(y) Target variable or dependent variable predict or understand based on the feature variables. For supervised learning tasks, y typically consists of the labels or responses (e.g., can be a column) associated with each set of feature variables in the dataset. Theft counts
Spatial unit of analysis The spatial unit of analysis refers to the geographic level or scale in the research analysis. It defines the spatial resolution or granularity in understanding spatial patterns. Clearly defining the geographical unit of analysis can help to avoid the Ecological Fallacy in the research findings. Lower Super Output Area (LSOA)
Temporal unit of analysis Temporal unit of analysis refers to the time scale or interval in analysis, e.g., the examination of temporal patterns, trends, or relationships. It can categorised into hourly, daily, weekly, monthly or yearly. In some prediction tasks, it emphasises the predicting power in the time scale of the trained model, e.g., the model can predict the next week (week-level) for each LSOA. Monthly
Study area and city It means the specific urban areas of some select cities as the case study in the analysis, e.g., City of London areas in Greater London. All urban areas in Greater London
Observation period It means the temporal period of the observation in the experimental analysis, e.g., the observation period covers 2021 to 2022 (two years). 2021 year
Model/method The main method/model will be used or trained for solving the research questions, such as some statistical models or machine learning models. Random Forest regressor

The starting date of the research project for this cohort is Jan 2025.

2 Recommended reading

The recommended reading section offers review papers and empirical works to aid in grasping the fundamental concepts, methodologies, and data types employed within specific topics. However, due to the proliferation of advanced methods (especially AI and Data Science) and emerging data types in these hot research topics, it may not cover all relevant literature in each particular research domain. It is strongly advised to further explore specific topics through Google Scholar or the university library for comprehensive understanding. To clarify, each number following the literature reference corresponds to the related topics index, which can be located in the appendix section.

3 Data sources

Crime data

Nowadays, crime data is readily available through numerous resources for various purposes. Several resources from different cities in the UK and US are listed below for your reference. Please take note of the data usage license and consider the data quality, particularly regarding spatial and temporal resolution issues, as detailed in the provided resources. Additionally, some city open data portals offer additional urban data that can be linked to the crime data for further analysis.

Urban data

Urban data is accessible through various resources, encompassing socio-economic data, population statistics, transportation data, geographical boundaries, and other urban environmental data. Below are several urban data repositories. Please be aware that some data sources require registration and obtaining an educational license for usage. You are encouraged to ecplore additional data resources or use your own data sets.

4 Analysing tools

Several tools can be employed for analyzing urban data, particularly focusing on geospatial and temporal data processing and modelling. Selected Python packages and software are provided for your reference:

5 Project management

It is highly recommended to utilise GitHub for project management and code writing with version control. Further guidance on GitHub usage can be found at Github Docs. You can also find some online courses at Linkedin Learning, Udemy or Coursera. The simplest method involves utilizing GitHub Desktop to commit and push your local Jupyter Notebook project.

Appendix

Table A1 The information on current research topics

No Title Description
1 Spatio-temporal prediction for urban crimes (Crime prediction) using machine learning/ deep learning and big data This project aims to develop an advanced crime prediction or analysis framework/method by leveraging advanced machine learning and deep learning techniques in the context of spatial and temporal big data analytics. By integrating historical crime data with relevant socio-economic and environmental factors, the project seeks to enhance the accuracy and efficiency of crime prediction in space and time or focus on explaining the specific crime patterns detected (e.g., crime hotspots and concentration). Students are encouraged to incorporate urban big data to improve crime analysis appraoch. The implementation will be designed for existing law enforcement systems to enhance effective public safety.
2 Exploring the urban mobility patterns using big data analytics Understanding and optimising mobility are crucial for sustainable and efficient modern urban development. The objective of this project is to explore the complex patterns in urban mobility by analysing diverse big data sources, such as public transportation records (e.g., smart card data), social media big data and mobile phone big data. Other tasks of this project can focus on identifying what key factors (urban facilities and functional land use) influence the population’s mobility patterns (e.g., commuting behaviours), or predicting the volume of the population’s mobility trends (e.g., origin and destination flows) across different urban areas. Employing alternative methods such as machine learning and geospatial analysis will be pivotal in extracting meaningful insights for urban planning or public resource management.
3 Urban transport analytics using big data The project aims to analyse the heterogeneity in urban transportation demand, usage/ridership, or model choices across urban areas to understand the diverse patterns of mobility within urban settings. By employing advanced geospatial data analytics and machine learning techniques, different types of transport mode patterns (e.g., taxi, cycling, public transportation) can be sensed, visualised, and analysed from various geo big data, such as smart card data, bike-sharing docking station data and mobile phone data.
4 Evaluating urban vitality /vibrancy using geo-big data This project aims to evaluate the urban vitality across retail areas (or high streets) through the footfall traffic sensed from geo-big data. The primary task of this project is to explore the daily rhythms of vitality (represented by footfall traffic at place venues) across different urban land use areas. Second, it seeks to identify key factors influencing the vitality, encompassing aspects such as economic revitalization, social cohesion, and environmental sustainability. To achieve this objective, alternative research methods, including spatial and temporal analyses, and machine learning models will be explored to provide a comprehensive understanding of the dynamics of urban areas.
5 Sensing urban functions through big data analysis In the context of dynamic urban environments, this project endeavours to employ AI tools to sense and comprehend various urban functions from geo big data.  Against the backdrop of rapidly evolving cities, understanding the intricate interplay of urban population interacting with diverse functions in the urban landscape is crucial for effective urban planning and management. The primary objective of this project is to detect and portray the dynamic urban function zones from human activity patterns sensed from geo big data (e.g., social media data, mobile phone data, smart card data, street view data and remote sensing data).

© 2025 Tongxin Chen. All rights reserved.

Contact: Tongxin.Chen@hull.ac.uk