Current cohort: May 2025

Welcome to the MSc AI and Data Science research project on urban big data analytics, I'm delighted to have you on board as we will explore the interesting research topics in urban science and crime science and I believe you will have confidence in your proficiency across most aspects of urban big data analytics and meet the requirements for the research reports upon completing this learning journey. While uni emphasizes the importance of avoiding academic misconduct each academic year, I must also reemphasise that any form of misconduct is unacceptable for this project. Please ensure thorough preparation and patience as you focus on each stage, which will lead to the completion of the report with high quality, ultimately contributing to the attainment of your MSc degree.

Research proposal#

A research proposal outlines a roadmap for the research project, including the objectives, methodology, timeline, and expected outcomes. In urban analytics, a research proposal aims to idetify urban issues by employing data science and AI methodologies to extract valuable insights in the context of the multifaceted nature of urban environments.

Here are the typical sections included in a research proposal:

  • Title: A concise title that captures the essence of the research project.

  • Introduction: This section should include Background information, Research gaps, Research questions or hypothesis, and Objectives or aims for the research projects.

    • Background information: It provides context for the study, including relevant literature, previous research, and the current state of knowledge in the field. It helps others understand why the research is important.

    • Research gaps: It refers to the research area or question where existing research has not adequately addressed or answered. Research gaps can arise for various reasons, such as limitations in previous studies (from literature), emerging trends or technologies and varying societal needs that have not been explored.

    • Research questions or hypothesis: It should clearly state the main research question or hypothesis that the study seeks to address.

    • Objectives or aims: It explains what the research aims to achieve or what specific concern you hope to solve and why these goals are important within the scope of the proposed study.

  • Literature Review: A review of relevant literature related to the research topic. This section demonstrates your understanding of existing concepts, findings, and methodologies, particularly the summary of current research challenges and gaps generated from the literature review.

  • Methodology: Detailed explanation of the research methodology, including data sources, and methods (e.g., data preprocessing techniques, variable measurement and the selected models).

  • Expected Outcomes: Anticipated outcome results of the research and how they will contribute to the existing body of knowledge in urban analytics.

  • Timeline: A timeline outlining the different stages of the research project, including data collection, analysis, writing, and submission.

  • References: A list of references cited in the proposal, following a specific citation style (e.g., APA, MLA).

Specifically, Table 1 lists all the important research elements for research proposals in urban analytics. Please incorporate all the elements in your research proposal and clearly deliver this table as Appdedix in the proposal.

Table 1 The checklist of the important elements in research proposal.

Element

Description

Example

Data type

Data types refer to the formats of information collected. In urban science, data can be categorised into various types for different intentions. For example, Crime data can be collected by crime survey data, policing recorded data, sel-report data and so on. Urban mobility data can be categorised into mobile phone call detail records, mobile phone GPS data, underground smart card data, WiFi data, social media data and so on.

Underground smart card data, policing recorded data

Data resource

Data resources are the various sources you collected the data under different usage licences (e.g., education and research licenses). Please read the data use policy carefully if you get access to the open data.

London datastore

Independents/features/predictors(X)

Features or independent variables are the attributes of the data that are used to predict the outcome.

Population mobility variables (measured by travel behaviours from smart card data)

Dependents/targets/responses(y)

Target variable or dependent variable predict or understand based on the feature variables. For supervised learning tasks, y typically consists of the labels or responses (e.g., can be a column) associated with each set of feature variables in the dataset.

Theft counts

Spatial unit of analysis

The spatial unit of analysis refers to the geographic level or scale in the research analysis. It defines the spatial resolution or granularity in understanding spatial patterns. Clearly defining the geographical unit of analysis can help to avoid the Ecological Fallacy in the research findings.

Lower Super Output Area (LSOA)

Temporal unit of analysis

Temporal unit of analysis refers to the time scale or interval in analysis, e.g., the examination of temporal patterns, trends, or relationships. It can categorised into hourly, daily, weekly, monthly or yearly. In some prediction tasks, it emphasises the predicting power in the time scale of the trained model, e.g., the model can predict the next week (week-level) for each LSOA.

Monthly

Study area and city

It means the specific urban areas of some select cities as the case study in the analysis, e.g., City of London areas in Greater London.

All urban areas in Greater London

Observation period

It means the temporal period of the observation in the experimental analysis, e.g., the observation period covers 2021 to 2022 (two years).

2021 year

Model/method

The main method/model will be used or trained for solving the research questions, such as some statistical models or machine learning models.

Random Forest regressor

Data sources#

Crime data#

Nowadays, crime data is readily available through numerous resources for various purposes. Several resources from different cities in the UK and US are listed below for your reference. Please take note of the data usage license and consider the data quality, particularly regarding spatial and temporal resolution issues, as detailed in the provided resources. Additionally, some city open data portals offer additional urban data that can be linked to the crime data for further analysis.

Urban data#

Urban data is accessible through various resources, encompassing socio-economic data, population statistics, transportation data, geographical boundaries, and other urban environmental data. Below are several urban data repositories. Please be aware that some data sources require registration and obtaining an educational license for usage. You are encouraged to ecplore additional data resources or use your own data sets.

Analysing tools#

Several tools can be employed for analyzing urban data, particularly focusing on geospatial and temporal data processing and modelling. Selected Python packages and software are provided for your reference:

Project management#

It is highly recommended to utilise GitHub for project management and code writing with version control. Further guidance on GitHub usage can be found at Github Docs. You can also find some online courses at Linkedin Learning, Udemy or Coursera. The simplest method involves utilizing GitHub Desktop to commit and push your local Jupyter Notebook project.

Research topics#

Table A1 The information on current research topics

No

Title

Description

Note

1

Predicting the urban mobility using deep learning and GeoAI

Understanding and optimising mobility are crucial for sustainable and efficient modern urban development. The objective of this project is to explore the complex patterns in urban mobility by analysing diverse big data sources, such as public transportation records (e.g., smart card data), social media big data and mobile phone big data. Other tasks of this project can focus on identifying what key factors (urban facilities and functional land use) influence the population’s mobility patterns (e.g., commuting behaviours), or predicting the volume of the population’s mobility trends (e.g., origin and destination flows) across different urban areas. Employing GeoAI will be pivotal in extracting meaningful insights for urban planning or public resource management.

At least one deep learning framework or model must be incorporated in this project. Machine learning and statistical models may only be developed as baseline models for comparison.

2

Spatio-temporal prediction for urban crimes using deep learning or xAI

This project aims to develop an advanced crime prediction or analysis framework/method by leveraging advanced deep learning techniques or xAI in the context of spatial and temporal big data analytics. By integrating historical crime data with relevant socio-economic and environmental factors, the project seeks to enhance the accuracy and efficiency of crime prediction in space and time or focus on explaining the specific crime patterns detected (e.g., crime hotspots and concentration). Students are encouraged to incorporate urban big data to improve crime analysis appraoch. The implementation will be designed for existing law enforcement systems to enhance effective public safety.

At least one deep learning framework or model must be incorporated in this project. Machine learning and statistical models may only be developed as baseline models for comparison.

3

AI for Urban transport analytics

The project aims to analyse the heterogeneity in urban transportation demand, usage/ridership, or model choices across urban areas to understand the diverse patterns of mobility within urban settings. By employing advanced geospatial data analytics and machine learning techniques, different types of transport mode patterns (e.g., taxi, cycling, public transportation) can be sensed, visualised, and analysed from various geo big data, such as smart card data, bike-sharing docking station data and mobile phone data.

At least one deep learning framework or model must be incorporated in this project. Machine learning and statistical models may only be developed as baseline models for comparison.


© 2025 Tongxin Chen. All rights reserved.

Contact: Tongxin.Chen at hull.ac.uk