The dataset\cite{buchwaldera}  is a hectometric grid of the variables in Geneva which was provided by the Swiss federal office. The thermal data was acquired by the Landsat 8 satellite on 26.06.2018, while the noise data was downloaded from the swiss confederation website \cite{godonnes}. In this study, the values of the variables were attributed to each hectometric cell and then the sum, mean, median and standard deviation were calculated using the zonal statistics tool from QGIS.  The noise data is measured in dB and the thermal radiation are given in Digital numbers, which can then be converted to different values. The projection used in this project is the one from the swiss coordinate system: ESPG 21781.

Methods

The Software used to perform the exploratory data analysis is GeoDa \cite{geoda}, a free open source software that provides tools for geospatial analysis. We will first make a basic statistical analysis of the data distribution using a scatter plot of the thermal radiation versus the noise experienced during the day. However, this method only lists the point and does not give an idea of the spatial distribution of the data.
For this reason we will add a co-location map of the two variables. To create a co-location map, we first need to divide each set of data into quantiles. We decided to use four quantiles, as a bigger number would lead to spreading the data too much to be relevant, while a lower number has the risk of grouping together points that could be very different, depending on the repartition inside the dataset. This co-location map will then show which points of each variable appear in the same quantile. This is a very useful tool to assess the correlation between two variables while also showing their location, allowing the reader to pick out different regions of the town. This will be useful because the population and activity are not spread out evenly across Geneva. The analysis will focus on finding hot spots in the city where the correlation might be significantly high or low.

Results

The scatter plot (Fig.1) Shows us a low correlation between the two variables, with an R2 of only 0.062, it tells us that the variables are very close to being independent. We can see that the data are very clustered towards the center, having a very high number of observation in the middle of the graph, while the points on the outside are more spread out and in much lower number. We also see that the regression curve has a positive slope, implying a positive correlation.