The R software (version 4.1.2) and the packages "devtool" and GitHub ("ProcessMiner/nlcor") were used for the analyses in this study. The R package ggplot2 was utilized to generate all the diagrams in the study. The index section of the report contains all scriptable R codes for the analyses performed in this investigation.
The data was imported in NetCDF format and converted to CSV format. This is to improve the compatibility and ease in opening and manipulating CSV files in RStudio using the R programming language. Every location (from location 01 to location 25) had a station variable name created after each coordinate was extracted. Next, we evaluated each station's weather data for both quality and range. A five-number statistical summary that included the mean, median, first quartile, and minimum value was also extracted for every location. Third quartile and maximum value for each location - this allowed for trend comparisons across all locations. Additionally, the IQR was computed to provide a range that contains 50% of the data; this range can also be used to generate a quartile confidence range, with 25% on either side of the IQR with confidence.
Normal seasonal variations exist in temperature. To guarantee this for our selected time frame in September, a Shapiro-Wilks test was employed to verify the null hypothesis that the data is usually distributed. With p<0.05, the data were found to be non-normally distributed. The alternative hypothesis—that the data is not normally distributed—was thus accepted for each location, while the null hypothesis was rejected.
It is impossible to apply the regular norm () function, which usually simulates distributed data given a population mean, SD, and distribution because it does not follow a normal distribution. Therefore, alternative methods to simulate our data must be needed to determine if a data point is truly incorrect (whether from a different data set or a typo), if it is genuinely a novel change in data (e.g., freak events, creating a novel area for research), or if it is due to an unrecorded change in station location (reference Munich airport story).
Additional data analysis was done using skewness and kurtosis to comprehend a dataset's properties beyond the mean and standard deviation, which shed light on the nature of deviations from a normal distribution. In R programming, the descent () function is used to analyze and compare datasets with theoretical distributions, often done when fitting probability distributions to data. A log-normal distribution is a probability distribution in which the natural logarithm of a variable follows a normal distribution.
Code can be access in this file :