Privacy and origin-destination data
Crucial to many mobility-related studies is good origin-destination data, but most data is only available at the level of statistical sectors, municipalities, regions, and so forth. Such data, despite being useful for generic analysis providing overviews, is rather holistic when it comes to analysis on a microscopic scale (e.g., the number of travelers on each road segment).
In this regard, a common practice is to use each area’s centroid as the base-point. However, these areas are usually irregular polygons making the centroid rather a poor representative of the data sample. Therefore, it would be best to use the information that comes with the entire polygon.
Combining two datasets
In some origin-destination datasets, we know that one side of the data describes, for example, residential points of origin. It’s also usually this part that is privacy-sensitive data meaning it’s aggregated to larger areas. Using the fact that these points are residential, combined with landuse, we can enhance the original origin-destination dataset by randomly distributing the samples in data on to the areas with the corresponding landuse.
To do this, we use a random point cloud distributed based on landuse data and knowledge about cyclists’ behavior.
Getting landuse data
Residential areas are one of (if not the) most relevant areas. After all, residential areas usually are where trips are generated from or end to. Thanks to OpenStreetMap (OSM), we know where these areas are.
Distance, bins and seeding
The next challenge is to attribute the origin-destination data to the corresponding landuse areas. For instance, in the case of a school accessibility analysis by bicycle, we use the school’s surrounding residential areas. But just randomly distributing the data over all residential areas around the schools cannot be done without answering a crucial question: how far around the school?
This question is partially answered in some origin-destination datasets. Besides, existing research suggests a clear link between the mode choice, in this case bicycle, and the distance traveled.
With this in mind, distance intervals can serve as bins and subsequently proportion of cycling trips from each distance interval to the school may be considered as the expected number of seeds (i.e., students) in each bin.
Distribution of Random Points
Now it’s time to distribute points randomly and create point cloud(s) over the corresponding (distance interval classified) polygons. Each point represents a possible student, and of course, the profile of students in a given school may change every school year. In other words, new students may as well mean new addresses. Therefore, the higher the iteration in the distribution of random points, the more reliable the accessibility analysis and subsequent assessments and conclusions.
Network and Accessibility Analysis
Eventually, we used the point clouds as origins and schools as destinations with an appropriate routing profile and our QGIS plugin to plan routes between all origin-destination pairs and import them in QGIS for further analysis.
Next, we can also do a Network Frequency Analysis, or in other words, in how many scenarios (school routing), a link has been used. This can be interpreted as (body of) the most pertinent cycling network serving these schools.
In conclusion, even with sparse origin-destination data, we can analyze cycling use around schools. This is made possible by combining the origin-destination data with landuse data and knowledge about cyclists’ behavior.
Posted by Hamed Eftekhar, Jan 21