The AWARE mobile platform was used to collect and record location and activity data for the period of January 22 - February 8 2016.
This website presents a summary of where I went and what I did during this three week period, as well as a description of my process for cleaning and exploring the data.
Location data was recorded using Google Fused Location (based on a combination of GPS, wifi, and bluetooth radio locations). Activity data was imputed using a Google activity classification algorithm based on accelerometer and other sensor data.
During the study period I:
|Collection Period||January 22 - February 8 2016 (18 Days)|
|Location Data Points||239,411|
|Activity Data Points||55,589|
The map above shows the approximate locations of where the greatest of location data points were collected (larger circles = more data points from this location in the dataset). The locations are approximate, based on a crude "binning" algorithm and the count of datapoints from a given area (a crude proxy for time spent). For example, there are several "bins" along Forbes Ave between Squirrel Hill and CMU. These are "binned" aggregates of my time spent in transit along Forbes Ave.(Created with OpenStreetMaps, MapBox, and Leaflet)
In truth, I don't need a smart phone, GPS, or binning algorithm to know that I spend most of my time at home or on campus. Let's add 50,000 points of activity data and see what we see! By joining activity data to my location data based on time stamp, I'm able to correlate activity with location.
The map above shows all ~50,000 data points. Where available, the data has been color coded according to activity type. This map reveals the paths taken when driving to various ski events. (Created in ArcGIS Pro)
Zooming in a bit, now we can begin to see that the location data captured has a bit of variety and a story to tell. For example, the purple on the left-hand side of the map shows when I walked back to campus after a haircut, and green shows some paths where I like to go and run.
Now that I've shown you the results, let's talk about how they were produced.
My first step is to complete some simple exploratory data analysis. Visual exploration makes things easiest, so I exported the location data to CSV and imported into ArcMap. By default, Arc creates a map from my location data like this:
Immediately, I can see that the location data has some "noise" in it. The image below shows a light scattering of location data that looks more like shotgun scatter shot than a travelled line. The location data puts in me places where I've never been:
Following the instructions in the tutorial I "binned" my location data based on locations 5 miles apart. This, unfortunately, just produced a sparse, "breadcrumbs" view of my travels across the state.
Moreover, when I inspected these bin locations on a small-scale map, the bins (being pseudorandomly seeded) often put me spending most of my time in locations I had only passed through briefly (the locations where I spent more time were sometimes binned into locations where I spent little time).
To produce something more interesting and readable, I wrote a new function to group the bins. Any bin with fewer than 300 data points was grouped into its neighbor. I reduced the "epsilon" radius distance for the original binning function to 0.25 miles, which resulted in creating over 1000 bins. I then grouped together any bins with fewer than 300 data points into their next nearest neighbor, reducing the total number of bins to 40. These 40 resulting bins are much more representative of where I spent my time.
This byte provided some fun real-world exposure to working with messy, incomplete data.
The data was incomplete in many ways:
The data is generally coherent. Some location data is clearly incorrect (such as the yellow triangle that appears below). The volume of the data makes it difficult, as visualized, to see the patterns and trends. Slicing the data (by day of week, for example) would great increase its comprehensibility.
This project has a few interesting take-aways: 1) sensor data can be really messy (and cleaning messy data is really time-consuming); 2) the sensors in modern smartphones are pretty amazing (for example, being able to somewhat accurately identify when I'm walking versus when I'm running).