Capstone

Problem Statements

A large array of urban activities including mobility can be modeled as networks evolving over time. These networks potentially capture the changes in urban dynamics caused by events like strikes and weather extremities, but identification of these events from temporal networks is a challenging problem and we intend to address it in this research. Our approach is a topological aggregation of the network followed by dimensionality reduction using representation learning, enabling the application of standard outlier detection to low dimensional representation space. We will evaluate the methodology by its ability to identify specific urban events. We expect our research to produce a methodology for anomaly detection in temporal networks of urban mobility that outperforms the legacy techniques and is generalizable to different types of temporal networks. Our motivations to pursue this problem is our belief that such a system can be used in early detection of potentially unsafe developments and enable a timely response.

Data

The urban mobility datasets were collected for multiple cities, including taxi ridership datasets for New York (USA), Washington DC (USA) and Chicago (USA), and subway ridership data for Taipei (Taiwan). We aggregated all these datasets at the day level and transformed them into a convenient uniform format.The summary of the aggregated mobility datasets is listed below.

Chicago

943 Days

77 Taxi Zones

New York

729 Days

263 Taxi Zones

Washignton

730 Days

96 Taxi Zones

Taipei

637 Days

108 Subway Stations

Data Summary

City	No. of Temporal Points	Total Ridership Collected	Average Ridership (per station per day)	Total Number of Nodes (Stations/ Taxi Zones)	Total Number of Edges
Chicago Taxi	943	14919326	282	77	1015
Washington DC Taxi	730	19062827	508	96	3668
Taipei Subway	637	1307013573	18969	108	11664
New York Taxi	729	540472732	2815	263	65792

Data Source

City	Timeframe	Website	No. of Records Obtained
Chicago	2015-01-01 to 2017-08-01	https://data.cityofchicago.org/Transportation /Taxi-Trips/wrvz-psew/data	170717
Washington DC	2016-01-01 to 2017-12-31	http://opendata.dc.gov/search?q=taxi	678485
Taipei	2017-01-01 to 2018-09-30	https://data.taipei/dataset/detail/metadata?i d=63f31c7e-7fc3-418b-bd82-b95158755b4 d	7374816
New York City	2017-01-01 to 2018-12-3	https://www1.nyc.gov/site/tlc/about/tlc-trip-r ecord-data.page	21380658

Events that are global in nature can be identified relatively easily by just using the legacy methods like aggregated time series analysis, as the impact of these events can be seen across the entire network. The challenging problem we want to address using this study is to detect events that are local in nature yet are significant enough to impact the ridership in the overall network. To benchmark the efficacy of our method in detecting events where the legacy methods perform well, and to detect events of our interest as mentioned before, we have selected a set of global and 3 significant local events for this study. The different types of events we have considered are National Holidays, Cultural Events, Parades, Protests, and Extreme Weather. The weather datasets were further processed to detect extreme weather conditions from weather readings. Days having temperature or precipitation, above or below the threshold (1%) have been marked as extreme weather condition. For temperature, we also marked those days that are above or below 2 standard deviations from the rolling average of the last 10 days as local extreme weather condition. The summary of the aggregated events data is presented below.

National Holidays

Culture Events

Extreme Weather

Data Summary

City	Extreme Weather	National Holiday	Culture Event
Chicago	42	10	4
Washington DC	43	41	46
Taipei	63	30	5
New York City	49	21	18

Weather Data Source

City	Timeframe	Website
Chicago	2015-01-01 to 2017-08-01	https://www.wunderground.com/history/monthly/us/il/chicago/KORD/date
Washington DC	2016-01-01 to 2017-12-31	https://www.wunderground.com/history/monthly/us/dc/washington/KDCA/date
Taipei	2017-01-01 to 2018-09-30	https://www.wunderground.com/history/monthly/tw/songshan-district/RCSS/date
New York City	2017-01-01 to 2018-12-31	https://www.wunderground.com/history/daily/us/ny/new-york-city/KLGA/date

We created synthetic data primarily for two reasons.

► To inject artificial anomalies of different kinds for diagnosing the models.

► To generate large volumes of data to enable training of deep auto-encoder.

Since the data has strong correlations we could not fit distributions and sample independently for each column. So we performed PCA to extract independent latent variables and fitted Gaussian distributions on these variables. Data was then created by sampling from these distributions and using inverse PCA to transform data back into the network domain. Finally, following types of anomalies were injected in the network domain;

► Global anomalies where the entire network witnesses a shift in the ridership.

► Balanced anomalies where some portions of the network experience shifts in ridership but the aggregated ridership is not affected on average.

Conclusion

Experiments on real-world data exhibited that community detection outperforms spatial aggregation because it also considers topological structure and connectivity of networks. Furthermore, time series analysis of daily aggregation does well in isolating anomalies which have a global impact while it fails to do well in isolating localized anomalies. Further experiments will yield a deeper diagnosis of the performance of these techniques. Experiments on synthetic data show that decomposition approaches (PCA and Autoencoder) perform better than crude network aggregation. But these experiments have not revealed any advantage of autoencoder over PCA. This is plausible because autoencoders provide an advantage in modeling complex nonlinear relationships but the data generation process was based on PCA and only had linear correlations between features. We will further refine the 7 synthetic data generation process and try to inject different types of anomalies which disrupt distinctive spatial and temporal patterns at different scales. This will provide detailed diagnostics into comparative capabilities of different methodologies in isolating a different kind of anomalies.

Pattern and Anomaly Detection in Urban Temporal Networks

Organization: NYU CUSP & Lockheed Martin

Sponsor: Stan Sobolevsky & Sergey Malinchik

Problem Statements

Data

Chicago

943 Days

77 Taxi Zones

New York

729 Days

263 Taxi Zones

Washignton

730 Days

96 Taxi Zones

Taipei

637 Days

108 Subway Stations

Data Summary

Data Source

National Holidays

Culture Events

Extreme Weather

Data Summary

Weather Data Source

Results

Conclusion

For more information

Team Members

Urwa Muaz

Prof. Stan

Shivam Pathak

Mingyi He

Jingtian Zhou

Saloni Saini

Acknowledgement

Pattern and Anomaly Detection in Urban Temporal Networks

Organization: NYU CUSP & Lockheed Martin

Sponsor: Stan Sobolevsky & Sergey Malinchik

Problem Statements

Data

Chicago

943 Days

77 Taxi Zones

New York

729 Days

263 Taxi Zones

Washignton

730 Days

96 Taxi Zones

Taipei

637 Days

108 Subway Stations

Data Summary

Data Source

National Holidays

Culture Events

Extreme Weather

Data Summary

Weather Data Source

Results

Conclusion

For more information

Team Members

Urwa Muaz

Prof. Stan

Shivam Pathak

Mingyi He

Jingtian Zhou

Saloni Saini

Our Sponsors

Acknowledgement