Journal 20140119
The week started by watching the Data, Crime, and Conflict: Can data and connectivity improve criminal justice and mitigate conflicts? on-line conference Stata web-cast sponsored by O’Reilly media.
The talk title included:
- Adaptive Adversaries: building systems to fight fraud and cyber intruders
- Big data: Connectivity and professional disaster management
- Software is Eating Bullets — Using Information to Empower Law Enforcement
- Waging Peace with Big Data and New Technologies
- Re-thinking conflict early warning: big data and systems thinking
This led to a brief exploration of GDELT just as it was going belly-up.
This led me to the October 2013 five (90 minute) lecture series on forecasting conflict using event data given by Philip A. Schrodt at the Graduate School of Decision Sciences at the University of Konstanz on conflict — videotaped and available here. The lecture slides didn’t show up very well on the video, but your can find Acrobat versions here.
While I have a passing interest in predicting international conflict, I was mostly interested in the analysis/forecasting of event data. There is also a lot of material here on natural language processing (NLP) to extract events from news reports, but I just skimmed that material.
The lectures themselves are a little choppy, but it turns out much of the material is based on a draft book — Analyzing International Event Data: A Handbook of Computer-Based Techniques by Shrodt and his colleague Deborah J. Gerner — originally written in 2000, with the first three chapters updated in 2012. I ended up reading the book in conjunction with watching the lectures and everything fell together — both the book and the lectures — much better.
I got a fair amount out of this material and enjoyed it, but not sure it was worth the hours required — but, heh, what else is a sabbatical for! But buried in there was the gem — a reference to a really good piece on an alternative way to understand/visualize rare event data. “The Separation Plot: A New Visual Method for Evaluating the Fit of Binary Models” by Brian Greenhill, Michael D. Ward, and Audrey Sacks. [American Journal of Political Science, 55:4, 990-1002, October 2011]. I’ll have more to say about this technique when I get around to writing a post on Receiver Operating Characteristic (ROC) curves. A version of the R software package to accomplish this is available here. I hope they port to Python soon, but it really shouldn’t be that hard to create the visuals — even in Excel.