iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW 2 December 2009 > Feature - Finding a clue in a data-stack

Feature - Finding a clue in a data-stack

Data-mining techniques can give law enforcement agencies the edge they need to catch criminals and stop terrorists, but they also raise concerns about respecting personal privacy and civil liberties. New research at Rutgers University has lead to the creation of DI-HOPE-KD, a suite of tools that integrate powerful data mining tactics while guarding civil liberties and due process. Image courtesy Jupiter Images; caption information courtesy of the National Science Foundation

Thanks to a distributed data analysis system under development at Rutgers University, New Jersey, the time it takes police to connect the dots between illicit activities across the globe could go from months to minutes.

Imagine that law enforcement officers in Los Angeles are investigating labs that manufacture the illegal drug methamphetamine. Meanwhile, investigators in Chicago are looking into the illegal sale of large quantities of the drug pseudoephedrine, manufactured in Canada — an over-the-counter medication often abused to make methamphetamine.

Chicago and L.A. investigators could unknowingly be working two ends of the same case, taking months to solve it. But with the DI-HOPE KD (Distributed Higher Order Privacy Enhanced Knowledge Discovery) system, that could change.

The Chicago-L.A. example is hypothetical, but according to William Pottenger, a Rutgers associate research professor, the scenario is based on a real incident. DI-HOPE KD will provide a virtual collaboration environment with a variety of tools to aid investigators, says Pottenger, who is also the director of transition for the Homeland Security Center of Excellence for Command, Control and Interoperability, based at Rutgers.

Once DI-HOPE KD is fully online, the system will use enormous amounts of data from multiple sources nationwide to turn up new leads on cases. In the meantime, the Rutgers group is testing their system’s algorithms using supercomputers to simulate a distributed environment, Pottenger says. While some parts of the DI-HOPE KD system are already available, the full system is expected to come online in the next five to ten years.

The first step in any investigation is to interview sources, collect information, and compare reports. To increase efficiency, DI-HOPE KD will take advantage of existing web interfaces through which investigators can directly input data for easier sharing and comparison.

Often there is so much data, however, that it is difficult to identify the important pieces, and quickly pruning out irrelevant data is a challenge. To address this, the Rutgers group is developing a set of ‘Higher Order Learning algorithms’ that mimic human intuition to identify important patterns and categorize the data into manageable groups. The pruned data can then be processed using data extraction algorithms that make an educated guess about the value of particular pieces of data — thus highlighting potential names, addresses and phone numbers that could be useful.

The next step is to make sense of the data. Normally, investigators meet in a room to search through huge amounts of information distributed across multiple databases for possible links. With DI-HOPE KD, however, investigators can participate remotely in a privacy-enhanced virtual environment, using the system to help connect the dots to solve the case.

In the hypothetical meth case, DI-HOPE KD would be able to connect the source in Canada, the broker in Chicago, and the lab in L.A., and alert investigators in these cities of potential links. The system would report solely the existence of a possible link; only investigators with the appropriate clearance would be able to see why the cases are linked.

“We’ve got a system that will use a number of technologies in data analytics to speed up investigations,” Pottenger says. “Supercomputers are critical to the development of this technology because without them you cannot scale (large enough to accurately test the tools).”

Amelia Williamson Smith, for iSGTW


 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map