iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW 22 August 2007 > iSGTW Technology - Weka4WS: distributed data mining using web services


Technology - Weka4WS: distributed data mining using web services

The Waikato Environment for Knowledge Analysis, or WEKA, is software developed at the University of Waikato, New Zealand. It gets its four-letter acronym from New Zealand’s native weka, flightless brown birds about the size of a chicken.
Stock image from

Released in June 2007, Weka4WS is a new tool designed to open the way for worldwide use of data mining services.

Developed at the University of Calabria Grid Computing Lab, Weka4WS extends the open source Weka toolkit for supporting distributed data mining on grid environments.

The original Weka provides a large collection of machine learning algorithms, written in Java, for data pre-processing, classification, clustering, association rules and visualization, which can be invoked through a common Graphical User Interface.

In Weka, the overall data mining process takes place on a single machine, since the algorithms can be executed only locally. Weka4WS extends Weka to support remote grid execution of the data mining algorithms through web services—hence the 4WS.

In this way, distributed data mining algorithms for classification, clustering and association rules can be concurrently executed on decentralized grid nodes.

To enable remote invocation, all the data mining algorithms provided by the Weka library are exposed as a web service, which can be easily deployed on available grid nodes.

Thus, Weka4WS also extends the Weka GUI to enable the invocation of the data mining algorithms that are exposed as web services on remote grid nodes.
The extended Knowledge Flow: Still under development, this component will allow execution of data mining workflows over multiple grid machines.
Image courtesy of Weka4WS

Grid integration

To achieve integration and interoperability with standard grid environments, Weka4WS has been designed by using the Web Services Resource Framework as an enabling technology.

In particular, Weka4WS has been developed by using the WSRF Java library provided by Globus Toolkit 4.

The current version of Weka4WS (1.0, released 7 June 2007), is based on the latest version of Weka (3.4.11, released 1 June 2007) and extends the Weka Explorer component. It runs on *nix platforms and requires Globus Toolkit 4 on both client and server nodes.

The development team is currently working on a new version that will include an extension of the Knowledge Flow component for grid-enabled data mining workflows, as well as support for running the client on any platform (including, for example, Microsoft Windows).

Weka4WS is partially funded by CoreGRID, an EU Network of Excellence on Peer-to-Peer and Grid technologies. Weka4WS is freely downloadable.

- Domenico Talia, University of Calabria, Italy



 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map