Enhanced search strategy in Cytoscape by Maital Ashkenazi
Bioinformatics application note, Abstract
Bioinformatics application note, PDF
Original proposal abstract
Maital's homepage and contact info
Download Cytoscape ESP
ESP can be downloaded from
Cytoscape plugins site, under Analysis -> EnhancedSearch. Please download version 1.1 which is the latest.
Download and place the Jar file under cytoscape/plugins directory. Launch Cytoscape and load a network to activate ESP search interface.
ESP can also be installed using Cytoscape's plugin manager (applies to Cytoscape versions 2.5 and 2.6).
About Cytoscape ESP
Synopsis
Cytoscape ESP (Enhanced Search Plugin) is an advanced search interface for Cytoscape. enables searching complex biological networks on multiple attribute fields using logical operators and wildcards. Queries use an intuitive syntax and simple search line interface. ESP is implemented as a Cytoscape plugin and complements existing search functions in the Cytoscape network visualization and analysis software, allowing users to easily identify nodes, edges and subgraphs of interest, even for very large networks.
Cytoscape ESP is a
Google Summer of Code project, mentored by David J. States, Gary Bader and Allan Kuchinsky and coordinated by Alexander Pico.
Query syntax
A query comprises search terms and operators. Search terms can be a single word such as "aquaporin", or a phrase surrounded by double quotes such as "water channel". Restricting the search to a specific attribute field is performed by placing the attribute name before the search term, followed by a colon, e.g. "compartment:tonoplast". If no attribute field is specified, all attribute fields are searched. Boolean operators (AND, OR, NOT) can be used to combine search criteria to help narrow the search. Parentheses may be used to control the Boolean logic of a query or to group multiple search criteria on a single attribute field. Inclusive or exclusive range queries allow matching of values between lower and upper bounds. Single and multiple character wildcard searches are supported.
.
|
Search Criteria |
Example |
|
Single term |
aquaporin |
|
Phrase |
"water channel" |
|
Restrict by attribute field |
gene_title:aquaporin |
|
Both terms must exist |
transcription AND factor |
|
At least one of the terms must exist |
transcription OR factor |
|
First term must exist but second must not |
transcription NOT factor |
|
Specify a required term |
+transcription factor |
|
Prohibit a term |
transcription -factor |
|
Single character wildcard |
prot?in |
|
Multiple character wildcard |
HSP* |
|
Inclusive range |
degree:[1 to 3] |
|
Exclusive search |
degree:{1 TO 3} |
|
Control Boolean logic |
(intrinsic OR integral) AND membrane |
|
Group terms to a single field |
gene_title:(+60S +"ribosomal protein") |
|
Escape special characters |
\(1\+1\)\:2 |
http://lucene.apache.org/java/docs/queryparsersyntax.html
Technical description
ESP is written in Java and based on the high performance open-source Lucene information retrieval library (
http://lucene.apache.org/).
ESP creates an internal index of all attributes within the network using Lucene. When a query is issued, all attributes are automatically indexed and then the search is performed. To support responsive user querying, the index is maintained as long as the network is not modified. To support Java Web Start, the Lucene index is stored in memory.
Lucene treats all attribute field values as strings. To support range queries on numerical attribute fields we transformed numerical values into structured strings using Solr’s NumberUtils package (
http://lucene.apache.org/solr/) preserving their numerical sorting order. A custom MultiFieldQueryParser is used to parse queries containing numeric values. Attribute fields with string or list values are tokenized with Lucene’s StandardAnalyzer.
To accommodate Lucene’s constraint of one-word attribute fields, whitespace in attribute names are replaced with underscores during indexing. In future ESP versions, attribute name autocompletion will handle this replacement.
Future features
Autocompletion of attribute field names.
History of previous searches.
Milestones
May 9, 2007 The working environment is set: Eclipse is installed, Cytoscape and Quick Find are checked out, compiled and running. I'm able to make changes and test them.
May 28, 2007 The first month of this project was dedicated to getting to know my mentors, become acquainted with Cytoscape internals, set up the working environment and fine tune my work plan, which have changed several times since the original application. We decided to use Lucene, an open-source text search engine library, as the main engine behind Enhanced Search. Coding period starts, and the first task is to create a plugin that serves as a prototype for the Enhanced Search functionality.
June 13, 2007 The basic framework is ready. We have a plugin that complies with Cytoscape 2.5 standards. It creates a menu item under plugins menu, displays a search box and gets the user query. Next step would be to use Lucene to parse the query.
June 27, 2007 Search is carried out using Lucene. Simple search queries were successfully tested. Indexing was done for nodes only. The index is placed in the memory.
July 4th, 2007 Complex search queries, involving several attributes, logical operators and wildcards were tested successfully with node attributes. Indexing is performed on both nodes and edges but we still need to check mixed queries with both nodes and edges attributes. Whitespaces in attribute names issue was solved by replacing whitespaces with underscores. Attribute names are case sensitive, attribute values are not.
July 21-25, 2007 Visiting and
presenting at ISMB/ECCB 2007, Vienna.
August 1st, 2007 ESP plugin is compatible with Cytoscape plugin manager. The search line is placed on the main toolbar. Search results are highlighted on the network. Mixed queries on both node and edge attributes are supported. JUnit tests created, as well as ant task to execute them.
August 22, 2007 Results are collected in a HitCollector, leading to faster display of search results. A progress bar was added. ESP was introduced to Cytoscape community. Range queries are supported. Queries are case insensitive.
September 9, 2007 Solved lexicographical sorting order of numerical values using Solr’s NumberUtils and a custom MultiFieldQueryParser for parsing queries containing numeric values.
October 10, 2007 Faster responsive user queries achieved by indexing a network the first time a query is executed, and re-using this index until the network is modified. Multiple networks and indexes are supported. ESP is enabled when a network is loaded and disabled when there are no networks. Search is deliberately enabled on networks with no view, to support searching of large networks without the need to create a view for them.
November 7, 2007 ESP functionality was demonstrated at the
5th Annual Cytoscape Retreat and Symposium, Amsterdam. ESP was made available for download from
Cytoscape Plugins web site.
November 14, 2007 An Application Note describing ESP functionality was submitted to Bioinformatics.
GenMAPP Wiki