Google Summer of Code 2008
"Google Summer of Code is a program that offers student developers stipends to write code for various open source projects. Google works with a several open source, free software and technology-related groups to identify and fund several projects over a three month-period. Last year, the program has brought together 900 students and 1500 mentors from 90 different countries to work on 130 open source projects."
get more open source code created and released for the benefit of all
inspire young developers to begin participating in open source development
help open source projects identify and bring in new developers/committers
provide students the opportunity to do work related to their academic pursuits during the summer
give students more exposure to real-world software development scenarios
2008 Student Project Pages
GenMAPP GSoC 2008
Like last year, we are again pooling the efforts of our colleagues and collaborators for this year's Google Summer of Code 2008. This is a great opportunity to work at the intersection of biology and computing. The GenMAPP organization will represent projects from Cytoscape, WikiPathways and PathVisio (see below). You'll notice that some of the projects are aimed at increasing cross-talk across these related projects. We like to get the most out of open source software development!
GenMAPP
GenMAPP is a visualization and analysis tool for biological data. GenMAPP illustrates the relationships between various genes and proteins to help researchers understand their data in terms of connected, biological pathways. Over 18,000 people from >70 countries have registered to download the GenMAPP program. The GenMAPP group is coordinated by the
Conklin Lab at the
Gladstone Institutes (
University of California, San Francisco). There are 361 publications that reference GenMAPP or use GenMAPP to display data in the context of biological pathways. GenMAPP is 100% open source and all new development is in Java, MySQL, Derby, XML, and Web technologies such as wiki in collaboration with the
UCSF library,
BiGCaT Bioinformatics, and the
Cytoscape Consortium. Our development team is composed of individuals who are both biologists and programmers, providing a unique perspective on building and using open source tools.Links:
Website
Wiki
Cytoscape
Cytoscape is a general network visualization tool that integrates network topology with data about the network into the visualization. Cytoscape was developed in and finds most use in the Systems Biology community. With over 2500 downloads per month Cytoscape is rapidly becoming a standard within the community. Cytoscape consists of a core application and a plugin framework which users exploit to extend the functionality of the application in new ways. Our team consists of programmers and biologists from both academia and industry including:
UC San Diego,
UC San Francisco,
U of Toronto,
Agilent,
Institute for Systems Biology,
Unilever,
Sloan-Kettering,
Institut Pasteur, and others. Links:
Website
Wiki
Javadoc
WebStart
WikiPathways
WikiPathways is a wiki for biological pathways, it does for pathways what
WikiPedia does for the encyclopedia. WikiPathways was born out of the need to extend and update the GenMAPP pathway collection, which is a laborious task, often requiring domain specific knowledge. The wiki approach allows biologists that have this domain knowledge to easily create or update a pathway. Pathways can be directly edited from a web browser using an editor applet where you can draw genes, proteins and their interactions like in any popular drawing tool. The pathways can be used as images for publication and in data analysis tools such as GenMAPP and Cytoscape. There are currently about 500 pathways available, divided over 6 different species and over 100 users have already registered to WikiPathways. WikiPathways itself is completely open source and is build on top of
MediaWiki, using
PathVisio as pathway editor. WikiPathways is developed and maintained by
BiGCaT Bioinformatics (
University of Maastricht) and the
Conklin Lab at the
Gladstone Institutes (
University of California, San Francisco). Links:
Website
Source code
PathVisio
PathVisio is another pathway visualization tool. Like GenMAPP, it can display relations between genes, proteins and metabolites. PathVisio is focused more on the pathway creation rather than (microarray) data analysis. PathVisio started out as a test case for the development of GPML, an XML based format for storing pathways. Currently, PathVisio plays an important role as the editor applet of WikiPathways. Links:
Website
API docs
Webstart
How to apply
We would like to know who you are and how you think. Incorporate the following into your application:
Your programming interests and strengths
What are your languages of choice?
Any prior experience with open source development?
What do you want to learn this summer?
Your interest and background in biology or bioinformatics
Any prior exposure to biology or bioinformatics?
Any interest in learning a bit of biology this summer?
Your ideas for a project (an original idea or one expanded from our Ideas Page)
Provide as much detail as possible
Strong applicants include an implementation plan and timeline (hint)
Refer to and link to other projects or products that illustrate your ideas
Identify possible hurdles and questions that will require more research/planning
What can you bring to the team?
Are you enthusiastic?
If you are selected
You be working with a small, active group of programmers that also speak biology
You will be gaining experience in a rapidly evolving field that interfaces computer and biological sciences
You might make more that you would mowing lawns!
Resources
GenMAPP-GSoC-flyer.pdf - This year's flyer
Communication
Email: apico[AT]gladstone.ucsf.edu - contact me to find out more about a project or your potential mentor(s).
Discussion mailing lists:
cytoscape
wikipathways - ask about our projects; join the community! Live voice chat on Skype: picoa1 - skype me to discuss projects and ideas.
Live virtual socials in
Second Life! Start your own
blog! GSoC Planet
Blogs
For Students
For Mentors
Last Year's Page: GSoC 2007
Overview of Ideas
As we are prototyping new features and functions for GenMAPP, Cytoscape, WikiPathways and PathVisio we are exploring a number of areas ideal for Google Summer of Code students. These projects include a broad set of skills, technologies and domains, such as Java GUIs, database integration, algorithms and wikis. Of course, you are also encouraged to propose your own ideas related to our projects. If you have solid CS skills and have an interest in the biological domain (do you think genes are cool?), then you should apply!
IDEA 0: Original Idea (student applicants = 8)
Feel free to propose your own idea. As long as it relates to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers, but make sure your proposal is also relevant.
IDEA 1: Defining cellular location in PathVisio/WikiPathways (student applicants = 2)
In the current WikiPathways pathways, cellular locations are usually illustrated as a rectangle or ellipse that define the boundaries of the location, in combination with a label that defines the name of the location (see 'mitochondrion' in
http://www.wikipathways.org/index.php/Pathway:Homo_sapiens:Apoptosis for an example). Visually, it is perfectly clear that the genes within that boundary are located in the corresponding cellular location. However, computationally, it's hard to derive this, unless you would stored the location for each of the genes that are within the boundaries. We can store this information as an attribute in GPML, but you wouldn't want to set it for each gene individually. It would be cool to do this automatically, by providing a cellular location drawing tool. A way this could work for the user: you draw a rectangle by dragging your mouse, all genes within that rectangle will highlight, you release the mouse button and a dialog pops up where you have to choose the cellular location. The end result would look the same as the current shape/label approach, but now the cellular-location is automatically stored as GPML attribute for all including genes. An extra could be that you can choose the cellular locations from an existing ontology, like Gene Ontology and that you could easily change the location's boundaries to include or exclude genes.
Language and Skills: Java applets, XML
Idea by: ThomasKelder Potential Mentors: ThomasKelder, AlexanderPico, BruceConklin, sign up here
IDEA 2: User right management for WikiPathways (student applicants = 1)
Like Wikipedia, everything on WikiPathways is currently publicly available to and editable by everybody. This is completely in line with the open philosophy of wikis, but not ideal in some situations. For example, imagine you're a scientist using pathways to structure your knowledge, hypotheses and preliminary results during research. You wouldn't want most of this information to be public before it's published in a scientific journal, because somebody else could run away with your findings. However, you will discuss your findings with lab members or close collaborators to get feedback and exchange ideas. WikiPathways would be ideal in this setting, since all this would be as simple as sending the url of the pathway around and everybody can immediately make modifications or add comments. When you're ready to submit your paper, and you've created a nice pathway summarizing your findings, you would like to have the journal's editors review the pathway as well. If only you could specify who is allowed to see the pathway and who isn't....
To support this, we need to extend WikiPathways with the functionality to make pathways private and specify the users that are allowed to see it. A good real-life example is Google Documents where you can make documents public or restrict access to whomever you choose. Introducing support for user groups could make it more easy to share a pathway with the whole lab with a single mouse click.
Language and Skills: PHP, WikiMedia
Idea by: ThomasKelder Potential Mentors: ThomasKelder, AlexanderPico, BruceConklin, sign up here
IDEA 3: Advanced network merge for Cytoscape (student applicants = 4)
You can combine data from different networks in Cytoscape using the merge feature. This feature allows you to find the union, intersection and difference of two networks based on node identifiers. The limitation of comparing by identifiers makes it hard to merge networks from different sources, since the identifiers are often different, while the nodes have the same biological meaning. For example, when you combine a pathway from PathwayCommons with a pathway from WikiPathways, you would like to have the same proteins (e.g. identified by the uniprot attribute) end up in the same nodes, regardless of the node identifier. This may lead to conflicts, e.g. when you merge two nodes with different attribute values. So the challenge would be to merge attributes where possible and let the user decide whenever there conflicts occur.
Language and Skills: Java, Identifier mapping
Idea by: ThomasKelder Potential Mentors: ThomasKelder, GaryBader, sign up here
IDEA 4: Curation levels and the curators dashboard for WikiPathways (student applicants = 1)
Currently all pathways in wikipathways are created equal. It would be useful to mark certain revisions of wikipathways as curated, meaning that they were reviewed by an independent curator (possibly even more than one). Curation is important for the success of wikipathways, so we don't want to make it a drag. To make curation hassle-free, we propose to create a dashboard page where curators can easily check and manage the flow of additions. For example, curators could view a list of recent changes for a selected set of pathways of interest. Each modification would be presented with enough information that the curator can quickly decide on the right course of action. For each modification, a curator could
revert the modification
mark the new version "approved"
contact the author to ask a question
make additional modifications
copy the modification on to similar pathways that would also benefit
from the change (e.g. a similar pathway in a related species, while translating gene identifiers using an gene orthologue database) Especially the last item on the list is interesting. Wikipathways often has several very similar pathways for related species (e.g. mouse, rat and human). Keeping these pathways in sync poses an interesting challenge. Copying modifications from one pathway to another builds on the gpmldiff / gpmlpatch tools that were created for last years GSOC.
Language and Skills: Java applets, PHP, WikiMedia
Idea by: MartijnVanIersel Potential Mentors: MartijnVanIersel, AlexanderPico, ThomasKelder, BruceConklin, sign up here
IDEA 5: WikiPathways webservice toolkit (student applicants = 0)
Wikipathways boasts a webservice for scripted access to the data on the site. Right now this webservice has functions for downloading and uploading pathways, but so many more things are possible. We invite you to help develop the wikipathways webservice further. We want to add functions to access components (genes, proteins, metabolites) of pathways, not just pathways as a whole. We want to add functions to query on metadata of pathways, such as categories, tags, comments, references and authors. And all of this within efficiency and security constraints, of course.
At the same time, we already can make use of the webservice to answer interesting questions. For example, it would be great to have a toolkit of scripts to ask interesting questions about the current state of wikipathways such as
For a certain species, what percentage of genes listed in ensembl is currently covered in a pathway?
Which gene ontology category is under-represented in the wikipathways gene collection, possibly pointing to a gap in the pathway coverage?
Which pathways fail to meet current curation standards (such as lack of references, datanodes of type "unknown", etc.)?
Which pathways contain a given set of genes or this interaction?
What interactions exist for a given gene in any pathway?
Download all pathways that meet given filters?
Language and Skills: PHP, WikiMedia, API
Idea by: MartijnVanIersel Potential Mentors: MartijnVanIersel, AlexanderPico, ThomasKelder, BruceConklin, sign up here
IDEA 6: Flexible Node Shape and Edge Type Interface to Cytoscape (student applicants = 2)
Cytoscape can currently display a limited number of shapes and edge strokes, however there is a need for allowing end users to specify their own custom shapes and strokes. Among the challenges related to this project:
Maintaining our high performance rendering speed.
Developing a clean programmers API.
Integrating the custom shapes and strokes into the VizMapper.
This project would require comfort with low-level Java graphics.
Language and Skills: Java
Idea By: MikeSmoot Potential Mentors: MikeSmoot, sign up here
IDEA 7: Custom Graphics for Cytoscape (student applicants = 2)
Cytoscape needs the ability to annotate the existing network graphics with arbitrary custom graphics to further enhance the visualizations we produce. Among the challenges related to this project:
Creating a data model that integrates with the existing network and attribute models.
Creating IO tools to get the images/data into Cytoscape.
Possibly creating a library of pre-defined shapes.
Integrating the custom graphics into the VizMapper.
This project would require comfort with low-level Java graphics.
Language and Skills: Java, GUIs
Idea By: MikeSmoot Potential Mentors: MikeSmoot, sign up here
IDEA 8: Support undirected networks in Cytoscape (student applicants = 2)
While metabolic pathways, interacting proteins and other biological schemes can be represented as directed networks, some biological networks, e.g. coexpression networks, are undirected by nature. Currently, Cytoscape supports only directed networks. Allow support for undirected networks in Cytoscape:
Enable import of undirected network files.
Handle VisMapper representation for undirected edges endpoints.
Provide API to create undirected edges.
Add possibility to create undirected edges in the editor.
Make sure that core plugins work as expected, e.g. the merge plugin.
This idea is based on the RFC
Making undirected edges first-class citizens by Daniel Abel
Classification: Medium, if supporting single edge type in a network - directed or undirected. Difficult, if supporting mixed edges in a network – directed and undirected.
Language and Skills: Java
Idea By: MaitalAshkenazi Potential Mentors: MaitalAshkenazi , MikeSmoot, sign up here
IDEA 9: Generate random networks in Cytoscape (student applicants = 9)
Random network models are used to better understand real-world biological networks. To realize if a phenomenon found in a biological network carries information or is random, we can compare it with the random network model. There are several algorithms for generating random networks. Create a plugin that generates random network based on various models:
Random networks with n nodes (Erdos-Renyi model)
Scale-free random network (Barabasi and Albert model)
Randomize existing network, preserving the degree of connectivity in the original network (different handling for directed and undirected networks)
Shuffle edge weights in a given network
Language and Skills: Java, basic graph theory concepts
Idea By: MaitalAshkenazi Potential Mentors: MaitalAshkenazi , MikeSmoot, sign up here
IDEA 10: Cytoscape Network Look and Feel (student applicants = 1)
Enhancements to graphic design and interaction design for Cytoscape networks that would provide a more modern look and feel. This can include but are not limited to:
gradient fill for nodes.
beveled borders and other enhanced node border graphics.
bundling of related edges.
enhanced mouse-over behaviors: e.g. show incident edges when hovering over a node, show connected nodes when hovering over an edge.
enhanced tooltip capability, e.g. via excentric labels
This project would require comfort with low-level Java graphics.
Language and Skills: Java, GUIs
Idea By: AllanKuchinsky Potential Mentors: BruceConklin, KristinaHanspers, sign up here
IDEA 11: Alternative Views and Navigation Aids for Large Cytoscape Networks (student applicants = 2)
Traditional node/link diagrams are not well suited for visualizing large networks, particularly those with many edges. This project would explore alternative network views and navigation aids for Cytoscape networks, which could include but are not limited to:
adjacency matrix view, possibily utilizing InfoViz toolkit.
focus+context techniques, such as magnifying lenses.
Note: all views are fully linked. Changes in one view are immediately propagated.
This project would require comfort with low-level Java graphics.
Language and Skills: Java, GUIs
Idea By: AllanKuchinsky Potential Mentors: ScooterMorris, KristinaHanspers, sign up here
IDEA 12: Enhanced Cytoscape editing (student applicants = 0)
Cytoscape currently has a rudimentary editor that implements a drag/drop mechanism and a palette configurable via the Vizmapper. Current variations for nodes and edges are limited to fill color and shape of node and target arrow of edge. This project would extend the editor into a more professional diagram editor. Improvements would include but are not limited to:
full support in palette for all visual properties for nodes and edges
grids and gridlines
'sticky' editor commands, repeat last action
layers for markup and background figures
configurable linkage to underlying gene/protein databases, i.e. drop node on palette and populate it with an entry from gene/protein database
user-definable 'templates'
This project would require comfort with low-level Java graphics.
Language and Skills: Java, GUIs
Idea By: AllanKuchinsky Potential Mentors: KristinaHanspers, sign up here
IDEA 13: Interactive Graph Layout in Cytoscape (student applicants = 1)
We have built a Cytoscape plugin for interactive graph layout (BubbleRouter), wherein the user draws a region on the canvas, assigns an attribute value to that region, and the region is populated with nodes in the current network that have that value for that attribute. Nodes are laid out in the region and edges running between regions are arranged in a manner so as to minimize edge crossings between regions. An example attribute is sub-cellular compartment. This project would extend the BubbleRouter functionality. Example extensions could include but are not limited to:
background graphics for regions, e.g. images of organelles
usage of different node layout algorithms within regions
global minimization of edge crossing
scripted automation for laying out multiple regions
saving/restoring region configurations as 'templates'
usage of alternative attributes beyond sub-cellular localization
This project would require comfort with low-level Java graphics and bioinformatics databases.
Language and Skills: Java, GUIs
Idea By: AllanKuchinsky Potential Mentors: ScooterMorris, AlexanderPico, BruceConklin, KristinaHanspers, sign up here
IDEA 14: Cytoscape Submolecular and Multipoint Data Visualization (student applicants = 0)
We have built a Cytoscape plugin for visualization of sub-molecular data and multi-point data(SubGeneViewer). This is a generalized viewer which can be specialized to support visualization of diverse data types. We have built two viewers, one each for (1) splice variation and (2) chromatographic fractionation data. This project would build additional visualizations for additional data types, for example protein domain data.
This project would require comfort with low-level Java graphics and bioinformatics data bases.
Language and Skills: Java, GUIs
Idea By: AllanKuchinsky Potential Mentors: AlexanderPico, KristinaHanspers, sign up here
IDEA 15: Cytoscape Pathway Analysis and Extension (student applicants = 0)
Cytoscape has capabilities for generating new networks by extracting interactions from scientific text and by utilizing web service interfaces to biological pathway resources and biological interactions databases. These capabilities can be built upon to provide gene-list based pathway analysis and to extend networks. Example extensions could include but are not limited to:
input a list of genes and return a list of pathways in which the gene list is highly represented
select a set of nodes in a network and extend to neighbors via configurable lookup of interactions in a set of databases
select a pair of nodes and extend the network by filling in the shortest path of additional nodes that connect them.
extend the network with upstream transcription factors for selected nodes
extend the network with downstream promotors for selected transcription factor nodes
Note: this is very related to idea #5 for wikipathways. Should we do some consolidation here?
Language and Skills: Java, GUIs
Idea By: AllanKuchinsky Potential Mentors: AlexanderPico, BruceConklin, KristinaHanspers, sign up here
IDEA 16: Web Front End for Cytoscape (student applicants = 4)
Cytoscape offers a lot of functionality for a desktop user, but more and more scientific work and collaboration is done via the web. Many popular applications also offer both a web and desktop version. Biologists find the web browser user interface an intuitive way to get things done. Since they are already familiar with the web user interface, there is less of a learning curve that the user needs to climb before being able to use the application. Additionally, Cytoscape users often collaborate with and need to share their work with colleagues who may not use Cytoscape users. In order to allow Cytoscape users to use the web for their work, a basic web front end can be added to Cytoscape to provide simple access to a constrained set of features, such as:
import a network by retrieving it from a server, given a URL
perform some simple graph layout algorithm to generate node coordinate positions
display the network -- read only, no editing needed. We also assume relatively small networks initially.
lookup detailed information about nodes of interest -- via linkout or equivalent technology.
import attributes into the browser client interest and map the attributes to nodes in the network.
overlay data using a fixed visual style with fixed mappings and a simple spreadsheet-like interface. Selecting anywhere in a column of the spreadsheet causes the nodes in the network to light up according to their data values in the selected column.
Note: to minimize dependency upon the evolving Cytoscape core model, this project will focus upon web browser-side functionality.
Language and Skills: Java, AJAX, GWT, Flash
Idea By: AllanKuchinsky Potential Mentors: GaryBader, KristinaHanspers, sign up here
IDEA 17: Group Node Viewer (student applicants = 2)
Cytoscape does a great job of visualizing normal binary networks, however a lot of data is organized hierarchically, in a nested fashion. Cytoscape has an API and data model that allows nodes to be grouped in such a nested fashion, but currently has limited ability to view these groups. This project will be to implement new visualizations of groups of nodes, as described on the Cytoscape wiki at the
Group Views page. Additional design and implementation work is needed to continue this project, which is already reasonably developed.
Language and Skills: Java
Idea By: GaryBader Potential Mentors: GaryBader, ScooterMorris, sign up here
IDEA 18: Consolidated Network Analysis Plugin for Cytoscape (student applicants = 4)
Cytoscape does not have basic network analysis tools in the core. Some other applications, such as
pajek,
JUNG, or
igraph have built-in network analysis tools like:
Basic statistics of the graph (degree distribution, etc.)
Generate graph based on models (random/BA/Small-World)
Find cliques
Find Shortest path
K-core decomposition
Some of the functions are already implemented as plugins. However, there is no unified user interface for them. It is very useful for users to provide common interface (GUI) to access those functions. This project includes adding interface to existing plugins and porting/wrapping analysis tools from other open-source projects.
Language and Skills: Java
Idea By: KeiOno Potential Mentors: MikeSmoot, KeiOno, MaitalAshkenazi, sign up here
IDEA 19: Improved Network Layouts (student applicants = 3)
One of Cytoscape's key features is the visualization of biologically relevant data that can be expressed as node-link diagrams. A key aspect of that visualization is the layout algorithm that is used to place the nodes and edges on the graph. Cytoscape has a number of layout algorithms, but most of them are not adequate for large or complex networks. New graph layout algorithms that provide esthetic layouts and are fast enough for large networks would be extremely useful for the biological community.
Language and Skills: Java, strong background in Graph Theory
Idea By: ScooterMorris Potential Mentors: MikeSmoot, sign up here
IDEA 20: Mapping Multiple Attributes to a Visual Style (student applicants = 2)
Currently in Cytoscape, single attributes or columns of values are mapped to particular visual properties. For example, if you have a table of gene IDs with gene expression data as fold changes and p-values, you might map the fold change to the fill color of each node representing a gene and map the p-value to the size of the node. Ideally, you could use boolean logic to take into consideration both fold change and p-value (e.g., fold change > 5 AND p-value < 0.01) and map the result to a single visual property, e.g., fill color. You might also have dozens of columns of data representing a time course or multiple conditions. Ideally, you could view all of these values simultaneously mapped to stripes of fill color, for example. The idea behind this project proposal is to develop a series of attribute mapping tools that work on top of the current methods to support the following needs:
User-defined boolean logic to map combinations of attributes to a single visual property
Specialized paint methods to represent the multiple attributes simultaneously, e.g.:
Striped fill color (horizontal or vertical)
Rim and core coloring (Within the fill area. "Rim" != border. E.g., bullseye on circular nodes)
Language and Skills: Java, GUIs
Idea By: AlexanderPico Potential Mentors: AlexanderPico, KristinaHanspers, sign up here
IDEA 21: Automatic (Smart) Node and Edge Label Layout (student applicants = 2)
Cytoscape currently has a large number of layout algorithms that place nodes according to various criteria. However, a common problem with all of these layout algorithms is that node and edge labels are not accounted for in the aesthetic criteria of the algorithm. This means that labels frequently end up in awkward locations which are hard to read, overlap or obscure other labels, and otherwise don't look quite right. As a consequence, users are frequently forced to adjust the position of labels s, which is a time consuming and tedious process. To fix this, we propose developing a layout algorithm for labels. Perhaps this algorithm could be integrated with a normal layout algorithm, or perhaps it could be a subsequent step that lays the labels out in an intelligent fashion once the nodes have been placed.
Language and Skill: Java, graph layout algorithms
Idea By: MikeSmoot Potential Mentors: MikeSmoot, sign up here
IDEA 22: Wiimote for Cytoscape (student applicants = 1)
MolViz (
http://molviz.cs.toronto.edu/molviz/ ) is a project to create innovative new interfaces for molecular visualization programs. For Summer of Code '08, the project aims to mentor a student for integrating Nintendo Wiimote control and gestures into the Cytoscape (
http://www.cytoscape.org/ ) program. Simple interaction can be achieved by using the Wiimote as a mouse controller, but more advanced forms of Wiimote control over the Cytoscape interface can be realized throughout the summer. The student would be responsible for bringing interesting ideas to the table, and incorporating feedback from the Cytoscape community into their project.
Language and Skills: Python, PyMol
Idea By: ChristianMuise Potential Mentors: ChristianMuise, sign up here
GenMAPP Wiki