Instructions on how to submit a request and view the results

What is FlyCAGE?

FlyCAGE is a web-accessible application that conducts integrative analysis of Drosophila melanogaster mRNA expression data to discover genes with correlated spatiotemporal expression. We would predict that genes showing this correlation may be co-regulated or share similar biological functions. FlyCAGE therefore allows us to identify candidate genes involved in a given process, to discover regulatory interactions in genetic networks, and to verify the quality of gene clustering techniques. The general analysis of FlyCAGE goes as the following: 1) submit an input target gene, 2) conduct Pearsons correlation on every available gene in Drosophila melanogaster, relative to the input target gene, 3) rank each gene based on the correlation coefficient.

How to Submit a Request

We will begin to submit an analysis request.

Input: Select Species

The first input field (required) is "Select Species". Currently, the only option available is "D. melanogaster". In the future, we hope to expand CAGE to other species.

Input: Target Gene Information

The second input field (required) is "Target Gene Information". There are two different forms you may fill out: "Existing Gene Form" and "Custom Gene Form". As the name suggests, the "Existing Gene Form" will require an existing gene identifier. The format of the input can be the full gene name (i.e. "ultrabithorax"), gene synonym (i.e. "ubx"), FlyBase ID (i.e. "FBgn0003944"), or CG annotation symbol (i.e. "CG10388").

The "Custom Gene Form" may be used to input experimental data, or the form may be used to find genes that match an expression pattern of interest. The form is organized by expression stage categories: Embryogenesis, Development, Tissue, Treatment, and Cell Line. Only expression stages that are checked will be included in the custom data object. Each expression stage may have a corresponding value from 0 to inf, which can be changed using the textbox or the slider. However, the slider will only change from 0 to 100 since the correlation analysis is only concerned with relative expression values. In the future, we hope to add a feature to submit custom expression stage data via an uploaded file.

Input: Genes of Interest

The third input field (optional) is "Genes of Interest". In this field, input a list of genes with an interest in seeing how they correlate with the target gene. The format of the input is a comma-delimited list of genes with the same format as the "Target Gene Name" field above (i.e. "grh, FBgn0000577, spalt major, CG1374"). The gene identification type does not need to be synonymous between each gene entry.

Input: Choose gene result count

The fourth input field (required) is "Choose gene result count". In this field, input the amount of correlated genes to display on the results page. By default, the input is set to 100. Larger inputs may slow down page rendering.

Input: Choose Expression Stages

The fifth input field (possibly required) is "Choose Expression Stages". This field will be disabled if the "Custom Gene Form" was used for the "Target Gene Information" since the form already captures the expression stage information. In this field, select the stages that you want to be compared between the target gene and the genes in the selected genome. The expression stages are organized by category: Embryogenesis, Development, Tissue, Treatment, and Cell Line. Clicking the checkbox next to the category text will check/uncheck all expression stages corresponding to that category. To view the specific expression stages, click the desired category text. Clicking the checkbox next to the expression stage will check/uncheck only that expression stage.

Note that the expression stages in each category are not mutually exclusive; there exists some expression stages that belong to multiple categories. For example, unchecking the "Development" category will also uncheck the "Embryogenesis" category since all expression stages from "Embryogenesis" exists in "Development".

Example: If you are only interested in discovering genes with similar expression in development stages, then only have the "Development" category selected.

Once all inputs are properly filled out, click the "Submit" button to retrieve the results.

How to View the Results

The results page will contain three panels: request summary, input form, and results. The request summary panel will contain a description of the results corresponding request. The input form panel will contain the form to create another request. The results panel will contain the results from the user's request.

Results Panel

The results panel will contain two main tables: "Results Table" and "Genes of Interest Table". The "Results Table" will contain all the result genes that are most correlated to the target gene. The "Genes of Interest Table" will contain all the genes of interest from the request. The genes will be ranked based on the correlation coefficient.

Each row will correspond to a single result gene:

Row for gene "Pox neuro"

There are seven columns to describe the result gene: rank, FlyBase ID, CG ID, gene name, r-value, p-value, and actions

  • Rank: the gene's position when ordered based on r-value from greatest to least (descending).
  • FlyBase ID: the gene's database identification key from FlyBase.
  • CG ID: The gene's annotation symbol from CG.
  • Gene Name: the gene's formal name.
  • R-value: The Pearsons correlation coefficient between the target gene and the row's result gene; near 1 indicates a positive correlation, near 0 indicates no correlation, near -1 indicates a negative correlation.
  • P-value: typical value to determine significance of Pearsons correlation coefficient.
  • Actions: set of buttons to conduct some action on the gene. Actions include: "FlyMine". "FlyMine" will open the gene's FlyMine report.

There is also an option to view the mRNA expression data between the target gene and the row's result gene. Click the green icon on the left-side of the row to open a plot with the data

The line graph will contain two trends to mark the expression data for the target gene and the row's result gene. Although the points are connected, it is important to not mistaken the data as a time series. The connected points are meant to better visualize the correlation. In the future, more work will be done to improve the visualization.

The x-axis will contain the expression stage based on their identification number. The y-axis will contain the expression score/count.

Sometimes it may be difficult to compare the trends together since the mRNA expression may be scaled differently. To better visualize the trends' relationship, click "Normalize" to normalize all the data between 0 and 1.

To view the expression stages' identification numbers, click the button "See Stage Mapping" to open a modal with the information.

When hovering over the plot, you will be able to view an overview of the data for the corresponding expression stage such as the gene IDs, resolved expression stages, and expression scores.

There are also other actions you may perform on the plot. On the top right of the plot, there will be a set of buttons to perform specific actions on the plot such as navigation, saving, and editing.