Logo
Cell Annotation Platform
Sign In / Join

Glossary of Cell Annotation Metadata Terms

Cell Annotation Metadata Overview

Annotating cell types and states – identifying them as separable entities and naming them as either known entities or new ones – is a cornerstone of biological research. The annotation of a cell state is a critical level of abstraction that structures our understanding of biology.

Currently, the annotations of cell types and states within single-cell datasets is rather ad hoc, which normally amounts to a single string associated to cells within standard bioinformatic files like Seurat or AnnData. Such an approach is simply not standardized enough to create reliable large-scale cell atlases within the Human Cell Atlas. Researchers disagree in the definitions of the cell labels used, molecular definitions of the biological entities, and relative precision of terms across datasets. There are also questions of bioinformatic transparency, including where did these cell annotations come from and what they precisely mean.

Here, we would like to promote a standard so that cell annotations could become more transparent for downstream analysis. This is required for publishing cell annotations on CAP. Information on how this cell annotation metadata will be encoded within bioinformatic files can be found in the schema here: Cell Annotation Schema

We will walk through the UI in this documentation:

Entering Cell Annotation Metadata

Cell annotation metadata may be entered on the 'Edit Dataset' page. See the Entering cell annotation metadata documentation for more information about how to enter and edit the cell annotation metadata. Below is an example view of the metadata entry page. Descriptions of how to complete each section follow.

Example cell label with cell term and category displayed

Cell Label

The preferred text used by the author to annotate this cell type. This denotes any free-text term which the author uses to annotate cells, i.e. the preferred cell label name used by the author.

Abbreviations are permitted in this field; authors may annotate their cells using any label they wish for this field. For example, in the 'Cell Label' field , the terms 'LC' or 'luminal cell' would both be acceptable.

Reserved terms

There are special cases whereby we have reserved keywords when annotating cells. Users should select these if applicable.

  • Doublets: The term “doublets” is reserved for encoding cells defined as doublets based on some computational analysis. By “doublets”, we refer to the sequencing artifact within droplet-based protocols whereby two or more cells are tagged with the same barcode.

  • Junk: The term “junk” is reserved for encoding cells that failed sequencing for some reason, e.g. few genes detected, high fraction of mitochondrial read. Researchers have found such a generic term useful.

  • Unknown: The term “unknown” is specifically reserved for cells which the author did not know how to annotate with a biological entity. It is a generic term meaning “I do not know”.

Cell Term

We next require authors to provide the full name of biological entity described in the field “Cell Label”.

Users have two options to proceed:

  • If the entity exists in the Cell Ontology, they may select this term via autosuggest.
  • In the case whereby this term does not exist in the Cell Ontology, users should type out the full name of the cell type or cell state listed in Cell Label. This is particularly relevant for novel cell types. Users will then be prompted to provide the closest Cell Ontology term associated to this entity.

Within the UI, users must accordingly either select 'Existing Term' (if the cell type term specified is currently present in the existing Cell Ontology), or otherwise select 'New Term'.

Cell Term (existing ontology term)

Example of Cell Term (existing ontology term)

Exact ontology term

Search for the appropriate Cell Ontology term via the autosuggestions provided by the Ontology Lookup Service (OLS). Otherwise, users could find the ID associated with this ontology term and copy it within the text box.

Synonyms

Comma-separated list of terms the author considers to be exactly or nearly the same as the value defined in the 'Cell Label' field. For example, for the term 'glial cell' a user could state the following synonyms 'neuroglia, neuroglial cell'.

Users may provide as many synonyms as they wish via the “+” icon.

If the user has selected a term within the Cell Ontology, the synonyms associated with that type within the ontology will be automatically populated within the user text fields. If there are any disagreements with those synonyms provided, the user should deselect those terms with the “-” icon.

In cases where no synonyms exist, please select 'unknown'.

Cell Term (new ontology term)

Example of Cell Term (new ontology term)

Closest ontology term

Search for the closest available Cell Ontology term using the Ontology Lookup Service (OLS) and copy the associated ID. Alternatively, select the auto-suggested ID.

Cell Full name

Full length name for the term used in 'Cell Label'.

Abbreviations not permitted. This must be the full-length name for the biological entity listed in “cell label” by the author.

Synonyms

Comma-separated list of terms the author considers to be exactly or nearly the same as the value defined in the 'Cell Label' field. For example, for the term 'glial cell' a user could state the following synonyms 'neuroglia, neuroglial cell'.

Users may provide as many synonyms as they wish via the “+” icon.

In cases where no synonyms exist, please select 'unknown'.

Category

The category term denotes a biological entity which the author associates as the nearest "class" or "broader term" (or "parent term") for the value/term in the field. Much like the field “Cell Term”, authors may select an existing cell ontology term, or provide their own free-text full length term.

The corresponding parent term or category normally could be the term directly above the specified cell type in the cell ontology hierarchy, which can be found in the Ontology Lookup Service tree view. For example, for the term 'glycinergic amacrine cell' the parent term would be 'amacrine cell'.

Users may provide either an existing Cell Ontology term using the autocomplete functionality via the UI, or by providing an entirely new term if there is no exact match.

Category (existing ontology term)

The corresponding parent term or category is the term directly above the specified cell type in the cell ontology hierarchy, which can be found in the Ontology Lookup Service (OLS) tree view.

For example, for the term 'glycinergic amacrine cell' the parent term would be 'amacrine cell'.

Example of Category (existing ontology term)

Category (new term)

Example of Category (new ontology term)

Closest ontology term

The corresponding parent term or category is the term directly above the specified cell type in the cell ontology hierarchy, which can be found in the Ontology Lookup Service (OLS) tree view. For example, for the term 'glycinergic amacrine cell' the parent term would be 'amacrine cell'.

Full name

The full-length name for the new Category term.

Abbreviations are not permitted.

Evidence for Annotations

Marker Gene Evidence

The list of gene names which are explicitly used as evidence supporting the assignment of this cell annotation. Given this is derived from the data itself, this must be recorded using a comma-separated list of the gene names existing within the bioinformatic file.

Example of Marker Gene fields

Canonical Marker Genes

The list of gene names of “legacy markers” or “known markers” for this entity, i.e. gene names widely recognized as defining this cell type (or cell state) using transcriptomics. This must be recorded using a comma-separated list of the gene names. The meaning of this field differs from “Marker Gene Evidence”, as the former is explicitly referring to some variety of data analysis.

For example, researchers could list “GNLY, NKG7” as canonical markers for “Natural killer (NK) cells”. “IL7R, S100A4” may be listed as canonical markers of “Memory CD4+ cells”.

Rationale

A free-text statement communicating the users rationale for their cell annotation. Justification and/or evidence for this, including citations, is encouraged. Users should explain why they chose this cell annotation, how they derived these cell annotations, and what the cell annotation means (i.e. the identity and function of the cell type or cell state).

Given this is free-text, the explanations/rationales will primarily be read by other researchers. We encourage researchers to provide as much context as possible. Such context is critical for resolving differences between cell annotations across publications and research groups.

For example, a user could provide the following informative rationale:

””” These cells were annotated as “plasmacytoid dendritic cells (pDCs)” upon running differential expression with Seurat v5 using default parameters after standard pre-processing and Leiden clustering. The differentially expressed genes of this cluster lacked key markers used to identify B cells, T cells, NK cells, or monocytes. More relevantly, this cluster expressed ‘GZMB’, ‘IGJ’, ‘IGKC’, and ‘SERPINF1’, which corresponds to the subcluster used to identify pDCs in Villani et al (2017), doi: 10.1126/science.aah4573. ”””

Example of the Rationale fields

Rationale DOI

Comma-separated list of DOIs corresponding to the publications cited as justification for cell annotations in the 'rationale' field. For example, for 'chodl neurons' a user could list the following DOIs: 10.7554/eLife.59928, 10.7554/eLife.59928

Cell Ontology Assessment

Optional free-text field to express any suggestions for improving any aspect of the Cell Ontology concerning this specific cell annotation. Disagreements with any aspect of the Cell Ontology should be noted here for ontology curators to review.

For example, a user could add additional information such as:

””” The CL term 'amacrine cell' (CL:0000561) should have four child terms, glycinergic, GABAergic, GABAergic Glycinergic amacrine cells and non-GABAergic non-glycinergic amacrine cells. Currently, this distinction is not clear. “““

or

””” A synonym listed for this annotation is 'T cell of appendix' (CL:0009031). It’s unclear how this is functionally different from other classes of mature or immature T cells. ”””

or

””” The definition provided by CL for the 'retinal melanocyte' (CL:0002485) does not clearly contrast the distinction cells of the 'retinal pigment epithelium' (RPE). The former should be distinguished by the uveal tract. ”””