The terms describing the data stored in the network of AIRR-seq repositories federated by the iReceptor Gateway are driven by the recommendations of the AIRR Community: the AIRR Minimal Standards (MiAIRR) for study metadata and the AIRR Rearrangement schema representing annotated rearrangements. For more information, visit the AIRR Community documentation page.
The definitions and relevant examples for each field are given below:
Example | ||
---|---|---|
Repertoire Description | Generic repertoire description | |
Repertoire ID | Identifier for the repertoire object. This identifier should be globally unique so that repertoires from multiple studies can be combined together without conflict. The repertoire_id is used to link other AIRR data to a Repertoire. Specifically, the Rearrangements Schema includes repertoire_id for referencing the specific Repertoire for that Rearrangement. | |
Repertoire Name | Short generic display name for the repertoire |
Example | ||
---|---|---|
ADC Publish Date | Date the study was first published in the AIRR Data Commons. | 44229 |
ADC Update Date | Date the study data was updated in the AIRR Data Commons. | 44229 |
Contact (collection) | Full contact information of the contact persons for this study This should include an e-mail address and a persistent identifier such as an ORCID ID. | Dr. P. Stibbons, p.stibbons@unseenu.edu, https://orcid.org/0000-0002-1825-0097 |
Contact (collection) | Full contact information of the data collector, i.e. the person who is legally responsible for data collection and release. This should include an e-mail address and a persistent identifier such as an ORCID ID. | Dr. P. Stibbons, p.stibbons@unseenu.edu, https://orcid.org/0000-0002-1825-0097 |
Contact (deposition) | Full contact information of the data depositor, i.e., the person submitting the data to a repository. This should include an e-mail address and a persistent identifier such as an ORCID ID. This is supposed to be a short-lived and technical role until the submission is relased. | Adrian Turnipseed, a.turnipseed@unseenu.edu, https://orcid.org/0000-0002-1825-0097 |
Funding | Funding agencies and grant numbers | NIH, award number R01GM987654 |
Inclusion criteria | List of criteria for inclusion/exclusion for the study | Include: Clinical P. falciparum infection; Exclude: Seropositive for HIV |
Lab address | Institution and institutional address of data collector | School of Medicine, Unseen University, Ankh-Morpork, Disk World |
Lab name | Department of data collector | Department for Planar Immunology |
Publications | Publications describing the rationale and/or outcome of the study. Where ever possible, a persistent identifier should be used such as a DOI or a Pubmed ID | PMID:85642 |
Study Description | Generic study description | Longer description |
Study ID | Unique ID assigned by study registry such as one of the International Nucleotide Sequence Database Collaboration (INSDC) repositories. | PRJNA001 |
Study keywords | Keywords describing properties of one or more data sets in a study | ['contains_ig', 'contains_schema_rearrangement', 'contains_schema_clone', 'contains_schema_cell'] |
Study title | Descriptive study title | Effects of sun light exposure of the Treg repertoire |
Study type | Type of study design | NCIT:C15197, Case-Control Study |
Study type (Ontology ID) | Type of study design (Ontology ID) | NCIT:C15197, Case-Control Study |
Example | ||
---|---|---|
Age (deprecated) | ||
Age event | Event in the study schedule to which `Age` refers. For NCBI BioSample this MUST be `sampling`. For other implementations submitters need to be aware that there is currently no mechanism to encode to potential delta between `Age event` and `Sample collection time`, hence the chosen events should be in temporal proximity. | enrollment |
Age maximum | Upper boundary of age range or equal to age_min for specific age. This field should only be null if age_min is null. | 80 |
Age minimum | Specific age or lower boundary of age range. | 60 |
Age unit | Unit of age range | UO:0000036, year |
Age unit (Ontology ID) | Unit of age range (Ontology ID) | UO:0000036, year |
Ancestry | Broad geographic origin of ancestry (continent) | list of continents, mixed or unknown |
Ethnicity | Ethnic group of subject (defined as cultural/language-based membership) | English, Kurds, Manchu, Yakuts (and other fields from Wikipedia) |
Organism | Binomial designation of subject's species | NCBITAXON:9606, Homo sapiens |
Organism (deprecated) | Binomial designation of subject's species | |
Organism (deprecated) | Binomial designation of subject's species (Ontology ID) | |
Organism (Ontology ID) | Binomial designation of subject's species (Ontology ID) | NCBITAXON:9606, Homo sapiens |
Race | Racial group of subject (as defined by NIH) | White, American Indian or Alaska Native, Black, Asian, Native Hawaiian or Other Pacific Islander, Other |
Relation (subjects) | Subject ID to which `Relation type` refers | SUB1355648 |
Relation type | Relation between subject and `linked_subjects`, can be genetic or environmental (e.g.exposure) | father, daughter, household |
Sex | Biological sex of subject | female |
Strain name | Non-human designation of the strain or breed of animal used | C57BL/6J |
Subject ID | Subject ID assigned by submitter, unique within study. If possible, a persistent subject ID linked to an INSDC or similar repository study should be used. | SUB856413 |
Synthetic library | TRUE for libraries in which the diversity has been synthetically generated (e.g. phage display) |
Example | ||
---|---|---|
Diagnosis | Diagnosis of subject | DOID:9538, multiple myeloma |
Diagnosis (Ontology ID) | Diagnosis of subject (Ontology ID) | DOID:9538, multiple myeloma |
Disease stage | Stage of disease at current intervention | Stage II |
Immunogen/agent | Antigen, vaccine or drug applied to subject at this intervention | bortezomib |
Intervention | Description of intervention | systemic chemotherapy, 6 cycles, 1.25 mg/m2 |
Length of disease | Time duration between initial diagnosis and current intervention | 23 months |
Medical history | Medical history of subject that is relevant to assess the course of disease and/or treatment | MGUS, first diagnosed 5 years prior |
Prior therapies | List of all relevant previous therapies applied to subject for treatment of `Diagnosis` | melphalan/prednisone |
Study group | Designation of study arm to which the subject is assigned to | control |
Example | ||
---|---|---|
Analysis ID | Identifier for machine-readable PROV model of analysis provenance | |
Collapsing method | The method used for combining multiple sequences from (4) into a single sequence in (5) | MUSCLE 3.8.31 |
Data processing ID | Identifier for the data processing object. | |
Data processing protocols | General description of how QC is performed | Data was processed using [...] |
Germline ID | Unique identifier of the germline set and version, in standardized form (Repo:Label:Version) | OGRDB:Human_IGH:2021.11 |
Paired read assembly | How paired end reads were assembled into a single receptor sequence | PandaSeq (minimal overlap 50, threshold 0.8) |
Primary annotation | If true, indicates this is the primary or default data processing for the repertoire and its rearrangements. If false, indicates this is a secondary or additional data processing. | |
Primer match cutoffs | How primers were identified in the sequences, were they removed/masked/etc? | Hamming distance <= 2 |
Processed files | Array of file names for data produced by this data processing. | ['ERR1278153_aa.txz', 'ERR1278153_ab.txz', 'ERR1278153_ac.txz'] |
Quality thresholds | How sequences were removed from (4) based on base quality scores | Average Phred score >=20 |
Software tools/versions | Version number and / or date, include company pipelines | IgBLAST 1.6 |
V(D)J germline database | Source of germline V(D)J genes with version number or date accessed. | ENSEMBL, Homo sapiens build 90, 2017-10-01 |
Example | ||
---|---|---|
# cells/experiment | Total number of cells that went into the experiment | 1000000 |
# cells/sequencing reaction | Number of cells for each biological replicate | 50000 |
Anatomic site | The anatomic location of the tissue, e.g. Inguinal, femur | Iliac crest |
Batch number | ID of sequencing run assigned by the sequencing facility | 160101_M01234 |
Biomaterial provider | Name and address of the entity providing the sample | Tissues-R-Us, Tampa, FL, USA |
Cell isolation procedure | Description of the procedure used for marker-based isolation or enrich cells | Cells were stained with fluorochrome labeled antibodies and then sorted on a FlowMerlin (CE) cytometer. |
Cell quality | Relative amount of viable cells after preparation and (if applicable) thawing | 90% viability as determined by 7-AAD |
Cell species | Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema. | NCBITAXON:9606, Homo sapiens |
Cell species (Ontology ID) | Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema. (Ontology ID) | NCBITAXON:9606, Homo sapiens |
Cell storage | TRUE if cells were cryo-preserved between isolation and further processing | TRUE |
Cell subset | Commonly-used designation of isolated cell population | CL:0000972, class switched memory B cell |
Cell subset (Ontology ID) | Commonly-used designation of isolated cell population (Ontology ID) | CL:0000972, class switched memory B cell |
Cell subset phenotype | List of cellular markers and their expression levels used to isolate the cell population | CD19+ CD38+ CD27+ IgM- IgD- |
Collection event | Unit of Sample collection time | UO:0000033, day |
Collection event | Unit of Sample collection time (Ontology ID) | UO:0000033, day |
Collection event | Event in the study schedule to which `Sample collection time` relates to | Primary vaccination |
Collection time | Time point at which sample was taken, relative to `Collection time event` | 14 |
Complete sequences | To be considered `complete`, the procedure used for library construction MUST generate sequences that 1) include the first V gene codon that encodes the mature polypeptide chain (i.e. after the leader sequence) and 2) include the last complete codon of the J gene (i.e. 1 bp 5' of the J->C splice site) and 3) provide sequence information for all positions between 1) and 2). To be considered `complete & untemplated`, the sections of the sequences defined in points 1) to 3) of the previous sentence MUST be untemplated, i.e. MUST NOT overlap with the primers used in library preparation. `mixed` should only be used if the procedure used for library construction will likely produce multiple categories of sequences in the given experiment. It SHOULD NOT be used as a replacement of a NULL value. | partial |
Date of sequencing run | Date of sequencing run | 42720 |
Disease state | Histopathologic evaluation of the sample | Tumor infiltration |
Library generation method | Generic type of library generation | RT(oligo-dT)+TS(UMI)+PCR |
Library generation protocol | Description of processes applied to substrate to obtain a library that is ready for sequencing | cDNA was generated using |
Linkage of loci | In case an experimental setup is used that physically links nucleic acids derived from distinct `Rearrangements` before library preparation, this field describes the mode of that linkage. All `hetero_*` terms indicate that in case of paired-read sequencing, the two reads should be expected to map to distinct IG/TR loci. `*_head-head` refers to techniques that link the 5' ends of transcripts in a single-cell context. `*_tail-head` refers to techniques that link the 3' end of one transcript to the 5' end of another one in a single-cell context. This term does not provide any information whether a continuous reading-frame between the two is generated. `*_prelinked` refers to constructs in which the linkage was already present on the DNA level (e.g. scFv). | hetero_head-head |
Processing protocol | Description of the methods applied to the sample including cell preparation/ isolation/enrichment and nucleic acid extraction. This should closely mirror the Materials and methods section in the manuscript. | Stimulated wih anti-CD3/anti-CD28 |
Protocol IDs | When using a library generation protocol from a commercial provider, provide the protocol version number | v2.1 (2016-09-15) |
Reads passing QC | Number of usable reads for analysis | 10365118 |
Sample ID | Sample ID assigned by submitter, unique within study. If possible, a persistent sample ID linked to INSDC or similar repository study should be used. | SUP52415 |
Sample Processing ID | Identifier for the sample processing object. This field should be unique within the repertoire. This field can be used to uniquely identify the combination of sample, cell processing, nucleic acid processing and sequencing run information for the repertoire. | |
Sample type | The way the sample was obtained, e.g. fine-needle aspirate, organ harvest, peripheral venous puncture | Biopsy |
Sequencing facility | Name and address of sequencing facility | Seqs-R-Us, Vancouver, BC, Canada |
Sequencing kit | Name, manufacturer, order and lot numbers of sequencing kit | FullSeq 600, Alumina, #M123456C0, 789G1HK |
Sequencing platform | Designation of sequencing instrument used | Alumina LoSeq 1000 |
Single-cell sort | TRUE if single cells were isolated into separate compartments | |
Target substrate | The class of nucleic acid that was used as primary starting material for the following procedures | RNA |
Target substrate quality | Description and results of the quality control performed on the template material | RIN 9.2 |
Template amount | Amount of template that went into the process | 1000 |
Template amount time unit | Unit of template amount | UO:0000024, nanogram |
Template amount time unit (Ontology ID) | Unit of template amount (Ontology ID) | UO:0000024, nanogram |
Tissue | The actual tissue sampled, e.g. lymph node, liver, peripheral blood | UBERON:0002371, bone marrow |
Tissue (Ontology ID) | The actual tissue sampled, e.g. lymph node, liver, peripheral blood (Ontology ID) | UBERON:0002371, bone marrow |
Tissue processing | Enzymatic digestion and/or physical methods used to isolate cells from sample | Collagenase A/Dnase I digested, followed by Percoll gradient |
Example | ||
---|---|---|
File name for the index file | MS10R-NMonson-C7JR9_S1_R3_001.fastq | |
Read length in bases for the index file | 8 | |
Forward read length | Read length in bases for the first file in paired-read sequencing | 300 |
Paired read direction | Read direction for the second file in paired-read sequencing | reverse |
Paired read length | Read length in bases for the second file in paired-read sequencing | 300 |
Paired sequencing file name | File name for the second file in paired-read sequencing | MS10R-NMonson-C7JR9_S1_R2_001.fastq |
Raw Data PID | Persistent identifier of raw data stored in an archive (e.g. INSDC run ID). Data archive should be identified in the CURIE prefix. | SRA:SRR11610494 |
Read direction | Read direction for the raw reads or sequences. The first file in paired-read sequencing. | forward |
Sequencing file name | File name for the raw reads or sequences. The first file in paired-read sequencing. | MS10R-NMonson-C7JR9_S1_R1_001.fastq |
Sequencing file type | File format for the raw reads or sequences |
Example | ||
---|---|---|
Forward PCR target | Position of the most distal nucleotide templated by the forward primer or primer mix | IGHV, +23 |
PCR target | Designation of the target locus. Note that this field uses a controlled vocubulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature. | IGK |
Reverse PCR target | Position of the most proximal nucleotide templated by the reverse primer or primer mix | IGHG, +57 |
Example | ||
---|---|---|
Receptor deleted gene germline set | GermlineSet from which it was taken (issuer/name/version) | |
Receptor deleted gene name | The accepted name for this gene, taken from the GermlineSet | |
Receptor deleted gene phasing | Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome | |
Receptor documented allele germline set | GermlineSet from which it was taken, referenced in standardized form (Repo:Label:Version) | OGRDB:Human_IGH:2021.11 |
Receptor documented allele name | The accepted name for this allele, taken from the GermlineSet | |
Receptor documented allele phasing | Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome | |
Receptor genotype ID | A unique identifier within the file for this Receptor Genotype, typically generated by the repository hosting the schema, for example from the underlying ID of the database record | |
Receptor genotype inference process | Information on how the genotype was acquired. Controlled vocabulary. | repertoire_sequencing |
Receptor genotype locus | IGH | |
Receptor undocumented allele name | Allele name as allocated by the inference pipeline | |
Receptor undocumented allele phasing | Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome | |
Receptor undocumented allele sequence | nt sequence of the allele, as provided by the inference pipeline |
Example | ||
---|---|---|
MHC gene symbol | The accepted designation of an allele, usually its gene symbol plus allele/sub-allele/etc identifiers, if provided by the mhc_typing method | |
MHC genotype class | Class of MHC alleles described by the MHCGenotype | MHC-I |
MHC genotype ID | A unique identifier for this MHCGenotype, assumed to be unique in the context of the study | |
MHC genotype inference process | The MHC gene to which the described allele belongs (Ontology ID) | MRO:0000046, HLA-A |
MHC genotype inference process | Repository and list from which it was taken (issuer/name/version) | |
MHC genotype inference process | Information on how the genotype was determined. The content of this field should come from a list of recommended terms provided in the AIRR Schema documentation. | pcr_low_resolution |
MHC germline set | The MHC gene to which the described allele belongs | MRO:0000046, HLA-A |
Example | ||
---|---|---|
Receptor genotype set ID | A unique identifier for this Receptor Genotype Set, typically generated by the repository hosting the schema, for example from the underlying ID of the database record |
Example | ||
---|---|---|
MHC genotype set ID | A unique identifier for this MHCGenotypeSet |
Example | ||
---|---|---|
Cells | ||
Clones | ||
Repository | ||
Sequences |
Example | ||
---|---|---|
C Alignment End | End position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
C Alignment Start | Start position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
C Cigar | CIGAR string for the C gene alignment. | |
C Germline Alignment | Aligned constant region germline sequence spanning the same region as the c_sequence_alignment field and including the same set of corrections and spacers (if any). | |
C Germline Alignment AA | Amino acid translation of the c_germline_aligment field. | |
C Germline End | Alignment end position in the C gene reference sequence (1-based closed interval). | |
C Germline Start | Alignment start position in the C gene reference sequence (1-based closed interval). | |
C IDentity | Fractional identity for the C gene alignment. | |
C Region | Constant region gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHG1*01 if using IMGT/GENE-DB). | IGHG1*01 |
C Score | Alignment score for the C gene alignment. | |
C Sequence Alignment | Aligned portion of query sequence assigned to the constant region, including any indel corrections or numbering spacers. | |
C Sequence Alignment AA | Amino acid translation of the c_sequence_alignment field. | |
C Sequence End | End position of the C gene in the query sequence (1-based closed interval). | |
C Sequence Start | Start position of the C gene in the query sequence (1-based closed interval). | |
C Support | C gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the C gene assignment as defined by the alignment tool. | |
CDR1 | Nucleotide sequence of the aligned CDR1 region. | |
CDR1 AA | Amino acid translation of the cdr1 field. | |
CDR1 End | CDR1 end position in the query sequence (1-based closed interval). | |
CDR1 Start | CDR1 start position in the query sequence (1-based closed interval). | |
CDR2 | Nucleotide sequence of the aligned CDR2 region. | |
CDR2 AA | Amino acid translation of the cdr2 field. | |
CDR2 End | CDR2 end position in the query sequence (1-based closed interval). | |
CDR2 Start | CDR2 start position in the query sequence (1-based closed interval). | |
CDR3 | Nucleotide sequence of the aligned CDR3 region. | |
CDR3 AA | Amino acid translation of the cdr3 field. | |
CDR3 End | CDR3 end position in the query sequence (1-based closed interval). | |
CDR3 Start | CDR3 start position in the query sequence (1-based closed interval). | |
Cell Index | Identifier defining the cell of origin for the query sequence. | W06_046_091 |
Clone ID | Clonal cluster assignment for the query sequence. | |
Complete Vdj | True if the sequence alignment spans the entire V(D)J region. Meaning, sequence_alignment includes both the first V gene codon that encodes the mature polypeptide chain (i.e., after the leader sequence) and the last complete codon of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment. | |
Consensus Count | Number of reads contributing to the UMI consensus or contig assembly for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence. | |
D Alignment End | End position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
D Alignment Start | Start position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
D Cigar | CIGAR string for the first or only D gene alignment. | |
D Frame | Numerical reading frame (1, 2, 3) of the first or only D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence. | |
D Gene With Allele | First or only D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB). | IGHD3-10*01 |
D Germline Alignment | Aligned D gene germline sequence spanning the same region as the d_sequence_alignment field and including the same set of corrections and spacers (if any). | |
D Germline Alignment AA | Amino acid translation of the d_germline_alignment field. | |
D Germline End | Alignment end position in the D gene reference sequence for the first or only D gene (1-based closed interval). | |
D Germline Start | Alignment start position in the D gene reference sequence for the first or only D gene (1-based closed interval). | |
D IDentity | Fractional identity for the first or only D gene alignment. | |
D Score | Alignment score for the first or only D gene alignment. | |
D Sequence Alignment | Aligned portion of query sequence assigned to the first or only D gene, including any indel corrections or numbering spacers. | |
D Sequence Alignment AA | Amino acid translation of the d_sequence_alignment field. | |
D Sequence End | End position of the first or only D gene in the query sequence. (1-based closed interval). | |
D Sequence Start | Start position of the first or only D gene in the query sequence. (1-based closed interval). | |
D Support | D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the first or only D gene as defined by the alignment tool. | |
D2 Alignment End | End position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
D2 Alignment Start | Start position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
D2 Call | Second D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB). | IGHD3-10*01 |
D2 Cigar | CIGAR string for the second D gene alignment. | |
D2 Frame | Numerical reading frame (1, 2, 3) of the second D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence. | |
D2 Germline Alignment | Aligned D gene germline sequence spanning the same region as the d2_sequence_alignment field and including the same set of corrections and spacers (if any). | |
D2 Germline Alignment AA | Amino acid translation of the d2_germline_alignment field. | |
D2 Germline End | Alignment end position in the second D gene reference sequence (1-based closed interval). | |
D2 Germline Start | Alignment start position in the second D gene reference sequence (1-based closed interval). | |
D2 IDentity | Fractional identity for the second D gene alignment. | |
D2 Score | Alignment score for the second D gene alignment. | |
D2 Sequence Alignment | Aligned portion of query sequence assigned to the second D gene, including any indel corrections or numbering spacers. | |
D2 Sequence Alignment AA | Amino acid translation of the d2_sequence_alignment field. | |
D2 Sequence End | End position of the second D gene in the query sequence (1-based closed interval). | |
D2 Sequence Start | Start position of the second D gene in the query sequence (1-based closed interval). | |
D2 Support | D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the second D gene as defined by the alignment tool. | |
Data Processing ID | Identifier to the data processing object in the repertoire metadata for this rearrangement. If this field is empty than the primary data processing object is assumed. | |
FWR1 | Nucleotide sequence of the aligned FWR1 region. | |
FWR1 AA | Amino acid translation of the fwr1 field. | |
FWR1 End | FWR1 end position in the query sequence (1-based closed interval). | |
FWR1 Start | FWR1 start position in the query sequence (1-based closed interval). | |
FWR2 | Nucleotide sequence of the aligned FWR2 region. | |
FWR2 AA | Amino acid translation of the fwr2 field. | |
FWR2 End | FWR2 end position in the query sequence (1-based closed interval). | |
FWR2 Start | FWR2 start position in the query sequence (1-based closed interval). | |
FWR3 | Nucleotide sequence of the aligned FWR3 region. | |
FWR3 AA | Amino acid translation of the fwr3 field. | |
FWR3 End | FWR3 end position in the query sequence (1-based closed interval). | |
FWR3 Start | FWR3 start position in the query sequence (1-based closed interval). | |
FWR4 | Nucleotide sequence of the aligned FWR4 region. | |
FWR4 AA | Amino acid translation of the fwr4 field. | |
FWR4 End | FWR4 end position in the query sequence (1-based closed interval). | |
FWR4 Start | FWR4 start position in the query sequence (1-based closed interval). | |
Gene Locus | Gene locus (chain type). Note that this field uses a controlled vocabulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature. | IGH |
Germline Alignment | Assembled, aligned, full-length inferred germline sequence spanning the same region as the sequence_alignment field (typically the V(D)J region) and including the same set of corrections and spacers (if any). | |
Germline Alignment AA | Amino acid translation of the assembled germline sequence. | |
Germline Database | Source of germline V(D)J genes with version number or date accessed. | ENSEMBL, Homo sapiens build 90, 2017-10-01 |
J Alignment End | End position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
J Alignment Start | Start position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
J Cigar | CIGAR string for the J gene alignment. | |
J Frameshift | True if the J gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the J gene reference sequence. | |
J Gene With Allele | J gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHJ4*02 if using IMGT/GENE-DB). | IGHJ4*02 |
J Germline Alignment | Aligned J gene germline sequence spanning the same region as the j_sequence_alignment field and including the same set of corrections and spacers (if any). | |
J Germline Alignment AA | Amino acid translation of the j_germline_alignment field. | |
J Germline End | Alignment end position in the J gene reference sequence (1-based closed interval). | |
J Germline Start | Alignment start position in the J gene reference sequence (1-based closed interval). | |
J IDentity | Fractional identity for the J gene alignment. | |
J Score | Alignment score for the J gene alignment. | |
J Sequence Alignment | Aligned portion of query sequence assigned to the J gene, including any indel corrections or numbering spacers. | |
J Sequence Alignment AA | Amino acid translation of the j_sequence_alignment field. | |
J Sequence End | End position of the J gene in the query sequence (1-based closed interval). | |
J Sequence Start | Start position of the J gene in the query sequence (1-based closed interval). | |
J Support | J gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the J gene assignment as defined by the alignment tool. | |
Junction Length | Number of nucleotides in the junction sequence. | |
Junction Length (AA) | Number of amino acids in the junction sequence. | |
Junction/CDR3 AA | Amino acid translation of the junction. | CARAGVYDGYTMDYW |
Junction/CDR3 NT | Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons. | TGTGCAAGAGCGGGAGTTTACGACGGATATACTATGGACTACTGG |
N1 Length | Number of untemplated nucleotides 5' of the first or only D gene alignment. | |
N2 Length | Number of untemplated nucleotides 3' of the first or only D gene alignment. | |
N3 Length | Number of untemplated nucleotides 3' of the second D gene alignment. | |
Np1 | Nucleotide sequence of the combined N/P region between the V gene and first D gene alignment or between the V gene and J gene alignments. | |
Np1 AA | Amino acid translation of the np1 field. | |
Np1 Length | Number of nucleotides between the V gene and first D gene alignments or between the V gene and J gene alignments. | |
Np2 | Nucleotide sequence of the combined N/P region between either the first D gene and J gene alignments or the first D gene and second D gene alignments. | |
Np2 AA | Amino acid translation of the np2 field. | |
Np2 Length | Number of nucleotides between either the first D gene and J gene alignments or the first D gene and second D gene alignments. | |
Np3 | Nucleotide sequence of the combined N/P region between the second D gene and J gene alignments. | |
Np3 AA | Amino acid translation of the np3 field. | |
Np3 Length | Number of nucleotides between the second D gene and J gene alignments. | |
P3D Length | Number of palindromic nucleotides 3' of the first or only D gene alignment. | |
P3D2 Length | Number of palindromic nucleotides 3' of the second D gene alignment. | |
P3V Length | Number of palindromic nucleotides 3' of the V gene alignment. | |
P5D Length | Number of palindromic nucleotides 5' of the first or only D gene alignment. | |
P5D2 Length | Number of palindromic nucleotides 5' of the second D gene alignment. | |
P5J Length | Number of palindromic nucleotides 5' of the J gene alignment. | |
Productive | True if the V(D)J sequence is predicted to be productive. | |
Quality | The Sanger/Phred quality scores for assessment of sequence quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.) | |
Quality Alignment | Sanger/Phred quality scores for assessment of sequence_alignment quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.) | |
Read Count | Copy number or number of duplicate observations for the query sequence. For example, the number of identical reads observed for this sequence. | 123 |
Rearrangement ID | Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a universally unique record locator for database applications. | |
Rearrangement Set ID | Identifier for grouping Rearrangement objects. | |
Repertoire ID | Identifier to the associated repertoire in study metadata. | |
Rev Comp | True if the alignment is on the opposite strand (reverse complemented) with respect to the query sequence. If True then all output data, such as alignment coordinates and sequences, are based on the reverse complement of 'sequence'. | |
Sample Processing ID | Identifier to the sample processing object in the repertoire metadata for this rearrangement. If the repertoire has a single sample then this field may be empty or missing. If the repertoire has multiple samples then this field may be empty or missing if the sample cannot be differentiated or the relationship is not maintained by the data processing. | |
Sequence | The query nucleotide sequence. Usually, this is the unmodified input sequence, which may be reverse complemented if necessary. In some cases, this field may contain consensus sequences or other types of collapsed input sequences if these steps are performed prior to alignment. | |
Sequence AA | Amino acid translation of the query nucleotide sequence. | |
Sequence Alignment | Aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps. Typically, this will include only the V(D)J region, but that is not a requirement. | |
Sequence Alignment AA | Amino acid translation of the aligned query sequence. | |
Sequence ID | Unique query sequence identifier for the Rearrangement. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment. When downloaded from an AIRR Data Commons repository, this will usually be a universally unique record locator for linking with other objects in the AIRR Data Model. | |
Stop Codon | True if the aligned sequence contains a stop codon. | |
Umi Count | Number of distinct UMIs represented by this sequence. For example, the total number of UMIs that contribute to the contig assembly for the query sequence. | |
V Alignment End | End position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
V Alignment Start | Start position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval). | |
V Cigar | CIGAR string for the V gene alignment. | |
V Frameshift | True if the V gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the V gene reference sequence. | |
V Gene With Allele | V gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHV4-59*01 if using IMGT/GENE-DB). | IGHV4-59*01 |
V Germline Alignment | Aligned V gene germline sequence spanning the same region as the v_sequence_alignment field and including the same set of corrections and spacers (if any). | |
V Germline Alignment AA | Amino acid translation of the v_germline_alignment field. | |
V Germline End | Alignment end position in the V gene reference sequence (1-based closed interval). | |
V Germline Start | Alignment start position in the V gene reference sequence (1-based closed interval). | |
V IDentity | Fractional identity for the V gene alignment. | |
V Score | Alignment score for the V gene. | |
V Sequence Alignment | Aligned portion of query sequence assigned to the V gene, including any indel corrections or numbering spacers. | |
V Sequence Alignment AA | Amino acid translation of the v_sequence_alignment field. | |
V Sequence End | End position of the V gene in the query sequence (1-based closed interval). | |
V Sequence Start | Start position of the V gene in the query sequence (1-based closed interval). | |
V Support | V gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the V gene assignment as defined by the alignment tool. | |
Vj In Frame | True if the V and J gene alignments are in-frame. |