Data elements definitions

The terms describing the data stored in the network of AIRR-seq repositories federated by the iReceptor Gateway are driven by the recommendations of the AIRR Community: the AIRR Minimal Standards (MiAIRR) for study metadata and the AIRR Rearrangement schema representing annotated rearrangements. For more information, visit the AIRR Community documentation page.

The definitions and relevant examples for each field are given below:

Repertoire

Example
Repertoire Description Generic repertoire description
Repertoire ID Identifier for the repertoire object. This identifier should be globally unique so that repertoires from multiple studies can be combined together without conflict. The repertoire_id is used to link other AIRR data to a Repertoire. Specifically, the Rearrangements Schema includes repertoire_id for referencing the specific Repertoire for that Rearrangement.
Repertoire Name Short generic display name for the repertoire

Study

Example
ADC Publish Date Date the study was first published in the AIRR Data Commons. 44229
ADC Update Date Date the study data was updated in the AIRR Data Commons. 44229
Contact (collection) Full contact information of the contact persons for this study This should include an e-mail address and a persistent identifier such as an ORCID ID. Dr. P. Stibbons, p.stibbons@unseenu.edu, https://orcid.org/0000-0002-1825-0097
Contact (collection) Full contact information of the data collector, i.e. the person who is legally responsible for data collection and release. This should include an e-mail address and a persistent identifier such as an ORCID ID. Dr. P. Stibbons, p.stibbons@unseenu.edu, https://orcid.org/0000-0002-1825-0097
Contact (deposition) Full contact information of the data depositor, i.e., the person submitting the data to a repository. This should include an e-mail address and a persistent identifier such as an ORCID ID. This is supposed to be a short-lived and technical role until the submission is relased. Adrian Turnipseed, a.turnipseed@unseenu.edu, https://orcid.org/0000-0002-1825-0097
Funding Funding agencies and grant numbers NIH, award number R01GM987654
Inclusion criteria List of criteria for inclusion/exclusion for the study Include: Clinical P. falciparum infection; Exclude: Seropositive for HIV
Lab address Institution and institutional address of data collector School of Medicine, Unseen University, Ankh-Morpork, Disk World
Lab name Department of data collector Department for Planar Immunology
Publications Publications describing the rationale and/or outcome of the study. Where ever possible, a persistent identifier should be used such as a DOI or a Pubmed ID PMID:85642
Study Description Generic study description Longer description
Study ID Unique ID assigned by study registry such as one of the International Nucleotide Sequence Database Collaboration (INSDC) repositories. PRJNA001
Study keywords Keywords describing properties of one or more data sets in a study ['contains_ig', 'contains_schema_rearrangement', 'contains_schema_clone', 'contains_schema_cell']
Study title Descriptive study title Effects of sun light exposure of the Treg repertoire
Study type Type of study design NCIT:C15197, Case-Control Study
Study type (Ontology ID) Type of study design (Ontology ID) NCIT:C15197, Case-Control Study

Subject

Example
Age (deprecated)
Age event Event in the study schedule to which `Age` refers. For NCBI BioSample this MUST be `sampling`. For other implementations submitters need to be aware that there is currently no mechanism to encode to potential delta between `Age event` and `Sample collection time`, hence the chosen events should be in temporal proximity. enrollment
Age maximum Upper boundary of age range or equal to age_min for specific age. This field should only be null if age_min is null. 80
Age minimum Specific age or lower boundary of age range. 60
Age unit Unit of age range UO:0000036, year
Age unit (Ontology ID) Unit of age range (Ontology ID) UO:0000036, year
Ancestry Broad geographic origin of ancestry (continent) list of continents, mixed or unknown
Ethnicity Ethnic group of subject (defined as cultural/language-based membership) English, Kurds, Manchu, Yakuts (and other fields from Wikipedia)
Organism Binomial designation of subject's species NCBITAXON:9606, Homo sapiens
Organism (deprecated) Binomial designation of subject's species
Organism (deprecated) Binomial designation of subject's species (Ontology ID)
Organism (Ontology ID) Binomial designation of subject's species (Ontology ID) NCBITAXON:9606, Homo sapiens
Race Racial group of subject (as defined by NIH) White, American Indian or Alaska Native, Black, Asian, Native Hawaiian or Other Pacific Islander, Other
Relation (subjects) Subject ID to which `Relation type` refers SUB1355648
Relation type Relation between subject and `linked_subjects`, can be genetic or environmental (e.g.exposure) father, daughter, household
Sex Biological sex of subject female
Strain name Non-human designation of the strain or breed of animal used C57BL/6J
Subject ID Subject ID assigned by submitter, unique within study. If possible, a persistent subject ID linked to an INSDC or similar repository study should be used. SUB856413
Synthetic library TRUE for libraries in which the diversity has been synthetically generated (e.g. phage display)

Diagnosis

Example
Diagnosis Diagnosis of subject DOID:9538, multiple myeloma
Diagnosis (Ontology ID) Diagnosis of subject (Ontology ID) DOID:9538, multiple myeloma
Disease stage Stage of disease at current intervention Stage II
Immunogen/agent Antigen, vaccine or drug applied to subject at this intervention bortezomib
Intervention Description of intervention systemic chemotherapy, 6 cycles, 1.25 mg/m2
Length of disease Time duration between initial diagnosis and current intervention 23 months
Medical history Medical history of subject that is relevant to assess the course of disease and/or treatment MGUS, first diagnosed 5 years prior
Prior therapies List of all relevant previous therapies applied to subject for treatment of `Diagnosis` melphalan/prednisone
Study group Designation of study arm to which the subject is assigned to control

Data Processing

Example
Analysis ID Identifier for machine-readable PROV model of analysis provenance
Collapsing method The method used for combining multiple sequences from (4) into a single sequence in (5) MUSCLE 3.8.31
Data processing ID Identifier for the data processing object.
Data processing protocols General description of how QC is performed Data was processed using [...]
Germline ID Unique identifier of the germline set and version, in standardized form (Repo:Label:Version) OGRDB:Human_IGH:2021.11
Paired read assembly How paired end reads were assembled into a single receptor sequence PandaSeq (minimal overlap 50, threshold 0.8)
Primary annotation If true, indicates this is the primary or default data processing for the repertoire and its rearrangements. If false, indicates this is a secondary or additional data processing.
Primer match cutoffs How primers were identified in the sequences, were they removed/masked/etc? Hamming distance <= 2
Processed files Array of file names for data produced by this data processing. ['ERR1278153_aa.txz', 'ERR1278153_ab.txz', 'ERR1278153_ac.txz']
Quality thresholds How sequences were removed from (4) based on base quality scores Average Phred score >=20
Software tools/versions Version number and / or date, include company pipelines IgBLAST 1.6
V(D)J germline database Source of germline V(D)J genes with version number or date accessed. ENSEMBL, Homo sapiens build 90, 2017-10-01

Sample Processing

Example
# cells/experiment Total number of cells that went into the experiment 1000000
# cells/sequencing reaction Number of cells for each biological replicate 50000
Anatomic site The anatomic location of the tissue, e.g. Inguinal, femur Iliac crest
Batch number ID of sequencing run assigned by the sequencing facility 160101_M01234
Biomaterial provider Name and address of the entity providing the sample Tissues-R-Us, Tampa, FL, USA
Cell isolation procedure Description of the procedure used for marker-based isolation or enrich cells Cells were stained with fluorochrome labeled antibodies and then sorted on a FlowMerlin (CE) cytometer.
Cell quality Relative amount of viable cells after preparation and (if applicable) thawing 90% viability as determined by 7-AAD
Cell species Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema. NCBITAXON:9606, Homo sapiens
Cell species (Ontology ID) Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema. (Ontology ID) NCBITAXON:9606, Homo sapiens
Cell storage TRUE if cells were cryo-preserved between isolation and further processing TRUE
Cell subset Commonly-used designation of isolated cell population CL:0000972, class switched memory B cell
Cell subset (Ontology ID) Commonly-used designation of isolated cell population (Ontology ID) CL:0000972, class switched memory B cell
Cell subset phenotype List of cellular markers and their expression levels used to isolate the cell population CD19+ CD38+ CD27+ IgM- IgD-
Collection event Unit of Sample collection time UO:0000033, day
Collection event Unit of Sample collection time (Ontology ID) UO:0000033, day
Collection event Event in the study schedule to which `Sample collection time` relates to Primary vaccination
Collection time Time point at which sample was taken, relative to `Collection time event` 14
Complete sequences To be considered `complete`, the procedure used for library construction MUST generate sequences that 1) include the first V gene codon that encodes the mature polypeptide chain (i.e. after the leader sequence) and 2) include the last complete codon of the J gene (i.e. 1 bp 5' of the J->C splice site) and 3) provide sequence information for all positions between 1) and 2). To be considered `complete & untemplated`, the sections of the sequences defined in points 1) to 3) of the previous sentence MUST be untemplated, i.e. MUST NOT overlap with the primers used in library preparation. `mixed` should only be used if the procedure used for library construction will likely produce multiple categories of sequences in the given experiment. It SHOULD NOT be used as a replacement of a NULL value. partial
Date of sequencing run Date of sequencing run 42720
Disease state Histopathologic evaluation of the sample Tumor infiltration
Library generation method Generic type of library generation RT(oligo-dT)+TS(UMI)+PCR
Library generation protocol Description of processes applied to substrate to obtain a library that is ready for sequencing cDNA was generated using
Linkage of loci In case an experimental setup is used that physically links nucleic acids derived from distinct `Rearrangements` before library preparation, this field describes the mode of that linkage. All `hetero_*` terms indicate that in case of paired-read sequencing, the two reads should be expected to map to distinct IG/TR loci. `*_head-head` refers to techniques that link the 5' ends of transcripts in a single-cell context. `*_tail-head` refers to techniques that link the 3' end of one transcript to the 5' end of another one in a single-cell context. This term does not provide any information whether a continuous reading-frame between the two is generated. `*_prelinked` refers to constructs in which the linkage was already present on the DNA level (e.g. scFv). hetero_head-head
Processing protocol Description of the methods applied to the sample including cell preparation/ isolation/enrichment and nucleic acid extraction. This should closely mirror the Materials and methods section in the manuscript. Stimulated wih anti-CD3/anti-CD28
Protocol IDs When using a library generation protocol from a commercial provider, provide the protocol version number v2.1 (2016-09-15)
Reads passing QC Number of usable reads for analysis 10365118
Sample ID Sample ID assigned by submitter, unique within study. If possible, a persistent sample ID linked to INSDC or similar repository study should be used. SUP52415
Sample Processing ID Identifier for the sample processing object. This field should be unique within the repertoire. This field can be used to uniquely identify the combination of sample, cell processing, nucleic acid processing and sequencing run information for the repertoire.
Sample type The way the sample was obtained, e.g. fine-needle aspirate, organ harvest, peripheral venous puncture Biopsy
Sequencing facility Name and address of sequencing facility Seqs-R-Us, Vancouver, BC, Canada
Sequencing kit Name, manufacturer, order and lot numbers of sequencing kit FullSeq 600, Alumina, #M123456C0, 789G1HK
Sequencing platform Designation of sequencing instrument used Alumina LoSeq 1000
Single-cell sort TRUE if single cells were isolated into separate compartments
Target substrate The class of nucleic acid that was used as primary starting material for the following procedures RNA
Target substrate quality Description and results of the quality control performed on the template material RIN 9.2
Template amount Amount of template that went into the process 1000
Template amount time unit Unit of template amount UO:0000024, nanogram
Template amount time unit (Ontology ID) Unit of template amount (Ontology ID) UO:0000024, nanogram
Tissue The actual tissue sampled, e.g. lymph node, liver, peripheral blood UBERON:0002371, bone marrow
Tissue (Ontology ID) The actual tissue sampled, e.g. lymph node, liver, peripheral blood (Ontology ID) UBERON:0002371, bone marrow
Tissue processing Enzymatic digestion and/or physical methods used to isolate cells from sample Collagenase A/Dnase I digested, followed by Percoll gradient

Sequencing Data

Example
File name for the index file MS10R-NMonson-C7JR9_S1_R3_001.fastq
Read length in bases for the index file 8
Forward read length Read length in bases for the first file in paired-read sequencing 300
Paired read direction Read direction for the second file in paired-read sequencing reverse
Paired read length Read length in bases for the second file in paired-read sequencing 300
Paired sequencing file name File name for the second file in paired-read sequencing MS10R-NMonson-C7JR9_S1_R2_001.fastq
Raw Data PID Persistent identifier of raw data stored in an archive (e.g. INSDC run ID). Data archive should be identified in the CURIE prefix. SRA:SRR11610494
Read direction Read direction for the raw reads or sequences. The first file in paired-read sequencing. forward
Sequencing file name File name for the raw reads or sequences. The first file in paired-read sequencing. MS10R-NMonson-C7JR9_S1_R1_001.fastq
Sequencing file type File format for the raw reads or sequences

PCR Target

Example
Forward PCR target Position of the most distal nucleotide templated by the forward primer or primer mix IGHV, +23
PCR target Designation of the target locus. Note that this field uses a controlled vocubulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature. IGK
Reverse PCR target Position of the most proximal nucleotide templated by the reverse primer or primer mix IGHG, +57

Receptor Genotype

Example
Receptor deleted gene germline set GermlineSet from which it was taken (issuer/name/version)
Receptor deleted gene name The accepted name for this gene, taken from the GermlineSet
Receptor deleted gene phasing Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome
Receptor documented allele germline set GermlineSet from which it was taken, referenced in standardized form (Repo:Label:Version) OGRDB:Human_IGH:2021.11
Receptor documented allele name The accepted name for this allele, taken from the GermlineSet
Receptor documented allele phasing Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome
Receptor genotype ID A unique identifier within the file for this Receptor Genotype, typically generated by the repository hosting the schema, for example from the underlying ID of the database record
Receptor genotype inference process Information on how the genotype was acquired. Controlled vocabulary. repertoire_sequencing
Receptor genotype locus IGH
Receptor undocumented allele name Allele name as allocated by the inference pipeline
Receptor undocumented allele phasing Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome
Receptor undocumented allele sequence nt sequence of the allele, as provided by the inference pipeline

MHC Genotype

Example
MHC gene symbol The accepted designation of an allele, usually its gene symbol plus allele/sub-allele/etc identifiers, if provided by the mhc_typing method
MHC genotype class Class of MHC alleles described by the MHCGenotype MHC-I
MHC genotype ID A unique identifier for this MHCGenotype, assumed to be unique in the context of the study
MHC genotype inference process The MHC gene to which the described allele belongs (Ontology ID) MRO:0000046, HLA-A
MHC genotype inference process Repository and list from which it was taken (issuer/name/version)
MHC genotype inference process Information on how the genotype was determined. The content of this field should come from a list of recommended terms provided in the AIRR Schema documentation. pcr_low_resolution
MHC germline set The MHC gene to which the described allele belongs MRO:0000046, HLA-A

Receptor Genotype Set

Example
Receptor genotype set ID A unique identifier for this Receptor Genotype Set, typically generated by the repository hosting the schema, for example from the underlying ID of the database record

MHC Genotype Set

Example
MHC genotype set ID A unique identifier for this MHCGenotypeSet

Other

Example
Cells
Clones
Repository
Sequences

Rearrangement

Example
C Alignment End End position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
C Alignment Start Start position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
C Cigar CIGAR string for the C gene alignment.
C Germline Alignment Aligned constant region germline sequence spanning the same region as the c_sequence_alignment field and including the same set of corrections and spacers (if any).
C Germline Alignment AA Amino acid translation of the c_germline_aligment field.
C Germline End Alignment end position in the C gene reference sequence (1-based closed interval).
C Germline Start Alignment start position in the C gene reference sequence (1-based closed interval).
C IDentity Fractional identity for the C gene alignment.
C Region Constant region gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHG1*01 if using IMGT/GENE-DB). IGHG1*01
C Score Alignment score for the C gene alignment.
C Sequence Alignment Aligned portion of query sequence assigned to the constant region, including any indel corrections or numbering spacers.
C Sequence Alignment AA Amino acid translation of the c_sequence_alignment field.
C Sequence End End position of the C gene in the query sequence (1-based closed interval).
C Sequence Start Start position of the C gene in the query sequence (1-based closed interval).
C Support C gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the C gene assignment as defined by the alignment tool.
CDR1 Nucleotide sequence of the aligned CDR1 region.
CDR1 AA Amino acid translation of the cdr1 field.
CDR1 End CDR1 end position in the query sequence (1-based closed interval).
CDR1 Start CDR1 start position in the query sequence (1-based closed interval).
CDR2 Nucleotide sequence of the aligned CDR2 region.
CDR2 AA Amino acid translation of the cdr2 field.
CDR2 End CDR2 end position in the query sequence (1-based closed interval).
CDR2 Start CDR2 start position in the query sequence (1-based closed interval).
CDR3 Nucleotide sequence of the aligned CDR3 region.
CDR3 AA Amino acid translation of the cdr3 field.
CDR3 End CDR3 end position in the query sequence (1-based closed interval).
CDR3 Start CDR3 start position in the query sequence (1-based closed interval).
Cell Index Identifier defining the cell of origin for the query sequence. W06_046_091
Clone ID Clonal cluster assignment for the query sequence.
Complete Vdj True if the sequence alignment spans the entire V(D)J region. Meaning, sequence_alignment includes both the first V gene codon that encodes the mature polypeptide chain (i.e., after the leader sequence) and the last complete codon of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment.
Consensus Count Number of reads contributing to the UMI consensus or contig assembly for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence.
D Alignment End End position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D Alignment Start Start position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D Cigar CIGAR string for the first or only D gene alignment.
D Frame Numerical reading frame (1, 2, 3) of the first or only D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence.
D Gene With Allele First or only D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB). IGHD3-10*01
D Germline Alignment Aligned D gene germline sequence spanning the same region as the d_sequence_alignment field and including the same set of corrections and spacers (if any).
D Germline Alignment AA Amino acid translation of the d_germline_alignment field.
D Germline End Alignment end position in the D gene reference sequence for the first or only D gene (1-based closed interval).
D Germline Start Alignment start position in the D gene reference sequence for the first or only D gene (1-based closed interval).
D IDentity Fractional identity for the first or only D gene alignment.
D Score Alignment score for the first or only D gene alignment.
D Sequence Alignment Aligned portion of query sequence assigned to the first or only D gene, including any indel corrections or numbering spacers.
D Sequence Alignment AA Amino acid translation of the d_sequence_alignment field.
D Sequence End End position of the first or only D gene in the query sequence. (1-based closed interval).
D Sequence Start Start position of the first or only D gene in the query sequence. (1-based closed interval).
D Support D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the first or only D gene as defined by the alignment tool.
D2 Alignment End End position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D2 Alignment Start Start position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D2 Call Second D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB). IGHD3-10*01
D2 Cigar CIGAR string for the second D gene alignment.
D2 Frame Numerical reading frame (1, 2, 3) of the second D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence.
D2 Germline Alignment Aligned D gene germline sequence spanning the same region as the d2_sequence_alignment field and including the same set of corrections and spacers (if any).
D2 Germline Alignment AA Amino acid translation of the d2_germline_alignment field.
D2 Germline End Alignment end position in the second D gene reference sequence (1-based closed interval).
D2 Germline Start Alignment start position in the second D gene reference sequence (1-based closed interval).
D2 IDentity Fractional identity for the second D gene alignment.
D2 Score Alignment score for the second D gene alignment.
D2 Sequence Alignment Aligned portion of query sequence assigned to the second D gene, including any indel corrections or numbering spacers.
D2 Sequence Alignment AA Amino acid translation of the d2_sequence_alignment field.
D2 Sequence End End position of the second D gene in the query sequence (1-based closed interval).
D2 Sequence Start Start position of the second D gene in the query sequence (1-based closed interval).
D2 Support D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the second D gene as defined by the alignment tool.
Data Processing ID Identifier to the data processing object in the repertoire metadata for this rearrangement. If this field is empty than the primary data processing object is assumed.
FWR1 Nucleotide sequence of the aligned FWR1 region.
FWR1 AA Amino acid translation of the fwr1 field.
FWR1 End FWR1 end position in the query sequence (1-based closed interval).
FWR1 Start FWR1 start position in the query sequence (1-based closed interval).
FWR2 Nucleotide sequence of the aligned FWR2 region.
FWR2 AA Amino acid translation of the fwr2 field.
FWR2 End FWR2 end position in the query sequence (1-based closed interval).
FWR2 Start FWR2 start position in the query sequence (1-based closed interval).
FWR3 Nucleotide sequence of the aligned FWR3 region.
FWR3 AA Amino acid translation of the fwr3 field.
FWR3 End FWR3 end position in the query sequence (1-based closed interval).
FWR3 Start FWR3 start position in the query sequence (1-based closed interval).
FWR4 Nucleotide sequence of the aligned FWR4 region.
FWR4 AA Amino acid translation of the fwr4 field.
FWR4 End FWR4 end position in the query sequence (1-based closed interval).
FWR4 Start FWR4 start position in the query sequence (1-based closed interval).
Gene Locus Gene locus (chain type). Note that this field uses a controlled vocabulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature. IGH
Germline Alignment Assembled, aligned, full-length inferred germline sequence spanning the same region as the sequence_alignment field (typically the V(D)J region) and including the same set of corrections and spacers (if any).
Germline Alignment AA Amino acid translation of the assembled germline sequence.
Germline Database Source of germline V(D)J genes with version number or date accessed. ENSEMBL, Homo sapiens build 90, 2017-10-01
J Alignment End End position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
J Alignment Start Start position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
J Cigar CIGAR string for the J gene alignment.
J Frameshift True if the J gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the J gene reference sequence.
J Gene With Allele J gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHJ4*02 if using IMGT/GENE-DB). IGHJ4*02
J Germline Alignment Aligned J gene germline sequence spanning the same region as the j_sequence_alignment field and including the same set of corrections and spacers (if any).
J Germline Alignment AA Amino acid translation of the j_germline_alignment field.
J Germline End Alignment end position in the J gene reference sequence (1-based closed interval).
J Germline Start Alignment start position in the J gene reference sequence (1-based closed interval).
J IDentity Fractional identity for the J gene alignment.
J Score Alignment score for the J gene alignment.
J Sequence Alignment Aligned portion of query sequence assigned to the J gene, including any indel corrections or numbering spacers.
J Sequence Alignment AA Amino acid translation of the j_sequence_alignment field.
J Sequence End End position of the J gene in the query sequence (1-based closed interval).
J Sequence Start Start position of the J gene in the query sequence (1-based closed interval).
J Support J gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the J gene assignment as defined by the alignment tool.
Junction Length Number of nucleotides in the junction sequence.
Junction Length (AA) Number of amino acids in the junction sequence.
Junction/CDR3 AA Amino acid translation of the junction. CARAGVYDGYTMDYW
Junction/CDR3 NT Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons. TGTGCAAGAGCGGGAGTTTACGACGGATATACTATGGACTACTGG
N1 Length Number of untemplated nucleotides 5' of the first or only D gene alignment.
N2 Length Number of untemplated nucleotides 3' of the first or only D gene alignment.
N3 Length Number of untemplated nucleotides 3' of the second D gene alignment.
Np1 Nucleotide sequence of the combined N/P region between the V gene and first D gene alignment or between the V gene and J gene alignments.
Np1 AA Amino acid translation of the np1 field.
Np1 Length Number of nucleotides between the V gene and first D gene alignments or between the V gene and J gene alignments.
Np2 Nucleotide sequence of the combined N/P region between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
Np2 AA Amino acid translation of the np2 field.
Np2 Length Number of nucleotides between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
Np3 Nucleotide sequence of the combined N/P region between the second D gene and J gene alignments.
Np3 AA Amino acid translation of the np3 field.
Np3 Length Number of nucleotides between the second D gene and J gene alignments.
P3D Length Number of palindromic nucleotides 3' of the first or only D gene alignment.
P3D2 Length Number of palindromic nucleotides 3' of the second D gene alignment.
P3V Length Number of palindromic nucleotides 3' of the V gene alignment.
P5D Length Number of palindromic nucleotides 5' of the first or only D gene alignment.
P5D2 Length Number of palindromic nucleotides 5' of the second D gene alignment.
P5J Length Number of palindromic nucleotides 5' of the J gene alignment.
Productive True if the V(D)J sequence is predicted to be productive.
Quality The Sanger/Phred quality scores for assessment of sequence quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.)
Quality Alignment Sanger/Phred quality scores for assessment of sequence_alignment quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.)
Read Count Copy number or number of duplicate observations for the query sequence. For example, the number of identical reads observed for this sequence. 123
Rearrangement ID Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a universally unique record locator for database applications.
Rearrangement Set ID Identifier for grouping Rearrangement objects.
Repertoire ID Identifier to the associated repertoire in study metadata.
Rev Comp True if the alignment is on the opposite strand (reverse complemented) with respect to the query sequence. If True then all output data, such as alignment coordinates and sequences, are based on the reverse complement of 'sequence'.
Sample Processing ID Identifier to the sample processing object in the repertoire metadata for this rearrangement. If the repertoire has a single sample then this field may be empty or missing. If the repertoire has multiple samples then this field may be empty or missing if the sample cannot be differentiated or the relationship is not maintained by the data processing.
Sequence The query nucleotide sequence. Usually, this is the unmodified input sequence, which may be reverse complemented if necessary. In some cases, this field may contain consensus sequences or other types of collapsed input sequences if these steps are performed prior to alignment.
Sequence AA Amino acid translation of the query nucleotide sequence.
Sequence Alignment Aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps. Typically, this will include only the V(D)J region, but that is not a requirement.
Sequence Alignment AA Amino acid translation of the aligned query sequence.
Sequence ID Unique query sequence identifier for the Rearrangement. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment. When downloaded from an AIRR Data Commons repository, this will usually be a universally unique record locator for linking with other objects in the AIRR Data Model.
Stop Codon True if the aligned sequence contains a stop codon.
Umi Count Number of distinct UMIs represented by this sequence. For example, the total number of UMIs that contribute to the contig assembly for the query sequence.
V Alignment End End position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
V Alignment Start Start position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
V Cigar CIGAR string for the V gene alignment.
V Frameshift True if the V gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the V gene reference sequence.
V Gene With Allele V gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHV4-59*01 if using IMGT/GENE-DB). IGHV4-59*01
V Germline Alignment Aligned V gene germline sequence spanning the same region as the v_sequence_alignment field and including the same set of corrections and spacers (if any).
V Germline Alignment AA Amino acid translation of the v_germline_alignment field.
V Germline End Alignment end position in the V gene reference sequence (1-based closed interval).
V Germline Start Alignment start position in the V gene reference sequence (1-based closed interval).
V IDentity Fractional identity for the V gene alignment.
V Score Alignment score for the V gene.
V Sequence Alignment Aligned portion of query sequence assigned to the V gene, including any indel corrections or numbering spacers.
V Sequence Alignment AA Amino acid translation of the v_sequence_alignment field.
V Sequence End End position of the V gene in the query sequence (1-based closed interval).
V Sequence Start Start position of the V gene in the query sequence (1-based closed interval).
V Support V gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the V gene assignment as defined by the alignment tool.
Vj In Frame True if the V and J gene alignments are in-frame.