Data elements definitions

The terms describing the data stored in the network of AIRR-seq repositories federated by the iReceptor Gateway are driven by the recommendations of the AIRR Community: the AIRR Minimal Standards (MiAIRR) for study metadata and the AIRR Rearrangement schema representing annotated rearrangements. For more information, visit the AIRR Community documentation page.

Repertoire

		Example
Repertoire Description	Generic repertoire description
Repertoire ID	Identifier for the repertoire object. This identifier should be globally unique so that repertoires from multiple studies can be combined together without conflict. The repertoire_id is used to link other AIRR data to a Repertoire. Specifically, the Rearrangements Schema includes repertoire_id for referencing the specific Repertoire for that Rearrangement.
Repertoire Name	Short generic display name for the repertoire

Study

		Example
ADC Publish Date	Date the study was first published in the AIRR Data Commons.	44229
ADC Update Date	Date the study data was updated in the AIRR Data Commons.	44229
Contact (collection)	Full contact information of the contact persons for this study This should include an e-mail address and a persistent identifier such as an ORCID ID.	Dr. P. Stibbons, p.stibbons@unseenu.edu, https://orcid.org/0000-0002-1825-0097
Contact (collection)	Full contact information of the data collector, i.e. the person who is legally responsible for data collection and release. This should include an e-mail address and a persistent identifier such as an ORCID ID.	Dr. P. Stibbons, p.stibbons@unseenu.edu, https://orcid.org/0000-0002-1825-0097
Contact (deposition)	Full contact information of the data depositor, i.e., the person submitting the data to a repository. This should include an e-mail address and a persistent identifier such as an ORCID ID. This is supposed to be a short-lived and technical role until the submission is relased.	Adrian Turnipseed, a.turnipseed@unseenu.edu, https://orcid.org/0000-0002-1825-0097
Funding	Funding agencies and grant numbers	NIH, award number R01GM987654
Inclusion criteria	List of criteria for inclusion/exclusion for the study	Include: Clinical P. falciparum infection; Exclude: Seropositive for HIV
Lab address	Institution and institutional address of data collector	School of Medicine, Unseen University, Ankh-Morpork, Disk World
Lab name	Department of data collector	Department for Planar Immunology
Publications	Publications describing the rationale and/or outcome of the study. Where ever possible, a persistent identifier should be used such as a DOI or a Pubmed ID	PMID:85642
Study Description	Generic study description	Longer description
Study ID	Unique ID assigned by study registry such as one of the International Nucleotide Sequence Database Collaboration (INSDC) repositories.	PRJNA001
Study keywords	Keywords describing properties of one or more data sets in a study. "contains_schema" keywords indicate that the study contains data objects from the AIRR Schema of that type (Rearrangement, Clone, Cell, Receptor) while the other keywords indicate that the study design considers the type of data indicated (e.g. it is possible to have a study that "contains_paired_chain" but does not "contains_schema_cell").	['contains_ig', 'contains_schema_rearrangement', 'contains_schema_clone', 'contains_schema_cell']
Study title	Descriptive study title	Effects of sun light exposure of the Treg repertoire
Study type	Type of study design	NCIT:C15197, Case-Control Study
Study type (Ontology ID)	Type of study design (Ontology ID)	NCIT:C15197, Case-Control Study

Subject

		Example
Age (deprecated)
Age event	Event in the study schedule to which `Age` refers. For NCBI BioSample this MUST be `sampling`. For other implementations submitters need to be aware that there is currently no mechanism to encode to potential delta between `Age event` and `Sample collection time`, hence the chosen events should be in temporal proximity.	enrollment
Age maximum	Upper boundary of age range or equal to age_min for specific age. This field should only be null if age_min is null.	80
Age minimum	Specific age or lower boundary of age range.	60
Age unit	Unit of age range	UO:0000036, year
Age unit (Ontology ID)	Unit of age range (Ontology ID)	UO:0000036, year
Ancestry	Broad geographic origin of ancestry (continent)	list of continents, mixed or unknown
Ethnicity	Ethnic group of subject (defined as cultural/language-based membership)	English, Kurds, Manchu, Yakuts (and other fields from Wikipedia)
Organism	Binomial designation of subject's species	NCBITAXON:9606, Homo sapiens
Organism (deprecated)	Binomial designation of subject's species
Organism (deprecated)	Binomial designation of subject's species (Ontology ID)
Organism (Ontology ID)	Binomial designation of subject's species (Ontology ID)	NCBITAXON:9606, Homo sapiens
Race	Racial group of subject (as defined by NIH)	White, American Indian or Alaska Native, Black, Asian, Native Hawaiian or Other Pacific Islander, Other
Relation (subjects)	Subject ID to which `Relation type` refers	SUB1355648
Relation type	Relation between subject and `linked_subjects`, can be genetic or environmental (e.g.exposure)	father, daughter, household
Sex	Biological sex of subject	female
Strain name	Non-human designation of the strain or breed of animal used	C57BL/6J
Subject ID	Subject ID assigned by submitter, unique within study. If possible, a persistent subject ID linked to an INSDC or similar repository study should be used.	SUB856413
Synthetic library	TRUE for libraries in which the diversity has been synthetically generated (e.g. phage display)

Diagnosis

		Example
Diagnosis	Diagnosis of subject	DOID:9538, multiple myeloma
Diagnosis (Ontology ID)	Diagnosis of subject (Ontology ID)	DOID:9538, multiple myeloma
Disease stage	Stage of disease at current intervention	Stage II
Immunogen/agent	Antigen, vaccine or drug applied to subject at this intervention	bortezomib
Intervention	Description of intervention	systemic chemotherapy, 6 cycles, 1.25 mg/m2
Length of disease	Time duration between initial diagnosis and current intervention	23 months
Medical history	Medical history of subject that is relevant to assess the course of disease and/or treatment	MGUS, first diagnosed 5 years prior
Prior therapies	List of all relevant previous therapies applied to subject for treatment of `Diagnosis`	melphalan/prednisone
Study group	Designation of study arm to which the subject is assigned to	control

Data Processing

		Example
Analysis ID	Identifier for machine-readable PROV model of analysis provenance
Collapsing method	The method used for combining multiple sequences from (4) into a single sequence in (5)	MUSCLE 3.8.31
Data processing ID	Identifier for the data processing object.
Data processing protocols	General description of how QC is performed	Data was processed using [...]
Germline ID	Unique identifier of the germline set and version, in standardized form (Repo:Label:Version)	OGRDB:Human_IGH:2021.11
Paired read assembly	How paired end reads were assembled into a single receptor sequence	PandaSeq (minimal overlap 50, threshold 0.8)
Primary annotation	If true, indicates this is the primary or default data processing for the repertoire and its rearrangements. If false, indicates this is a secondary or additional data processing.
Primer match cutoffs	How primers were identified in the sequences, were they removed/masked/etc?	Hamming distance <= 2
Processed files	Array of file names for data produced by this data processing.	['ERR1278153_aa.txz', 'ERR1278153_ab.txz', 'ERR1278153_ac.txz']
Quality thresholds	How/if sequences were removed from (4) based on base quality scores	Average Phred score >=20
Software tools/versions	Version number and / or date, include company pipelines	IgBLAST 1.6
V(D)J germline database	Source of germline V(D)J genes with version number or date accessed.	ENSEMBL, Homo sapiens build 90, 2017-10-01

Sample Processing

		Example
# cells/experiment	Total number of cells that went into the experiment	1000000
# cells/sequencing reaction	Number of cells for each biological replicate	50000
Anatomic site	The anatomic location of the tissue, e.g. Inguinal, femur	Iliac crest
Batch number	ID of sequencing run assigned by the sequencing facility	160101_M01234
Biomaterial provider	Name and address of the entity providing the sample	Tissues-R-Us, Tampa, FL, USA
Cell isolation procedure	Description of the procedure used for marker-based isolation or enrich cells	Cells were stained with fluorochrome labeled antibodies and then sorted on a FlowMerlin (CE) cytometer.
Cell quality	Relative amount of viable cells after preparation and (if applicable) thawing	90% viability as determined by 7-AAD
Cell species	Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema.	NCBITAXON:9606, Homo sapiens
Cell species (Ontology ID)	Binomial designation of the species from which the analyzed cells originate. Typically, this value should be identical to `species`, in which case it SHOULD NOT be set explicitly. However, there are valid experimental setups in which the two might differ, e.g., chimeric animal models. If set, this key will overwrite the `species` information for all lower layers of the schema. (Ontology ID)	NCBITAXON:9606, Homo sapiens
Cell storage	TRUE if cells were cryo-preserved between isolation and further processing	TRUE
Cell subset	Commonly-used designation of isolated cell population	CL:0000972, class switched memory B cell
Cell subset (Ontology ID)	Commonly-used designation of isolated cell population (Ontology ID)	CL:0000972, class switched memory B cell
Cell subset phenotype	List of cellular markers and their expression levels used to isolate the cell population	CD19+ CD38+ CD27+ IgM- IgD-
Collection event	Unit of Sample collection time	UO:0000033, day
Collection event	Unit of Sample collection time (Ontology ID)	UO:0000033, day
Collection event	Event in the study schedule to which `Sample collection time` relates to	Primary vaccination
Collection time	Time point at which sample was taken, relative to `Collection time event`	14
Complete sequences	To be considered `complete`, the procedure used for library construction MUST generate sequences that 1) include the first V gene codon that encodes the mature polypeptide chain (i.e. after the leader sequence) and 2) include the last complete codon of the J gene (i.e. 1 bp 5' of the J->C splice site) and 3) provide sequence information for all positions between 1) and 2). To be considered `complete & untemplated`, the sections of the sequences defined in points 1) to 3) of the previous sentence MUST be untemplated, i.e. MUST NOT overlap with the primers used in library preparation. `mixed` should only be used if the procedure used for library construction will likely produce multiple categories of sequences in the given experiment. It SHOULD NOT be used as a replacement of a NULL value.	partial
Date of sequencing run	Date of sequencing run	42720
Disease state	Histopathologic evaluation of the sample	Tumor infiltration
Library generation method	Generic type of library generation	RT(oligo-dT)+TS(UMI)+PCR
Library generation protocol	Description of processes applied to substrate to obtain a library that is ready for sequencing	cDNA was generated using
Linkage of loci	In case an experimental setup is used that physically links nucleic acids derived from distinct `Rearrangements` before library preparation, this field describes the mode of that linkage. All `hetero_` terms indicate that in case of paired-read sequencing, the two reads should be expected to map to distinct IG/TR loci. `_head-head` refers to techniques that link the 5' ends of transcripts in a single-cell context. `_tail-head` refers to techniques that link the 3' end of one transcript to the 5' end of another one in a single-cell context. This term does not provide any information whether a continuous reading-frame between the two is generated. `_prelinked` refers to constructs in which the linkage was already present on the DNA level (e.g. scFv).	hetero_head-head
Processing protocol	Description of the methods applied to the sample including cell preparation/ isolation/enrichment and nucleic acid extraction. This should closely mirror the Materials and methods section in the manuscript.	Stimulated wih anti-CD3/anti-CD28
Protocol IDs	When using a library generation protocol from a commercial provider, provide the protocol version number	v2.1 (2016-09-15)
Reads passing QC	Number of usable reads for analysis	10365118
Sample ID	Sample ID assigned by submitter, unique within study. If possible, a persistent sample ID linked to INSDC or similar repository study should be used.	SUP52415
Sample Processing ID	Identifier for the sample processing object. This field should be unique within the repertoire. This field can be used to uniquely identify the combination of sample, cell processing, nucleic acid processing and sequencing run information for the repertoire.
Sample type	The way the sample was obtained, e.g. fine-needle aspirate, organ harvest, peripheral venous puncture	Biopsy
Sequencing facility	Name and address of sequencing facility	Seqs-R-Us, Vancouver, BC, Canada
Sequencing kit	Name, manufacturer, order and lot numbers of sequencing kit	FullSeq 600, Alumina, #M123456C0, 789G1HK
Sequencing platform	Designation of sequencing instrument used	Alumina LoSeq 1000
Single-cell sort	TRUE if single cells were isolated into separate compartments
Target substrate	The class of nucleic acid that was used as primary starting material for the following procedures	RNA
Target substrate quality	Description and results of the quality control performed on the template material	RIN 9.2
Template amount	Amount of template that went into the process	1000
Template amount time unit	Unit of template amount	UO:0000024, nanogram
Template amount time unit (Ontology ID)	Unit of template amount (Ontology ID)	UO:0000024, nanogram
Tissue	The actual tissue sampled, e.g. lymph node, liver, peripheral blood	UBERON:0002371, bone marrow
Tissue (Ontology ID)	The actual tissue sampled, e.g. lymph node, liver, peripheral blood (Ontology ID)	UBERON:0002371, bone marrow
Tissue processing	Enzymatic digestion and/or physical methods used to isolate cells from sample	Collagenase A/Dnase I digested, followed by Percoll gradient

Sequencing Data

		Example
	Read length in bases for the index file	8
	File name for the index file	MS10R-NMonson-C7JR9_S1_R3_001.fastq
Forward read length	Read length in bases for the first file in paired-read sequencing	300
Paired read direction	Read direction for the second file in paired-read sequencing	reverse
Paired read length	Read length in bases for the second file in paired-read sequencing	300
Paired sequencing file name	File name for the second file in paired-read sequencing	MS10R-NMonson-C7JR9_S1_R2_001.fastq
Raw Data PID	Persistent identifier of raw data stored in an archive (e.g. INSDC run ID). Data archive should be identified in the CURIE prefix.	SRA:SRR11610494
Read direction	Read direction for the raw reads or sequences. The first file in paired-read sequencing.	forward
Sequencing file name	File name for the raw reads or sequences. The first file in paired-read sequencing.	MS10R-NMonson-C7JR9_S1_R1_001.fastq
Sequencing file type	File format for the raw reads or sequences

PCR Target

		Example
Forward PCR target	Position of the most distal nucleotide templated by the forward primer or primer mix	IGHV, +23
PCR target	Designation of the target locus. Note that this field uses a controlled vocubulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature.	IGK
Reverse PCR target	Position of the most proximal nucleotide templated by the reverse primer or primer mix	IGHG, +57

Receptor Genotype

		Example

Receptor genotype ID	A unique identifier within the file for this Receptor Genotype, typically generated by the repository hosting the schema, for example from the underlying ID of the database record.
Receptor genotype inference process	Information on how the genotype was acquired. Controlled vocabulary.	repertoire_sequencing
Receptor genotype locus	Gene locus	IGH

Receptor Genotype Set

		Example
Receptor genotype set ID	A unique identifier for this Receptor Genotype Set, typically generated by the repository hosting the schema, for example from the underlying ID of the database record.

Receptor Genotype Deleted Gene

		Example
Receptor deleted gene germline set	GermlineSet from which it was taken (issuer/name/version)
Receptor deleted gene name	The accepted name for this gene, taken from the GermlineSet
Receptor deleted gene phasing	Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome.

Receptor Genotype Documented Allele

		Example
Receptor documented allele germline set	GermlineSet from which it was taken, referenced in standardized form (Repo:Label:Version)	OGRDB:Human_IGH:2021.11
Receptor documented allele name	The accepted name for this allele, taken from the GermlineSet
Receptor documented allele phasing	Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome.

Receptor Genotype Undocumented Allele

		Example
Receptor undocumented allele name	Allele name as allocated by the inference pipeline
Receptor undocumented allele phasing	Chromosomal phasing indicator. Alleles with the same value are inferred to be located on the same chromosome.
Receptor undocumented allele sequence	nt sequence of the allele, as provided by the inference pipeline

MHC Genotype

		Example
MHC genotype class	Class of MHC alleles described by the MHCGenotype	MHC-I
MHC genotype ID	A unique identifier for this MHCGenotype, assumed to be unique in the context of the study
MHC genotype inference process	Information on how the genotype was determined. The content of this field should come from a list of recommended terms provided in the AIRR Schema documentation.	pcr_low_resolution

MHC Genotype Set

		Example
MHC genotype set ID	A unique identifier for this MHCGenotypeSet

MHC Allele

		Example
MHC allele	The accepted designation of an allele, usually its gene symbol plus allele/sub-allele/etc identifiers, if provided by the mhc_typing method
MHC gene ID (ontology)	The MHC gene to which the described allele belongs (Ontology ID)	MRO:0000046, HLA-A
MHC gene name	The MHC gene to which the described allele belongs	MRO:0000046, HLA-A
MHC germline reference set	Repository and list from which it was taken (issuer/name/version)

Other

		Example
Cells
Clones
Repository
Sequences

Rearrangement

		Example
C Alignment End	End position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
C Alignment Start	Start position of the C gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
C Cigar	CIGAR string for the C gene alignment.
C Germline Alignment	Aligned constant region germline sequence spanning the same region as the c_sequence_alignment field and including the same set of corrections and spacers (if any).
C Germline Alignment AA	Amino acid translation of the c_germline_aligment field.
C Germline End	Alignment end position in the C gene reference sequence (1-based closed interval).
C Germline Start	Alignment start position in the C gene reference sequence (1-based closed interval).
C IDentity	Fractional identity for the C gene alignment.
C Region	Constant region gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHG1*01 if using IMGT/GENE-DB).	IGHG1*01
C Score	Alignment score for the C gene alignment.
C Sequence Alignment	Aligned portion of query sequence assigned to the constant region, including any indel corrections or numbering spacers.
C Sequence Alignment AA	Amino acid translation of the c_sequence_alignment field.
C Sequence End	End position of the C gene in the query sequence (1-based closed interval).
C Sequence Start	Start position of the C gene in the query sequence (1-based closed interval).
C Support	C gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the C gene assignment as defined by the alignment tool.
CDR1	Nucleotide sequence of the aligned CDR1 region.
CDR1 AA	Amino acid translation of the cdr1 field.
CDR1 End	CDR1 end position in the query sequence (1-based closed interval).
CDR1 Start	CDR1 start position in the query sequence (1-based closed interval).
CDR2	Nucleotide sequence of the aligned CDR2 region.
CDR2 AA	Amino acid translation of the cdr2 field.
CDR2 End	CDR2 end position in the query sequence (1-based closed interval).
CDR2 Start	CDR2 start position in the query sequence (1-based closed interval).
CDR3	Nucleotide sequence of the aligned CDR3 region.
CDR3 AA	Amino acid translation of the cdr3 field.
CDR3 End	CDR3 end position in the query sequence (1-based closed interval).
CDR3 Start	CDR3 start position in the query sequence (1-based closed interval).
Cell Index	Identifier defining the cell of origin for the query sequence.	W06_046_091
Clone ID	Clonal cluster assignment for the query sequence.
Complete Vdj	True if the sequence alignment spans the entire V(D)J region. Meaning, sequence_alignment includes both the first V gene codon that encodes the mature polypeptide chain (i.e., after the leader sequence) and the last complete codon of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment.
Consensus Count	Number of reads contributing to the UMI consensus or contig assembly for this sequence. For example, the sum of the number of reads for all UMIs that contribute to the query sequence.
D Alignment End	End position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D Alignment Start	Start position of the first or only D gene in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D Cigar	CIGAR string for the first or only D gene alignment.
D Frame	Numerical reading frame (1, 2, 3) of the first or only D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence.
D Gene With Allele	First or only D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB).	IGHD3-10*01
D Germline Alignment	Aligned D gene germline sequence spanning the same region as the d_sequence_alignment field and including the same set of corrections and spacers (if any).
D Germline Alignment AA	Amino acid translation of the d_germline_alignment field.
D Germline End	Alignment end position in the D gene reference sequence for the first or only D gene (1-based closed interval).
D Germline Start	Alignment start position in the D gene reference sequence for the first or only D gene (1-based closed interval).
D IDentity	Fractional identity for the first or only D gene alignment.
D Score	Alignment score for the first or only D gene alignment.
D Sequence Alignment	Aligned portion of query sequence assigned to the first or only D gene, including any indel corrections or numbering spacers.
D Sequence Alignment AA	Amino acid translation of the d_sequence_alignment field.
D Sequence End	End position of the first or only D gene in the query sequence. (1-based closed interval).
D Sequence Start	Start position of the first or only D gene in the query sequence. (1-based closed interval).
D Support	D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the first or only D gene as defined by the alignment tool.
D2 Alignment End	End position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D2 Alignment Start	Start position of the second D gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
D2 Call	Second D gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHD3-10*01 if using IMGT/GENE-DB).	IGHD3-10*01
D2 Cigar	CIGAR string for the second D gene alignment.
D2 Frame	Numerical reading frame (1, 2, 3) of the second D gene in the query nucleotide sequence, where frame 1 is relative to the first codon of D gene reference sequence.
D2 Germline Alignment	Aligned D gene germline sequence spanning the same region as the d2_sequence_alignment field and including the same set of corrections and spacers (if any).
D2 Germline Alignment AA	Amino acid translation of the d2_germline_alignment field.
D2 Germline End	Alignment end position in the second D gene reference sequence (1-based closed interval).
D2 Germline Start	Alignment start position in the second D gene reference sequence (1-based closed interval).
D2 IDentity	Fractional identity for the second D gene alignment.
D2 Score	Alignment score for the second D gene alignment.
D2 Sequence Alignment	Aligned portion of query sequence assigned to the second D gene, including any indel corrections or numbering spacers.
D2 Sequence Alignment AA	Amino acid translation of the d2_sequence_alignment field.
D2 Sequence End	End position of the second D gene in the query sequence (1-based closed interval).
D2 Sequence Start	Start position of the second D gene in the query sequence (1-based closed interval).
D2 Support	D gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the second D gene as defined by the alignment tool.
Data Processing ID	Identifier to the data processing object in the repertoire metadata for this rearrangement. If this field is empty than the primary data processing object is assumed.
FWR1	Nucleotide sequence of the aligned FWR1 region.
FWR1 AA	Amino acid translation of the fwr1 field.
FWR1 End	FWR1 end position in the query sequence (1-based closed interval).
FWR1 Start	FWR1 start position in the query sequence (1-based closed interval).
FWR2	Nucleotide sequence of the aligned FWR2 region.
FWR2 AA	Amino acid translation of the fwr2 field.
FWR2 End	FWR2 end position in the query sequence (1-based closed interval).
FWR2 Start	FWR2 start position in the query sequence (1-based closed interval).
FWR3	Nucleotide sequence of the aligned FWR3 region.
FWR3 AA	Amino acid translation of the fwr3 field.
FWR3 End	FWR3 end position in the query sequence (1-based closed interval).
FWR3 Start	FWR3 start position in the query sequence (1-based closed interval).
FWR4	Nucleotide sequence of the aligned FWR4 region.
FWR4 AA	Amino acid translation of the fwr4 field.
FWR4 End	FWR4 end position in the query sequence (1-based closed interval).
FWR4 Start	FWR4 start position in the query sequence (1-based closed interval).
Gene Locus	Gene locus (chain type). Note that this field uses a controlled vocabulary that is meant to provide a generic classification of the locus, not necessarily the correct designation according to a specific nomenclature.	IGH
Germline Alignment	Assembled, aligned, full-length inferred germline sequence spanning the same region as the sequence_alignment field (typically the V(D)J region) and including the same set of corrections and spacers (if any).
Germline Alignment AA	Amino acid translation of the assembled germline sequence.
Germline Database	Source of germline V(D)J genes with version number or date accessed.	ENSEMBL, Homo sapiens build 90, 2017-10-01
J Alignment End	End position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
J Alignment Start	Start position of the J gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
J Cigar	CIGAR string for the J gene alignment.
J Frameshift	True if the J gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the J gene reference sequence.
J Gene With Allele	J gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHJ4*02 if using IMGT/GENE-DB).	IGHJ4*02
J Germline Alignment	Aligned J gene germline sequence spanning the same region as the j_sequence_alignment field and including the same set of corrections and spacers (if any).
J Germline Alignment AA	Amino acid translation of the j_germline_alignment field.
J Germline End	Alignment end position in the J gene reference sequence (1-based closed interval).
J Germline Start	Alignment start position in the J gene reference sequence (1-based closed interval).
J IDentity	Fractional identity for the J gene alignment.
J Score	Alignment score for the J gene alignment.
J Sequence Alignment	Aligned portion of query sequence assigned to the J gene, including any indel corrections or numbering spacers.
J Sequence Alignment AA	Amino acid translation of the j_sequence_alignment field.
J Sequence End	End position of the J gene in the query sequence (1-based closed interval).
J Sequence Start	Start position of the J gene in the query sequence (1-based closed interval).
J Support	J gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the J gene assignment as defined by the alignment tool.
Junction Length	Number of nucleotides in the junction sequence.
Junction Length (AA)	Number of amino acids in the junction sequence.
Junction/CDR3 AA	Amino acid translation of the junction.	CARAGVYDGYTMDYW
Junction/CDR3 NT	Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons.	TGTGCAAGAGCGGGAGTTTACGACGGATATACTATGGACTACTGG
N1 Length	Number of untemplated nucleotides 5' of the first or only D gene alignment.
N2 Length	Number of untemplated nucleotides 3' of the first or only D gene alignment.
N3 Length	Number of untemplated nucleotides 3' of the second D gene alignment.
Np1	Nucleotide sequence of the combined N/P region between the V gene and first D gene alignment or between the V gene and J gene alignments.
Np1 AA	Amino acid translation of the np1 field.
Np1 Length	Number of nucleotides between the V gene and first D gene alignments or between the V gene and J gene alignments.
Np2	Nucleotide sequence of the combined N/P region between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
Np2 AA	Amino acid translation of the np2 field.
Np2 Length	Number of nucleotides between either the first D gene and J gene alignments or the first D gene and second D gene alignments.
Np3	Nucleotide sequence of the combined N/P region between the second D gene and J gene alignments.
Np3 AA	Amino acid translation of the np3 field.
Np3 Length	Number of nucleotides between the second D gene and J gene alignments.
P3D Length	Number of palindromic nucleotides 3' of the first or only D gene alignment.
P3D2 Length	Number of palindromic nucleotides 3' of the second D gene alignment.
P3V Length	Number of palindromic nucleotides 3' of the V gene alignment.
P5D Length	Number of palindromic nucleotides 5' of the first or only D gene alignment.
P5D2 Length	Number of palindromic nucleotides 5' of the second D gene alignment.
P5J Length	Number of palindromic nucleotides 5' of the J gene alignment.
Productive	True if the V(D)J sequence is predicted to be productive.
Quality	The Sanger/Phred quality scores for assessment of sequence quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.)
Quality Alignment	Sanger/Phred quality scores for assessment of sequence_alignment quality. Phred quality scores from 0 to 93 are encoded using ASCII 33 to 126 (Used by Illumina from v1.8.)
Read Count	Copy number or number of duplicate observations for the query sequence. For example, the number of identical reads observed for this sequence.	123
Rearrangement ID	Identifier for the Rearrangement object. May be identical to sequence_id, but will usually be a universally unique record locator for database applications.
Rearrangement Set ID	Identifier for grouping Rearrangement objects.
Repertoire ID	Identifier to the associated repertoire in study metadata.
Rev Comp	True if the alignment is on the opposite strand (reverse complemented) with respect to the query sequence. If True then all output data, such as alignment coordinates and sequences, are based on the reverse complement of 'sequence'.
Sample Processing ID	Identifier to the sample processing object in the repertoire metadata for this rearrangement. If the repertoire has a single sample then this field may be empty or missing. If the repertoire has multiple samples then this field may be empty or missing if the sample cannot be differentiated or the relationship is not maintained by the data processing.
Sequence	The query nucleotide sequence. Usually, this is the unmodified input sequence, which may be reverse complemented if necessary. In some cases, this field may contain consensus sequences or other types of collapsed input sequences if these steps are performed prior to alignment.
Sequence AA	Amino acid translation of the query nucleotide sequence.
Sequence Alignment	Aligned portion of query sequence, including any indel corrections or numbering spacers, such as IMGT-gaps. Typically, this will include only the V(D)J region, but that is not a requirement.
Sequence Alignment AA	Amino acid translation of the aligned query sequence.
Sequence ID	Unique query sequence identifier for the Rearrangement. Most often this will be the input sequence header or a substring thereof, but may also be a custom identifier defined by the tool in cases where query sequences have been combined in some fashion prior to alignment. When downloaded from an AIRR Data Commons repository, this will usually be a universally unique record locator for linking with other objects in the AIRR Data Model.
Stop Codon	True if the aligned sequence contains a stop codon.
Umi Count	Number of distinct UMIs represented by this sequence. For example, the total number of UMIs that contribute to the contig assembly for the query sequence.
V Alignment End	End position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
V Alignment Start	Start position of the V gene alignment in both the sequence_alignment and germline_alignment fields (1-based closed interval).
V Cigar	CIGAR string for the V gene alignment.
V Frameshift	True if the V gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the V gene reference sequence.
V Gene With Allele	V gene with allele. If referring to a known reference sequence in a database the relevant gene/allele nomenclature should be followed (e.g., IGHV4-59*01 if using IMGT/GENE-DB).	IGHV4-59*01
V Germline Alignment	Aligned V gene germline sequence spanning the same region as the v_sequence_alignment field and including the same set of corrections and spacers (if any).
V Germline Alignment AA	Amino acid translation of the v_germline_alignment field.
V Germline End	Alignment end position in the V gene reference sequence (1-based closed interval).
V Germline Start	Alignment start position in the V gene reference sequence (1-based closed interval).
V IDentity	Fractional identity for the V gene alignment.
V Score	Alignment score for the V gene.
V Sequence Alignment	Aligned portion of query sequence assigned to the V gene, including any indel corrections or numbering spacers.
V Sequence Alignment AA	Amino acid translation of the v_sequence_alignment field.
V Sequence End	End position of the V gene in the query sequence (1-based closed interval).
V Sequence Start	Start position of the V gene in the query sequence (1-based closed interval).
V Support	V gene alignment E-value, p-value, likelihood, probability or other similar measure of support for the V gene assignment as defined by the alignment tool.
Vj In Frame	True if the V and J gene alignments are in-frame.