6. Provide detailed sequence alignment information
Sequence alignment is instrumental for many modelling applications. Full
details of the alignments must be included. These include used method,
version, substitution table and program parameters. Notes like “default
parameters” are not sufficient as they may differ between program
installations and versions. Multiple sequence alignments can provide
more reliable results than pairwise analyses when working with less
conserved sequences. If any manual interventions have been made, they
must be described and justified.
Follow guidelines for describing sequence alignments (Vihinen, 2020).
Include database identifiers for sequences.
Example: The multiple sequence alignment of TEC family members
included entries P51813 for BMX, LRG_128 for BTK, Q08881-1 for ITK,
P42680-1 for TEC, and P42681-1 for TXK. The alignment was performed on
Clustal Omega program (Clustal O(1.2.4)) (Sievers et al., 2011) and run
at https://www.ebi.ac.uk/Tools/msa/clustalo/. The used
substitution matrix was of Gonnet et al. (Gonnet et al., 1994). The
program parameters were: Output guide tree, false; Output distance
matrix, false; Dealign input sequences, false; mBed-like clustering
guide tree, true; mBed-like clustering iteration, true; Number of
iterations, 0; Maximum guide tree iterations, -1; Maximum HMM
iterations, -1; Output alignment format, clustal_num; Output order,
aligned; Sequence type, protein.
The insertion between residues 34 and 38 was manually adjusted so that
there was just one gap instead of two provided by the program. The
alignment covers 96.4-100% of the sequence lengths. The multiple
sequence alignment is in Supplementary material.