6. Provide detailed sequence alignment information
Sequence alignment is instrumental for many modelling applications. Full details of the alignments must be included. These include used method, version, substitution table and program parameters. Notes like “default parameters” are not sufficient as they may differ between program installations and versions. Multiple sequence alignments can provide more reliable results than pairwise analyses when working with less conserved sequences. If any manual interventions have been made, they must be described and justified.
Follow guidelines for describing sequence alignments (Vihinen, 2020). Include database identifiers for sequences.
Example: The multiple sequence alignment of TEC family members included entries P51813 for BMX, LRG_128 for BTK, Q08881-1 for ITK, P42680-1 for TEC, and P42681-1 for TXK. The alignment was performed on Clustal Omega program (Clustal O(1.2.4)) (Sievers et al., 2011) and run at https://www.ebi.ac.uk/Tools/msa/clustalo/. The used substitution matrix was of Gonnet et al. (Gonnet et al., 1994). The program parameters were: Output guide tree, false; Output distance matrix, false; Dealign input sequences, false; mBed-like clustering guide tree, true; mBed-like clustering iteration, true; Number of iterations, 0; Maximum guide tree iterations, -1; Maximum HMM iterations, -1; Output alignment format, clustal_num; Output order, aligned; Sequence type, protein.
The insertion between residues 34 and 38 was manually adjusted so that there was just one gap instead of two provided by the program. The alignment covers 96.4-100% of the sequence lengths. The multiple sequence alignment is in Supplementary material.