Missing nucleotides in entropy output .tsv file

Hi, when I select for Dengue the whole genome, and export the genome diversity data (entropy) to a .tsv file, I have missing nucleotides in the output file. In my TSV file I only have 7947 rows for the genome that has 10649 nucleotides. I read the Missing sequence data (gaps, indels, ambiguity) — Nextstrain documentation but I don’t see a clear description for this case.

Hi @Keith,

I’m assuming you are looking at the build at auspice and using the “Download data” button at the bottom to download the genetic diversity data. If so, the missing positions are likely positions with no genomic variability, as documented in the download data docs:

Note that no data will be produced for positions where no genomic variability is observed in the dataset, or for any sites which may have been masked during the analysis and are therefore not in the data which the visualisation uses.

Best,
Jover