我有一个snakemake工作流,它失败了,因为最后一个作业要么创建了两个输出文件,要么一个也没有创建。我尝试过使用检查点来解决这个问题,但我认为在尝试整理聚合函数中的输出文件时,我会使用通配符。
工作流(1)从biom社区配置文件创建fasta文件。然后对fasta文件运行in file (2),这将创建一个txt文件作为输出。
最后一步是解析器(3),它输出csv和fasta文件。但是,如果txt文件中没有匹配项(也就是insilico没有得到任何结果),那么它就不会创建csv或fasta文件。
SAMPLES, = glob_wildcards("input/metaphlan/{sample}.biom")
ID = "0 1 2 3 4".split()
TARGETS = expand("output/metaphlan/isPCR/final/{id}_mismatch_{sample}.fasta", sample = SAMPLES, id = ID)
rule all:
input:
TARGETS
rule getgenome:
input:
"input/metaphlan/{sample}.biom"
output:
csv="output/metaphlan/fasta_dump/{sample}.csv",
fas="output/metaphlan/fasta_dump/{sample}_dump.fasta"
conda:
"envs/synth_genome.yaml"
shell:
"python scripts/get_genomes_noabund_Snakemake.py {input} 1 {output.fas} {output.csv}"
rule PCR:
input:
"output/metaphlan/fasta_dump/{sample}_dump.fasta"
output:
"output/metaphlan/isPCR/raw/{id}_mismatch/{sample}.txt"
params:
id = "{id}"
shell:
"software/exonerate-2.2.0-x86_64/bin/ipcress --products --mismatch {params.id} scripts/primers-miseq.txt {input} > {output}"
rule parse:
input:
"output/metaphlan/isPCR/raw/{id}_mismatch/{sample}.txt"
output:
"output/metaphlan/isPCR/final/{id}_mismatch_{sample}.csv",
"output/metaphlan/isPCR/final/{id}_mismatch_{sample}.fasta"
shell:
"python scripts/iPCRess_parser_v2.py {input} {output}"
演练很好--没有错误。但是如果我做了正确的运行,snakemake会中止它,并告诉它作业执行失败:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 31 of snakeflow/Snakefile:
Missing files after 5 seconds:
output/metaphlan/isPCR/final/2_mismatch_metaphlan_rectal_SRR5907487.csv
output/metaphlan/isPCR/final/2_mismatch_metaphlan_rectal_SRR5907487.fasta
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
我知道我可以更改解析器脚本,只创建两个空文件,但我不想创建不必要的文件。我研究了dynamic,但这不适用于两个潜在的输出文件,所以我查看了checkpoint。据我所知,这应该可以帮助我解决这个问题。
下面是我使用检查点的尝试:
SAMPLES, = glob_wildcards("input/metaphlan/{sample}.biom")
ID = "0 1 2 3 4".split()
TARGETS = expand("output/metaphlan/isPCR/final/{id}_mismatch_{sample}n.txt", sample = SAMPLES, id = ID)
print(TARGETS)
rule all:
input:
TARGETS
rule getgenome:
input:
"input/metaphlan/{sample}.biom"
output:
csv="output/metaphlan/fasta_dump/{sample}.csv",
fas="output/metaphlan/fasta_dump/{sample}_dump.fasta"
conda:
"envs/synth_genome.yaml"
shell:
"python scripts/get_genomes_noabund_Snakemake.py {input} 1 {output.fas} {output.csv}"
rule PCR:
input:
"output/metaphlan/fasta_dump/{sample}_dump.fasta"
output:
"output/metaphlan/isPCR/raw/{id}_mismatch/{sample}.txt"
params:
id = "{id}"
shell:
"software/exonerate-2.2.0-x86_64/bin/ipcress --products --mismatch {params.id} scripts/primers-miseq.txt {input} > {output}"
checkpoint parse:
input:
"output/metaphlan/isPCR/raw/{id}_mismatch/{sample}.txt"
output:
"output/metaphlan/isPCR/final/{id}_mismatch_{sample}.csv",
"output/metaphlan/isPCR/final/{id}_mismatch_{sample}.fasta"
shell:
"python scripts/iPCRess_parser_v2.py {input} {output}"
def aggregate_input(wildcards):
checkpoint_output = checkpoints.parse.get(**wildcards).output[0,1]
return expand('output/metaphlan/isPCR/final/{id}_mismatch_{sample}.csv','output/metaphlan/isPCR/final/{id}_mismatch_{sample}.fasta', sample = wildcards.SAMPLES, id=wildcards.ID)
rule collect:
input:
aggregate_input
output:
"output/metaphlan/isPCR/final/{id}_mismatch_{sample}n.txt"
shell:
"cat {input} >> {output}"
和错误
<function aggregate_input at 0x7f63eade2158>
SyntaxError:
Input and output files have to be specified as strings or lists of strings.
File "snakeflow/Snakefile", line 52, in <module>
我相信这是因为我在聚合函数中使用通配符的方式有问题,但我无法弄清楚。我已经尝试了在检查点教程中找到的各种版本,但都没有用。
任何帮助都非常感谢,谢谢!
转载请注明出处:http://www.xinruixiangtm.com/article/20230526/2379021.html