我有一个fasta文件。我需要删除含有“N”或不含至少3个独特碱基的序列。 目前为止的代码如下。另外,我将如何删除序列ID行,以便删除序列。perl来检查序列是否包含至少3个独特的碱基,如果不删除
#!/usr/bin/perl
use strict;
use warnings;
open FILE, '<', $ARGV[0] or die qq{Failed to open "$ARGV[1]" for input: $!\n};
open match_fh, ">$ARGV[0]_trimmed.fasta"
or die qq{Failed to open for output: $!\n};
while (my $line = <FILE>) {
chomp($line);
if ($line =~ m/^>/) {
print match_fh "$line\n";
my @data = http://cn.voidcc.com/question/split(/|/, $line);
my $nextline = ;
if ($nextline !~ /N+/g) {
if ($nextline =~ /[ATGC]{3}/g) {
}
print match_fh"$nextline";
}
}
}
close FILE;
close match_fh;
INPUT
>seq1
ATGCGGGATGATCCGAACGTTTAATCTCGTATGCCGTCTTCTATCTCNNN
>seq2
GATGAGCTTGACTCTAGTCCATCTCGTATGCCGTCTTCTGCTATCTCGTA
>seq3
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTC
>seq4
TGGTACTGTAAGCATGAGAGTAATCTCGTATGCCGTCTTCTGCTTGAAAA
OUTPUT
>seq2
GATGAGCTTGACTCTAGTCCATCTCGTATGCCGTCTTCTGCTATCTCGTA
>seq4
TGGTACTGTAAGCATGAGAGTAATCTCGTATGCCGTCTTCTGCTTGAAAA
cara
什么是你的代码错误?它是否提供错误或不正确的结果?如果是,哪个? –
它不会删除不少于3个唯一字符的行。 –
你能提供一个简短的样本输入文件和预期的输出文件吗? –
回答
while(my $head = <FILE>) {
next if($head !~ /^>/);
$_=<FILE>;
if(!/N+/ && /A/+/T/+/G/+/C/ >= 3) {
print match_fh $head, $_;
}
}
Mike