PFB2017_problemsets/python_script_8.py at master · nickgladman/PFB2017_problemsets · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#! /usr/bin/env python3

import sys
import re

##Python 8 problem sets

#1.0 Take a mulit-FASTA Python_08.fasta file from user input and calculate the nucleotide composition for each sequence. Use a datastructure to keep count. Print out each sequence name and its compostion in this format
# need to go through the FASTA file line at a time, using the first line ID (header)
# as the dictionary key for nucleotide value. In turn the nucleotide value also functions as
#the dictionary name for the nucleotide content subdictionary. Each A T G C within the nucleotide
#content dictionary functions as a separate key, to which the value is the occurrence of each
#nucleotide within the sequence.
#fasta[gene_name][nt] = {A:, # T:, # G: #, C: #}

#construct empty dictionary dataframe
#populate each dictionary with proper key-values
#count nucleotides and add into lowest dictionary level


fasta = {} #highest level dictionary


for line in sys.argv[1]:
	line = line.split()
	if r"(^>.*\s)" not in line:
		group(1)= fasta[]

print(fasta)


#for line in sys.argv[1]:
#        line = line.split()
#        if r"(^>.*\s)" in line:
#                line = fasta[line]
#
#print(fasta)