from struct import pack,unpack from random import randint # convert 16 dna chars into signed 32 bit integer def bincode(D): if D=='A': return 0 if D=='C': return 1 if D=='G': return 2 if D=='T': return 3 #bincode # better to use hashtable/dictionary def pack16(S): # S is a string of 16 DNA values if len(S)!=16: raise Exception("string not 16 chars long") V = 0 # value first = bincode(S[0]) if first>=2: first = -4 + first # G, T are negatives if first V = first # sets last (rightmost) bits of V: i = 1 while i<16: V = V * 4 # shift left 2 bits V = V + bincode(S[i]) i += 1 # while return V #pack16 v = pack16("TAACGTTACATACATC") print v v2 = pack16("CAACGTTACATACATC") DNA = "ACGT" def unpack16(V): # take V as 32bit signed int, return 16-char DNA String D = "" i = 0 while i<16: c = DNA[V%4] D = c + D V = V / 4 # sign-extended shift right i += 1 #while return D # unpack16 print unpack16(v) print unpack16(v2) def writednabin(D,filename): fd = open(filename,"wb") # open(filename,"rb") lend = pack(">i",len(D)) fd.write(lend) # z = fd.read(4), then unpack z, y, = unpack(">i",z) i = 0 while i<=len(D)-16: v = pack16(D[i:i+16]) vp = pack(">i",v) fd.write(vp) i += 16 # while rem = len(D) % 16 if rem>0: r = D[len(D)-rem:] + ('A'*(16-rem)) # pad with dummies v = pack16(r) vp = pack(">i",v) fd.write(vp) # if remainder exists fd.close() #writednabin # s[3:5] gives substring from s[3] to s[4],, 5-3 is length of substring DNA = "ACGT" # generate and return random DNA string of length n def gendna(n): s = "" while n>0: s = s + DNA[ randint(0,3) ] n -= 1 #while return s #gendna D = gendna(130) writednabin(D,"test.dna") print D