[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
VMs: deeper investigation on truncated repeating seqs.
Hi,
I investigated the Marke Fincher's idea on an excerpt from
the holy bible (~ 190000 chars) and found many sequences like
the ones found by MF on the VM.
I used a very tight algorithm (I think much tighter
than the one used by MF):
1) be N the length of the "starting" string
2) for each UNIQUE sequence SEQ in the VM of length N
(if it is not unique we should study it in the N+1 case
as starting string)
2.1) replace it with @@@@ in the VM
2.2) set HOLECOUNTER = 0
2.3) set M = N - 1
2.4) set SEQLEFT = LEFT(SEQ,M) (leftmost M chars of SEQ)
2.5) count occurrences of SEQLEFT in the VM
and replace them with @@@@
If no occurrency is found HOLECOUNTER = HOLECOUNTER + 1
2.4) set SEQRIGHT = RIGHT(SEQ,M) (rightmost M chars of SEQ)
2.6) count occurrences of SEQRIGHT in the VM
and replace them with @@@@
If no occurrency is found HOLECOUNTER = HOLECOUNTER + 1
2.7) if HOLECOUNTER >= MAXHOLECOUNTER then check another sequence
(restore the VM without @@@ and go to step 2)
2.8) set M = M - 1 and goto step 2.4 until M > 0
2.9) print SEQ and its subsequences found in the VM
(if we arrive at this point SEQ is "good")
N.B. HOLECOUNTER is a counter for the subsequences
that miss. MAXHOLECOUNTER is the max number
of missing sequences allowed (of course for
shorter values of N, MAXHOLECOUNTER should be
smaller)
Algorithm flow on an example string and N = 4;
nelYZmezzoXYZWdel minXdi nWosZtra XY vitaYZ mi XYZ rovaiXYZperZW
- analyzing sequence XYZW
nelYZmezzo@@@@del minXdiX nWosZtra XY vitaYZ mi XYZ rovaiXYZperZW
- search for XYZ and YZW
nelYZmezzo@@@@del minXdiX nWosZtra XY vitaYZ mi @@@ rovai@@@perZW
found 2 XYZ
found 0 YZW -> HOLECOUNT = 1
- search for XY and ZW
nelYZmezzo@@@@del minXdiX nWosZtra @@ vitaYZ mi @@@ rovai@@@per@@
found 1 XY
found 1 ZW
- searcg for X and W
nelYZmezzo@@@@del min@di@ n@osZtra @@ vitaYZ mi @@@ rovai@@@per@@
found 2 X
found 1 W
Output:
X 2
XY 1
XYZ 2
XYZW
ZW 1
W 1
For starting len N= 8..12 "tons" of such sequences can be
found. For greater values of N these sequences are (naturally)
rarer (Marchov chains).
Marke, please post a descritpion of the algorithm
you used in your investigation on "truncated repeating seqs".
Marzio De Biasi
P.S.
These are the results for sequences of length 16 (MAXHOLECOUNTER=8).
. (31465)
.Manasseh.sounds (01074)
.MEans.scarlet.o (00134)
.ME.isaac.means. (00273)
.ME.Have.their.y (00002)
.ME.HEre.he.said (00005)
.ME.HE.left.his. (00004)
.ME.HE.Searched. (00001)
.ME.HE.SAID.god. (00003)
.ME.HE.SAID.TO.m (00001)
.ME.HE.SAID.TO.H
ivE.HE.SAID.TO.H (00001)
ack.HE.SAID.TO.H (00008)
em.sHE.SAID.TO.H (00003)
try.wE.SAID.TO.H (00002)
joseph.SAID.TO.H (00065)
is.handmAID.TO.H (00001)
hich.we.dID.TO.H (00002)
as.gathereD.TO.H (00033)
.they.spoke.TO.H (00101)
e.entered.inTO.H (00007)
.the.angel.whO.H (00039)
ounds.like.the.H (02838)
.the.place.of.tH (09180)
M.sounds.like.th (03018)
MEans.scarlet.or (00354)
ME.isaac.means.h (00676)
ME.Home.they.bro (00006)
ME.HEre.he.said. (00006)
ME.HE.left.his.g (00005)
ME.HE.Searched.b (00001)
ME.HE.SAID.god.b (00005)
ME.HE.SAID.TO.me (00001)
ME.HE.SAID.TO.HI
vE.HE.SAID.TO.HI (00001)
ck.HE.SAID.TO.HI (00008)
n.sHE.SAID.TO.HI (00001)
ry.wE.SAID.TO.HI (00001)
oseph.SAID.TO.HI (00055)
s.handmAID.TO.HI (00001)
ich.we.dID.TO.HI (00002)
s.gathereD.TO.HI (00025)
they.spoke.TO.HI (00081)
.entered.inTO.HI (00007)
e.house.to.dO.HI (00001)
they.embalmed.HI (00905)
y.he.add.terapHI (00656)
the.dead.sheol.I (07028)
. (33197)
.Were.household. (01618)
.WIll.surely.vis (00600)
.WITness.heap.in (00008)
.WITHout.number. (00018)
.WITH.inheritanc (00226)
.WITH.Her.about. (00014)
.WITH.HIs.face.t (00018)
.WITH.HIM.both.c (00043)
.WITH.HIM.To.bur (00001)
.WITH.HIM.THat.w (00002)
.WITH.HIM.THEn.j (00003)
.WITH.HIM.THE.re (00001)
.WITH.HIM.THE.SO
re.to.HIM.THE.SO (00002)
an.husHIM.THE.SO (00001)
d.leummIM.THE.SO (00001)
d.shecheM.THE.SO (00003)
of.machir.THE.SO (00094)
rother.and.HE.SO (00002)
nes.from.herE.SO (00025)
orget.ephraim.SO (00331)
y.have.been.asSO (00111)
l.is.the.place.O (10343)
Of.the.dead. (10424)
OTes.after.god.t (00176)
OTHs.el.elohe.is (00048)
OTHEs.and.every. (00005)
OTHERs.i.am.dyin (00070)
OTHER.will.be.gr (00123)
OTHER.Saying.wha (00010)
OTHER.S.wife.tha (00017)
OTHER.S.Speech.s (00001)
OTHER.S.SONs.bow (00001)
OTHER.S.SON.who. (00001)
OTHER.S.SON.All. (00001)
OTHER.S.SON.AND.
.esau.S.SON.AND. (00002)
hem.hiS.SON.AND. (00002)
.bore.a.SON.AND. (00012)
ken.veniSON.AND. (00003)
ouch.simeON.AND. (00013)
d.of.canaaN.AND. (00098)
e.who.lives.AND. (01623)
f.my.right.hAND. (00289)
which.is.beyoND. (00157)
ace.of.the.deaD. (02810)
he.place.of.the. (29713)
A few results from N = 12 MAXHOLECOUNT=4
Ad. (10160)
ANasseh.soun (00938)
ANDed.before (00094)
AND.sheol.is (01910)
AND.All.that (00089)
AND.ANah.thi (00002)
AND.AND.laid (00023)
AND.AND.Had. (00003)
AND.AND.HE.W
uND.AND.HE.W (00001)
elD.AND.HE.W (00002)
him.AND.HE.W (00010)
he.lAND.HE.W (00002)
s.fouND.HE.W (00001)
.beholD.HE.W (00007)
evening.HE.W (00051)
and.of.tHE.W (00158)
of.the.onE.W (00297)
leed.means.W (01906)
ebrew.for.tW (01056)
Dead. (02089)
D. (04917)
D.Fell.down. (00101)
D.FOur.parts (00010)
D.FORty.seve (00011)
D.FOR.am.i.i (00050)
D.FOR.He.did (00002)
D.FOR.HIm.fo (00001)
D.FOR.HIS.fa (00003)
D.FOR.HIS.Wo (00001)
D.FOR.HIS.WI
h.FOR.HIS.WI (00001)
an.OR.HIS.WI (00001)
bekah.HIS.WI (00038)
oats.tHIS.WI (00001)
rother.IS.WI (00015)
eed.meanS.WI (00102)
ssociated.WI (00777)
brew.for.tWI (00009)
dead.sheol.I (07842)
W.for.twice. (02547)
WIce.fruitfu (00610)
WITness.heap (00008)
WITHout.numb (00017)
WITH.inherit (00271)
WITH.Tambour (00001)
WITH.THat.sa (00002)
WITH.THEir.f (00014)
WITH.THE.fie (00015)
WITH.THE.Exa (00003)
WITH.THE.EAs (00003)
WITH.THE.EAR
onTH.THE.EAR (00002)
nisH.THE.EAR (00001)
k.in.THE.EAR (00103)
ham.rosE.EAR (00006)
y.lord.s.EAR (00013)
red.ten.yEAR (00268)
h.means.scAR (00679)
for.twice.fR (06783)
Is.the.place (07889)
ITful.sheol. (00617)
ITHer.way.or (00032)
ITH.inherita (00271)
ITH.Tambouri (00001)
ITH.THat.sam (00002)
ITH.THEir.fl (00014)
ITH.THE.fiel (00016)
ITH.THE.Exac (00003)
ITH.THE.EAst (00003)
ITH.THE.EART
nTH.THE.EART (00002)
isH.THE.EART (00001)
.of.THE.EART (00102)
e.wholE.EART (00004)
ed.with.EART (00005)
pt.his.hEART (00013)
l.not.depART (00016)
hold.propeRT (00142)
e.place.of.T (10509)
______________________________________________________________________
To unsubscribe, send mail to majordomo@xxxxxxxxxxx with a body saying:
unsubscribe vms-list