Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Genetic Algorithm for Solving Substitution Ciphers using Trigram Frequency - Prof. David P, Study Guides, Projects, Research of Computer Science

Pace University Computer Science

Prof. David P. Benjamin

A project for creating a program that uses the genetic algorithm to solve substitution ciphers by analyzing trigram frequency in english text. The program assumes a master file of at least 10,000 words and a cipher text of at least one paragraph. The algorithm involves an initial population of 30 sets of alphabets, selection based on fitness value, crossover, mutation, and fitness measurement. The objective is to translate the cipher text to plain text with 85% accuracy.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/09/2009

koofers-user-jlc 🇺🇸

10 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

Artificial Intelligence Project

CS 627 - Fall 2002

Sawipa Sakulchareon

364-15-6686

SURVIVAL OF THE FITTEST AND SIMPLE SUBSTITUTION CIPHERS

INTRODUCTION

The basic algorithm of the substitution cipher can be attacked by many ways , however

this program focuses on solving the cipher by using Darwin's theory of Natural Selection

from which the genetic algorithm is derived.

What is the substitution cipher?

In this cipher, each letter is replaced by another letter, leaving spaces and punctuation

unchanged.

For example,

"PCQ VMJIPD LHIK LISE KHAHJAWAV HAV ZCIPE EIPD KHAHJIUAJ

KHJEE KCPK."

EFIRCDME, LAREK IJCS LHE LHCMKAPV APV CPE PIDHLK

(cipher)

Mow during this time Shahrazad had borne king Shahriyar three sons.

Epilogue, Tales from the Thousand and One Nights

(plain text)

Why is the program using the Genetic Algorithm?

The Genetic Algorithm is easy to apply to a wide range of problem. There is a similarity

between a set of English alphabet and a set of DNA gene. In English Alphabets there

are a discrete number of 26 alphabets. In the real DNA, there are 4 alphabets which are

AGTC. It is obvious that the Genetic Algorithm may be the first method that comes in

mind when dealing with ciphers. The result can be very good even it is not optimal. The

objective is to be able to translate the cipher to plain text, with only 85% of correctness

of results. We can guess the rest of the cipher by using our human intelligence to

deduct the incorrect replacing of the alphabets. It proves that what is good for nature, it

may be good for AI.

SCOPE AND OBJECTIVE

The program is intended for solving ciphers in English by using the frequency of

trigrams in Standard English. The program assumes there are 26 alphabets and throws

away space and punctuations. This program may works in other languages if the same

restrictions apply. It needs 2 input files for constructing the trigrams' frequency statistics.

One of the input is the master file. It should be at least 10000 words in length. The other

is the cipher text with the length of at least one paragraph.

Partial preview of the text

Download Genetic Algorithm for Solving Substitution Ciphers using Trigram Frequency - Prof. David P and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Artificial Intelligence Project

CS 627 - Fall 2002

Sawipa Sakulchareon

SURVIVAL OF THE FITTEST AND SIMPLE SUBSTITUTION CIPHERS

INTRODUCTION

The basic algorithm of the substitution cipher can be attacked by many ways , however this program focuses on solving the cipher by using Darwin's theory of Natural Selection from which the genetic algorithm is derived. What is the substitution cipher? In this cipher, each letter is replaced by another letter, leaving spaces and punctuation unchanged. For example,

"PCQ VMJIPD LHIK LISE KHAHJAWAV HAV ZCIPE EIPD KHAHJIUAJ

KHJEE KCPK."

EFIRCDME, LAREK IJCS LHE LHCMKAPV APV CPE PIDHLK

(cipher)

Mow during this time Shahrazad had borne king Shahriyar three sons. Epilogue, Tales from the Thousand and One Nights (plain text) Why is the program using the Genetic Algorithm? The Genetic Algorithm is easy to apply to a wide range of problem. There is a similarity between a set of English alphabet and a set of DNA gene. In English Alphabets there are a discrete number of 26 alphabets. In the real DNA, there are 4 alphabets which are AGTC. It is obvious that the Genetic Algorithm may be the first method that comes in mind when dealing with ciphers. The result can be very good even it is not optimal. The objective is to be able to translate the cipher to plain text, with only 85% of correctness of results. We can guess the rest of the cipher by using our human intelligence to deduct the incorrect replacing of the alphabets. It proves that what is good for nature, it may be good for AI.

SCOPE AND OBJECTIVE

The program is intended for solving ciphers in English by using the frequency of trigrams in Standard English. The program assumes there are 26 alphabets and throws away space and punctuations. This program may works in other languages if the same restrictions apply. It needs 2 input files for constructing the trigrams' frequency statistics. One of the input is the master file. It should be at least 10000 words in length. The other is the cipher text with the length of at least one paragraph.

What is trigram frequency? In Standard English, not every 3 letter occurs the same number of time. For example, "the" and "and" are the most frequent 3 letter words in English. By measuring the consecutively occurrences of 3 letters, the program can create statistics of the trigram frequency. THE COMPLETE ALGORITHM OF THE PROGRAM Initial Population The number population in the program is 30. The number of the population does affect the performance of the program to a certain degree, but the bigger doesn't mean it is better. Many books said that the good population size is in the range of 20-50. Population consists of 30 sets of alphabets called " Genome ." Each genome has 26 alphabets that are randomly placed to represent other alphabets. No repeat of the same alphabet allows in Genome. Selection The selection process starts from randomly selecting a number in the range of 0-1 and randomly selecting a pair of Genomes from the current population. Look up the fitness value of each genome. If the value exceed the random number, the genome is selected to be a parent of new offspring, or else it is thrown away and the process will select another genome. Since the higher the fitness value, the better chance of passing the condition above. The genome of the high fitness value will produce more offspring than the genome of the low fitness value. Cross Over After 2 parents are selected, another random number is created. This time it is in the range of 0-25. The number is cross over point of when the two parents are exchanging their alphabets. The offspring will replicate the first parent's set of alphabets up to the crossover point and the rest of the set will have the alphabets from the second parent which haven't been copied from the first parent. Mutation Every 5 generations, a pair of alphabets in each genome of that generation is swapped to create the mutation effect. The swap pair is randomly selected. Fitness The fitness of each genome is measured by the gap of comparing the frequency distribution of each trigram after translating alphabets on the genome to cipher text and standard trigram frequency stats on the master file. The bigger the gap, the lower the fitness value. This process is a little complicated. It starts from open the master file and record the frequency of all trigrams and save the frequency numbers on the multi dimensional array. Repeat the process on the cipher text.

Genetic Algorithm for Solving Substitution Ciphers using Trigram Frequency - Prof. David P, Study Guides, Projects, Research of Computer Science

Related documents

Partial preview of the text

Download Genetic Algorithm for Solving Substitution Ciphers using Trigram Frequency - Prof. David P and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Artificial Intelligence Project

CS 627 - Fall 2002

Sawipa Sakulchareon

SURVIVAL OF THE FITTEST AND SIMPLE SUBSTITUTION CIPHERS

INTRODUCTION

"PCQ VMJIPD LHIK LISE KHAHJAWAV HAV ZCIPE EIPD KHAHJIUAJ

KHJEE KCPK."

EFIRCDME, LAREK IJCS LHE LHCMKAPV APV CPE PIDHLK

(cipher)

SCOPE AND OBJECTIVE