The Artima Developer Community
Sponsored Link

Java Answers Forum
File comparing algorithm

10 replies on 1 page. Most recent reply: Jun 24, 2008 4:44 AM by Vivek Ranjan

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 10 replies on 1 page
K. Csaba

Posts: 4
Nickname: aerensiani
Registered: Aug, 2005

File comparing algorithm Posted: Aug 30, 2005 12:22 AM
Reply to this message Reply
Advertisement
Hi everyone!

I would like to make an algorithm, which compares 2 files. It takes the first file as basic, and makes the comparison to the other. The file can be modfied like this:
1. parts get removed: the prog indicates, nothing special
2. parts get included: the prog highlights these parts(different color example), no spec either
3. parts get modified: case 1 and 2 together

The program should work for unlimited depth, the program should find the matching parts correctly even after longer distance. I think after a matching sentence we can assume matching(3rd case).

Can anyone help me with this problem of mine? I want to make this in JAVA, but i really need the theory not the code itself.

Thanks!

Aeren


Kondwani Mkandawire

Posts: 530
Nickname: spike
Registered: Aug, 2004

Re: File comparing algorithm Posted: Aug 30, 2005 1:52 AM
Reply to this message Reply
> Hi everyone!
>
> I would like to make an algorithm, which compares 2 files.

Use JNI call sizeof(file); from Native C method.

> It takes the first file as basic, and makes the comparison
> to the other. The file can be modfied like this:

> 1. parts get removed: the prog indicates, nothing special
> 2. parts get included: the prog highlights these

I think these are simple Java IO routines (Google
Java IO tutorial).

> parts(different color example), no spec either

I have no clue what this means.

> 3. parts get modified: case 1 and 2 together

No clue what this means either...

> The program should work for unlimited depth, the program
> should find the matching parts correctly even after longer

Simple brain teaser, toy around with it on paper in
pseudo code if you have a hard time.

> distance. I think after a matching sentence we can assume
> matching(3rd case).
>
> Can anyone help me with this problem of mine? I want to
> make this in JAVA, but i really need the theory not the
> code itself.

Hope that helps.

Spike

K. Csaba

Posts: 4
Nickname: aerensiani
Registered: Aug, 2005

Re: File comparing algorithm Posted: Aug 30, 2005 2:04 AM
Reply to this message Reply
Ok, i!ll try to explain again:

I want to make a program, that stores a file, then it compares that file to another(most likely a newer, modified file) and shows the changes made in the file. The way it shows is irrelevant. What i need is an algorithm that can make the comparison in a simple text file in an intelligent way(reading lines in won't do, if i insert a line everything will be different from then on), which works for ulimited length(won't make any difference if i insert 1 word or 10 pages) and it should find the remaining matching parts correctly(if the first word of a sentence matches, it doesn't mean it's the same sentence again). The algorithm is not that simple.
I think we can assume, that the original file is matching again after 1 correct sentence(same as in the new file).

Any ideas?

Kondwani Mkandawire

Posts: 530
Nickname: spike
Registered: Aug, 2004

Re: File comparing algorithm Posted: Aug 30, 2005 2:18 AM
Reply to this message Reply
My baad, I misunderstood. Sounds like AI Pattern
matching *might* do the trick. Having a knowledge
base is where I'd start from, but you're right it
is quite a complex algorithm. Probably would involve
individual word matching sequentially. I'll drop
a line when I can think of something.

K. Csaba

Posts: 4
Nickname: aerensiani
Registered: Aug, 2005

Re: File comparing algorithm Posted: Aug 30, 2005 4:12 AM
Reply to this message Reply
Have any ideas where to find such patter matching alg?
I found a program called Pariter, but it doesn!t really work even with small files :)
(It could be that it only works for supported file types)

Kondwani Mkandawire

Posts: 530
Nickname: spike
Registered: Aug, 2004

Re: File comparing algorithm Posted: Aug 31, 2005 11:40 PM
Reply to this message Reply
Just a Suggestion, use a database with a Table.
You may use Hibernate to define how these mappings
will be. But the idea is that you write your
original file to the DB. For each Word in a
Line, associate that word with and Integer - location.
Stick this in some sort of a HashMap. That
will be your orginal_file You can have a column called
name - dub this first entry -> original and
never change it.

The trick will come in here. the next time you
alter the file or bring in a second file for
comparison, re-invoke your create_Hash(originalFile)
in order to recreate you HashMap which has parameters

(word, (Integer)location).

Recreate a second HashMap which will be run in
comparison, checking if it contains such and such
a word and what the associated Integer values are
in both (their difference will indicate the
alteration). From their that's where the real
work will come in:

At what point do we consider the file drastically
changed (is it based on difference in HashMap
size, difference in the word locations of the
Integer)? The only problem with using a Hash
Map is that duplicates are an issue -> maybe use
a safer Collection class but with the same concept.

Just a thought

K. Csaba

Posts: 4
Nickname: aerensiani
Registered: Aug, 2005

Re: File comparing algorithm Posted: Sep 2, 2005 5:35 AM
Reply to this message Reply
Thanks for the tip!

I'm checking some promising algorithms right now, hope to find something useful. There are a lot of algorithms but somehow all of them are focused on lines. I really don't understand why don't they think of normal writings, where lines cannot be used?

Kondwani Mkandawire

Posts: 530
Nickname: spike
Registered: Aug, 2004

Re: File comparing algorithm Posted: Sep 2, 2005 6:18 AM
Reply to this message Reply
The suggestion I gave suggestion moving word by
word and assigning each word a value based on
its location (StringTokenize it using " ");
and eliminate punctionations after each
Tokenization. If you expand on it, it should
give you the desired effect.

malgosia b.

Posts: 1
Nickname: gosai
Registered: Dec, 2005

Re: File comparing algorithm Posted: Dec 15, 2005 2:30 AM
Reply to this message Reply
Hi!
I'm interesting in binary file comparision. Could you send mi what promising algorithms you have checked (links or titles)?
My e-mail gosia93 @ poczta.onet.pl (without spaces of course).
From your description i understand that you were looking unix diff algorithm. Although it is based on lines it is really good.
Thanks,
m.

krishna malladi

Posts: 1
Nickname: malladi
Registered: Jun, 2008

Re: File comparing algorithm Posted: Jun 19, 2008 4:26 AM
Reply to this message Reply
> Thanks for the tip!
>
> I'm checking some promising algorithms right now, hope to
> find something useful. There are a lot of algorithms but
> somehow all of them are focused on lines. I really don't
> understand why don't they think of normal writings, where
> lines cannot be used?

Could any one tell me the algorithms which are used to compare text files other than diff?
if possible please send me the online links where i can find them.....
Thnx in advance.

Vivek Ranjan

Posts: 1
Nickname: haste4sum1
Registered: Jun, 2008

Re: File comparing algorithm Posted: Jun 24, 2008 4:44 AM
Reply to this message Reply
> Thanks for the tip!
>
> I'm checking some promising algorithms right now, hope to
> find something useful. There are a lot of algorithms but
> somehow all of them are focused on lines. I really don't
> understand why don't they think of normal writings, where
> lines cannot be used?

well i am also stuck with the same problem,
i have to develop a robust software to compare two different versions of same/different files. i already developed a software which is comparing the file line by line but i in my next version of this software i need to trace the changed words also line by line, please help me with some solution(algo)

Flat View: This topic has 10 replies on 1 page
Topic: survey about refactoring Previous Topic   Next Topic Topic: Weird ClassCastException (Swing-related)

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use