The Artima Developer Community
Sponsored Link

Weblogs Forum
Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Science

40 replies on 3 pages. Most recent reply: Dec 17, 2005 6:41 PM by Gregg Wonderly

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 40 replies on 3 pages [ 1 2 3 | » ]
Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Science (View in Weblogs)
Posted: Nov 23, 2005 7:27 PM
Reply to this message Reply
Summary
From Wikipedia: "Hash tables are not persistent data structures, in the sense that there is no simple space-efficient way to update a hash table while retaining access to the previous copy of the hash table.". This is false, and Wikipedia should know why.
Advertisement
Wikipedia and I have a hate-hate relationship. What drives me nuts about it, is the sheer idea that an anarchic shoutocracy (whoever shouts the loudest is right) can be trusted to provide the reader with anything resembling fact. Truthfully, you shouldn't trust anything you read on the internet (including this).

Enough bitching, and now on to something useful. The indexable stack data structure I "invented" recently, actually was already described by a fellow named Phil Hagwell under the rather bizarrely named "VList". Ironically this data structure is described at http://en.wikipedia.org/wiki/VList! Anyway Phil's original paper is here in PDF format. I posted this article at CodeProject.com which briefly explains why this data structure provide constant indexing most of the time.

What is relevant about this structure is that you can use it to create a persistent hash-table, despite the entirely stupid claim in the Wikipedia entry.


Derek Parnell

Posts: 22
Nickname: derekp
Registered: Feb, 2005

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Science Posted: Nov 23, 2005 9:06 PM
Reply to this message Reply
How about you stop whinging and start fixing. If you don't think that the Wikipedia entry is correct then fix it, sheesh!

I just had a go at 'correcting' that entry, so why not have another look at it an improve upon my edition of the entry.

http://en.wikipedia.org/wiki/Hash_tables

Wikipedia is not a 'shoutocracy'. Just because somebody is wrong or misinformed does not make everything in Wikipedia wrong or misinformed. It appears that you are prone to exaggeration.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Science Posted: Nov 24, 2005 2:15 AM
Reply to this message Reply
> How about you stop whinging and start fixing. If you don't
> think that the Wikipedia entry is correct then fix it,
> sheesh!

So that someone else can come along and change it? No thanks. I have better things to do with my time. Besides, my criticizing Wikipedia makes it better because it motivates people to fix it, just to point out that I am an ass.

> I just had a go at 'correcting' that entry, so why not
> have another look at it an improve upon my edition of the
> entry.
>
> http://en.wikipedia.org/wiki/Hash_tables

From the latest entry:

"Hash tables are not, generally speaking, persistent data structures, in the sense that there is no simple space-efficient way to update a hash table while retaining access to the previous copy of the hash table."

Being persistent is not a property of hash tables. It is a property of certain implementations of hash table. The property of being simple or space-efficient does not dictate whether a structure is persistent. Furthermore the conclusion is based on false pretenses: there are simple and space-efficient ways to update a hash table.

"However, it is possible to implement a hash table using a VList, making it persistent but influencing the performance cost."

The statement about inflencing the performance cost is baseless.

The entire bit about persistence should be stricken from the hash table entry.

> Wikipedia is not a 'shoutocracy'. Just because somebody is
> wrong or misinformed does not make everything in Wikipedia
> wrong or misinformed. It appears that you are prone to
> exaggeration.

I am not suggesting that everyone is wrong or misinformed, or even the majority. Much of the mundane and trivial knowledge on wikipedia is very helpful and correct.

However, when there is disagreement, especially with regards to poorly understood theoretical ideas, a debate ensues and the most persistent viewpoint perseveres. This is why I call it a shoutocracy.

Zeljko Vrba

Posts: 10
Nickname: zvrba
Registered: Aug, 2005

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Science Posted: Nov 24, 2005 3:37 AM
Reply to this message Reply
>
> Wikipedia and I have a hate-hate relationship. What drives
>
so if you've seen something blatantly wrong, why didn't you edit the wiki page and do a public service, instead of bitching around here? it probably wouldn't take you much more time than writing this post...

Andrew Cunningham

Posts: 2
Nickname: acunningha
Registered: Nov, 2005

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Science Posted: Nov 24, 2005 4:03 AM
Reply to this message Reply
It's fair enough to say to someone "fix it" if they know its a problem, but what about the people that don't? As far as I know that information was 100% correct because, assumedly, I went to a Wikipedia to learn something I didn't originally know.

Now I have never edited a Wikipedia entry, partly because I'm not a great writer and partly because of lazyness, so perhaps Wikipedia does implement the following suggestion: why doesn't Wikipedia visually tag information as unconfirmed until someone comes along and confirms it. That way, people that know nothing can take it with a grain of salt, and experts will be able to come along, easily spot unconfirmed but correct information and confirm it.

Derek Parnell

Posts: 22
Nickname: derekp
Registered: Feb, 2005

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Science Posted: Nov 24, 2005 4:33 AM
Reply to this message Reply
I view Wikipedia as always in draft mode. Nothing is confirmed and so if I need to rely on its information I verify it externally first. In fact, I'd do this with nearly any source too. Wikipedia is one of many resources, and nearly always contains accurate information. However if one has to trust the information gleaned from it, have it confirmed independantly from multiple other sources.

To improve Wikipedia, one must be confident enough to be exposed as incorrectly informed in public. In other words, if you are not going to be worried by somebody 'correcting' your words, then one should actively improve Wikipedia articles.

It appears that Mr D. treasures his words to much to have them exposed to public editing.

I now have incorporated Mr D.'s improvements to the Hash Table entry, though he misses out on the public kudos for it.

Michael Feathers

Posts: 448
Nickname: mfeathers
Registered: Jul, 2003

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 24, 2005 8:50 AM
Reply to this message Reply
Whenever I hear people complain about Wikipedia's accuracy, it reminds me of the way that print journalists used to complain about television news, and the way television news people used to complain about direct newsfeed internet sites like The Drudge Report.

The fear is that without editorial control and research there is nothing to guarantee the veracity of the information. Well, I think they are right, but it's only dangerous if people accept information uncritically. Some will, but the nice thing is that more people are becoming critical readers these days, and I think it has to do with the fact that there are more untrusted sources out there. Wikipedia? I take what I read with a grain of salt, and frankly, I think that people should take what traditional encyclopedias state with a grain of salt also. Editorial control is never perfect. If anything, people in the past have been too trusting of trusted sources, in my opinion.

Critical reading is definitely a "use it or lose it" skill.

Michael Feathers

Posts: 448
Nickname: mfeathers
Registered: Jul, 2003

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 24, 2005 10:57 AM
Reply to this message Reply
You know.. as I think about it, I think the original entry was right. Data structures are families of implementations, not specifications. If we tried to define a hash table as any data structure which allows O(1) insertion and O(n) search, a simple array with a hash function f x -> x could be seen as an integer hash table. But those aren't called hash tables, they're called direct address tables. The difference is the implementation.

I don't think I'd call hashing on a Vlist a hash table. It's a great idea, but I'd give it another name. If the world sees Red/Black trees and Splay trees as distinct, I see no reason why these two should be lumped together.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 24, 2005 11:20 AM
Reply to this message Reply
> You know.. as I think about it, I think the original entry
> was right. Data structures are families of
> implementations, not specifications.

In order to say anything meaningful about a data structure, we have to generalize over the infinite set of possible implementations. The only way to is constrain it using a specification. The original entry was flawed because it made incorrect assumptions about possible implementations, because it was making conclusions about properties of a theoretical notion based on a limited set of known implementations.

> If we tried to
> define a hash table as any data structure which allows
> O(1) insertion and O(n) search, a simple array with a hash
> function f x -> x could be seen as an integer hash table.
> But those aren't called hash tables, they're called
> d direct address tables. The difference is the
> implementation.

The difference is also specification. The direct address table is a refinement of the generalized notion of a hash table. I don't see how anyone could provide a useful definition of hash list which explicitly excluded direct address tables. Even if you did, you would then need to come up with a name for all data structures with the properties of hash-tables.

> I don't think I'd call hashing on a Vlist a hash table.
> It's a great idea, but I'd give it another name.

Why? A hash table built usingh a Vlist, is still a hash table. A hash table is widely understood to be any data structure which provides average case O(1) access of elements using a key and a hash function, with a worst case of O(N).

> If the
> e world sees Red/Black trees and Splay trees as distinct,
> I see no reason why these two should be lumped together.

But they are both under the umbrella of binary search trees.

Michael Feathers

Posts: 448
Nickname: mfeathers
Registered: Jul, 2003

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 24, 2005 11:51 AM
Reply to this message Reply
> > You know.. as I think about it, I think the original
> entry
> > was right. Data structures are families of
> > implementations, not specifications.
>
> In order to say anything meaningful about a data
> structure, we have to generalize over the infinite set of
> possible implementations. The only way to is constrain it
> using a specification. The original entry was flawed
> because it made incorrect assumptions about possible
> implementations, because it was making conclusions about
> properties of a theoretical notion based on a limited set
> of known implementations.

Not at all. We can give examples. As far as I know, there is no widely accepted cross-language formal specification for data structures. Most textbooks that I've seen talk about data structures generally but reserve analysis for specific cases, like open-chaining on a table for instance.

> > If we tried to
> > define a hash table as any data structure which allows
> > O(1) insertion and O(n) search, a simple array with a
> hash
> > function f x -> x could be seen as an integer hash
> table.
> > But those aren't called hash tables, they're called
> > d direct address tables. The difference is the
> > implementation.
>
> The difference is also specification. The direct address
> table is a refinement of the generalized notion of a hash
> table. I don't see how anyone could provide a useful
> definition of hash list which explicitly excluded direct
> address tables. Even if you did, you would then need to
> come up with a name for all data structures with the
> properties of hash-tables.

We could, it if was useful. I suspect the notion of maps was a generalization along the same lines. The thing is, the Vlist you're talking about isn't even a table. Isn't it odd to call two data structures the same if they have a completely different structure?

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 24, 2005 12:23 PM
Reply to this message Reply
> I suspect the notion of maps
> was a generalization along the same lines. The thing is,
> the Vlist you're talking about isn't even a table. Isn't
> it odd to call two data structures the same if they have a
> completely different structure?

Not at all. A hash-table is an abstract data type. The structure and implementation is irrelevant, as long as it provides the basic operations expected of a hash-table, and with the basic complexity guarantees.

If I am mistaken, I would appreciate it if someone could point me to a computer science journal which indicates anything to the contrary.

Michael Feathers

Posts: 448
Nickname: mfeathers
Registered: Jul, 2003

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 24, 2005 12:39 PM
Reply to this message Reply
> Not at all. A hash-table is an abstract data type.
> The structure and implementation is irrelevant, as long as
> it provides the basic operations expected of a hash-table,
> and with the basic complexity guarantees.
>
> If I am mistaken, I would appreciate it if someone could
> point me to a computer science journal which indicates
> anything to the contrary.

I hear you, but I think you're the one proposing something new. Can you give three definitions of a hash table from different sources that describe the same behavior but don't appeal to an implementation?

I have three data structures/algorithms books on my shelf. The Rivest book, Mark Weiss's, and another. None of them give a formal definition of a hash table, all give examples of implementation and analysis for the those particular implementations.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 24, 2005 10:21 PM
Reply to this message Reply
> I hear you, but I think you're the one proposing something
> new. Can you give three definitions of a hash table from
> different sources that describe the same behavior but
> don't appeal to an implementation?

Nope.

> I have three data structures/algorithms books on my shelf.
> The Rivest book, Mark Weiss's, and another. None of them
> m give a formal definition of a hash table, all give
> examples of implementation and analysis for the those
> particular implementations.

I am alone in the cold then. I guess some day I should stop talking about it and actually formalize this stuff in that book I've always wanted to write. Maybe that way programmers won't have these kinds of debates anymore. If I ever actually get around to writing an introductury computer science book will you promise to review it? Your input would be phenomenal. I might actually be able to put something half decent together with a little help.

Terje Slettebø

Posts: 205
Nickname: tslettebo
Registered: Jun, 2004

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 25, 2005 9:04 AM
Reply to this message Reply
> > I hear you, but I think you're the one proposing
> something
> > new. Can you give three definitions of a hash table
> from
> > different sources that describe the same behavior but
> > don't appeal to an implementation?
>
> Nope.
>
> > I have three data structures/algorithms books on my
> shelf.
> > The Rivest book, Mark Weiss's, and another. None of
> them
> > m give a formal definition of a hash table, all give
> > examples of implementation and analysis for the those
> > particular implementations.
>
> I am alone in the cold then. I guess some day I should
> stop talking about it and actually formalize this stuff in
> that book I've always wanted to write. Maybe that way
> programmers won't have these kinds of debates anymore.

I'm with Michael on this one (and was about to reply the same, until I saw it had already been said): All the definitions I've seen (including doing a search now), points out that the "hash" in "hash table"/"hash map" comes from it being the name of an implementation method or data structure, that works by applying a mathematical function (a hash function) to the item being searched for, to locate it. A "map", on the other hand, is more general, and for example std::map in C++ is typically implemented as a (red/black) tree, as a hash map would hardly satisfy the complexity requirements (such as the map being sorted).

It's similar to a linked list: You could hardly call a data structure a linked list, if it didn't mean you had nodes linking to the next one. The names describe the data structure, not (just) the operations you may perform with it.

You can, however, say that your type (whatever the data structure is called, such as VList) performs such and such compared to a hash map data structure. The abstract _operation_ is key/value mapping; the _implementation_ may be many things, with varying space/time performance characteristics, all satisfying the requirement, such as arrays (for numerical indexes, only), trees, hash tables, etc.

Regards,

Terje

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Persistent Hash-Tables and Why Wikipedia Still Sucks at Computer Scienc Posted: Nov 25, 2005 9:38 AM
Reply to this message Reply
> I'm with Michael on this one (and was about to reply the
> same, until I saw it had already been said): All the
> definitions I've seen (including doing a search now),
> points out that the "hash" in "hash table"/"hash map"
> comes from it being the name of an implementation method
> or data structure, that works by applying a mathematical
> function (a hash function) to the item being searched for,
> to locate it.

I can live with that. So a hash-map is a map ADT with O(1) complexity element insertion and element access by key on average, which by definition requires the usage of a non-trivial hash-function in its implementation. There is however nothing said about the underlying representation of the data type other than it is a table. Using a VList (read as indexable stack) as the table doesn't change anything AFAICT, except the fact that the hash function can grow, without invalidating pointers to specific items in the tree.

> It's similar to a linked list: You could hardly call a data structure a linked list, if it didn't mean you had nodes linking to the next one. The names describe the data structure, not (just) the operations you may perform with it.

Yes I do agree: some data structures do have specific requirements on the implementation implied by their name. Such as hash-map, linked-list, splay-tree, red-black tree.

I consider all of these as examples of semi-abstract data structures. Some implementation details are well defined, but always in terms of abstract notions: like nodes, or tables, or pointers, or arrays, etc.

Flat View: This topic has 40 replies on 3 pages [ 1  2  3 | » ]
Topic: Interfaces or Abstract Base Classes? Previous Topic   Next Topic Topic: Pointers and References and Temporaries


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us