Sponsored Link •
Bill Venners: What is the distinction of human readable and human understandable data, and why is that distinction important?
Dave Thomas: I can give you 128 bit cipher key as ASCII, and you can read it, but it may not make sense to you.
Andy Hunt: So it is readable, but not understandable.
Dave Thomas: I can give you the works of Shakespeare as a list of words sorted alphabetically. You could read it, but you couldn't make much sense of it.
Andy Hunt: The advantage of human understandable plain text is, suppose for historic reasons you've got a control file lying around, but there is no software still around that can understand it or do anything meaningful with it. You as a human may be able to read that file and understand enough to figure out whatever you're trying to extract from it. Or, suppose you've got some printouts from way back when sitting in a warehouse. You need to get some ancient piece of account information or figure out an algorithm from an old Cobol program. If you have printouts and nothing left that can possibly even read them, you can still read them yourself and extract some information.
Dave Thomas: Cobol provides a good example, I think. The Cobol fixed length record has data in columns. You can print it out and actually see the columns of data lined up. One step better than that is CSV, comma separated variables, because in CSV you can put in a header that tells you what's in each column.
Bill Venners: So the CSV header, which basically lists comma separated column names, is an example of self-describing data.
Dave Thomas: It's a very simple example of self-describing data. And you can import CSV into just about any program.
Bill Venners: So the advantage of self-describing data is that in the absence of the manual, in situations where I have to look at some data and figure it out, the metadata will help. The metadata isn't the whole manual. Maybe it's just words like "Customer ID," which can help me figure out what the columns are about.
Andy Hunt: Metadata helps us express our intent. In this case suppose the original programs are missing, and all you've got is this CSV file. The column names give you a hint. You might be able to figure this out from staring at raw data in the Cobol dump: "Well those look like customer IDs, or maybe those are zip codes, I'm not real sure." Metadata gives you that one level of added security, OK, the people who wrote this originally called this "Customer ID." Now I've got a hint to go on. I know what they meant by that. I can understand this. It all boils down to communication. You are communicating intent.