Matt's Blog: while { coding }
    Back to Matt's Homepage   |   Hire Matt

The One Where I Rant About Data

I’ll come right out and say this: Data is imaginary. It doesn’t exist. So let’s not pretend that it does.

Well, I guess we have to pretend that it does to some extent. But we certainly don’t have to pretend that it exists in some absolute, immutable way.

Take files, for example. What are files, really? If you think about this long and hard enough you realize no, files aren’t just chains of bytes. Files are flattened data structures. And more than that, they are flattened out ideas. But let’s stick with the data structure thing. Every document, every file, everything ever has some kind of structure.

Files haven’t always existed in computers. There was a time with computers and gasp, no files. The file metaphor appeared fairly early in computing. So why do we stick with that same metaphor now?

Files are crap. They’ve forced us to flatten out our data. That business plan that you’re writing in 15 parts, with three sub parts each? It really wants to be a tree with 15 nodes, each with 3 nodes. Or something. It most certainly does NOT want to be flattened down into a file, where every part of the file is on the same footing with every other part of the file. That just doesn’t make sense.

So here we are, as programmers, and we have all of this data and structure floating around in our programs, and we have to store things in a file. Blech! How are you going to store a graph (object graph, data tree, what have you) in a flat file and have that make any damned sense at all?

You’re not. No, do NOT say XML! That is not the answer either. Don’t get me wrong… XML heads in the right direction, but it’s still flattening things out terribly. And it’s also kind of stupid. To be fair that’s mostly an implementation detail.

I know, I know, you want me to explain why XML is stupid, because otherwise I’m just making spurious accusations, and programmers hate that shit. Or at least pretend to, so they can poke at other programmers when they disagree with them. Whatever. I hold these truths to be self-evident, that files are fucking bullshit and that cramming data into flat files is a god damned waste of time. And yes, I mean “god damned”. Even if you don’t believe in a God or a god or an FSM or what have you, it’s clear that we only have so much time on this spinning ball of mud and how much of it will we spend bruising, battering, and smearing out our data to fit into some ridiculous flat file system?

Thank god for relational databases, right?

Fuck you. Have you listened to a god damned word I’ve been saying? I’m talking about taking our data, packing it up into ridiculous shapes and sizes, spackling on meta data to make it all make sense, and then shoving it off to some other place to live. Does that in any way make you think that relational databases are getting off the hook here? Fuck-to-the-N-O. What’s that? I have to mash my data up into chunks that fit into little square tables? And then if I want my data to connect to other data I have to create more little meta tables to show how everything connects? And it’s all ok, because there’s MATH behind it all, right? God damned MATH will save the fucking day.

No.

Maybe you’re catching on now. Now you’re thinking about those nifty key/value databases that have come back around lately, aren’t you? And the document stores? The hierarchical databases? Surely something here will save the day, right? Rubbish. But maybe. I don’t know.

Here’s what I know: I have data, and I want to look at my data my way. You have data, and some of it is probably some of the same data that I have. And that’s cool… I remember kindergarten. I like to share. But why in the hell do I have to share with you on YOUR terms? Don’t get me wrong, I don’t see why you should have to share with me on my terms either. Things should be flexible and bendy and still make sense. Is that too much to ask?

Probably, yes.