|
Re: Seven Lessons on Scalability from Reddit
|
Posted: May 21, 2010 7:43 PM
|
|
Wolfgang,
- I don't like design by committee. There are very few good designers in the world, and putting many mediocre designers on the same team does not make up for it. More often than not, each designer comes from a different division in corporate and is seeking to protect their own feifdom, making them behave even dumber than they actually are. That's unfortunate, since mediocre designers are precisely the ones who cannot afford political wars. Great designers, in my experience, simply find ways to ignore the rules and deal with the consequences later.
- I am a statistician by training, and approach problems mathematically, so being "so numbers" is a good thing. Math wins.
I also don't like writing stored procedure code, user defined functions or other SQL caca. I am not sure what gives you the impression I think that is a good idea. I don't really know what kind of code you like to write. It's evidently not Lisp, not Java, and its likely not SQL or any XML language like XQuery or XSLT. It's maybe Python, without threads?
Most of our software is driven by a relational algebra model; SQL is merely the backend target for a model compiler. You should be able to sink and hoist SQL concepts like UDFs and SQL calculated columns, stored procedures, parameterized queries, etc. You should also have tools that automatically tell you how long a query is taking, from the perspective of the application server. You should then be able to orthogonally create optimizations using a copy-on-write peephole optimizer that rollback optimizations when the specific query has changed. None of this stuff is done today by most, primarily because how our software systems premature or "early binding". Having an open-schema is really just a hack around the fact that ORM-centric systems like LINQ require early binding or hard to maintain text strings. We don't require our developers to use either of those baroque tools, since they are both proven ineffective; every developer who can feel pain hates them, just as you seem to. Just as you want eventual consistency in your data, I want code to be eventually consistent with an online data dictionary, and to partially fail if the online data dictionary is not consistent with the code. Stuff like read-repair supported by CassandraDB is really just a form of partial failure; look at it that way! [Theory+practice note: In programming language theory terms, this means that your type system is more easily rooted in a Curry-style effect system (where the meaning of a term doesn't depend on type), rather than a Church-style effect system (where the effects depend on the type), and the overhead of keeping track of all those types would be a big headache.] If you TRIED building software our way using traditional ORM-centric toolsets, then you would die from typing too much. Do you have any idea how much memory an AppDomain in .NET takes up when you naively load 100,000 different DAO/DTO classes? In an unmodified JVM? (The same question comes up when you have many prototypes in a prototype-based language, and want a linear performance curve and not have to microcode leaf prototypes). ORM-centric tools just aren't built for extremely late-bound faceted materialization. I once watched a BOOST tutorial on the C++ BOOST database interface tool, and everyone on the BOOST mailing list basically agreed there was not much need for metaprogramming in an ORM. When everyone is in agreement on something, it is an opportune time to dissent. Who knows, dissenting might be right.
As for using the JVM vs. Python: Python is sort of stupid and ugly (I hate it), and a much nicer and funner to use system is Erlang. Erlang has a lot of cool features like on-the-fly compression for compact distributed communications, thereby minimizing linking overheads. Serialization in CLR and JVM is way more convoluted. Likewise, Erlang's notion of processes are way more lighter than, say, CLR's notion of AppDomain, which is in some ways lighter than the JVM comparable features.
Morgan,
You need a reliable build process. Without a reliable build process, you cannot increase quality. Your concerns over how to increase performance while the schema is constantly shifting is noted. In any RDBMS, constantly screwing up indexes will alter the statistics catalog and influence the Cost Based Optimizer, so you may not be getting true statistics, ever.
If you are limited to one prototype machine shared among many developers, and have no way to setup a real test environment, then that stinks. If you are dependent on a live copy of "valuable data", then that stinks too. I can't imagine why valuable, rare data would be stored in a database constantly undergoing schema changes. Something sounds horribly wrong. I would recommend treating the build process itself as a problem, and modeling the problem domain using OOA, and breaking up the problem so that invariants are encoded into class responsibilities. This might seem obtuse, but it will really help you to see where your build process is going wrong. For example, Ruby on Rails uses Active Record precisely because they treat every database table as an object. Logical changes to the schema are THEREFORE driven by logical changes to the object model, which will ONLY happen if the specification changes the OO Analysis, because it will require class collaborations to be redefined, thus changing all data invariants imposed in the RDBMS used strictly to serialize the OO app state. Do you see what I am saying? I am showing you how the problem here must be methodological, otherwise it makes little sense for the schema to constantly be changing among many developers. I could make the same argument for a Data Mapper where the object model is automagically refreshed when the schema changes.
|
|