Monday, July 6, 2009

Classic: Application Developers vs. Database Developers

One of my favorite "articles" of all time. I love these types of conversations, in the DB or the Middle Tier? For the vast majority of us, the Database will do just fine. As I've learned more about the data grids and the like, there are trade-offs, which aren't often discussed. One way or another you lose data (say you decide to only UPDATE one time instead of 60). Originally posted on February 20, 2008. Enjoy.

It started innocently enough with this article. I sent it out to about 20 colleagues.

The best line from the article:
"Jerry: "Yeah, databases cause lots of headaches. They crash all the time, corrupt data, etc. Using text files is better."

One of my more recently arrived colleagues (I'll call him Mr. M) replied to everyone with this statement:

"Kind of funny actually, databases are less and less important at the large investment banks, where they basically load everything up into a data grid across a several hundred node cluster. Writing to the db is way too slow."

This started a day long exchange of emails. What follows is the entire thread (up until my last post tonight).

Me:
"I would just argue that they don’t necessarily know how to write to databases. I would however love to see benchmarking done on both methods. Would be an interesting test..."

Mr. M:
"Well, my understanding is they just can’t scale out the db enough. Even something like Oracle RAC won’t work. And outside of the military, these are probably the top 1% of programmers in the world building this stuff."

Me:
"A benchmark would be the only way I would believe it.

If you said the top 1% of database developers tried it and failed, I would be more likely to agree.

My experience is that application developers != database developers. Different type of thinking involved."

Mr. M:
"'A benchmark would be the only way I would believe it.'

Do you need a benchmark before you would believe in-memory retrieval is faster than disk retrieval? Essentially, this is what we’re talking about.

'If you said the top 1% of database developers tried it and failed, I would be more likely to agree. My experience is that application developers != database developers. Different type of thinking involved.'

Why? It’s an issue to do with application performance not simply database performance. Database concerns are a subset of application concerns, essentially a specialization, requiring less encompassing knowledge. ;)

From the article you linked to (http://www.watersonline.com/public/showPage.html?page=432587)

"Better data management is the answer, says Lewis Foti, manager of high-performance computing and grid at The Royal Bank of Scotland (RBS) global banking and markets. "For very large compute arrays, the key issue is data starvation and saturation. This problem requires data grids with high bandwidth and scalable, parallel access,
...
Banks are learning that data management in a distributed grid environment is very different from online transaction processing. "With so many data sources, distribution channels, demands for aggregation and analytics, surges in data volumes and complex dynamics between the flows, we need to manage 'data in motion' and give up the notion that data is somehow stored. It's dynamic, not static," says Michael Di Stefano, vice president and architect for financial services at GemStone Systems
...
There is even some debate over how small a unit of work can be put on today's grids. Di Stefano at GemStone, for example, says, "One client has gone from 200 trades per second in a program trading application to more than 6,000 trades per second. This shows what the technology can do."

Yep, the writing is on the wall. Oracle knows it too.

http://www.google.com/search?hl=en&q=oracle+buys+tangosol&btnG=Google+Search"

Me:
"Good points. If it is in-memory it would be faster. I have not had the pleasure to work on such a system.

I do disagree with the database concerns being a subset of application concerns. The data drives the app. We’re probably getting religious at this point (or am I)."

Mr. M:
"‘The data drives the app.”

Exactly, but who’s to say where the data comes from or in what format? My application data may reside completely in xml files, or maybe I get it from some third party web services a la the en vogue “mashup.” Heck, I may not even need to worry about a database anymore…. http://www.amazon.com/gp/browse.html?node=16427261 The database is only one particular concern of the overall application. And it’s the application that matters. Data is useless if it just sits on a disk somewhere. It’s the ways in which the application lets the users view and manipulate the data that adds value to the business.

Yep, definitely a different type of thinking between application developers and database developers."

Me:
"Definitely religious now.

Applications come and go, data stays the same. Think Green Screens, EJBs, Ruby…what’s next?"

Mr. M:
"'Applications come and go'

Exactly. Businesses are not static, nor are the markets they compete in. Changing applications are a function of changing business processes and changing markets.

'data stays the same.'

Nonsense. Otherwise UPDATE would not be an SQL reserved word. If you mean database technology stays the same, well, I’m more inclined to agree with that.

'Think Green Screens, EJBs, Ruby...what’s next?'

Whatever comes along to let the business more effectively respond to current market realities. Application platforms have evolved much faster than database platforms have. They’ve had to, their sphere of operation is much broader than that of databases, this is only natural, they deal with much broader concerns than do databases. Databases in the internet era function in essentially the same role they did in the era of dumb terminals. Clearly application platforms have evolved orders of magnitude more. Hence the statement, database concerns are a subset of application concerns.

Here’s a simple test….if I take some business application and I’m forced to throw away one or the other, either the database or the appl- wait a second, it doesn’t even make sense to finish it, does it? The business can live without the database. I could do all kinds of things with the data, I could stick it anywhere. The business can’t live without the application though. Another way to look at is, what do the business users look at, test, approve, and use? The database? Of course not, they look at the application. They could care less whether the data sits on disk in an RDBMS, xml, or flat files."

Me:
"We obviously violently disagree.

Without the database (and I use database and data interchangebly), the business could no longer function. The app is meaningless. How would you contact your customer? You couldn’t find it.

'Exactly. Businesses are not static, nor are the markets they compete in. Changing applications are a function of changing business processes and changing markets.'

Poorly designed applications…that is all."

A Feisty Colleague:
"Using data and database interchangeably is incorrect. A database is a mechanism for data storage. XML data sets and flat files are mechanisms for data storage, too. So is a file cabinet, because, the data doesn’t have to be electronic, it could be … gasp! … on paper, and the application to use that data would be hands for holding the paper and a pencil to update and add data to the page."

Me:
"No it isn’t. I take into account xml files, flat files, web services (but not paper, unless it’s scanned) and all that. It would be consumed by the database and then accessed by the application via SQL.

(that’s for Mr. M and the feisty one)"

At which point someone forwarded the home page for Oracle's TimesTen In-Memory Database.

Me:
"A database on/in the mid-tier...Perfect!"

Mr. M:
"Implicit acknowledgment that disk IO operations that come with traditional database access simply can’t match the performance of in-memory data access (a point which you previously were unconvinced of but now seem perfectly accepting of the idea once you see it’s got Oracle’s imprimatur on it).

Of course, why any application developer would want to program against an SQL interface if they weren’t forced to is beyond me. It is orthogonal to the programming model of most application platform languages.

Surely Oracle recognize this fact too or they wouldn’t be buying Tangosol and other data grid technologies. Of course, most of those products are far more technically advanced than TimesTen or anything Oracle has in that space.

Incidentally, it’s illustrative to note that Coherence and other products like it were for the most part designed and built by application programmers. The development of all these products is pretty much driven by the needs of the large investment banks on Wall Street. These trading applications simply had too many concurrent transactions to use an RDBMS (a problem quite a number of public domains now share, most famously google.com, nope, no RDBMS there, yet miraculously there is still data). The database just simply would not scale to such a degree. So the application developers, by necessity, came up with an alternate solution that did work, a fully transactional cache of data replicated across a cluster with node numbers in the thousands, and no relational model whatsoever to speak of. A perfect example of how database concerns are only one, sometimes small, concern amongst many that application developers must be aware of and ready to solve."

Me:
"Like you said initially, the top 1%.

Many of us will never touch a system like this.

I will certainly concede that it is faster (still would love to see benchmarking though), but that still leaves 99% of the applications out there that do not require that kind of performance."

Me (again):
"And don’t forget, I use data and database interchangeably. Applications are nothing without the data right?

As to the object/relational impedance mismatch...well, more people that don’t know how to work in sets. Looping is what they understand. I understand the application side more than you seem to give me credit for.

I’m not saying applications aren’t important, they are. Data (databases) and applications go hand in hand. If the application went away though, they could still access their data via SELECT statements (yes, via an application client tool), however painful that may be. Applications make retrieving data that much easier for our users.

If anyone wants to unsubscribe from this mailing list, just let us know. This is fun for me (I’m guessing Mr. M too)."

Needless to say it was a fun day. It didn't get [too] personal. More than anything I'm happy to have an equally passionate colleague.

Besides, he claims he was just fracking around with me. ;)

16 comments:

  1. One of these days someone will have to explain to all these "programmers" that

    PROGRAMMER != DESIGNER

    and that without correct data design NO data store whatsoever will be stable, scalable, yaddayadda.

    Until then, we'll just have to yawn at the ignorance demonstrated by these self-appointed experts...

    Back in the 70s when databases started to be used the "for" argument was precisely to stop the proliferation of ad-hoc development on flat files.

    Whereby every programmer out there decreed how to store data and someone else was left with the horrendous task of making it all integrate into a coherent and valid whole that could accurately represent the real enterprise data situation.

    We are now at the same juncture.

    These idiots have been left running the show by even more incompetent managers, and now someone will have to figure out how to integrate all these ad-hoc, out of control data stores.

    Nothing like letting them create the market!

    ReplyDelete
  2. @noons

    I definitely can't speak to the history of the RDBMS, but I can't argue much with your argument.

    I still have a problem, conceptually, of how to use, let alone integrate all of these different solutions. I even read somewhere recently of a NoSQL conference.

    I do believe that there is a fundamental lack of knowledge on how to properly use a database. Even database developers I've worked with don't seem to know how to properly use them. So it seems natural that the programmer types would gravitate towards something "easier."

    The object model? It's still relational isn't it? One record (parent) to many (children). I see that it gets flattened in a returned query, i.e. NAME = Chet Justice would be repeated 1 or more times...nevermind...I think it's too late for me to be thinking about this stuff.

    I need to hang out with more people like you so I don't get all worried about my future. ;)

    chet

    ReplyDelete
  3. Again, the problem I have with database folks is that many of them have a limited perspective. They don't really have a view of the entire application, only one particular aspect of it, namely, persistence. And really, when you think about it, this is a concern that really should simply be abstracted away from the developer. Meaning, the fact that a given bit of data in my application needs to live beyond the browser closing or the server rebooting, really is a case for metaprogramming. It should be declarative. I should simply be able to flag something as being persistent, and not have to think about it any further. In the Java world we're not quite there, but we're getting closer.

    Databases are on the way out. About another 10 years and they'll be thought of much as mainframes have been the last few years.

    ReplyDelete
  4. @mcohen01

    perhaps you are right in that many database developers have a limited perspective.

    i would still argue that the data is the most important thing and it needs to be reasonably structured.

    i won't disagree with the need in certain situations (you mentioned financial services) to do things differently, but there is a price to pay for that which is often not discussed.

    ReplyDelete
  5. ...and don't even get me started on closure, cardinality, and relation calculus! ;-)

    ReplyDelete
  6. "i would still argue that the data is the most important thing and it needs to be reasonably structured."

    and you would be wrong. sorry, no offense, but this perspective of data holding primacy of position is simply wrong. i'll try to give you an example that maybe will illustrate for you why it's a mistake to place so much focus on data (and, with you guys, by extension, databases).

    twitter might be the best example. the data is worthless. who gives a sh_t if i'm loving my morning cocoa puffs? or whatever the f_ck people "tweet" about....

    "it's the application, stupid!"

    that data only becomes "valuable" or meaningful because of the context in which it has been presented by the application. the application dictates whether data is valuable or not. data itself is rarely intrinsically valuable.

    ReplyDelete
  7. @jbl

    you know I won't argue math with you.

    ReplyDelete
  8. I generally agree with mcohen01, data qua data is not meaningful. Perhaps database developers would do well in taking a course on Information Theory?

    ReplyDelete
  9. @mcohen01

    Twitter is a great example for your argument. The data is relatively unimportant...for the time being. The data mining possibilities of all the unstructured data though.

    Back to my argument, which hasn't been explicitly stated, is data in the enterprise. In the enterprise the application comes and goes (it started with SQL*Plus, to EJB, to .NET now on to RoR). Where would you rather spend your time? The data just needs to be "skinned," the app becomes the presentation of it...nothing more.

    ReplyDelete
  10. @chet "Where would you rather spend your time?"
    That generally depends on your job title. ;-)

    "...the app becomes the presentation of it...nothing more."
    I'd like to add: "..is one way of doing it."

    Your ideal job title should be "Systems Architect" Chet! :-D

    ReplyDelete
  11. "Back to my argument, which hasn't been explicitly stated, is data in the enterprise."

    the problem here chet is that when you say "data" what you really mean is "the database," and more specifically, "the oracle database." and both of those are waning in their influence. think about it, if i've got a WAN and a data grid in my enterprise, what do i even need a database for? serialize it all to disk once a day if you really can't sleep at night.

    dude, databases are quickly becoming dinosaur technology. sorry, i know you don't want to hear that, but that's the case. invest in yourself, learn new things, and then you won't have all your eggs in larry's basket.


    "In the enterprise the application comes and goes (it started with SQL*Plus, to EJB, to .NET now on to RoR)."

    and this should tell you something. the business value is realized not from the data sitting there in whatever medium or format or whatever storage device, the business value is realized by applications that do something for the business.

    "it's the application, stupid!"

    how do you not see this?


    "Where would you rather spend your time? The data just needs to be "skinned," the app becomes the presentation of it...nothing more."

    skinned? nothing more? but dude, this "skinning" as you call it, this is everything. the business doesn't give a sh_t about data, it cares about application functionality.

    let's try this another way.....if we fire all the dbas tomorrow, can the app devs muddle through the database stuff, or find some other way of persisting the data, say, in xml files? if we fire all the app devs tomorrow, can the dbas build the app?

    ReplyDelete
  12. "it's the application, stupid!"

    how do you not see this?

    dude, how did you get started in this field? maybe you typed in lines of BASIC to make a turtle move across the screen? me, i wanted to make tutorials for students to work through, Computer Assisted Instruction, as it's known. in other words, i wanted to do something on the screen. i needed to implement some bit of functionality. it didn't really matter to me whether the tutorial data and student responses were stored in a relational database, in xml, in text files, wherever. it didn't matter. sure, a relational model makes things easier, but really dude, i would have preferred to just be able to tag/annotate something as being "persistent." i would have rather just been able to say, "save this bit of data here, i don't care how you do it, just save it, so i can look it up later possibly." persistence is well on the way to being a meta concern in application programming. you database guys are so caught up in the minutiae, the low level plumbing, of this one small aspect of application development, i can only chalk it up to the fact that you're so heavily invested personally in the technology that anything that threatens it is by definition anathema to you. there's a simple solution to your problem though....

    ReplyDelete
  13. @mcohen01

    re: how did i get started

    I didn't have as grand plans as you. I started as a secretary essentially, doing data entry for some 1000 odd letters we sent out each month.

    Hmmm...how can I do this better/faster/easier?

    Access. (yikes)

    And so it began. I was fortunate enough to get a job (from a friend) in an Oracle shop...and the rest is history.

    re: anathema

    I would disagree, mostly. I'm not married to Oracle. I've earned a pretty darn good living off of it. I could and will adapt. Your view of DBAs as the stodgy old coots might not be too far off...but they have also been at the center of everything (enterpise speaking) for the longest time. They've had to go in an fix problems...clean data...merge data...all kinds of craziness. Sometimes created by App Dev people, sometimes created by DB Dev people. They are responsible for maintaining the system thus are the ones who get yelled at or blamed most often.

    I'll say it again though, for 99.9% of the apps out there, a database (Oracle) will do just fine. PL/SQL (4GL) is very robust. Application Express makes creating the presentation layer ever so easy.

    Does that mean I don't want to see what the new stuff is all about? Absolutely not. But data is very important to a business. They need to know the address of the customer who just purchased a widget. They need to know how much money was spent in the last month on Widget 1, 2 and 3. The data is critical to the business. And I'm pretty sure you can ask many DBAs or Database types how they ran their first reports and they'll tell you SQL*Plus (the first application to access the data).

    ReplyDelete
  14. Look at the human body. Muscles and a skeleton are both essential, working together. The skeleton is a bit more persistent, but it becomes pretty unimportant once the muscles stop moving it around.
    There are animals that move around without a skeleton. There are apps that work without data persistance (mostly realtime stuff). There are apps that persist data other than in an RDBMS.

    Equally there's bucket loads of data without a specific application. That's what Excel is all about, a generic app to handle a variety of data. And thats what a database is, the first layer of a generic data handling app. [Though the storage admins may say that RAID is the first layer.]

    In computing, we can guarantee everything will change. But enterprises will have data they want preserved. They'll have someone who's job it is to do that, just as they have systems and storage administrators today. [Okay, maybe it will be outsourced to the cloud.]

    But for the apps side, they'll probably be buying that from Oracle and SAP and doing very little of their own development. So for job security, you're best off sticking with the data.

    ReplyDelete
  15. "that data only becomes "valuable" or meaningful because of the context in which it has been presented by the application. the application dictates whether data is valuable or not. data itself is rarely intrinsically valuable."


    That's a good one. Care to pass that by the finance folks charged with allocating budgets to insane "developers"?

    It would last just about 5 seconds: as long as it takes to read that pile of crap.

    Once again, the totaly irresponsible nature of so-called "developers' shines through.

    These people are nothing more than expensive quacks who need to be eradicated from IT.

    I'll tell you who is going to be out of a job in 10 years: all these twitter idiots. Without excepion.

    ReplyDelete
  16. Information without context is in fact meaningless. Here's an example: "yellow".

    The architectural pattern which provides the context is irrelevant. If you want to argue with that go right ahead.

    ReplyDelete