October 07, 1997
By Mike Romano
Microsoft touts Windows NT as the corporate network of the future. So how does the company's internal NT network fare?
Why Windows NT is, at best, a work in progress. |
Conventional wisdom (not to mention the wisdom of a few unconventional thinkers) says small computers replace big ones, distributed networks supplant Big Iron buried inside glass houses, and Microsoft and Intel will eventually take over the world from the mainframe guys (such as IBM, Digital Equipment Corp. and Sun Microsystems).
Who are we to doubt such a trend? Corporate marketing master Microsoft has gone to great lengths to convince the world that networks of PCs can indeed scale, from its "Scalability Day"--the Windows NT dog-and-pony show in May--to the daily operations at its own headquarters, where the company hums to the tune of its network of Windows NT-based servers.
Top executives usher customers into Microsoft's data center, where rooms packed with whirring boxes and blinking lights supposedly demonstrate incontrovertibly NT's "scalable" enterprise capability. After all, if Microsoft runs on it, NT must be good enough, right?
This approach to corporate derring-do is called "eating your own dog food." Remember "Bonanza" star Lorne Greene, who claimed in TV commercials that he fed Alpo to his own dogs? Same principle.
But eating this particular, still half-baked dog food can also cause network indigestion. Behind the scenes at Scalability Day, glitches showed that Windows NT may not be quite as ready for dinnertime as the company would like us to believe. Off the guided tour in Redmond, internal memos obtained by UPSIDE, as well as interviews with roughly a dozen rank-and-file Microsoft employees, reveal a system plagued by serious flaws.
That's not to say that NT won't emerge as the victor in the end. Microsoft's willingness to prove its mettle by forcing new products down the throats of its own employees first is admirable. It's undoubtedly the fastest way to turn Microsoft software into true industrial-strength products. But a glimpse behind the curtain provides a valuable reality check into just how far the wizards at Microsoft still have to go.
"Network Difficulty"
On Scalability Day (May 20, 1997) Microsoft executives in Manhattan serve up Windows NT garnished with "benchmark" tests that prove it is not a loser. An ebullient Bill Gates is onstage at the Equitable Center, joined by Robert Barnes, Microsoft's Program Manager, plus an ordinary automated teller machine. Trailer trucks parked on 51st Street are rigged with satellite dishes to beam the presentation worldwide. Microsoft NetShow is streaming live video coverage in 10 languages over the Web.
At the click of a mouse, 45 servers stationed in an adjacent art gallery (which Microsoft has converted into a $5 million temporary data center) will begin churning at the rate of 1 billion transactions per day. Click. Nothing happens. An onstage counter should be reeling off 7,000 simulated bank transactions per second. What's wrong? "We had a little bit of network difficulty," Barnes explains after taking a break to show a 10-minute video that was standing by for just such an emergency. "We had to actually restart." As in reboot the computers.
Later, in a curious bit of public relations, Microsoft explained the Scalability Day glitch like this: "Oddly, for some unknown reason the primary domain controller hung for a few seconds. There was a hardware reason causing the delay." (Systems experts from the hardware maker, Compaq Computer Corp., could not be reached for comment.) For confirmation, IBM independently tested NT servers the next day, yielding similar results.
After the reboot, the system performs as advertised. According to Gates, the simulations exceed transaction demands of the New York Stock Exchange, Visa and AT&T combined. "Well, that's fantastic," he cheers. "It's a pretty impressive system to see all those disk drives chugging along, doing the different transactions. You really sort of have to feel it to experience the full power."
Back at the Ranch
But, hey, demos happen. The real test is back at company headquarters, where Microsoft's internal network hosts 37,000 internal e-mail accounts and runs 2.5 million daily messages on 1,500 NT servers. Another 14 NT servers support Microsoft's corporate Web page, which handles 80 million hits per day, 3,000 concurrent users per server and more than 160 database-driven applications.
Vice President John Connors, who runs the Information Technology Group (ITG), which is responsible for maintaining Microsoft's internal system, regularly leads prospective NT customers on tours of Building 11, the company's data center. "It makes them feel good to see those NT servers, talk to operations people and get a gut-level feel," he says.
Microsoft's employees' guts, however, don't feel so good. These workers grouse about a continually sluggish system that regularly interrupts both internal and external e-mail service and can make connecting to system resources impossible. According to one product manager, e-mail in her department has been disrupted "a couple days every week for the past month." She says e-mail can take three hours to travel down the hallway. "E-mail is basically a nightmare for everybody these days," sighs another employee. "The [e-mail] Exchange server--that's seriously up and down."
Even independent contractors complain about delays in reaching Microsoft by e-mail: "If the phone company did it, they wouldn't get away with that kind of bullshit," says one. According to Microsoft employees, the company's corporate network suffers less from significant system crashes than from perpetual lags and frequent nuisances. "[Network problems] seem to be recurring," says a Microsoft Network (MSN) programmer. "It's pretty much continually slow." For example, the programmer says receiving a response from a server not two feet from his desk can take 40 seconds. "It affects everything in terms of getting to shared servers," he says. "Say you're loading or installing a program from a network server, it's a real pain in the butt, it's really bad."
The program manager says slow connectivity and server access "got to be such a problem [that it caused] delays to mission critical [development]" throughout her entire product group.
"I'm flabbergasted," says a Microsoft subcontractor familiar with Redmond's corporate network. "Copying files, for example, which is fairly basic, took a ridiculous amount of time because the network was overloaded. From my point of view, that kind of network performance is unacceptable."
Servers that manage Microsoft's Internet projects (housed in a separate off-campus data center) come in for special criticism. "The Web farm is way fucked up," complains an MSN developer, who says server complications occasionally close Microsoft Web sites to browsers and display partial pages. "It's insane."
"Quite a few people have been having problems connecting to network resources," concedes an e-mail sent by Microsoft tech-support to one product group this spring. "ITG has told us that there is an overall problem with the number of WINS [Windows Internet Naming System] machines that are available to handle requests." WINS servers translate domain names, such as microsoft.com, into their corresponding Internet protocol addresses. This applies to the Internet as well as corporate intranet systems. "If you can't establish a session with a domain controller [via a WINS server]," the e-mail continues, "your password can't be validated, so you aren't logged into the Net under your user ID."
As a temporary fix, tech support launched an internal Web site so employees could tell if their access problems were caused by the WINS shortage. Otherwise, the e-mail offered few remedies and instructed employees to log on and off the system until their passwords were accepted. Microsoft would not officially comment on specific problems, but Connors denies any systemic network problems. He acknowledged, however, that ITG fields more user complaints than compliments, which, after all, is the nature of the job. "They don't give me a lot of 'my computer is doing great, you're doing a great job,'" says Connors.
Connors characterizes most glitches and delays as "network storms," a campus catchall for a wide variety of system errors. He says most e-mail and connectivity slowdowns are caused by bugs in experimental code being tested on the network server by overly ambitious Web developers. In fact, Microsoft beta-tests almost all of its commercial software on the internal system. Although the company showcases its corporate network to potential customers, it's an unsterile proving ground by design, where system bugs, slowdowns and storms are not unexpected. Connors says a recently installed test version of Exchange may have been responsible for some of the complaints in the spring.
Some employees remain patient and sympathetic. "My attitude is that we could lock the network down on whatever was sure to work, or we could try out this experimental stuff," says an MSN manager. "If something's slow, it's the price you pay for being where this work is being done."
All That Glitters Is Not Gold
Still, glitches do slip through. Some Microsoft programs that passed the internal stress tests and ran supposedly "golden code" have led to several embarrassing complications with Microsoft's public online services. In April, e-mail service for more than 2 million MSN subscribers was disabled for five days. On top of the customer relations nightmare for the online service, Microsoft scrambled to shield the underlying NT architecture from criticism. Rather than implicate NT in the glitch, MSN spokesman Chris Voss quickly shifted blame to Exchange. "It was more the e-mail client that was not scalable," he said.
Two months later, Microsoft completed its two-year project to wean the company's Web server farm off competing Unix products and onto NT servers. One of the last servers turned over to NT was Microsoft's primary corporate home page Web server, atbd.microsoft.com.
Within hours of the switchover, on June 19, hackers began cracking NT security, jamming the site and interrupting service to 1 million daily visitors. The situation was particularly embarrassing, given that NT competitors, especially Sun Microsystems, had been attacking NT's supposedly lax security in press releases for months.
An online statement issued by Microsoft, titled "If You Had Trouble Accessing microsoft.com, Here's Why," blamed the service disruption on "scheduled upgrades," which, unfortunately, coincided with the hacker attacks. "A patch that protects against this type of attack was applied today and the site disruption is no longer an issue," explained the memo.
But 10 days later another hacker managed to lock up Microsoft's Web servers, disrupting microsoft.com. The company developed another patch and encouraged anyone using an NT Web server to load the fix.
The very next day, on July 2, the site went down for the third time in three weeks. "It's not an attack or a hack or anything like that this time," reported company spokesman Adam Sohn. Instead, this disruption was caused by maintenance performed on Microsoft database servers. "Some of that literally means picking up a computer, unplugging it from the wall and bringing it up somewhere else," explains Sohn. Unexpected complications temporarily disabled 70 percent of microsoft.com servers.
Winning Strategy
Despite the setbacks, Microsoft boasts an impressive array of satisfied NT customers, in part because of its extensive internal testing and use of its own products. Boeing, Charles Schwab & Co., Nasdaq and Lexis-Nexis all rely at least in part on NT servers. In July, the Defense Department said the Air Force will begin installing NT servers to manage 37,000 workstations. During fiscal 1996-1997, Microsoft expects to ship 1.2 million copies of Windows NT 4.0, and sales, combined with those of BackOffice, could approach $2 billion--almost four times last year's volume. "If you compare [NT] to any individual form of Unix, it's outshipping Unix platforms by about a factor of five," Gates announced at Scalability Day. "If you take Unix as a whole together, Windows NT is outshipping it by more than a factor of two."
That winning strategy is mostly due to a price/performance ratio that makes Windows too affordable to ignore. But the strategy also includes shipping sometimes buggy products early, with promised upgrades to follow. While simply "good enough" may satisfy PC customers, corporate clients may hold out and pay premiums for what Gates calls "absolute performance."
Even Gates admitted at Microsoft's Scalability Day that "in terms of absolute performance, we still have some work we're doing to achieve that metric." The question that remains: Is what's good enough for Microsoft really good enough for the rest of the world?
Mike Romano is a staff writer at the Seattle Weekly.