For some time now, Rapleaf has been hard at work converting a critical portion of our infrastructure from a MySQL-based system to a Hadoop-based one. We see it as a much more obvious path to linear scalability of our processing pipeline. Since scalability is our goal, a technology that has obviously found its way into our view is Amazon Web Services’ EC2 offering.
For those of you that don’t know, in brief, EC2 is a virtualization service that allows you to host as many instances of a given machine image as you’d like in Amazon’s datacenters, effectively giving you the ability to nearly instantly scale up any portion of your infrastructure. It’s a pretty compelling idea, but designing an architecture around it can be confusing at times. Even worse, trying to compare costs between running in EC2 and on your own machines in a colocation facility really isn’t comparing apples to apples, but you have to do it in order to understand the costs.
We recently took some time to try and slay this dragon and actually quantify our decision to use or not use EC2 one way or another. I’ll run through our logic in this post so you can try and compare your own use case to ours and make an informed decision.
Comparison Factors
First, let’s look at the dimensions within which we’re trying to compare the two options. The better solution should balance all of the following:
- Scalability. We want to be able to scale up to handle more capacity relatively easily, and maintenance associated with our setup to be pretty low.
- Performance. Whatever solution we choose, it should be capable of completing its needed tasks in the allotted time. Particularly for our application, we have a 6-hour execution window.
- Cost. Of course, we’d love our solution to be as cheap as possible, without sacrificing too much of our other goals.
Let’s take a look at each of these factors in turn.
Scalability
There’s no question that EC2 is a far more scalable approach to running a Hadoop application. When it comes time to use more machines, you just use Amazon’s tools to boot up more instances. You pay for what you need and no more. When you don’t need the extra, you can always just power them down. Enough said.
Conversely, scaling in a colo can range from merely time consuming to downright nightmarish. In the best situation, you already have space in your cage, so you only have to purchase, install, and configure a bunch of new machines. Just getting the machines from the vendor is probably going to take a week, and installation some additional time, though you can likely do some work up front to have a machine image that reduces installation time greatly. In a worse situation, you might be out of space and have to consider renting some new space in another cage or cabinets, which brings on questions like network topology.
However, the real killer in owning your own machines seems to me to be the possible maintenance costs. Specifically, how many sysadmins are you going to need to keep all those machines spinning away? There’s all the installing and updating of software, hardware failures and replacements, and troubleshooting a host of other possible issues. The industry average machine:sysadmin ratio sits at around 120 machines per person. Hadoop deployments are probably a little simpler than the average install, though, so I’m guessing this number is more like 200+ machines per person. (Obviously, this does not take into account many variables like hardware failure rates, so your mileage may vary.) Comparatively, there’s no hardware maintenance in EC2, and Amazon takes care of the machine imaging and network management for you, so I would estimate that a single sysadmin could probably manage something like 2000 machines.
The absolute worst problem is what happens when you reach the limits of your current setup. For instance, when you cross your 200-machine productivity limit, you’ll have to hire another sysadmin, which is an enormous expense and time investment. Or, when you cross the limit of the number of switches you can affordably daisy chain together, the cost of purchasing much more expensive networking gear. Etcetera. EC2 is only effected by the personnel issue, but even then, at such a scale that it probably isn’t an issue.
Performance
The performance issue is not quite as clear cut as one might think. For starters, you’re able to get pretty much whatever kind of machine your heart desires on EC2, so you can match with whatever machine you’d like to get in your datacenter. At Rapleaf, we’re eying up the highest-end machines, the “High CPU Large Instances” (pricing and details can be found here), which are roughly equivalent to the machines we’d like to have in our datacenter.
Right off, though, there are some differences. First, none of the EC2 instance offerings with 8 cores have more than 7GB of memory. We want at least 8GB (which seems like a pretty rational number when you have 8 cores…), and we like to have the option of more if we ever needed it. Next, we managed to find a vendor of 1U machines that support 4 SATA disks, allowing us to get the performance of 4 spindles and 4TB of raw space per node, which is pretty nice. The biggest disks available in EC2 are 1690GB, which would impose some restrictions on how we do things. Amazon has an add-on feature called Elastic Block Stores that would allow us to have more disk space per instance, but at an additional cost.
There’s another subtle issue, though. In your own datacenter, you control the precise layout of your machines and network, allowing you to place all of your worker machines on either a single switch or a set of trunked switches, keeping the aggregate bandwidth in the cluster very high. EC2 makes no guarantees at all about where your machines might physically be located. As a result, you’re completely barred from configuring Hadoop rack awareness, which robs you of some performance. (Amazon does allow something called Availability Zones, within which computers are considered “close”, but not necessarily in the same rack.)
In effect, this all boils down to the fact that in EC2, you’re giving up control of some of the finer details. This can be a benefit in some scenarios, but it means that you have to be prepared to accept whatever performance EC2 gives you with the inherent machine configuration.
Cost
So, let’s try and put it all together into a consistent cost calculation and see where it comes down. We’ll assume for the sake of argument that you have a fairly well-paid sysadmin who makes $120,000 a year, and that if you have less machines than the maximum productive number, your sysadmin can find something else to do with his spare time. All the numbers about colocation costs are roughly based on numbers we’re currently paying. Finally, I’m just going to leave out any possible bandwidth costs because any estimation would be very subjective based on the application architecture and the data source. (In a real application, you probably need to consider bandwidth costs very closely!)
EC2
High CPU Large Instance cost per hour: $.80
Maintenance cost per hour: $120k/year / 12 months in a year / 30 days in a month / 24 hours in a day / 2000 machines = $.007 / hour
Total cost per hour: $.807
Owned machine
Machine cost per hour amortized over 36 months of use: $2000 / 36 / 30 / 24 = $.07
Cost of cabinet and power per hour: $2500 per cabinet / 30 / 24 / 40 machines per cabinet = $.09
Maintenance cost per hour: $120k/year / 12 months / 30 / 24 / 200 machines = $.07
Total cost per hour: $.23
In terms of flat cost per hour for like numbers of machines, owning the machines and running them in colo space you rent monthly is 1/3 to 1/4 the cost of running your cluster in EC2. Somewhat surprising, no?
The Caveat…
There is one final piece of the analysis that is missing: the cost, or rather, the benefit, of the scalability attributes of EC2. I’ve purposely left this out of the computation because, honestly, it is in an incredibly hard attribute to quantify, at least for our current application. Clearly, for those of you who know without a doubt that there will be unpredictable overages, your need to protect yourself from being overwhelmed or performing poorly might vastly outweigh the 2/3-3/4 premium you’ll be paying. However, it doesn’t appear that Rapleaf has that problem at this point. More to the point, if you’re already at the point of deploying a Hadoop application, then you’re probably well aware that it isn’t something that shouldn’t be depended on for realtime answers, and thus you can probably bear a longer scaling path.
Conclusion
For Rapleaf, we currently expect our cluster to start at something like 40 machines and grow modestly for the near future. As a result, it looks like we’ll stick it out in a colo. Maybe when we hit the ceiling on that plan we’ll reevaluate again.
I would love to hear everyone’s thoughts on this analysis. Did I leave out any glaring costs that will swing us in the opposite direction?

24 Comments
Hi, Interesting analysis. Your 8 core, 7GB, 4TB machine seems cheap, but that is the way things go. I walked to EC2 away from 3xQuad-CPU super-micro servers, haven’t regretted it
One _enormous_ factor you omit is the shift in your mindset that takes place.
E.g. You set aside the EBS noting (correctly) the additional storage cost – however what EBS facilitates, that your coloc can’t, is the ease of snap-shotting and cloning data among instances – it permits this on a scale that is completely flexible. This sort of flexibility/functionality can _really_ change your application, but not in all cases.
From my experience, AWS starts off being a cost issue, but as you get the hang of thinking about scaling from the outset your usecases start to become ones you’d never start to contemplate when using a coloc facility.
Some figures struck me as strange:
– Do you really utilize your private cluster 24×7? Or is it the 6-hour execution window, daily?
– Could you really get a sys admin to be available 24×7 for 120k, and look after 200 machines? If it is one person you’d need to account for the fact that the response time will be far from immediate, hence the availability would be down from 24×7.
Back of the envelope calcs made me think the ‘owned’ cost is more like half of the EC2 ‘High CPU Large’ cost.
Apart from the value of the expansion option that EC2 offers, is the technology option it offers.
E.g. What odds do you put on Amazon offering solid state HDD, Infinband and/or Tesla machines, etc. within the three year life of the coloc machines?
Add the value you attach to all these options to the cost of the coloc machines in order to make the comparison
Bryan, great post. At Technorati we are also very excited about the promise of cloud computing and have run our production crawlers on EC2 since August. We also run our own data center with over 700 machines and, as you point out, the costs on EC2 are dramatically higher than doing it yourself.
The main issues we faced were, our colo is, in many ways, a fixed cost and we need to get the most out of it we can. Our hadoop cluster is a modest 9 machines right now, but we will be growing this significantly as we move our MySQL analytics to hadoop apps. We are finding that we can get a lot more data crunching out of a cluster of hadoop machines than the equivalent in database machines so we’ll be freeing up hardware for more efficient (and higher revenue generating) hadoop applications.
EC2 is a blessing for development as our engineers can fire up images, stress test, and even roll out releases with instant roll back to a parallel set of virtual hosts. No bottlenecks with limited ops personnel either. Having to have repeatable images and instance startup have also improved our build/release process.
I imagine if we were shooting for a hadoop cluster over 100 machines, we would start looking at alternative cloud solutions and, as time passes, more competitors to EC2 should drive those prices down.
Our conclusion was like yours. For now, bring it back in house. The limited memory and spindles are still a show stopper for massively parallel, disk intensive operations.
While the benefit of EC2 is elasticity the benefit of colo is the ability to run your own hardware config and have dedicated support to admin your own cluster.
We see MUCH better support when we can have in-house admin vs external.
If I hire someone in-house and they don’t perform I can fire them.
Further, the quality is much higher when you have a full time admin.
I think a hybrid model would work better if you can get your data into EC2 quickly. It doesn’t work for everyone of course.
We have a pretty static configuration so an EC2 migration isn’t on the roadmap until they lower their prices by 4x.
Another benefit that Dorion pointed out is that you can have your developer bootup instances to test new applications.
I think this is still valid outside of EC2 as you can just host your images there for testing but migrate it to colo when you need to…
Nice post! We came to much the same conclusions, plus some one other that is only important for some applications:
We need to be able to physically transfer data to our machines – content providers ship us external HDs full of TBs of music and uploading that from our office is prohibitively slow. Sneakernet to our colo via NYC subway is around 3-4Gbps
-Todd
@Mark:
I was surprised that the machines were as cheap as they were, too.
Yes, we really do plan to use our private cluster 24×7. Our goal is to get data through the system in 12 hours max, so we run through new stuff every 6 hours. If we actually go faster than that, then we’ll probably just start the next run immediately. That said, we probably don’t need instant turnaround from our sysadmin on a single machine, because in the case of a machine failure we’re only taking a marginal hit to performance.
Finally, it’s pretty near impossible to guess what Amazon is going to do next in terms of their “hardware” offerings, so we have to go based on what we have today. More importantly, if the price of any new technologies we’d like to use actually does come down in the future, with our own cluster, we’d actually be in a better position to take advantage of them, rather than waiting for EC2 to support it.
Nice analysis, Bryan -
The cost for “Owned” may be problematic in a few other areas. You’ve got purchase prices amortized, which is good, but are you taking into account the failure rates for owned equipment, and subsequent need to replace parts? Or the time required to troubleshoot systems which may require replacements? EC2 and similar services already include those costs.
Power and rack space are not quite so incremental — at least not in many data centers, where one much purchase contracts for those expenses in relatively large increments.
Another issue overlooked is capex vs. opex. As businesses grow (or shrink) that becomes a big problem for critical apps based on large batch jobs. Adding more EC2 nodes to scale to what the business requires — that day — minimizes risks which may become life or death for a start-up. Owned equipment does not scale so readily. Business can generally afford opex scaling much more rapidly and fluidly than capex, particularly during rough economic climates like we have now.
FWIW, in my experience running a dept based mostly on Hadoop use, having sysadmins is not quite the right approach. The MR jobs require troubleshooting by systems engineers who work more closely to the code than sysadmins tend to dare go… Designating some engineers to work on systems seems to be a much better practice than using sysadmins to run our Hadoop clusters. YMMV.
Our dept run Hadoop on 100+ m1.xl EC2 nodes each day for one critical app alone, plus other smaller clusters for other apps, development, and testing. We’ve run the numbers and there’s just no way that owned equipment would be cost-effective compared to this. Business growth and an aggressive pattern of acquisitions makes that even more poignant.
Looking forward to hearing more about your experiences -
@pxn:
It’s true, this analysis doesn’t take into account machine failure rate. I’d like it to, actually, but I don’t have a good source for numbers. I suspect that the failure rate is not going to add an overwhelming additional cost per machine.
I agree that you can’t grow as incrementally in a colo, but my assumption is that that’s ok. When it comes time for us to scale out, we’ll just increase the size of the cluster by a full rack or so. This has the downside of being a big capital investment up front, but the upside of not requiring us to scale again for a while. Amortized, it’ll hopefully end up looking like incremental scaling.
I think that large batch job business, at least for us, is fairly predictable in size and usually announced a good deal ahead of when the output has to be delivered. So, for our Hadoop applications, a fixed cluster is just fine. However, in our serving tier, we definitely recognize the possibility for unexpected spikes in traffic, and are considering using EC2 for overflow web servers.
Would you be willing to share the numbers of your own application’s analysis? I’d be really interested to see if it would cause adjustments in my analysis that could tip the scales.
“Could you really get a sys admin to be available 24×7 for 120k, and look after 200 machines?”
Oh, absolutely. The trend in the Silicon Valley is to pay less than HALF that and have you be responsible for 200 machines and on call 24/7.
Hey guys,
Good to see that you are finally considering EC2. I’ve been nagging Jeremy to get off of MySQL and to look at EC2 for quite a while.
Your calculations are correct. However, I think you have vastly underestimated the costs on several fronts.
You will probably not be able to get 40 machines in a single cabinet. The heat issue is only getting worse and data centers will push back pretty hard.
A good sys admin costs more than that.
What about failures and maintenance? What if you don’t have enough capacity.
That being said, if you have a lot of data, it makes sense to have your own cluster to avoid the costs and risks of transfering to EC2/S3.
Maybe a hybrid approach could be done?
Hi, I enjoyed reading your article sometime ago, and have resonated with it for a while. I posted a similar kind of article myself recently, discussing the benefits of using or not using the cloud from an algorithmic perspective. And I linked to your article from there. I hope it’s ok.
Best,
&
–
http://www.pandamatak.com/people/anand/blog/2009/04/mapreduce_hadoop_and_clouds_wh.html
I tried to compile my own numbers, and AWS comes up surprisingly cost effective, especially for a low number of nodes.
It’s hard to make an apples-to-apples comparison, but I’ve made the following assumptions:
1. 1U Node with a single Quad Core Xeon (2.5Ghz), 8Mb RAM and 4×1TB SATAs costs around $2200.
2. A single node provides roughly 10 compute units. (An EC2 Large Instance provides 4 CUs @ $0.40 / Instance Hour.)
3. Cabinet is $2k / month, 20 1U nodes fit in one cabinet. (I can’t see fitting more than that given power/cooling limits.)
4. Each cabinet adds a fixed cost of around $4k for a switch, cables, KVM, etc.
5. Admin cost is roughly 2x as much when owning. 80% of their time, vs. 40% when using AWS. This is a rough guess, but intuitively I think it’s in the ballpark.
6. You have a fixed base cost of around $10k for the master node and a backup/secondary name-node. 2 x Quad Core, RAID 5, etc.
7. You use the cluster roughly 8 hours a day. (30 days a month.) Sure, you could get higher utilization if you’re very efficient.
8. Available storage on the owned cluster is 1/3 of raw storage. (So 1.35TB per node.)
9. To faciliate comparison with S3, I assumed you are using all of the available storage at a rate of $0.10 / Gb / Mo.
10. I ignored S3 data transfer and transaction charges. I figure 9 & 10 should cancel each other out…
Nodes / Monthly Cost Difference (Negative favors AWS)
10 – (3,284.43)
20 – (143.86)
30 – 877.26
40 – 4,017.83
50 – 5,038.96
60 – 8,179.52
80 – 12,341.22
100 – 16,502.91
As you can see the numbers intersect at around 20 nodes. Most of the difference in the beginning is accounted for the SysAdmin’s salary…
On the other hand, if you switched up the EC2 Large Instance with the High-CPU Extra Large Instance (20 CUs @ $0.80 / Instance Hour), the table turns in AWS’ favor. The lines don’t cross until you hit the 70 node mark.
I wish there was a better way to quantify CUs.
@Patrick:
If you are buying more than a few machines, I doubt you will end up paying the full $2200 per machine you suggest. I think our machines turned out to be at least slightly cheaper, and we only bought 40.
You’re right – we could only get around 22 machines in a cabinet due to power and cooling constraints. However, our other costs per cabinet are very low – we can share one $2k 48-port switch between two cabinets and we don’t need a KVM. (We either use IPMI or a wheeled KVM cart that the colo has on hand.)
As for your suggestion of 8 hours/day cluster usage, I pointed out in my original post that we would be using the cluster 24/7, which has proven to be the case. In fact, we are expanding the cluster so that we can get more work done in 24 hours.
I think it’s very interesting, though, that even in your model of 66% down time, EC2 isn’t cheaper anymore after you hit 20 machines. This certainly means that in our application, with 24/7 use and 40+ machines, EC2 would cost us a lot more.
@Dru:
We’ve talked about hybrid approaches for a while, but there are some substantial barriers to making that a really great approach.
You can’t really use EC2 to extend your local cluster, since the network latency would just crush your Map/Reduce performance. You might think that you could just turn a completely separate EC2 cluster when you needed it, but if your data is stored locally and is nontrivially large, then your computation time will likely be subsumed by your transfer time.
In the future, we may support running a portion of our serving infrastructure in EC2 for failover and performance reasons, but that will include replicating only a tiny portion of the total data and computation of our entire application.
@Bryan
I agree, the cost per node and per cabinet could go down. You can also pack more nodes per cabinet if you used a DC power supply (although each node might cost more).
But equipment and real-estate costs are minor, I think, compared to the labor costs for building and maintaining your own cluster. That is, until you get to a certain scale.
Interesting post, I wish I had run across this earlier. Its always difficult to do comparisons like these because there are so many factors involved that are incredibly difficult to quantify (e.g. scalability or turn-around rate). Having used Hadoop on EC2 (via Rightscale) I though it might be helpful to point out cases where my experience was similar and different to yours. Why hybrid clouds are great, I would agree that this is not a good situation to use one.
Aside from the latency, there is also the cost consideration. EC2 charges per gig of in/out bandwidth, since Hadoop is unaware of this fact you could easily end up with huge bandwidth costs, particularly if you are doing multi-stage jobs.
You were wondering about failure rate on EC2, I can say that I have experienced only a handful of failures on EC2 over the course of a few years. Almost every time an instance failed it was a result of the instance being completely overworked, often as a result of a configuration error. For example I once used the wrong config file and ended up telling Hadoop it 4GB of RAM to on a small instance (1.7GB of RAM). RAM and swap completely filled up, etc etc. When correctly configured I have almost never seen an instance fail, even when worked quite vigorously.
One additional approach you may want to consider is using many more weak machines, as opposed to relatively few strong machines. So replacing xlarge instances, with 8x as many small instances (or 4x large instances). I see this as being appropriate in two circumstances. First, if you have a job which can be chopped up into thousands (if not many more) of very tiny splits, you can ensure that all nodes stay busy. The second reason is for IO bound tasks, which is quite popular among Hadoop jobs. Conventional wisdom says that 2 disks in a RAID should provide double the throughput of a single disk. While this is true, it is important to note that unlike CPU and Memory, IO performance is not limited to a fraction of the available IO performance on an instance. So say you are using a small instance, if you are the only one using the disk then you get almost the entire available throughput of that disk. If you were to get say 60% of the throughput on the 8 disks of 8 small instances (~4.8 disks worth of throughput), thats could potentially outperform getting 100% throughput of the 4 disks in an xlarge instance. Sure you could get stuck on an instance with tons of IO hogs, but overall IO utilization on EC2 I would guess is quite low. Of course, your mileage may vary and the only way to truly know is to try it.
Cheers and happy Hadooping.
Extremely interesting post. I know my comment is probably very late for your decision, but I think I’ve spotted some small issues with your cost calculator (forgive me if I’m wrong, as I’m posting this at 6am after a sleepless night).
1. For colo, you are considering a 36 months amortization period. In that case you could consider the EC2 reserved instance and for that you’ll pay a 1-time $4000 and $.24/h resulting in a 4000/3/12/30/24 + .24 = $.40
2. Related to colo:
a. I think you’ve left aside some additional costs like wires/switches/etc., but maybe we can ignore those.
b. I am tempted to think that the sysadmin will have to perform a lot more work for colo and depending on the number of instances you are considering 1 sysadmin might not be enough (having 2 sysadmins will completely change the equation. Based on your SLA you might be required to have at least 2 sysadmins for assuring 24/7).
c. You haven’t included any costs for spare parts over a 3 year period. Once again based on the number of machines, the probability of having broken pieces grows significantly.
d. I guess this is a bit subjective, but for the period considered, 3 years — which is roughly 2 Moore’s cycles, Amazon most probably will offer you the chance to upgrade to more powerful machines. Upgrading your own servers will be a much more expensive and risky operation (once again depending on the number of machines).
Now, I haven’t redone all the calculations, but I’m pretty sure that considering at least a couple of the above points the costs of the 2 options will be much closer.
bests,
./alex
The most important inputs into the decision are:
1) Demand variance
2) Required response time to increased demand
Your original scenario was for a steady (low demand variance) workload with good advance notice on increased demand (long/relaxed response time). In this case it’s not surprising that colo is less than EC2. I do think you are missing (from both alternative solutions) the cost of management software (whether RightScale or Nagios or OpenView). When you get hundreds of machines you need more than home-brew system management tools. This cost can be enormous (and is a reason why EC2 can sometimes be a better deal at large server counts).
I’ve worked at (and with) a number of companies with the opposite scenario – high demand variance and short required response time to demand changes. Here EC2 and other clouds shine. Imagine a situation where the number of concurrent users varies from 0 to 25,000 (or even higher) – and the spikes are only for about 6 hours/day, 5 days/week, and only 36 weeks/year for a total utilization rate of about 1/8th.
Add to this the fact that the concurrent users are a subset of a paid subscriber base of 3 million who *could* all decide to logon at once (unlikely), and you’ll see that EC2’s variable costs plus autoscaling (up and down) are a better way to fit costs to demand. CFOs love this. It allows them to match the outbound bill to the inbound recognized subscription revenue. You can satisfy your high-water-mark demand without permanently embedding high-water-mark capacity.
So in the end, it’s not only “mean”, but “variance” that matters – just like in an investment portfolio!
Doug
Just a little note in case you decide to buy your own equipment.
re: “we managed to find a vendor of 1U machines that support 4 SATA disks” SuperMicro (resold by tons of server builders) has 1U machines that can handle 8 x 2.5″ SATA drives and 12 DIMMs. Really nice if density is important to you.
Doh. This post is from December ‘08 and not December `09. I hope that whatever you did is going well for you! haha.
I think a hybrid approach can give you the ‘agility’ you need. Agile is a cool word, but can we live up to it? If you focus on moving to hadoop for storage, then you abstract your apps. from being tied to physical storage and can let the market settle down while you decide where you implement, possibly migrating from colo to EC2, etc. underneat your hadoop setup. Also, we make big use of ‘free’ virtualization tools with cheap terabyte storage in our own colo cabinet, giving us agility you might want for developers, testing, etc. but it’s in our piece of the cloud at fixed cost. Bottom line… think outside the (1U) box
I run technical operations for a fairly large internet property, and there are a couple assumptions about sysadmin work here that should be corrected. The notion that sysadmin’s spend a 40% of their time working on hardware or colo issues is bananas. That figure is much closer to 10%. The vast majority of their time, in a software development or software type service, is working directly with engineers to troubleshoot software or network problems, installing/configuring/patching software, performance tuning, or other regular maintenance. In other words, out 85-90% of the workload for your operations group does not change at all whether you use your own equipment or you have services in the cloud. In my experience so far, you simply swap out the time you would spend with your own vendors and equipment for the time spend provisioning and configuring AWS instances. Your syadmin time is going to basically be a wash.
Nothing beats a cloud service for rapid scalability. Nothing beats a cloud service for low cost of entry. However, there are a couple big problems you have to think about. How do you get 10TB of data out of your AWS/S3 cloud (and what is it going to cost you)? How do you load balance more simultaneous connections than AWS software load balancing will support? How can you guarantee to your engineers that ten server instances are identical when they were built over a six month period? How do you get more than 1.5 TB at available at local SCSI or fiberchannel speeds? How do you troubleshoot packet loss when you have no access network equipment and no enforceable SLA to your network provider? Basically, how do you go bigger and faster?
wow.. Ive learned a lot of things in here.. Now i know that it is very difficult to have or make your own home.. You really need lots of money for the investment..
3 Trackbacks
[...] RapLeap (Haddop, EC2 or Colo) [...]
[...] Comparing EC2 to Hadoop [...]
[...] there’s been a bit of discussion online about whether or not it makes sense for companies to host server infrastructure at Amazon [...]