S3 vs HDFS – Comparing Technologies in the Big Data Ecosystem
For the better part of the last decade, Hadoop has served as the industry standard mechanism for storing big data with HDFS acting as its core architectural achievement. Although Hadoop has dominated market share of big data storage,cloud technologies have emerged as a viable alternative to on-premises data storage. A cloud technology that offers a similar use case as Hadoop is AWS’s S3 storage solution. If you are using Amazon’s cloud computing infrastructure in big data projects, S3 is an ideal fit. However, there are several differences between Hadoop and S3. S3 is an object store, and therefore is not strictly used as a file system. S3 accesses and stores objects using the key/object relationship. S3 also uses an eventually consistent model, similar to what you would find with NoSQL storage design. Instead of partitioning and bucketing, S3 employs data sharding, which allows you to split and store a single dataset.