Analytics For All

Hello, welcome to your community

This community is an endeavour to contain, share and propagate ideas from data enthusiasts and evangelists. If you live and breath data, you have found your ultimate destination. Be ready to indulge in opinions and float in a pool of ideas.

This message will automatically disappear in 5 seconds…
Get In
Skip to Content

S3 vs HDFS – Comparing Technologies in the Big Data Ecosystem

Blog

For the better part of the last decade, Hadoop has served as the industry standard mechanism for storing big data with HDFS acting as its core architectural achievement. Although Hadoop has dominated market share of big data storage,cloud technologies have emerged as a viable alternative to on-premises data storage. A cloud technology that offers a similar use case as Hadoop is AWS’s S3 storage solution. If you are using Amazon’s cloud computing infrastructure in big data projects, S3 is an ideal fit. However, there are several differences between Hadoop and S3. S3 is an object store, and therefore is not strictly used as a file system. S3 accesses and stores objects using the key/object relationship. S3 also uses an eventually consistent model, similar to what you would find with NoSQL storage design. Instead of partitioning and bucketing, S3 employs data sharding, which allows you to split and store a single dataset.

Back to top