S3 is good software
Continuing from this previous post (see zed_is_good_software.md
), I wanted to riff/write/think about why I like S3 so much.
Similar to the previous post, I’ll start off talking generally in terms of 1. Performance (implementation) and 2. API (interface) and then build on top of that.
What is “performance” in this context ? I would say that “operational quality” is a more accurate word that I’m looking for. S3 for me is the fundamental service for storing/querying files 12 when building distributed systems, so the primary expectation is durability (property that data doesn’t get lost or corrupted) and then availability (property that you can access your data when you need it). In terms of these parameters S3 is boringly stable and reliable, your data is never getting lost and you’ll almost always be able to access you data (unless there’s an AWS outage, then most of the internet is down anyway). There’s also consistency, where S3 is strongly read-after-write consistent 3 since Dec 2020.
In terms of API, S3 is incredibly flexible and simple unlike certain other AWS services and the Console UX 4 ? I’ve personally found it incredibly easy to use it in weird ways like plugging it into my ETL workflow as a scratch space to do some parquet transformations, to dump file state in a stateful program, backup RDS snapshots, copy files between two servers and query parquets in bucket with duckdb. Apart from the standard API operations like listing, putting, getting, you have the higher level data movement functionality in the AWS cli (aws s3 cp, aws s3 mv, aws s3 sync). You can go one abstraction higher and just mount a S3 bucket to your server5, call it a day and run certain workloads on the files in the buckets.
S3 has other good qualities too - reasonable pricing for most use cases, solid versioning - but I’ll skip those for now.
Ok, now after what must seem like lots of convoluted AWS shpiel, I’m finally getting at my point, which is, I speculate that when you have a very good service like S3 with these characteristics (great operational quality & great interface quality ) some emergent behaviours emerge. It becomes Lindy in a sense, everyone just uses it for anything and everything ( appropriate and inappropriate ). It becomes foundational infrastructure duct tape for very important systems in the world.
I really like using S3 as an ideal to aim for when building platform systems. I further more really like Jassy’s views on this Primitives (aka Platforms) in the AMZN 2023 Letter to Shareholders6 where he talks about motivations for building good Primitives amongst other things.
Jassy describes primitives as building blocks that do one thing really well with maximum developer flexibility. S3 fits this perfectly - it has exceptional operational quality for object storage and a simple API that hides complexity while enabling composition.
S3 is good software.
Appendix 1
I spent some time collecting how different enterprises are using S3 as primary data stores in systems (I found these two to be interesting usecases ).
1. Turbopuffer
- marketing copy: search every byte ( serverless vector and full-text search built from first principles on object storage: fast, 10x cheaper, and extremely scalable )
- usecase: vector database
- simple system architecture of a binary using a EC2 instance (Memory / SSD Cache) as a hot cache and S3 is source of truth.
- links:
2. Parseable
- marketing copy: Fast Observability on S3 (Parseable is built for fast observability on object storage systems like S3: deploy anywhere in minutes, 10x cheaper, extremely scalable and built with open standards.)
- usecase: observability (and time series) database
- links:
Appendix 2
Another pattern I’ve observed is that people are using s3 (object storage service in general) as lakehouses. I guess that kind of makes sense, considering how easy I’ve found it to dump parquets into s3 buckets and query them in duckdb or do more detailed operations with pyarrow (I’m guessing the metadata management is simpler and more standard with Apache Iceberg).
links:
- https://tobilg.com/the-age-of-10-dollar-a-month-lakehouses
- https://aws.amazon.com/blogs/big-data/use-apache-iceberg-in-your-data-lake-with-amazon-s3-aws-glue-and-snowflake/
- (interesting concept from duckdb to manage tables in object storage. competitor to iceberg ?) https://duckdb.org/2025/05/27/ducklake.html
- (e6data talk about object storage native lakehouse) https://youtu.be/XX-EWRyXVzs?feature=shared
-
I love that Bezos’s original spec for S3 was “malloc for the internet” - https://news.ycombinator.com/item?id=24802268 ↩︎
-
I know that its technically an object storage, but you are storing all kind of files at the end of the day and the wording of object is more to indicate a specific access pattern. ↩︎
-
https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/ ↩︎
-
Hey, the UX counts as an API too ! Also its not surprising that people line up to use AWS despite its console sucking and its other shortcomings, the capabilities of its service is exceptional. ↩︎
-
https://aws.amazon.com/blogs/storage/the-inside-story-on-mountpoint-for-amazon-s3-a-high-performance-open-source-file-client/ ↩︎
-
https://www.aboutamazon.com/news/company-news/amazon-ceo-andy-jassy-2023-letter-to-shareholders ↩︎