When I started in this industry, there was a bad but common practice. Each new project started with a database schema. The DBAs or whoever was in charge of Oracle would have a long discussion and many meetings, and then you, the developer, would be “blessed” with a schema. The schema was usually a bit wrong and inefficient, and didn’t match what you were doing, so you wrote inefficient queries to work around it until you got yelled at and everyone agreed to fix things. This changed with object-relational mapping tools like Hibernate in Java and Entity Framework in .NET. It changed more seriously when we moved to “schema on read,” first with Hadoop and later with Amazon S3 and Parquet files and whatever.
The old system was slow and painful but it did protect against unexpected change. The modern system empowers data producers to change but disempowers the people whose job it is to provide stability. Most organizations have some data platform team whose job it is to provide omniscience despite being woefully outnumbered. That might sound like a good deal for developers—all of the power and none of the responsibility and a team of people who are there to take the whipping. However, it doesn’t work out that way. As a developer, you’re either breaking downstream data systems, including the fancy new AI system, or you’re afraid to break things and moving too slowly.
When data ownership moves upstream
Consider this story. Jez, a senior engineer on the Support Platform team, spots this payload: