AI Agents Can Write the Code. Engineers Still Own the Trust.

I recently read the arXiv paper “Skills for the future software profession: beyond agentic AI!” by Sungmin Kang, Baishakhi Ray, and Abhik Roychoudhury. One point that strongly resonated with me is their observation that verification and validation will become increasingly important as AI agents take on more implementation work.

That idea feels very real to me.

We are entering a phase where AI agents are no longer just autocomplete tools. They can generate code, modify existing systems, write tests, open pull requests, explain bugs, and even suggest architectural changes. This is powerful, but it also changes the responsibility of software engineers.

If an AI agent writes the implementation, the engineer’s job does not disappear. It shifts.

The new question is not only:

“Can this agent generate the code?”

The more important question becomes:

“How do we know this code is correct, safe, maintainable, and aligned with the real business intent?”

That is where verification and validation become critical.

Implementation is becoming cheaper, but correctness is not

In traditional software engineering, a lot of effort went into writing the implementation itself. Developers translated requirements into code, handled edge cases, wrote tests, reviewed pull requests, and debugged issues.

AI agents reduce the cost of implementation. A developer can now describe a feature or bug fix and get a working-looking patch very quickly.

But “working-looking” is not the same as correct.

This is especially important in real production systems. In many engineering teams, the hard part is not just writing code. The hard part is understanding the domain, knowing the constraints, avoiding regressions, and making sure the change behaves correctly in messy real-world situations.

For example, in a backend system, an AI-generated change may compile and pass unit tests, but still break pagination, introduce a race condition, miss an authorization rule, or create unexpected load on a database.

In a fintech or risk decisioning system, the stakes are even higher. A small logic change can affect payment decisions, fraud checks, customer eligibility, compliance rules, or financial reporting. The code may look clean, but if it changes the decision behavior incorrectly, the business impact can be serious.

This is why the future bottleneck is not simply code generation.

The bottleneck is trust.

Verification asks: did we build it right?

Verification is about checking whether the system was built correctly according to the specification.

In day-to-day engineering, this usually means things like:

Do the tests cover the expected behavior?
Do the APIs return the right response for edge cases?
Do we handle invalid inputs correctly?
Do we preserve existing behavior?
Do we enforce security and authorization rules?
Does the implementation match the technical design?

With AI-generated code, verification becomes even more important because the code can be produced faster than humans can fully reason about it.

A common risk is that developers may see a polished AI-generated patch and trust it too quickly. The code may have good naming, clean structure, and even generated tests. But generated tests can also be shallow. They may only test the happy path. They may confirm what the agent implemented, not what the business actually needed.

This means engineers must become better at reviewing not only the code, but also the evidence around the code.

A pull request should not only answer:

“Does the code look good?”

It should answer:

“What proves that this behavior is correct?”

That proof may come from unit tests, integration tests, contract tests, property-based tests, production metrics, static analysis, formal specifications, or carefully written acceptance criteria.

Validation asks: did we build the right thing?

Validation is different. It asks whether we built the right system for the actual user or business need.

This is where AI agents can struggle.

An agent can implement a ticket exactly as written, but the ticket itself may be incomplete. The acceptance criteria may be vague. The user story may miss an important edge case. The business rule may be obvious to domain experts but not written anywhere.

In real engineering teams, this happens all the time.

A product requirement says:

“Block risky transactions.”

But what does “risky” mean?

Does it depend on amount, country, device, customer history, bank response, previous failed attempts, or compliance rules?

Should the transaction be blocked permanently or sent for manual review?

What happens if the risk service is unavailable?

Should existing trusted customers be treated differently?

An AI agent can generate an implementation, but it cannot magically know the unwritten business intent. That is where engineers must work with product managers, domain experts, compliance teams, support teams, and other engineers to clarify the real requirement.

In the agentic AI era, validation becomes a key engineering skill because humans must define what “good” means before agents can reliably build it.

Specifications need to become living artifacts

One practical change I expect is that specifications will become more important again.

In many teams, specifications exist in scattered places: Jira tickets, Slack threads, design docs, pull request comments, test cases, dashboards, and tribal knowledge. Over time, these drift apart.

AI agents can make this drift worse because they allow systems to change faster. If the code changes quickly but the intent is not captured clearly, teams accumulate cognitive debt. After a few months, nobody fully remembers why a behavior exists or what rules the system is supposed to preserve.

This is dangerous.

A better workflow would treat specifications as living artifacts.

For example:

Before implementation, engineers define the business rule clearly.
The rule is translated into acceptance criteria.
Acceptance criteria are converted into executable tests.
Important invariants are documented.
The AI agent implements the change.
Another agent or tool verifies the implementation.
The human engineer reviews both the code and the verification evidence.

This kind of workflow is not about slowing teams down. It is about making fast AI-assisted development safe.

Code review will also change

Today, many code reviews focus heavily on implementation details.

Is the code readable?
Is the naming good?
Is the function too large?
Is the logic duplicated?
Are there obvious bugs?

These things still matter. But with AI agents, code review needs to move one level higher.

The reviewer should ask:

Does this change solve the actual problem?
Are the tests meaningful or just generated noise?
What behavior changed?
What behavior must not change?
Are there missing edge cases?
Could this create a production incident?
Is the rollback path clear?
Are logs, metrics, and alerts sufficient?

This is especially important for backend systems where bugs may not be visible immediately. A frontend bug may be seen quickly by users. But a backend bug can silently corrupt data, make wrong risk decisions, trigger duplicate jobs, or increase infrastructure cost before anyone notices.

So code review becomes less about checking every syntax detail and more about validating assumptions, risks, and evidence.

Production incidents will teach us where AI was trusted too much

I also think many teams will learn this lesson through production incidents.

An AI-generated migration may pass local tests but fail on real data volume.
A generated retry mechanism may accidentally multiply traffic during an outage.
A generated cache change may serve stale data.
A generated authorization fix may protect one endpoint but miss another.
A generated risk-rule change may work for normal cases but fail for edge cases.

None of these problems are “AI problems” only. Human engineers also make these mistakes. But AI increases the speed and volume of changes, which means weak verification practices become more dangerous.

If teams use agents to ship faster without improving validation, they may simply ship bugs faster.

The engineer becomes the trust architect

This is why I like the idea that future software engineers will move from being only code writers to becoming trust architects.

That does not mean engineers will stop coding. Coding skill will still matter. But the higher-value skill will be the ability to define correctness, design verification workflows, and judge whether the generated solution is safe enough to ship.

The engineer will need to know how to say:

This is the business intent.
These are the edge cases.
These are the invariants.
These are the failure modes.
These are the tests that prove correctness.
These are the metrics that will tell us if production behavior changes.
This is where human review is required.
This is where automation is enough.

That is a very different role from simply accepting whatever an AI agent generates.

What teams should start doing now

Teams that want to use AI agents responsibly should invest in verification and validation practices early.

Some practical steps:

Write better acceptance criteria before asking agents to implement changes.
Review generated tests as carefully as generated code.
Use integration and contract tests for important service boundaries.
Capture business rules as executable checks where possible.
Track production behavior with meaningful metrics and alerts.
Require stronger review for high-risk areas like payments, permissions, compliance, and data migrations.
Keep design decisions and assumptions documented.
Treat AI output as a proposal, not as truth.

The last point is important. AI agents can be very useful, but they are not accountable for production systems. Engineers and organizations are.

Conclusion

The rise of AI coding agents does not remove the need for software engineering discipline. It increases the need for it.

As implementation becomes easier, verification and validation become more valuable. The ability to write code will still matter, but the ability to prove that the code is correct, safe, and aligned with user intent will matter even more.

That is why the paper’s point resonated with me so strongly.

AI agents may write more of the code.

But engineers still own the correctness.

Engineers still own the judgment.

Engineers still own the trust.