Day 2 of SlashNEW

Date: May 28, 2026

TL/DR? Read the highlights & takeaways.

Humans in the loop: why the future of AI still belongs to us

Jovana Dunisijevic opened SlashNew day two with something deliberately unfashionable: a case for optimism. As a senior technical evangelist at Atlassian, she’s spent her career building systems designed to scale. Now, she said, the more interesting question is how humans continue to scale alongside the intelligent systems they’re building.

She also opened with a complaint about the phrase “humans in the loop.”

“We built the loop. We designed it. We ran it alone for years, decades even. So calling it humans in the loop feels like someone opened the door and said, come in. We never left the room. And it’s a room where we are the hosts.”

The 84% nobody talks about

The most cited number in the talk came from Atlassian’s own research: writing code accounts for roughly 16% of the software development lifecycle. That figure is a year old but Jovana’s read is that it’s probably still in the right range, possibly lower now.

The question she posed is why organisations keep the conversation almost entirely on that 16%. Every vendor has an AI coding assistant. Every IDE ships one. Writing code faster is, as she put it, table stakes. It is not a strategy.

The remaining 84% is where the leverage actually is: clarifying ambiguous requirements before a line of code is written, surfacing risks early, navigating architecture trade-offs, managing governance and compliance, backlog triage, security reviews, documentation, stakeholder updates…
All the work that keeps engineers out of flow. All the work nobody is asking AI to touch.

“I honestly don’t believe coding was ever the bottleneck. I think it was everything else.”

Organisations that stopped at the coding assistant, she argued, have quietly chosen a model where humans exist to correct machines, rather than one where AI sharpens human judgment. Most of them haven’t noticed they’ve made that choice.

The workflow has already flipped

Something else most teams haven’t noticed: the order of their work has reversed.

The old sequence was discuss, then build. Teams aligned on requirements, wrote RFCs, debated approaches, and then started building. The new sequence, spreading quietly across engineering teams, is build first, socialise later. The prototype becomes the artifact that drives the decision. The discussion happens after something exists.

Jovana is not entirely negative about this. Ideas that used to die in a backlog now get implemented in an afternoon. Teams can evaluate with higher fidelity earlier. Faster feedback loops produce better decisions. Cross-functional empowerment is real: product managers are prototyping, designers are pushing pull requests.

“We’re automating implementation faster than we’re improving alignment. The gap is widening.”

But each of those wins carries a hidden cost that shows up later. The alignment work doesn’t disappear. It just moves to after the fact, where it becomes churn and rework. The “should we even build this” conversation gets skipped entirely. Decisions that used to require several people in a room now get made by an individual contributor orchestrating agents alone.

“Coding and engineering are not the same thing. Coding is just writing code. Engineering is thinking, problem solving, assessment, analysis, navigating ambiguity, weighing consequences. That was always the job. Now it becomes the entire job.”

What AI cannot solve

Jovana named four categories where AI consistently fails and human judgment remains irreplaceable: context gaps, ethical trade-offs, organisational politics, and product judgment.

Legacy knowledge is the hardest one to transfer. The understanding of why a system was built the way it was, what was tried before and failed, what a specific dependency actually does, why a particular architectural choice was made a decade ago: none of that lives in a model. It lives in the people who built and maintained the system. At Atlassian, she noted, the platform is complex enough that the context only exists in the engineers who built it. AI cannot help there.

The other compounding problem is cost. Every query has a price. Every agent run burns tokens. Infrastructure costs are rising, not falling, and organisations are consistently underestimating how quickly AI-assisted development can drain a budget when engineers stop tracking what they’re spinning up.

“Before we put AI to work, we have to make sure that what we’re building is actually what we should be building. If we’re not following that one rule of thumb, there’s more rework, which is engineering pain, but also more money out the door.”

The new bottleneck is alignment

Five specific things are breaking in real teams, in Jovana’s observation:

Code quality is quietly degrading. AI produces code that works but doesn’t necessarily belong. Low-quality patterns accumulate and codebases become harder to extend.
Code review doesn’t scale. When output per engineer doubles or triples, line-by-line review becomes painful and ineffective. Teams need different review practices but are not investing in building them.
Context is being lost at organisational scale. Work that used to take two weeks now takes an hour. Nobody files a ticket. Nobody writes the documentation. The why behind decisions disappears entirely and lives only in the person who did the work.
Non-engineering functions become the new constraint. When engineering accelerates, product decisions, prioritisation, and review capacity become the bottlenecks because they cannot keep pace.

“AI doesn’t eliminate complexity. It amplifies it. The more autonomous our systems become, the more critical human oversight gets, not as control, but as judgment.”

Intelligence is not the same as wisdom

The talk’s most direct argument was about where investment is going wrong at an industry level.

Organisations are pouring resources into larger models, stronger predictions, and more autonomous agents. Machine cognition is being optimised at speed. Human cognition is being assumed to keep up on its own.

Jovana’s position is that assumption is wrong.

Intelligence identifies patterns. Wisdom chooses direction.
Intelligence accelerates answers. Wisdom makes sense of the question.
Intelligence optimises movement. Wisdom decides whether the destination is worth reaching.
As intelligence becomes more abundant, wisdom becomes more scarce and more valuable.
Organisations are investing almost entirely in the former.

“The organisations that win the next decade won’t be the ones with the smartest machines. They’ll be the ones with the wisest humans leading them.”

What to actually do

Five practices Jovana proposed for building an accountability layer that can keep up with the pace of AI development:

Redesign alignment as a first-class phase, not an optional one. Prototyping may be cheap, but shipping still requires alignment. The definition of ready needs to include who owns it, why it exists, what the customer impact is, and what the rollout risks are.
Shift code review from difference review to intent review. Stop asking whether the code is correct and start asking whether it’s the right problem to solve and the right way to solve it.
Make record-keeping automatic, because people have stopped documenting. Who did what, when, why, under what constraints: that context is disappearing from organisations in real time.
Raise the bar on testing and safety. Volume is up, quality pressure is up, attack surface is up.
Adapt roles when a function becomes a bottleneck. If product decision-making can’t keep pace with engineering output, enable others to do that work with the right templates and guardrails.

On day-to-day AI use, she recommended pairing closely rather than leaving agents to run unsupervised. Watching what an agent is doing and correcting it as it diverges is more efficient than letting it run for 30 minutes and discovering the output needs to be rebuilt. “If your code is solid, AI will follow those patterns. If it’s already messy, AI will struggle with it.”

Jovana

To everyone feeling behind

The close was addressed directly to different groups in the room.

To engineers feeling left behind: adapting takes courage. The people who are comfortable staying put are not the ones who will stay relevant.

To junior engineers worried about AI taking graduate roles: curiosity is the edge. Fresh perspectives unconstrained by how things have always been done are still something organisations specifically want. Anthropic, she noted, has said AI is writing around 90% of its code, and they are still hiring and opening offices.

To senior engineers unsure what to learn next: the systems knowledge, the pattern recognition from watching teams make mistakes, the context about why things were built the way they were, the ability to nurse legacy infrastructure back to health. That is what AI cannot replicate. The job is not to compete with AI. It is to direct it.

“Stay adaptive. The people who are adaptive are going to last a long time.”

AI will find the most efficient path. Your judgment as a human is what ensures it’s also the right one.

Talk delivered at SlashNew, Newcastle. Speaker: Jovana Dunisijevic, Senior Technical Evangelist at Atlassian.

Leading successful remote teams: what ten years actually taught us

Callum Whyte opened with a scene most people in the room recognised. Alarm at six. Kid crying. On-call alert from the night before. Traffic into the office for a nine o’clock standup, just to say “no updates from me.” Nine hours in an open plan room that’s hard to focus in. A management meeting about why it’s important to be in the office every day for team culture.

He put up a photo of a water cooler. “This is the only picture of company culture I can find on the internet. Who’s got a water cooler at home?”

His point: the only thing meaningfully different about an office is a water cooler, and most things companies credit for office culture exist anywhere.

Callum is a .NET developer who runs a consultancy called Bump. He’s been leading remote teams for over a decade, including several years before starting his own company. The talk was built on that history, including the parts that didn’t work.

The burnout that started it

Before founding Bump, Callum spent three or four years at a company in London he described as a dream role when he joined. Globally respected in its space, committed to open source, building large-scale software for medical startups and membership platforms. He moved cities for it.

It became a hamster wheel. Two teams, one in London and one in Poland, with genuinely different work cultures: different start times, different attitudes to community involvement, different engineering backgrounds. Nobody resolved those differences. Webcams pointed at both offices were the solution someone came up with. Engineers started using the feed to monitor whether people had read their Slack messages.

He found out how bad things had got when his mother told him at a family gathering that he “looked grey, almost dead”, was how she put it.

Callum Whyte

He quit on December 20, 2017. By December 22, he had his laptop open working on open source code for the first time in years. “The weight of leaving that job was an instant relief.”

Four pillars, built from scratch

Starting Bump meant designing a company culture from nothing. Callum built it around four commitments.

The first was open source and community first. This is not a policy that survives without structure. If you don’t plan and schedule open source contribution the same way you plan client work, it gets dropped under delivery pressure. Bump runs it as a live project in their management tool, with proper scheduling and time allocation. Around 80% of their work now comes through reputation built in open source communities, which makes the investment self-sustaining.

The second was technical quality. Callum’s position from the start: he would tell clients when something was not going to be delivered on time because getting it right mattered more than hitting a date. Not every client accepts that framing, but the ones who do tend to stay. The alternative, he argued, is the pizza party hackathon model where people who never worked on a product get pulled in three weeks before a deadline and generate technical debt at speed.

The third was genuine remote flexibility. Not “hybrid with three office days,” not “flexible with core hours.” Work from anywhere, no timesheets, no standups, no mandatory overtime. The whole operation built around async-first communication. People are given a budget to set up wherever they work best, whether that’s a home office or a co-working space.

The fourth was flat structure. No layers of hierarchy, limited meetings, a hard cap of 25 minutes on any meeting that does happen, with a shared agenda required in advance.

What didn’t work

The talk was notably direct about the things Callum got wrong.

Full async with no regular touchpoint broke down. Without a shared moment, individual contributors stay aligned on their own projects but the team loses collective direction. Nobody knows what anyone else is working on. Work siloes.

Unlimited leave, without guidance, produced the opposite of what he intended. People took less leave in the first year, not more, because they weren’t sure what was acceptable. The current policy sets a legal minimum of four weeks, recommends seven, and requires a manager conversation for anything above that. Average annual leave now runs eight or nine weeks, planned well in advance.

All-hard-problems-all-the-time burned people out for different reasons. A senior engineer eventually asked for some easy work, and Callum realised he’d built a culture that only valued deep technical challenges. People need variety. Refactoring, internal housekeeping, and maintenance work have value for wellbeing even if they’re not intellectually demanding.

One meeting a week

The current structure includes exactly one mandatory meeting per week: 45 minutes on Monday morning. Thirty of those minutes are explicitly unstructured. Whatever anyone wants to talk about. Weekend, weather, memes, whatever. The last fifteen minutes covers team priorities and anything that needs attention.

Callum ends the social segment and moves things along. He described this as being the bad guy, but noted that the thirty minutes of informal connection is what makes the fifteen minutes of alignment actually land.

The meeting sits at 9am UK time, which requires the Australian team to finish slightly later on Mondays. That’s the compromise that makes a globally distributed weekly touchpoint possible.

Slack as infrastructure

Callum’s team uses Slack as their single source of truth. The naming convention is strict and predictable: internal-something for internal projects, project-something for client project discussions, client-something for channels that include external people.

The point of the convention is that someone covering for an absent colleague should be able to navigate the whole organisation without asking anyone. Every project channel has pinned links at the top: GitHub repo, project management tool, functional specifications, environment URLs.

Every project also has a named lead developer whose responsibility is to own the technical requirements, not to do all the work. The lead is actively discouraged from doing the work. The role exists to spread knowledge and give people a clear person to direct questions to, rather than leaving new team members to figure out who knows what.

Callum mentioned that Slack’s built-in working hours feature lets people set when notifications are active. The company policy is that anyone can send a message at any time. Respecting working hours is the individual’s responsibility through their own notification settings. That removes the friction of wondering whether a given message is appropriate to send right now.

Onboarding and mentorship

Onboarding runs across three half-days, not one full day, specifically to avoid information overload.

Day one is introductions and setup.
Day two is technical architecture and code patterns, finishing at lunch.
Day three is no work, and involves getting the new person together with the team in person over lunch or drinks.

“You’ve got to be working closely with someone. You need to make sure you have a bit of a bond. So come the first day you actually start doing proper work, you’ve broken the ice.”

Callum dropped one-to-ones early on as unnecessary ceremony, then reinstated them. His original view was that the team talked constantly about work. What was missing was the personal check-in: are you enjoying what you’re doing, is there anything you need to learn, is there anything that’s not working. Monthly half-hour conversations with a manager or a peer now happen across the team.

An 18-month saga involving a junior developer with a Dell XPS and throttled firmware made a related point: physical barriers to remote collaboration set back junior engineers disproportionately. When someone junior needs more guidance and the connection quality makes screen sharing impossible, the cost compounds quietly over months before anyone identifies the cause.

Your job as a manager is to create the ecosystem where good people can do their best work. You hired them for a reason. They can look after themselves if you get the conditions right.

AI and standards

Callum’s team has moved away from enforcing code standards purely through static analysis and Git hooks. His observation: junior engineers have replaced Stack Overflow with Claude. They copy from it the same way they used to copy from Stack Overflow.

The approach now is to make the standards readable by both people and AI. README files at the root of every repository explain the architectural decisions and the reasons behind them. Claude reads the README the same way a developer does, which means the patterns get followed whether a person or an agent is doing the work.

For team-wide standards, the process is deliberately collaborative rather than dictated by a senior engineer. People get on a call, discuss what good practice looks like, and an AI note-taker captures the conversation. The transcript goes into Claude with any existing standards documentation. Claude distills the result into a structured standards file, which gets updated over time.

Code review rules are similarly explicit: no more than three comments per pull request. More than that means either the pull request was too large or the feedback warrants a call. Verbal discussion is better than written comment threads for anything substantive, and calling someone is better still for junior developers who might read written feedback more harshly than intended.

On burnout and who remote work suits

Callum was direct that remote work is not suitable for everyone and that the surveys disagreeing with his model are real.

A 2022 study found 62% of tech workers burned out. People working remotely report higher rates of loneliness, exhaustion, and irritability. That doesn’t invalidate remote work, but it means managers need to watch for it rather than assume flexibility equals wellbeing.

The team moved from Trello to ClickUp partly because some team members with ADHD found Trello’s undifferentiated lists difficult to work with. ClickUp lets people view only their own backlog, see clear deadlines, and track progress in a way that feels like forward movement rather than a wall of undone work.

Remote work requires self-direction in a way that office work doesn’t. Callum described a recent candidate interview where a front-end engineer, knowing Bump was a .NET house, showed up having already read a .NET book. That curiosity was an immediate signal.

“Someone that’s not necessarily self-motivated to go and learn is going to struggle more in a remote team. And that’s okay. You just need to work that out ahead of time and make sure they find a company that is actually a better fit for them.”

Talk delivered at SlashNew, Newcastle. Speaker: Callum Whyte, founder of Bump.

Trustworthy AI: keeping humans at the heart of intelligent systems

Michelle Sandford from Microsoft opened with a concession. She acknowledged that several speakers at the conference had noted something about advocates for AI: they get paid to be optimistic. Michelle works in developer engagement at Microsoft, so the conflict of interest is real. She named it directly, then made her case anyway.

Her talk was not a product pitch. It was a sustained argument for a single idea: trust is the foundation everything else is built on, AI changes what trust requires, and engineers who understand this are not replaceable.

You own the code

The phrase Michelle returned to throughout the talk was: you own the code.

When a coding agent makes a change in GitHub Copilot, the commit record shows two things: the change was made by a coding agent, and your name is on it. That is not a quirk of the UI. It is the accountability structure. If it goes to production and something breaks, your name is in the audit log.

The consequence is direct: you do not approve a pull request you do not understand. The agent will write the pull request for you, with deep links to all the changed code, with a full summary of what was done and why. If you read that and still do not understand something, ask the agent to explain it better. Ask it to explain it like you are five. There is no longer any excuse for approving a change you cannot account for.

Michelle contrasted this with change advisory board processes she ran at IBM years earlier. Engineers would submit a change request and leave the call. If the reviewers did not understand it, they would push it through anyway under time pressure, with assurances that the engineers said it was fine. That world no longer exists. The information is in the pull request. The agent will explain anything you ask. Approving without understanding is now a choice, not a constraint.

“If you just click approve and it goes to production and something goes wrong, that’s your fault. That’s your name. You are the one who has cost your company millions of dollars.”

What makes you irreplaceable

Michelle posed the replacement question directly. If AI is doing more and more of the technical work, and someone only needs to click approve, why does that job require an experienced developer at all?

It does not. Her position was unambiguous: if your contribution to a codebase is clicking yes without reading what the agent has done, you are replaceable by anyone who can click a button.

What makes you irreplaceable is the accumulation of knowledge you bring to the review. Your understanding of the system’s history, why certain decisions were made, what was tried before, where the dependencies are fragile, what the business actually needs versus what the ticket says. That judgment cannot be generated. It has to be brought.

She told a story from a conference last year. She had spent eight attempts trying to get a demo working, then asked a junior colleague to look at it. The junior, Jia, spotted a capital letter in a file path. Michelle and a senior colleague of the same vintage had both missed it because they were applying the same learned troubleshooting patterns. Jia saw it because she was not filtering through years of assumptions.

Early career people bring something different. Senior people bring something different. Neither of those things is replaceable by pattern completion.

Michelle

The three waves

Michelle described three stages of how AI has been used in software development.

The first: write the code you were going to write anyway, just faster. Boilerplate, autocomplete, filling in the obvious parts.
The second: delegate specific tasks to the agent while you work alongside it. Write this function. Write these tests. Fix this bug. Refactor this. You are directing, it is executing, and you are watching.
The third, where teams are now: asynchronous agentic development, where you are the architect assigning multiple agents to simultaneous tasks.

The risk in wave three is that the speed creates pressure to skip the review. Approve, go, push, repeat. Michelle’s point was that the accountability requirement does not diminish as the pace increases. It becomes more important, because more is happening faster, and more can go wrong faster.

“Those of you that stay awake, that stay human in the loop, are the ones that will survive and thrive in this new world.”

Human in the loop has more than one meaning

Michelle distinguished between four governance positions:

Human before the loop: a person writes the instruction; the agent executes without further oversight.
Human in the loop: a person watches the agent work and approves actions as they happen.
Human on the loop: a person monitors an agent working independently and intervenes when needed.
Human after the loop: audit happens after the fact.

Which position is appropriate depends on the risk level of what is being automated. Auditing after the fact is only acceptable for things that are genuinely low risk and low consequence. For anything else, a human needs to be present in the process before a change reaches production.

She flagged watching agents think as an underrated practice. When an agent is working through a problem, its reasoning is visible before the final pull request summarises it. The final summary is professional and clean. The working notes often show something more useful: what it noticed, what it was unsure about, what it decided to do about something unexpected. “It sounds like the snarkiest dev you have in your team.” That commentary disappears from the formal output. If you are watching, you have it.

For a new project where the instructions and context are not yet solid, she recommended watching each agent individually rather than running a fleet in parallel. Establish that the instructions work before assigning multiple agents to run unsupervised.

The six principles and the one people skip

Michelle briefly covered Microsoft’s six responsible AI principles: fairness, transparency, accountability, reliability and safety, privacy and security, and inclusiveness.

The last one gets treated as an afterthought. It should not be.

She used a historical example: fighter planes designed for the average man during the second world war. Nobody fitted them. Too short to reach some controls, too tall for others. Seatbelts designed for the average male body killed women and children until the design was changed.

Designing for the average user means designing for nobody. Edge cases are not exceptions to plan for later. They are the test of whether your system actually works.

“When you are building your stuff, try to think about designing it for your edge cases, because everyone is a potential customer, everyone is a potential user.”

Accessibility features built in from the start tend to benefit everyone. Subtitles were designed for deaf users and are now used widely by people who are not. Audio description, voice control, and adaptive interfaces all follow the same pattern. Designing for the most constrained user often produces a better product for everyone.

The race to the bottom is not a race anyone can win

A colleague told Michelle at a recent event that the obvious play for smaller development companies was to cut teams from five developers to two, using AI to cover the difference. Cheaper, faster, same output.

She pushed back.

A company that uses AI to do yesterday’s work cheaper and quicker will not exist in two years. The opportunity is not cost reduction. It is doing things that were previously impossible. Every development team has a backlog of projects that never got resourced. AI does not reduce the team needed to work through that backlog; it makes it possible to work through it. The question is not how to do the same thing for less. It is what becomes possible now that was not before.

“Enable your developers to do more now that they can do more. The race to the bottom is not a race anyone can win.”

The test before you approve

Michelle closed with a single question she uses as a personal standard before she approves anything going to production: would I trust this with my own family’s data?

If the answer is no, the principles have not been applied properly. If your organisation does not give you the ability to say no, that is the conversation to have right now, before anything else moves forward.

She noted that governance, compliance, and security work are not as tedious as they used to be. Tools assist with all of it. But “don’t check the boxes while sleeping.” The assistance exists to make it easier to do the work properly, not to do it instead of you.

If you are that person that is just clicking yes, yes, do, without reading anything, you are replaceable by a couple of kids off the street.

Talk delivered at SlashNew, Newcastle. Speaker: Michelle Sandford, Developer Engagement Lead at Microsoft.

Paving the road for the robots: why platform engineering is the prerequisite for AI delivery

Nick Williams, a principal consultant and technology principal at Equal Experts, moves between organisations of different sizes, industries, and levels of readiness. The talk he brought to SlashNew was built from that breadth of observation: a detailed account of what is going wrong with AI-assisted software delivery, why it is going wrong, and what the actual fix is.

The thesis is not complicated, but the industry is consistently avoiding it. You cannot let AI drive on a road that does not exist.

The 500-day problem

A common LinkedIn story goes like this: we built a full MVP in a day. Used to take six months. Incredible. Nick spent about two years disagreeing with those stories, assuming they were exaggerated. He now thinks many of them were real.

What he almost never sees is the day-500 version of the same story. How is that MVP doing now? Is it still delivering value? What does the codebase look like after 18 months of AI-assisted development?

His observation from the organisations he works with directly: the ones with genuine 500-day success stories already had the infrastructure before they started. They already had standards, guardrails, and frameworks in place. Delivering quickly and well with AI was not a surprise to them because it was not new to them. They do not write LinkedIn posts about it because it is not a novelty.

The organisations producing the more visible success stories at day one are frequently the ones accumulating the most invisible problems by day 500.

What platform debt looks like when AI makes it visible

Nick walked through a real case, anonymised but specific. A mid-sized company, four or five engineering teams, gave everyone a Cursor licence and felt the acceleration immediately. Everything felt fast. Then the problems started appearing.

Fifty components in the stack, each built slightly differently. Four different logging frameworks. Different config loading patterns. Different health check endpoint conventions. Individually, any one of those components looks fine. Across the whole estate, the divergence creates real operational problems.

The health check example was concrete. The organisation had a long-standing convention, documented somewhere in Confluence, that health check endpoints should be at a specific path and that logs, traces, and metrics from those endpoints should be dropped to keep storage costs manageable. That convention held for years because developers would copy existing repos when adding health checks to new services, picking up the pattern incidentally.

AI broke that copy behaviour. When a developer now asks an agent why a Kubernetes deployment is failing, the agent correctly identifies the missing health check endpoint and adds one, on a different path, without inheriting the logging suppression rule. The service works. The logs start appearing. Storage costs climb. Latency measurements get contaminated because a health check polling every second with a one-millisecond response distorts the baseline, masking a real four-second user experience problem elsewhere.

“None of that is AI’s fault. But it’s allowed a change in behaviour from institutional knowledge and people talking to one another to a world where people are talking slightly less and copying slightly less. Every problem gets a novel solution now.”

What AI cannot see

The root cause Nick identified is that AI has no access to invisible effort.

Most organisations operate as a platform built out of people. There is a person who knows how the infrastructure runs. There is a person who knows why that configuration decision was made. There is a person on Slack who replies first when someone has a question. That knowledge is institutional and informal, and it works until it does not.

AI cannot access any of it. For an agent to work effectively, it needs three categories of information: what to do, what the rules are, and what already exists. In most organisations, the rules and the map of what exists live in people’s heads, with some poorly maintained documentation as backup.

Nick grouped organisations broadly into three states. The first is genuine chaos: no real standards, every component its own snowflake, which is fine for individual components but creates compounding problems across a large estate and makes incident response extremely difficult (he cited the Log4Shell crisis as a concrete example of trying to find undocumented systems at speed). The second is convention: someone wrote down how things should be done, there are five versions of that document in Confluence, broadly consistent but frequently outdated and unevenly applied. The third is a proper digital platform: everything is encoded, the right abstractions are in place, and compliance is automated rather than hoped for.

Most organisations working with AI are at level zero or one.

Williams

Why soft guardrails do not scale

The common response to the context problem is to add more context: a skills file for infrastructure conventions, a skills file for security posture, another for information governance, another for API design standards. Nick showed a real screenshot of the agent and skills setup one team had assembled, with personas for a backend architect, a compliance officer, a front-end architect, a performance engineer, and more.

He does not dismiss this approach entirely. For a single component, in a controlled test, it works. The problems emerge at organisational scale and over time.

First, it is expensive. Tokens are consumed rediscovering solved problems on every run. Each team ends up maintaining its own slightly different version of the same skills. When policies change, updating and distributing a library of skill files across an organisation is real engineering work.
Second, it produces variance. Nick ran a controlled experiment comparing three conditions: no context at all, a well-structured set of skills and agents, and a platform SDK. The skills-and-agents setup produced massive variation between runs given the identical prompt on the same day. An LLM reading a skills file and inferring what to produce will produce something different each time, depending on temperature, model version, and random variation in the probability space. A platform SDK that deterministically generates code has no such variation.
Third, it is exclusionary. Skill files and agent configuration live in repositories. In most organisations, that means only developers have access. The same standards information that used to live, however badly, in Confluence, accessible to solution architects and business analysts and support staff, is now locked in markdown files that only engineers see.

“Guardrails you can only lean on at the point of deployment, when the code has already been written and the agent already made its choices, are late. If the first time I find out I cannot do something that way is when the build pipeline blocks me, I have to go back and redo the work.”

What a hard platform actually does

Nick showed a domain-specific language built as a platform SDK for a client engagement. The example was deliberately simple: an HTTP API definition in a few lines of code, with a single endpoint handler. That is all the engineer writes.

What they do not see, but what the platform provides automatically: default security controls applied uniformly, JWT header validation, subnet restriction, default rate limiting. No way to accidentally omit them. No way to choose different defaults without an explicit and documented override. A Terraform deployment module is generated without the engineer needing to know the module exists or how to write Terraform. A build pipeline is configured. A Backstage catalogue entry is generated and updated. A run book scaffold is created. Log IDs are enforced by the type system, which means log searches for support staff can be generated automatically.

The engineer’s job is to implement the feature. The platform handles everything else, deterministically and identically for every service in the estate.

He framed this with a phrase from the functional programming world: constraints bring freedom. Removing all the non-functional concerns from the engineer’s context window means they can focus entirely on the problem they actually need to solve.

The experiment results

To test the comparison directly, Nick and colleagues ran a real geospatial feature implementation repeatedly across three conditions. No context, skills and agents, and platform SDK. They scored results on feature correctness, similarity across runs from the same prompt, and runtime behaviour consistency.

The skills and agents condition showed very high variance. AST similarity and n-gram similarity measures, borrowed from plagiarism detection, showed that the same prompt with the same skills produced substantially different output on successive runs.

The platform SDK condition converged. The feature logic varied as expected, since that is genuinely different for each problem. The structure, the tests, the non-functional behaviour were consistent across every run.

The platform SDK condition also cost significantly less. Partly because fewer tokens were consumed writing boilerplate that the platform generated deterministically. Partly because the constrained, predictable output meant the implementation quality held even when run on a lower-cost model. The skills-and-agents condition required a larger, more capable model to produce acceptable results.

Nick noted the comparison excludes the cost of building the platform itself — substantial, and deliberately acknowledged.

The argument for doing it now

Nick made the case that platform engineering has always been the right investment for human engineers. The fact that most organisations have not built proper platforms is a historical failure, not a considered choice.

AI makes that failure impossible to defer. An agent cannot call a senior engineer on Slack to ask why the infrastructure is configured a certain way. It cannot absorb institutional knowledge by sitting near the person who holds it. If the information is not encoded and accessible, the agent cannot use it.

“We should have valued people more and done more of this to make their lives easier for quite a long time. I’m really glad it seems to be starting to happen now. But building with the human and the AI in mind is just good design. Your humans will benefit from it too.”

The conclusion was direct: if you want AI to work at scale across teams without accumulating a new generation of snowflakes that will need migrating in two years, build the road before you let the AI drive.

Constraints bring freedom. Take all the non-functional complexity away from the engineer and you give them the entire freedom to solve the problem they actually need to solve.

Talk delivered at SlashNew, Newcastle. Speaker: Nick Williams, Principal Consultant and Technology Principal at Equal Experts.

Cleared for takeoff: what two near-misses at Melbourne Airport taught me about communication

Jack Skinner opened with a boarding announcement and spent the next half hour using aviation as a lens for something every engineer in the room does badly: assuming shared context.

The talk was built around a real ATSB safety report, publication AO-2023-043, investigating two runway excursions at Melbourne Airport in September 2023. Nobody was seriously hurt. If you were a passenger on either flight, you would never have known anything happened. The plaincrash that didn’t happen.

What happened

Melbourne Airport was partway through a multi-year runway resurfacing program. The northern end of runway 34 had been shortened with a displaced threshold, marked by six red lights in two groups of three. The information was broadcast through every available channel: notems in the flight briefing pack, ATIS radio updates on a continuous loop, preloaded distances in the flight deck computers.

Two separate flight crews, two weeks apart, both missed it.

The first was a Malaysian A330 bound for Kuala Lumpur. The crew had notems and ATIS messages noting the displaced threshold, but used the full runway length in their calculations anyway. Contributing factors identified in the investigation: nineteen notems across three pages, all marked “no special effect to the flight” in the summary table, including the one that was not; a split screen display in the cockpit that may have affected how the crew paged through the list; no explicit mention of changed runway conditions from air traffic control; no signage at the southern departure taxiway to indicate the reduced distance.

That aircraft rotated approximately 170 metres before the works limit line and crossed over the construction zone seven metres overhead. Workers on the ground were shaken. The flight crew, receiving no instrument feedback and no contact from ATC, continued to Kuala Lumpur unaware anything had occurred.

Two weeks later, a Bamboo Airways 787 bound for Hanoi did the same thing. The crew had been running 70 minutes late due to equipment failures, were operating under high workload and time pressure, and skipped a review step they’d normally complete. They rotated past a taxiway intersection and cleared the displaced threshold with 4.5 metres of clearance. One construction worker reported a pressure wave injury from the proximity of the aircraft. The crew discussed the runway lights appearing closer than normal after liftoff, concluded everything was fine, and flew on to Hanoi.

Why the mitigations failed

Jack was careful to frame this as a blameless post-mortem. Both crews performed independent dual calculations that were cross-checked. Multiple communication channels presented the relevant information. The airport’s risk assessment had correctly identified the worst credible outcome and put mitigations in place.

The mitigations failed because of how humans actually process information under load.

Notems are notoriously difficult to parse. The FAA lists them in its top five hazards in the airspace system specifically because pilots and controllers struggle to distinguish applicable from non-applicable information. Nineteen notems under one heading, all summarised as having no special effect on the flight, creates a pattern that conditions readers to stop reading carefully.

ATIS acknowledgement confirms currency of information, not comprehension. A crew can confirm receipt of broadcast Oscar and still not have registered what Oscar contained.

After the first incident, a proposal to modify controller phraseology was declined. This sounds like negligence until you understand the constraint: aviation phraseology is standardised globally in English specifically to prevent misunderstandings across international crews. A Melbourne tower improvising new language mid-program would introduce a different category of risk, since crews from dozens of countries would not be expecting it.

The software parallel

Jack closed with a client story. A company had built an AI-assisted development pipeline: AI-enhanced requirements producing long specification documents, AI translation and summarisation between teams, AI transcription of Loom videos as handoff documentation. The technical output was genuinely good. Architecture sound, tests correct, code solid, review process intact.

What eroded was shared understanding. Nobody was collaborating across the chain. Context was being processed, summarised, and passed forward, but the chain of mediated communication meant quality slipped in ways nobody could quantify. Long scrolls of accurate text were not producing alignment.

The parallel to the runway was deliberate. The notems were accurate. The ATIS was accurate. The preloaded distances were accurate. None of that was sufficient to prevent two aircraft from nearly landing on construction workers.

“Context is complicated. It’s relative. It shifts. It decays. Understanding the audience you write for and who listens to your stories is so important.”

His argument: AI accelerates in whatever direction you point it. If the shared understanding is already degrading, AI will accelerate that degradation. The ability to empathise, communicate, and build clarity across teams is not a soft skill that AI will eventually handle. It is the thing that makes everything else work.

Talk delivered at SlashNew, Newcastle. Speaker: Jack Skinner, consultant CTO.

The reality gap: what AI actually needs to work at scale

Charlotte Fleming, researcher in developer relations at Octopus Deploy, closed the conference with findings from the AI Pulse Report she wrote earlier this year. The report combines primary survey data from 379 respondents with a review of major industry surveys including DORA, Stack Overflow, and JetBrains. The picture it draws is one most organisations will recognise: heavy investment, patchy returns, and a gap between what AI does well and what developers actually need.

The gap between perception and performance

90% of tech professionals use AI at work. 47% use it daily. 67% of executives say they would trade headcount for a 50% productivity boost from AI tools. Spend is matching the enthusiasm, with some organisations committing between 1 and 8% of total revenue to AI tooling.

Against all of that, only 1% of leaders rated their organisations as mature on AI deployment. Even accounting for survey timing, Charlotte estimated the real number is probably around 10%.

The productivity data is where the gap becomes concrete. 58% of developers believe AI saves them time, with perceived savings estimated at 20 to 24% of working hours. A Meta study measuring actual performance on complex tasks found developers were 19% slower with AI tools, not faster.

The frustrations are consistent. 44% of developers say AI solutions are almost right but not quite. 30% say debugging AI-generated code takes longer than writing it themselves. 13% report they have become less confident in their own problem-solving since using AI. That last figure points at something beyond inefficiency: a feedback loop where reliance on AI erodes the judgment that makes AI useful in the first place.

What AI does versus what developers want

Charlotte mapped two lists against each other. What AI does well: writing repetitive code, generating tests, producing documentation. What developers actually want help with (from DX and DORA research): environment setup and maintenance, writing and maintaining tests, task tracking, security and compliance, and management overhead.

The lists barely overlap.

Agentic AI is starting to close some of the gap, but optimising AI tooling before the delivery infrastructure is ready won’t produce the compound returns organisations are expecting.

The talent paradox

73% of organisations surveyed by Octopus have reduced junior developer hiring over the past two years. Google and Meta have each cut junior hiring by roughly half compared to 2021.

The arithmetic looks clean on a spreadsheet: senior salaries run 1.4 to 2.1 times higher than junior, juniors require significant training investment, and AI tools cost around $200 per developer per month. Seniors plus AI looks like the obvious optimisation.

The cost that does not show up on the spreadsheet is three to five years away. Senior developer capability takes five to seven years to develop from a junior starting point. Charlotte identified a three-phase trajectory: broad experimentation (current), ongoing optimisation with continued hiring freezes, and a future senior skills shortage that will show up first as wage inflation. The industry has seen this pattern before after the dot-com downturn and the post-pandemic correction. Each time, the shortage appeared two to three years after the hiring freeze.

Charlotte

Why continuous delivery is the prerequisite

The organisations generating real returns from AI investment had one thing in common: they built fit-for-purpose automated delivery infrastructure before adding AI into the mix. Only 21% of AI pilots make it to production. Only 5% of organisations are generating compound returns. The BCG research Charlotte cited found 74% of companies struggling to scale AI value at all.

The theory of constraints applies here. AI accelerates code generation, which sits upstream of code review. Code review has traditionally been bottlenecked by senior developer availability. Feeding more AI-generated code into a review queue that has not grown makes the bottleneck worse, not better.

The DORA 2024 data captured this directly: when AI adoption increased by 25%, delivery throughput dropped 1.5% and delivery stability dropped 7.2%. The AI was not failing. The surrounding processes could not absorb the velocity. By 2025, as practices matured, the throughput effect had flipped to positive.

Continuous delivery practices, specifically continuous integration automation, build automation, deployment automation, and observability, provide the structured, repeatable context that lets AI be useful across teams rather than in isolated pockets.

“It’s not about the AI tools, it’s the system that you’re putting them into.”

More AI means more oversight, not less

89% of organisations have had an AI-related production incident. 25% have had a production outage caused by AI-generated code. Security, auditability, and compliance do not get easier with AI in the pipeline; they get harder.

When a human writes a deployment, the audit trail captures who wrote it. When an AI suggests one and a human approves it, the trail has to capture both. The radius of an AI-suggested action that goes wrong is potentially larger than a human one because AI acts faster. Governance has to scale with that speed.

Charlotte drew the boundary clearly. Where AI compounds value: pattern recognition, boilerplate, unit test generation, log analysis. Where human judgment is required: architectural decisions, business context evaluation, production incident response, mentoring, trade-off decisions. That judgment has to be developed, not assumed to be present.

Human oversight only works if the humans doing the overseeing know enough to oversee. The talent pipeline problem and the oversight problem are the same problem.

Agentic AI on top of a broken process is only going to amplify the broken process. Agentic AI on top of a well-automated pipeline is going to amplify the value.

Talk delivered at SlashNew, Newcastle. Speaker: Dr. Charlotte Fleming, Researcher in Developer Relations at Octopus Deploy.

Recent content

Popular topics

Newcastle's /NEW 2026 Day 2