Halaxy AI, built to improve clinical care

Designing a native suite of AI tools that cut clinical documentation time so practitioners could spend less time on paperwork and more time with their patients.

Role

Lead Product Designer

Company

Halaxy

Context

Background

Halaxy is a global clinical practice management platform used by 40,000+ practitioners across 98 professions in 79 countries. This work sits at the heart of what Halaxy is trying to do, to make healthcare administration simpler so practitioners can spend more time with patients and less time on paperwork.

As AI started changing what clinical software could do, there was a real opportunity to build something most competitors hadn't tried, a native, compliant AI assistant built directly into the clinical workflow rather than added on from outside.

I was one of two designers on the team, reporting directly to the CEO and co-founder. I owned the AI features from initial scoping through to QA and release.

The Problem

Practitioners were drowning in paperwork

Practitioners were spending a significant part of their day on documentation rather than patient care

  • Writing notes manually during or after every appointment meant less time with patients and a real risk that important context got missed
  • Reviewing long referral and pathology documents to pull out critical details was slow and interrupted clinical focus
  • During patient handovers or when picking up a patient not seen in a long time, getting a clear picture of their history meant manually reading back through every note one by one
  • Clinics using third-party AI scribes faced broken workflows, manual copy-pasting between tools, and zero visibility for practice managers on billing or usage

The Approach

Three focused tools, one shared principle

Rather than one large feature, we built three focused tools that together covered the full documentation picture. The same principle ran through all three. Keep the practitioner in control, make every output editable before it touches the record, and never store sensitive data longer than necessary.

  • AI Scribe: Transcribe and summarise appointments in real time.
  • File Summary: summarise any document uploaded to a patient record.
  • Multi-Note Summary: generate a consolidated overview of a patient's full clinical history.

A thread running through all three tools was template flexibility. We built a library of templates covering common clinical workflows and professions, SOAP, BIRP, DAP, ADHD assessments, GP consultations, and more.

Practitioners could use these out of the box, edit them to match their preferences, or build entirely new ones from scratch using plain language instructions and placeholders. The AI adapted to how each practitioner worked, not the other way around.

Validation

Beta Testing

Before rolling out to all users we ran a closed beta with a selected group of clinical practices. The goal was to understand how practitioners actually used the features in a real setting, what issues they ran into, and what was missing.

What we learnt;

  • Honest feedback on how the features performed in real clinical workflows across different professions.
  • Issues and edge cases we hadn't encountered in internal testing, for example long periods of silence during phsycology appointments which would make the audio sessions pause.
  • A set of real-world clinical templates from practitioners that we built directly into the product template library.
  • Refinements to summary output quality based on how different specialties actually write notes

AI Strategy

How we approached model selection and iteration

We didn't start with the most powerful model. We started with the right one for the job, and upgraded as the requirements grew.

Phase 1

Start Simple

Built File Summary first, the most contained problem. Used strict JSON structures for predictable, reliable outputs.

Phase 2

Hit the limits

Strict JSON couldn't handle custom templates. Loosening the prompt structure introduced inconsistency and unreliable outputs.

Phase 3

Upgrade and refine

Upgraded to Claude Haiku for better instruction-following and context handling. Combined with refined prompts, output quality went from 40% to 95% in testing.

Compliance

Built for healthcare from day one

Designing for healthcare meant holding ourselves to a higher standard. These weren't constraints we worked around, they were principles we designed in from the start because the people using this product are making clinical decisions.

  • Live audio streamed and never stored, deleted immediately after transcription
  • AWS opt-out policy ensuring patient data is never used to train models
  • Every AI output lands in an editable field before touching the patient record, always keeping a human in the Loop.
  • Every agentic action requires a confirm step before execution
  • Persistent disclaimers and a feedback loop to monitor output quality over time

Feature 1

AI Scribe and Summarising Tool

The centrepiece of the suite. It needed to work reliably in a real clinical environment, noisy rooms, long consultations, multiple speakers, and all kinds of devices.

Live transcription and speaker identification

Practitioners could follow along in real time rather than discovering transcription problems at the end of a 45-minute consultation. Speaker identification kept the transcript readable, clearly separating practitioner and patient throughout.

Summarise with a template

Once the recording was complete, practitioners selected a template and generated a structured summary from the transcription.

The AI used the template instructions to shape the raw transcription into a properly formatted clinical note ready to review. They could use one of our industry standard templates or edit a template to fit their own needs and workflows.

Injecting the AI Summary to the Clinical Note

Once happy with the summary, practitioners injected it directly into the clinical note with a single action. It landed in an editable field, nothing was written to the record until they had reviewed, edited if needed, and then they could publish the note.

Clinical templates, editable templates, and custom builds

We knew a single generic output format would never work across 98 professions. A physiotherapist writes notes differently to a GP. A psychologist using BIRP has different needs to a paediatrician using a custom intake format.

So we built industry standard templates, which pracitiotner could either edit to fit there need or come up with a fully custom template based on their unique needs.

  • Halaxys Templates: A library covering the most common clinical formats: SOAP, BIRP, DAP, GP consultation notes, ADHD assessments, and more. Ready to use out of the box.
  • Editable templates: Practitioners could take any built-in template and adjust it, adding sections, changing terminology, or restructuring the layout to match their preferences.
  • Custom from scratch: Using plain language instructions and placeholders, practitioners could build entirely new formats tailored to their specialty or workflow.

AI Scribe and Summarising Tool Performance

1900+ groups using Scribe
2,500+ Consults/Week
19.5% MoM Growth
11,000+ Transcripts in March

Feature 2

AI File Summary & Agentic Flows

Summarising documents sounds simple until you're dealing with 6+ page PDFs, dense referral letters, and pathology reports with complex formatting.

How the agentic flow works

Rather than asking one model to do everything, we used two models working in sequence. This kept each model focused on a single task and helped manage context limits and improve reliability.

While the backend handles a two-model handoff, the UI is clear and focused. The practitioner only sees a single modal with a pre-filled form, keeping the task simple.

AI File Summary Performance

2,900+ groups using File Summary
31,000+ documents summarised
14% month-on-month growth

Feature 3

Patient Multi-Note AI Summary

Multi-Note Summary gave practitioners a quick way to get across a patient's full history without reading back through every individual note.

Most useful during handovers, when picking up a long-term patient, or preparing for a complex consultation.

Practitioners could generate a summary across all note types or filter by specific types like prescriptions, pathology orders, or clinical observations for a more targeted view.

For patients with a long history the feature could potentially hit the AI's 200,000 token limit. Rather than letting it fail silently, we added date-range filters up to 24 months and a clear warning when a summary was too large, giving practitioners the option to narrow the range. The constraint became a design decision rather than a technical failure.

Like the other features, the output could be shaped using Halaxy's built-in templates or any custom template the practitioner had created themselves.

Patient Multi-Note AI Summary Performance

400+ uses across 160+ groups in the first Week

The Problems

Issues we had to solve for

1. Needing to switch from JSON to flexible templates

We started with a strict JSON structure to control the AI output. It worked well for a fixed, predictable result but that was also its problem. When we wanted practitioners to be able to choose different templates and build their own, the hard coded JSON structure didnt work.

We switched to natural language prompts, which meant practitioners could write templates in plain English and swap them across any of the AI features. A template built for the AI Scribe could be used in File Summary or Multi-Note Summary without any changes.

Amazon Nova Lite, which we had been using for its speed and low cost, struggled to follow the more flexible prompt instructions reliably. Upgrading to Claude Haiku and refining the system prompt fixed this, taking output quality from around 40% passing in testing to 95%.

2. Collecting Feedback

Errors were easy to catch but output quality was harder to measure. We added a thumbs up and thumbs down reaction to every AI output, with a modal to capture written feedback when something was off. This gave us a direct line to how the features were performing in real clinical use rather than relying on error logs alone.

3. High Infrastrure Cost

We built the AI Scribe on Amazon Transcribe because it let us prove out the feature quickly without worrying about infrastructure. Once the Scribe took off and usage grew faster than expected, the cost of running at scale became a real concern. We migrated to a self-hosted NVIDIA Parakeet model, which gave us more control over performance and cut transcription infrastructure costs by 87%.

Impact

Results

4,100+

Practitioner groups using AI features

1,900+

Groups using AI Scribe

11k+

Transcripts in March alone

31k+

Documents summarised

87%

Infrastructure cost reduction

95%

Output quality in testing

Reflection

What I learned

The biggest lesson from this project was that designing AI features well is mostly about what happens around the AI output, not the output itself.

The review steps, the escape hatches, the feedback loops, the guardrails.
Practitioners needed to feel like they were always in control before they'd trust the system enough to use it every day. That thinking shaped every decision we made across the suite.