Process

How to Implement AI Document Classification in SharePoint for Australian and New Zealand Accounting Firms

Learn how to implement Microsoft Purview sensitivity labels for automated document classification in SharePoint. Practical guide for AU/NZ accounting firms with compliance considerations and step-by-step configuration.

The Hidden Cost of Document Chaos

A mid-sized Sydney accounting firm recently discovered 47 versions of the same client tax return scattered across SharePoint folders, email threads, and desktop drives. Three versions contained outdated TFN data. One made it into a board presentation. Nobody knew which was current until a junior accountant spent six hours reconstructing the timeline.

This isn't an unusual story. It's the daily reality for most accounting practices.

Manual document handling creates compliance exposure that Australian and New Zealand firms consistently underestimate. With AML/CTF obligations extending to professional services from July 2026, firms need systematic approaches to classify, protect, and retrieve sensitive financial documents—approaches that don't rely on staff remembering which documents need encryption or access restrictions.

Microsoft Purview sensitivity labels offer AI-powered classification that applies consistent handling rules automatically. But like any technology implementation, the value comes from thoughtful design rather than just switching on features.

Why Document Classification Is a Systems Problem

When we work with accounting firms on operational efficiency, document management rarely surfaces as the presenting problem. Partners tell us about compliance concerns, search frustrations, or staff spending too much time on administrative tasks. But when we map the underlying systems, document classification often emerges as a root cause.

Think of it this way: every document in your firm exists within a system of flows and feedback loops. Documents are created, modified, shared, stored, and eventually archived or destroyed. Manual classification introduces friction at every stage:

Human inconsistency - Different staff members make different classification decisions for identical documents. There's no feedback loop to correct these variations.

Time waste that compounds - Microsoft testing found manual document processing takes 10-15 minutes per document when staff must read content, determine sensitivity, apply protections, and file correctly. Across thousands of client documents during tax season, you're burning hundreds of billable hours on administrative overhead.

Compliance gaps that hide until audits - When regulators request all documents containing specific client data, manually managed systems require staff to search multiple repositories, guess at folder structures colleagues created, and hope nothing was misfiled.

The real risk isn't any single misclassified document. It's the absence of a reliable system that creates predictable, consistent outcomes.

Compliance Requirements for ANZ Accounting Practices

Before designing any classification system, you need to understand the regulatory landscape it must support.

Australian Requirements

The Accounting Professional and Ethical Standards Board (APESB) sets ethical standards for Chartered Accountants ANZ, CPA Australia, and IPA members. These standards mandate secure handling of client information, though they don't prescribe specific technical controls.

The most significant upcoming change: AML/CTF obligations extend to accounting services from 1 July 2026. AUSTRAC is conducting a national education campaign to prepare firms for new reporting requirements, customer due diligence obligations, and record-keeping standards that demand systematic document classification.

New Zealand Requirements

The External Reporting Board (XRB) requires companies to maintain complete and accurate accounting records for at least seven years. Failure to meet filing and audit requirements can result in fines up to NZD 50,000 and referral to regulatory authorities.

What This Means for Classification

Your classification system needs to:

Identify documents containing regulated information (TFN, IRD numbers, financial data)
Apply appropriate protections automatically
Maintain audit trails of who accessed what and when
Support retention and destruction policies

Manual processes can't deliver this reliably at scale. That's where AI-powered classification becomes valuable—not as a replacement for human judgment, but as a system that applies consistent rules without the inconsistency inherent in manual approaches.

Struggling to map your firm's document workflows?
Our AI Discovery Workshop helps accounting firms identify where document handling creates the biggest compliance risks and efficiency drains. We'll map your current processes and identify the highest-impact opportunities for automation.
Book a Discovery Workshop →

Understanding Microsoft Purview Sensitivity Labels

Sensitivity labels are persistent metadata that travel with documents across Microsoft 365 services. When you apply a label to a client tax return in SharePoint, that label follows the document into Teams channels, email attachments, and Power BI reports.

Labels enforce protection policies automatically. A "Highly Confidential - Client Financial Data" label can:

Restrict document access to specific staff members
Require encryption
Prevent copying or printing
Block external sharing
Log all access attempts

Users don't need to remember which protections apply—the label handles enforcement.

Client-Side vs Service-Side Auto-Labeling

Client-side auto-labeling happens on the user's device, giving users a recommendation or automatically applying a label before a document is saved. This catches sensitive content at creation time but requires compatible Office applications.

Service-side auto-labeling applies labels automatically after content is saved, with no user interaction required. This method works on existing document libraries, applies to files users never open, and handles legacy documents uploaded before you implemented classification.

For accounting firms with years of client documents already in SharePoint, service-side auto-labeling delivers the most immediate value. You don't need to wait for staff to open each file.

How AI Classification Achieves High Accuracy

Automated classification analyses document content rather than relying on filenames or folder locations. The AI understands context, identifying a financial statement even when the phrase "financial statement" never appears explicitly.

Microsoft Purview includes built-in sensitive information types that detect:

Australian Business Numbers (ABN)
Tax File Numbers (TFN)
Bank account details
New Zealand IRD numbers
Credit card numbers

These pattern-matching rules achieve accuracy rates exceeding 95% for common document types—far more consistent than manual classification across a team of people with varying attention to detail.

Importantly, auto-labeling never removes a manually applied sensitivity label. When staff explicitly classify a document, that decision takes precedence. The system augments human judgment rather than overriding it.

Designing Your Classification Framework

Microsoft recommends no more than five top-level labels to keep the interface manageable. A four-tier model works well for accounting firms:

Tier 1: Public

Marketing materials, published thought leadership, and general firm information intended for external audiences.

Tier 2: Internal

Operational documents, internal correspondence, and non-sensitive administrative files that should stay within the firm but don't contain client data.

Tier 3: Confidential

Client engagement letters, draft financial statements, and business correspondence containing proprietary information. Requires access controls and audit logging.

Tier 4: Highly Confidential

Documents containing TFN/IRD numbers, bank account details, audit working papers, and strategic financial plans. Requires encryption, strict access restrictions, and comprehensive audit trails.

Mapping Document Types to Labels

Document Type	Typical Label	Rationale
Tax returns with TFN/IRD	Highly Confidential	Privacy regulations, identity theft risk
Finalised financial statements	Highly Confidential	Detailed transaction data, strategic information
Draft financial statements	Confidential	Under review, limited distribution
Client engagement letters	Confidential	Proprietary terms, client expectations
Internal correspondence about clients	Confidential	Client information even without specific data
Firm policies and procedures	Internal	Operational, no client data
Marketing materials	Public	Intended for external distribution

Setting Label Priority

Label priority determines which classification applies when documents contain multiple sensitivity triggers. If a single document contains both an ABN (triggering Confidential) and a TFN (triggering Highly Confidential), priority order ensures the TFN trigger wins.

Arrange labels from least to most sensitive in the Purview portal:

Position 1: Public
Position 2: Internal
Position 3: Confidential
Position 4: Highly Confidential

This ordering ensures the most protective label always applies when multiple rules match.

Step-by-Step Implementation Guide

Phase 1: Configure Sensitivity Labels

Navigate to the Microsoft Purview portal
Select Information Protection from the left navigation
Create your sensitivity labels, defining protection settings for each:
- Access restrictions (who can open, edit, share)
- Encryption requirements
- Visual markings (headers, footers, watermarks)
- External sharing controls

Phase 2: Create Custom Sensitive Information Types

Microsoft's built-in patterns work for common data types, but you'll need custom types for Australian and New Zealand identifiers.

ABN Detection Pattern: Two-digit number, space, three-digit number, space, three-digit number, space, three-digit number

TFN Detection: The nine-digit format appears in many non-sensitive contexts. Add keyword proximity rules that trigger detection only when the number appears near terms like "Tax File Number," "TFN," or "Australian Taxation Office" within 300 characters.

New Zealand IRD Numbers: Configure the pattern to match 8-9 digit numbers, then add validation logic checking the checksum digit to confirm validity rather than matching random digit sequences.

Phase 3: Build Auto-Labeling Policies

Create policies that connect your sensitive information types to appropriate labels:

Select the sensitive information types to detect
Choose the label to apply when detected
Define the scope (which SharePoint sites, Teams, users)
Set confidence thresholds (higher thresholds reduce false positives)

Phase 4: Test in Simulation Mode

This step is critical. Run simulation mode for at least two weeks before enforcing any policy.

Simulation mode processes documents and reports which labels would be applied without actually changing anything. This reveals:

False positives (documents incorrectly flagged)
Coverage gaps (sensitive documents missed)
Unintended consequences

Review match data daily during simulation. Look for unexpected matches indicating overly broad rules and missing matches revealing coverage gaps.

Phase 5: Refine and Deploy

Based on simulation results:

Tighten keyword proximity if detection triggers on invoice numbers or reference codes
Expand patterns if legitimate financial statements aren't matching
Adjust confidence thresholds to balance accuracy against coverage

Allow 24 hours for policy changes to propagate through services. Plan deployments during low-activity periods.

Need help designing your classification framework?
Getting the label structure and detection rules right from the start saves significant rework later. Our team can help you design a classification system that matches your firm's specific document types, compliance requirements, and workflow patterns.
Schedule a Consultation →

Implementing Classification in SharePoint

Configure Default Labels for Document Libraries

Set default sensitivity labels for document libraries to ensure new uploads receive baseline protection immediately.

Client engagement libraries: Default to Confidential
Internal administrative libraries: Default to Internal
Published content libraries: Default to Public

Navigate to library settings → "Default sensitivity labels" → select the appropriate label.

Apply Labels to Existing Documents

Service-side auto-labeling processes existing SharePoint documents automatically. The timeline ranges from days to weeks depending on document volume.

Monitor progress through the Purview portal's auto-labeling analytics:

Documents processed
Documents labeled
Documents pending

Don't wait for 100% coverage before considering implementation successful. Even 70-80% automated coverage in the first month represents significant progress over manual classification.

Configure Metadata for Enhanced Search

Sensitivity labels work alongside managed metadata columns for multi-dimensional document retrieval. Create columns for:

Client name
Engagement type
Financial year
Document category

The benefit: you no longer need to guess how colleagues filed documents. Instead of navigating nested folders, search by client name and document type, and SharePoint returns all matching documents regardless of physical location.

Extending Classification to Microsoft Teams

Container-Level Labels

Container-level sensitivity labels apply to entire Teams channels, enforcing consistent document handling for all files uploaded to that workspace.

When you label a client engagement team as Confidential, every document uploaded inherits that baseline protection automatically. This prevents staff from uploading sensitive client documents to incorrectly configured channels.

Team Templates by Client Confidentiality

Create team templates with pre-assigned sensitivity labels:

Standard client engagement: Confidential label
High-net-worth individuals: Highly Confidential label
Publicly traded companies: Highly Confidential label

Template-based provisioning ensures consistent security posture across engagements. When staff request a new client team, they select the appropriate template, and correct labels are applied automatically.

External Sharing Controls

Link sensitivity labels to sharing policies:

Highly Confidential: Block all external sharing
Confidential: Allow external sharing only with authenticated recipients or specific domains
Internal: Prompt for confirmation before external sharing
Public: Allow sharing with appropriate warnings

Measuring Success

Primary Metrics

Classification coverage: Percentage of documents with sensitivity labels applied. Target 90%+ within 90 days of full deployment.

Auto-labeling accuracy: Percentage of automatically labeled documents that staff don't relabel manually. Target 95%+.

Manual override rate: How often staff change auto-applied labels. High rates indicate rules that need refinement.

Search time reduction: Average time to locate specific client documents before and after implementation. Expect 60-70% reduction when metadata and labels replace folder navigation.

Quarterly Review Process

Schedule quarterly policy reviews to:

Analyse documents that staff relabel manually (reveals patterns your rules miss)
Test rule changes in simulation mode before production deployment
Adjust policies based on changing compliance requirements
Update sensitive information types for new document patterns

Common Implementation Challenges

Data Quality Issues

Most organisations store documents in inconsistent formats with incomplete metadata. Without standardised data, AI models may misclassify critical documents.

Solution: Start with a pilot library containing representative documents. Use simulation mode to identify data quality issues before firm-wide deployment.

Scanned Documents and Legacy Files

Scanned PDFs and image files don't contain searchable text, preventing sensitive information detection.

Solution: Implement OCR preprocessing for scanned documents. Consider SharePoint Premium for advanced document understanding capabilities.

User Adoption Resistance

Staff may distrust AI decisions or misunderstand how auto-labeling works.

Solution: Communicate clearly that auto-labeling augments rather than replaces human judgment. Manual labels always take precedence. The system makes staff more effective, not redundant.

The 24-Hour Propagation Window

Policy changes can take up to 24 hours to propagate across Microsoft 365 services.

Solution: Plan changes during low-activity periods. Batch policy updates rather than deploying incremental changes daily.

Licensing Requirements

Auto-labeling requires E5-tier licensing:

Microsoft 365 E5
E5 Compliance add-on (to E3)
E5 Information Protection and Governance add-on (to E3)

Manual labeling works with Office 365 E3, but automatic policy-driven application requires the higher tier.

Cost consideration: E5 Compliance add-on pricing typically runs lower than full E5 licensing when you only need information protection features. Evaluate whether your firm needs other E5 capabilities before choosing.

Implementation Roadmap

Month 1: Pilot and Refinement

Week 1-2:

Select pilot library (500-1,000 representative documents)
Configure sensitivity labels and protection settings
Create custom sensitive information types for ABN, TFN, IRD

Week 3-4:

Deploy auto-labeling policies in simulation mode
Review match data daily
Refine rules based on false positives and coverage gaps

Month 2: High-Priority Libraries

Expand to client engagement libraries
Configure default labels for document libraries
Deploy container labels for Teams channels
Train staff on the system and expectations

Month 3: Firm-Wide Rollout

Complete rollout to all SharePoint sites and Teams
Implement external sharing controls
Establish quarterly review cadence
Document policies and procedures

Ready to implement AI document classification?
Document classification is one component of a broader operational efficiency strategy. Our AI Discovery Workshop helps accounting firms identify where AI can deliver the highest impact across document management, client communication, compliance monitoring, and workflow automation.
We'll map your current processes, identify the root causes of inefficiency, and design solutions that work with your existing systems—not against them.
Investment: $2,000-$5,000 with full money-back guarantee
Book Your Discovery Workshop →

Frequently Asked Questions

Can sensitivity labels be applied to documents created before implementing Purview?

Yes. Service-side auto-labeling processes existing documents without requiring staff to open them. Files may require SharePoint reindexing to trigger detection, with processing typically completing within 1-7 days.

What financial data requires Highly Confidential classification?

Documents containing TFN/IRD numbers, bank account details, strategic client financial plans, and audit working papers warrant Highly Confidential labels. Unauthorised disclosure could cause significant financial, legal, or reputational damage.

How does auto-labeling handle documents containing multiple sensitivity levels?

The highest-sensitivity match determines the applied label based on priority order. When a document contains both an ABN and a TFN, the TFN trigger applies the Highly Confidential label because it sits higher in the priority sequence.

How long before new labels appear in SharePoint and Teams?

Labels propagate within four hours for desktop apps and one hour for web apps with browser refresh. Policy changes may take up to 24 hours to fully propagate across all Microsoft 365 services.

What's the difference between classifications and sensitivity labels?

Classifications identify data patterns (like TFN numbers). Sensitivity labels define handling policies (like encryption and access control). Classifications help organise data; sensitivity labels ensure its protection.

Do we need to retrain staff on every document?

No. That's the point of auto-labeling—the system applies consistent classification without requiring staff to make decisions on every document. Staff training focuses on understanding the system, handling exceptions, and knowing when to apply manual labels.

Summary

AI document classification isn't about replacing human judgment—it's about building a system that applies consistent rules at scale, freeing your team to focus on client work rather than administrative overhead.

The key principles:

Start with a clear classification framework (four tiers work well for most firms)
Test thoroughly in simulation mode before enforcing policies
Design for your specific compliance requirements (AML/CTF obligations from July 2026)
Measure and refine quarterly based on actual usage patterns

Done well, you'll recover hundreds of hours currently lost to manual classification and document searching—time your team can redirect to billable client work.

AI2Easy helps Australian and New Zealand accounting firms implement AI solutions that integrate with existing systems and deliver measurable ROI. Our discovery-first approach ensures we understand your specific challenges before recommending solutions.

‹ If AI Can Navigate Mars, It Can Handle Your Monday Morning Roster Crisis

RAG vs Fine-Tuning vs Copilot: Which AI Approach Fits Your Business? ›