How to Implement AI Document Classification in SharePoint for Australian and New Zealand Accounting Firms
Learn how to implement Microsoft Purview sensitivity labels for automated document classification in SharePoint. Practical guide for AU/NZ accounting firms with compliance considerations and step-by-step configuration.
The Hidden Cost of Document Chaos
A mid-sized Sydney accounting firm recently discovered 47 versions of the same client tax return scattered across SharePoint folders, email threads, and desktop drives. Three versions contained outdated TFN data. One made it into a board presentation. Nobody knew which was current until a junior accountant spent six hours reconstructing the timeline.
This isn't an unusual story. It's the daily reality for most accounting practices.
Manual document handling creates compliance exposure that Australian and New Zealand firms consistently underestimate. With AML/CTF obligations extending to professional services from July 2026, firms need systematic approaches to classify, protect, and retrieve sensitive financial documents—approaches that don't rely on staff remembering which documents need encryption or access restrictions.
Microsoft Purview sensitivity labels offer AI-powered classification that applies consistent handling rules automatically. But like any technology implementation, the value comes from thoughtful design rather than just switching on features.
Why Document Classification Is a Systems Problem
When we work with accounting firms on operational efficiency, document management rarely surfaces as the presenting problem. Partners tell us about compliance concerns, search frustrations, or staff spending too much time on administrative tasks. But when we map the underlying systems, document classification often emerges as a root cause.
Think of it this way: every document in your firm exists within a system of flows and feedback loops. Documents are created, modified, shared, stored, and eventually archived or destroyed. Manual classification introduces friction at every stage:
Human inconsistency - Different staff members make different classification decisions for identical documents. There's no feedback loop to correct these variations.
Time waste that compounds - Microsoft testing found manual document processing takes 10-15 minutes per document when staff must read content, determine sensitivity, apply protections, and file correctly. Across thousands of client documents during tax season, you're burning hundreds of billable hours on administrative overhead.
Compliance gaps that hide until audits - When regulators request all documents containing specific client data, manually managed systems require staff to search multiple repositories, guess at folder structures colleagues created, and hope nothing was misfiled.
The real risk isn't any single misclassified document. It's the absence of a reliable system that creates predictable, consistent outcomes.
Compliance Requirements for ANZ Accounting Practices
Before designing any classification system, you need to understand the regulatory landscape it must support.
Australian Requirements
The Accounting Professional and Ethical Standards Board (APESB) sets ethical standards for Chartered Accountants ANZ, CPA Australia, and IPA members. These standards mandate secure handling of client information, though they don't prescribe specific technical controls.
The most significant upcoming change: AML/CTF obligations extend to accounting services from 1 July 2026. AUSTRAC is conducting a national education campaign to prepare firms for new reporting requirements, customer due diligence obligations, and record-keeping standards that demand systematic document classification.
New Zealand Requirements
The External Reporting Board (XRB) requires companies to maintain complete and accurate accounting records for at least seven years. Failure to meet filing and audit requirements can result in fines up to NZD 50,000 and referral to regulatory authorities.
What This Means for Classification
Your classification system needs to:
Identify documents containing regulated information (TFN, IRD numbers, financial data)
Apply appropriate protections automatically
Maintain audit trails of who accessed what and when
Support retention and destruction policies
Manual processes can't deliver this reliably at scale. That's where AI-powered classification becomes valuable—not as a replacement for human judgment, but as a system that applies consistent rules without the inconsistency inherent in manual approaches.
Struggling to map your firm's document workflows?
Our AI Discovery Workshop helps accounting firms identify where document handling creates the biggest compliance risks and efficiency drains. We'll map your current processes and identify the highest-impact opportunities for automation.
Understanding Microsoft Purview Sensitivity Labels
Sensitivity labels are persistent metadata that travel with documents across Microsoft 365 services. When you apply a label to a client tax return in SharePoint, that label follows the document into Teams channels, email attachments, and Power BI reports.
Labels enforce protection policies automatically. A "Highly Confidential - Client Financial Data" label can:
Restrict document access to specific staff members
Require encryption
Prevent copying or printing
Block external sharing
Log all access attempts
Users don't need to remember which protections apply—the label handles enforcement.
Client-Side vs Service-Side Auto-Labeling
Client-side auto-labeling happens on the user's device, giving users a recommendation or automatically applying a label before a document is saved. This catches sensitive content at creation time but requires compatible Office applications.
Service-side auto-labeling applies labels automatically after content is saved, with no user interaction required. This method works on existing document libraries, applies to files users never open, and handles legacy documents uploaded before you implemented classification.
For accounting firms with years of client documents already in SharePoint, service-side auto-labeling delivers the most immediate value. You don't need to wait for staff to open each file.
How AI Classification Achieves High Accuracy
Automated classification analyses document content rather than relying on filenames or folder locations. The AI understands context, identifying a financial statement even when the phrase "financial statement" never appears explicitly.
Microsoft Purview includes built-in sensitive information types that detect:
Australian Business Numbers (ABN)
Tax File Numbers (TFN)
Bank account details
New Zealand IRD numbers
Credit card numbers
These pattern-matching rules achieve accuracy rates exceeding 95% for common document types—far more consistent than manual classification across a team of people with varying attention to detail.
Importantly, auto-labeling never removes a manually applied sensitivity label. When staff explicitly classify a document, that decision takes precedence. The system augments human judgment rather than overriding it.
Designing Your Classification Framework
Microsoft recommends no more than five top-level labels to keep the interface manageable. A four-tier model works well for accounting firms:
Tier 1: Public
Marketing materials, published thought leadership, and general firm information intended for external audiences.
Tier 2: Internal
Operational documents, internal correspondence, and non-sensitive administrative files that should stay within the firm but don't contain client data.
Tier 3: Confidential
Client engagement letters, draft financial statements, and business correspondence containing proprietary information. Requires access controls and audit logging.
Tier 4: Highly Confidential
Documents containing TFN/IRD numbers, bank account details, audit working papers, and strategic financial plans. Requires encryption, strict access restrictions, and comprehensive audit trails.
Mapping Document Types to Labels
Document Type | Typical Label | Rationale |
|---|---|---|
Tax returns with TFN/IRD | Highly Confidential | Privacy regulations, identity theft risk |
Finalised financial statements | Highly Confidential | Detailed transaction data, strategic information |
Draft financial statements | Confidential | Under review, limited distribution |
Client engagement letters | Confidential | Proprietary terms, client expectations |
Internal correspondence about clients | Confidential | Client information even without specific data |
Firm policies and procedures | Internal | Operational, no client data |
Marketing materials | Public | Intended for external distribution |
Setting Label Priority
Label priority determines which classification applies when documents contain multiple sensitivity triggers. If a single document contains both an ABN (triggering Confidential) and a TFN (triggering Highly Confidential), priority order ensures the TFN trigger wins.
Arrange labels from least to most sensitive in the Purview portal:
Position 1: Public
Position 2: Internal
Position 3: Confidential
Position 4: Highly Confidential
This ordering ensures the most protective label always applies when multiple rules match.
Step-by-Step Implementation Guide
Phase 1: Configure Sensitivity Labels
Navigate to the Microsoft Purview portal
Select Information Protection from the left navigation
Create your sensitivity labels, defining protection settings for each:
Access restrictions (who can open, edit, share)
Encryption requirements
Visual markings (headers, footers, watermarks)
External sharing controls
Phase 2: Create Custom Sensitive Information Types
Microsoft's built-in patterns work for common data types, but you'll need custom types for Australian and New Zealand identifiers.
ABN Detection Pattern: Two-digit number, space, three-digit number, space, three-digit number, space, three-digit number
TFN Detection: The nine-digit format appears in many non-sensitive contexts. Add keyword proximity rules that trigger detection only when the number appears near terms like "Tax File Number," "TFN," or "Australian Taxation Office" within 300 characters.
New Zealand IRD Numbers: Configure the pattern to match 8-9 digit numbers, then add validation logic checking the checksum digit to confirm validity rather than matching random digit sequences.
Phase 3: Build Auto-Labeling Policies
Create policies that connect your sensitive information types to appropriate labels:
Select the sensitive information types to detect
Choose the label to apply when detected
Define the scope (which SharePoint sites, Teams, users)
Set confidence thresholds (higher thresholds reduce false positives)
Phase 4: Test in Simulation Mode
This step is critical. Run simulation mode for at least two weeks before enforcing any policy.
Simulation mode processes documents and reports which labels would be applied without actually changing anything. This reveals:
False positives (documents incorrectly flagged)
Coverage gaps (sensitive documents missed)
Unintended consequences
Review match data daily during simulation. Look for unexpected matches indicating overly broad rules and missing matches revealing coverage gaps.
Phase 5: Refine and Deploy
Based on simulation results:
Tighten keyword proximity if detection triggers on invoice numbers or reference codes
Expand patterns if legitimate financial statements aren't matching
Adjust confidence thresholds to balance accuracy against coverage
Allow 24 hours for policy changes to propagate through services. Plan deployments during low-activity periods.
Need help designing your classification framework?
Getting the label structure and detection rules right from the start saves significant rework later. Our team can help you design a classification system that matches your firm's specific document types, compliance requirements, and workflow patterns.
Implementing Classification in SharePoint
Configure Default Labels for Document Libraries
Set default sensitivity labels for document libraries to ensure new uploads receive baseline protection immediately.
Client engagement libraries: Default to Confidential
Internal administrative libraries: Default to Internal
Published content libraries: Default to Public
Navigate to library settings → "Default sensitivity labels" → select the appropriate label.
Apply Labels to Existing Documents
Service-side auto-labeling processes existing SharePoint documents automatically. The timeline ranges from days to weeks depending on document volume.
Monitor progress through the Purview portal's auto-labeling analytics:
Documents processed
Documents labeled
Documents pending
Don't wait for 100% coverage before considering implementation successful. Even 70-80% automated coverage in the first month represents significant progress over manual classification.
Configure Metadata for Enhanced Search
Sensitivity labels work alongside managed metadata columns for multi-dimensional document retrieval. Create columns for:
Client name
Engagement type
Financial year
Document category
The benefit: you no longer need to guess how colleagues filed documents. Instead of navigating nested folders, search by client name and document type, and SharePoint returns all matching documents regardless of physical location.
Extending Classification to Microsoft Teams
Container-Level Labels
Container-level sensitivity labels apply to entire Teams channels, enforcing consistent document handling for all files uploaded to that workspace.
When you label a client engagement team as Confidential, every document uploaded inherits that baseline protection automatically. This prevents staff from uploading sensitive client documents to incorrectly configured channels.
Team Templates by Client Confidentiality
Create team templates with pre-assigned sensitivity labels:
Standard client engagement: Confidential label
High-net-worth individuals: Highly Confidential label
Publicly traded companies: Highly Confidential label
Template-based provisioning ensures consistent security posture across engagements. When staff request a new client team, they select the appropriate template, and correct labels are applied automatically.
External Sharing Controls
Link sensitivity labels to sharing policies:
Highly Confidential: Block all external sharing
Confidential: Allow external sharing only with authenticated recipients or specific domains
Internal: Prompt for confirmation before external sharing
Public: Allow sharing with appropriate warnings
Measuring Success
Primary Metrics
Classification coverage: Percentage of documents with sensitivity labels applied. Target 90%+ within 90 days of full deployment.
Auto-labeling accuracy: Percentage of automatically labeled documents that staff don't relabel manually. Target 95%+.
Manual override rate: How often staff change auto-applied labels. High rates indicate rules that need refinement.
Search time reduction: Average time to locate specific client documents before and after implementation. Expect 60-70% reduction when metadata and labels replace folder navigation.
Quarterly Review Process
Schedule quarterly policy reviews to:
Analyse documents that staff relabel manually (reveals patterns your rules miss)
Test rule changes in simulation mode before production deployment
Adjust policies based on changing compliance requirements
Update sensitive information types for new document patterns
Common Implementation Challenges
Data Quality Issues
Most organisations store documents in inconsistent formats with incomplete metadata. Without standardised data, AI models may misclassify critical documents.
Solution: Start with a pilot library containing representative documents. Use simulation mode to identify data quality issues before firm-wide deployment.
Scanned Documents and Legacy Files
Scanned PDFs and image files don't contain searchable text, preventing sensitive information detection.
Solution: Implement OCR preprocessing for scanned documents. Consider SharePoint Premium for advanced document understanding capabilities.
User Adoption Resistance
Staff may distrust AI decisions or misunderstand how auto-labeling works.
Solution: Communicate clearly that auto-labeling augments rather than replaces human judgment. Manual labels always take precedence. The system makes staff more effective, not redundant.
The 24-Hour Propagation Window
Policy changes can take up to 24 hours to propagate across Microsoft 365 services.
Solution: Plan changes during low-activity periods. Batch policy updates rather than deploying incremental changes daily.
Licensing Requirements
Auto-labeling requires E5-tier licensing:
Microsoft 365 E5
E5 Compliance add-on (to E3)
E5 Information Protection and Governance add-on (to E3)
Manual labeling works with Office 365 E3, but automatic policy-driven application requires the higher tier.
Cost consideration: E5 Compliance add-on pricing typically runs lower than full E5 licensing when you only need information protection features. Evaluate whether your firm needs other E5 capabilities before choosing.
Implementation Roadmap
Month 1: Pilot and Refinement
Week 1-2:
Select pilot library (500-1,000 representative documents)
Configure sensitivity labels and protection settings
Create custom sensitive information types for ABN, TFN, IRD
Week 3-4:
Deploy auto-labeling policies in simulation mode
Review match data daily
Refine rules based on false positives and coverage gaps
Month 2: High-Priority Libraries
Expand to client engagement libraries
Configure default labels for document libraries
Deploy container labels for Teams channels
Train staff on the system and expectations
Month 3: Firm-Wide Rollout
Complete rollout to all SharePoint sites and Teams
Implement external sharing controls
Establish quarterly review cadence
Document policies and procedures
Ready to implement AI document classification?
Document classification is one component of a broader operational efficiency strategy. Our AI Discovery Workshop helps accounting firms identify where AI can deliver the highest impact across document management, client communication, compliance monitoring, and workflow automation.
We'll map your current processes, identify the root causes of inefficiency, and design solutions that work with your existing systems—not against them.
Investment: $2,000-$5,000 with full money-back guarantee
Frequently Asked Questions
Can sensitivity labels be applied to documents created before implementing Purview?
Yes. Service-side auto-labeling processes existing documents without requiring staff to open them. Files may require SharePoint reindexing to trigger detection, with processing typically completing within 1-7 days.
What financial data requires Highly Confidential classification?
Documents containing TFN/IRD numbers, bank account details, strategic client financial plans, and audit working papers warrant Highly Confidential labels. Unauthorised disclosure could cause significant financial, legal, or reputational damage.
How does auto-labeling handle documents containing multiple sensitivity levels?
The highest-sensitivity match determines the applied label based on priority order. When a document contains both an ABN and a TFN, the TFN trigger applies the Highly Confidential label because it sits higher in the priority sequence.
How long before new labels appear in SharePoint and Teams?
Labels propagate within four hours for desktop apps and one hour for web apps with browser refresh. Policy changes may take up to 24 hours to fully propagate across all Microsoft 365 services.
What's the difference between classifications and sensitivity labels?
Classifications identify data patterns (like TFN numbers). Sensitivity labels define handling policies (like encryption and access control). Classifications help organise data; sensitivity labels ensure its protection.
Do we need to retrain staff on every document?
No. That's the point of auto-labeling—the system applies consistent classification without requiring staff to make decisions on every document. Staff training focuses on understanding the system, handling exceptions, and knowing when to apply manual labels.
Summary
AI document classification isn't about replacing human judgment—it's about building a system that applies consistent rules at scale, freeing your team to focus on client work rather than administrative overhead.
The key principles:
Start with a clear classification framework (four tiers work well for most firms)
Test thoroughly in simulation mode before enforcing policies
Design for your specific compliance requirements (AML/CTF obligations from July 2026)
Measure and refine quarterly based on actual usage patterns
Done well, you'll recover hundreds of hours currently lost to manual classification and document searching—time your team can redirect to billable client work.
AI2Easy helps Australian and New Zealand accounting firms implement AI solutions that integrate with existing systems and deliver measurable ROI. Our discovery-first approach ensures we understand your specific challenges before recommending solutions.
