Corelight automates Tailscale ACLs with AWS resource tags: Security engineering

Corelight is a cybersecurity company that develops network detection and response (NDR) technology, who previously joined us to explain how adopting Tailscale significantly improved their networking security, user experience, and access control. Now, Chandan Chowdhury, Senior Security Infrastructure Engineer at Corelight, drops by for a technical deep dive on their automated system for generating Tailscale Access Control Lists (ACLs) directly from AWS resource tags. If you're interested in an exploration of the business side of things, check out part one to learn about how Corelight saved thousands annually from fewer connection issues.

Traditional network security often involves cross-team approval and manual configuration files that quickly become outdated and error-prone. This automated approach with Tailscale eliminates manual errors, provides clear audit trails, and ensures that access policies remain synchronized with actual resources.

“The tag-driven approach to Tailscale ACL management has significantly reduced Corelight's operational overhead while improving security posture. The system currently manages access for hundreds of resources across dozens of AWS accounts through simple, standardized resource tags.”
Chandan ChowdhurySenior Security Infrastructure Engineer

The challenge

Managing network access policies across multiple AWS accounts, regions, and hundreds of resources is a daunting task, and Corelight’s team faced several challenges:

Scale: Corelight has dozens of AWS accounts with resources across multiple regions. Change was slow within the Ops team while waiting for cross-team review and approval from security
Consistency: Corelight's access policies weren’t synchronized with the actual infrastructure
Maintenance: Corelight's manual updates were prone to human error
Compliance: Maintaining audit trails and proper access controls created chronic maintenance burdens

The solution: Tag-driven ACL generation

To address these challenges, Corelight’s system automatically generates Tailscale ACLs by reading standardized tags from AWS resources. This is an overview of the architecture:

How it works

The five-step process includes: resource discovery, tag processing, ACL generation, user resolution, and validation and deployment.

Step 1: Resource discovery

During resource discovery, this system periodically scans a list of AWS accounts and regions to look for resources with a specific tag. Then it gathers the tag content along with resource-specific identifiers.

Currently, Corelight only processes the Running EC2 Instances and Application and Network Load Balancers resources with the tailscale-access tag.

Step 2: Tag processing

Next, the system generates Host, group, and ACL definitions using the resource identifier and tag content. After a few iterations, Corelight settled on standardized tags (explained below) to define access policies.

The tag format uses the following conventions:

Individual group to port access follows <group name>:<ports> structure
Repeat the tags above for each group to port access combination
Use two forward-slashes (//) as a separator for individual groups to port access

For example:

Name: "web-server-prod"

tailscale-access: "infra:22//devops:80//devops:443"

Notice how the devops team needed two entries. This is because AWS allows the comma symbol (",") in EC2 tag value but not for other resources, such as LoadBalancers.

Step 3: ACL generation

Merge with the base ACL. Once you have the information needed for the ACL, the actual generation step is straightforward. The system processes tags to generate three key components:

Hosts definition

We append the instance IDs at the end because AWS allows resources to have the same Name, but the hosts entries in ACL have to be unique.

{

"web-server-prod_i-1234567890abcdef0": "10.0.1.100/32",

"api-server-staging_i-0987654321fedcba0": "10.0.2.50/32"

}

Groups definition

{

"group:infra@company.com": [],

"group:devops@company.com": []

}

ACL rules

[

{

"action": "accept",

"src": ["group:infra@company.com"],

"dst": ["web-server-prod_i-1234567890abcdef0:22"]

},

{

"action": "accept",

"src": ["group:devops@company.com"],

"dst": ["web-server-prod_i-1234567890abcdef0:80,443"]

}

]

Step 4: User resolution

The system integrates with Corelight’s Identity provider to populate group memberships.

Step 5: Validation and deployment

First, call the Tailscale acl/validate API to ensure the generated ACL is error-free. If it is, upload the ACL using an API call.

Security considerations

When it comes to security, we look at the categories of access control and network security.

For access control, these were the must-haves:

Least privilege: The cross-account role should only have read-only permissions
Service accounts: IdP API access must use dedicated service accounts
Secret management: All credentials must be stored in AWS Secrets Manager, encrypted with a customer-managed key

And for network security, we ensured the following:

Private IPs only: System must resolve to private IP addresses
DNS resolution: Load balancer IPs are discovered via DNS lookup
CIDR notation: All hosts are defined with /32 CIDR blocks

Implementation Benefits

This system has yielded multiple crucial benefits since its implementation.

Self-documenting infrastructure: Resource tags serve as both configuration and documentation, making access policies visible directly in the AWS console
Automatic synchronization: When resources are created, modified, or terminated, the next ACL generation cycle automatically reflects these changes
Multi-account support: The system uses cross-account IAM roles to discover resources across an entire AWS Organization
Audit trail: All changes are logged, and ACL versions are stored for compliance and rollback capabilities
Validation pipeline: Multiple validation steps ensure that only valid ACLs are deployed
- Group existence verification
- Tailscale ACL syntax validation
- Empty group detection

Lessons learned

Throughout this process, Corelight gained several key takeaways for making the implementation of this system smoother.

Tag standardization is critical: Establish clear tagging conventions early to prevent confusion and parsing errors.
Error handling matters: Network calls, API limits, and temporary failures require robust retry mechanisms.
Validation prevents outages: Multiple validation layers catch errors before they impact production access.
Monitoring is essential: Comprehensive logging helps troubleshoot issues and provides audit trails.

Future enhancements

Looking into the future, Corelight’s team hopes to implement several improvements over time. These include drift checks that measure the amount of change in infrastructure and access, to ensure it’s within expectations. They’re also interested in extending this system beyond AWS to Azure and GCP resources. If they can follow the same tagging conventions, Corelight believes only the "Resource discovery" steps will need to be implemented to do so. Finally, they hope to improve this system’s efficiency since, currently, Corelight can only process 4 regions of 28 AWS Accounts in under 10 minutes without any parallelization, generating 35 hosts, 6 groups, and 57 acl blocks.

Conclusion

By leveraging AWS resource tags as the source of truth for network access policies, Corelight created a system that scales with their infrastructure while maintaining security and compliance requirements. By leveraging the power of Tailscale ACLs, this system creates a dynamic and maintainable security framework that seamlessly manages complexity.

Corelight automates Tailscale ACLs with AWS resource tags: A security engineering approach