Context โ
I was writing documentation for my proc-macro crate that generates builders for functions and structs called bon
. I made the following simple example of code to compare bon
with the other alternative crate buildstructor
.
#[derive(buildstructor::Builder)]
struct User {
name: String
}
User::builder()
.name("Foo")
.build();
This example code was part of the doc comment, which I tested by running cargo test --doc
. However, it didn't compile:
cannot find type `User` in this scope
--> doc-test/path/here.rs
|
2 | struct User {
| ^^^^ not found in this scope
Suddenly the code generated by macro can't find the User
struct it was placed on ๐คจ. And this is where weird things need some explanation. To figure out what's happening, let's build an understanding of how name resolution works for "local" items.
Name resolution for local items โ
It's possible to define an item such as a struct
, impl
block or fn
inside any block expression in Rust. For example, this code defines a "local" anonymous struct inside of a function block:
fn example() {
struct User;
let user = User;
}
Here, the User
struct is only accessible inside of the function block's scope. We can't reference it outside of this function:
fn example() {
struct User;
}
// error: cannot find type `User` in this scope
type Foo = User;
mod child_module {
// error: unresolved import `super::User`; no `User` in the root
use super::User;
}
This doesn't work because, logically, there should be something in the path that says {fn example()}::User
. However, there is no syntax in Rust to express the {fn example()}
scope.
But what does std::any::type_name()
return for that User
struct then? Let's figure this out:
fn example() -> &'static str {
struct User;
std::any::type_name::<User>()
}
fn main() {
println!("{}", example());
}
This outputs the following:
crate_name::example::User
So, the function name becomes part of the path as if it was just a simple module. However, this isn't true, or at least this behaviour isn't exposed in the language. If we try to reference the User
from the surrounding scope using that syntax, we are still out of luck:
fn example() {
struct User;
}
type Foo = example::User;
This generates a compile error:
error[E0433]: failed to resolve: function `example` is not a crate or module
--> path/to/code.rs
|
6 | type Foo = example::User;
| ^^^^^^^ function `example` is not a crate or module
So there is just no way to refer to the User
struct outside of the function scope, right?... Wrong ๐ฑ! There is a way to do this, but it's so complicated that let's just assume we can't do that in production code.
If you are curious, first, try to solve this yourself:
fn example() {
struct User;
}
type Foo = /* how can we get the `User` type from the `example` function here? */;
and then take a look at the solution below:
Solution for referring to a local item outside the function body.
The idea is to implement a trait for the local type and then use that trait in the outside scope to get the local type.
trait TypeChannel {
type Type;
}
struct GetUserType;
fn example() {
struct User {
name: String
}
// We can implement a trait from the surrounding scope
// that uses the local item.
impl TypeChannel for GetUserType {
type Type = User;
}
}
type RootUser = <GetUserType as TypeChannel>::Type;
// Another interesting fact. The fields of the `User` struct aren't private
// in the root scope. You can create the `User` struct via the `RootUser` type
// alias and reference its fields in the top-level scope just fine ๐ฑ.
fn main() {
let user = RootUser {
name: "Bon".to_owned()
};
println!("Here is {}!", user.name);
}
Now this compiles... but well, I'd rather burn this code with fire ๐ฅ.
By the way, rust-analyzer
doesn't support this pattern. It can't resolve the RootUser
type and its fields, but rustc
works fine with this.
Now, let's see what happens if we define a child module inside of the function block.
fn example() {
struct User;
mod child_module {
use super::User;
}
}
Does this compile? Surely, it should compile, because the child module becomes a child of the anonymous function scope, so it should have access to symbols defined in the function, right?... Wrong ๐ฑ!
It still doesn't compile with the error:
unresolved import `super::User`; no `User` in the root
This is because super
doesn't refer to the parent function scope, instead it refers to the top-level module (called root
by the compiler in the error message) that defines the example()
function. For example, this code compiles:
struct TopLevelStruct;
fn example() {
struct User;
mod child_module {
use super::TopLevelStruct;
}
}
As you can see we took TopLevelStruct
from super
, so it means super
refers to the surrounding module of the example
function, and we already know we can't how hacky it is to access the symbols defined inside of that example
function from within the surrounding module.
So.. this brings us to the following dilemma.
How does this affect macros? โ
Macros generate code, and that code must not always be fully accessible to the scope where the macro was invoked. For example, a macro that generates a builder struct would like to restrict access to the private fields of the generated builder struct for the surrounding module.
I'll use bon
's macros syntax to showcase this.
use bon::Builder;
#[derive(Builder)]
struct User {
name: String,
}
Let's see what the generated code for this example may look like (very simplified).
TIP
The real code generated by #[bon::builder]
is a bit more complex, it uses typestate pattern to catch all potential developer errors at compile time ๐ฑ.
struct User {
name: String,
}
#[derive(Default)]
struct UserBuilder {
name: Option<String>,
}
/* {snipped} ... impl blocks for `UserBuilder` that define setters ... */
fn example() {
let builder = UserBuilder::default();
// oops, we can access the builder's internal fields here
let _ = builder.name;
}
The problem with this approach is that UserBuilder
is defined in the same module scope as the User
struct. It means all fields of UserBuilder
are accessible by this module. This is how the visibility of private fields works in Rust - the entire module where the struct is defined has access to the private fields of that struct.
The way to avoid this problem is to define the builder in a nested child module, to make private fields of the builder struct accessible only within that child module.
struct User {
name: String,
}
use user_builder::UserBuilder;
mod user_builder {
use super::*;
#[derive(Default)]
pub(super) struct UserBuilder {
name: Option<String>,
}
}
fn example() {
let builder = UserBuilder::default();
// Nope, we can't access the builder's fields now.
// let _ = builder.name;
}
So... problem solved, right?... Wrong ๐ฑ!
Now imagine our builder macro is invoked for a struct defined inside of a local function scope:
use bon::Builder;
fn example() {
struct Password(String);
#[derive(Builder)]
struct User {
password: Password,
}
}
If #[derive(Builder)]
creates a child module, then we have a problem. Let's see the generated code:
fn example() {
struct Password(String);
struct User {
password: Password,
}
mod user_builder {
use super::*;
pub(super) struct UserBuilder {
password: Option<Password>,
}
}
}
This doesn't compile with the error:
password: Option<Password>,
^^^^^^^^ not found in this scope
Why is that? As we discussed higher child modules defined inside function blocks can't access symbols defined in the function's scope. The use super::*
imports items from the surrounding top-level module instead of the function scope.
It means, that if we want to support local items in our macro we just can't use a child module if the code inside of that child module needs to reference types (or any items) from the surrounding scope.
The core problem is the conflict:
- We want to make the builder's fields private, so we need to define the builder struct inside of a child module.
- We want to reference types from the surrounding scope in the builder's fields, including local items, so we can't define the builder struct inside the child module.
This is the problem that I found in buildstructor
. The only way to solve this is to make a compromise, which I did when implementing #[derive(bon::Builder)]
. The compromise is not to use a child module, and obfuscate the private fields of the builder struct with leading __
and #[doc(hidden)]
attributes to make it hard for the user to access them (even though not physically impossible).
But then... Defining types inside of functions is rather a niche use case. How do child modules in macro-generated code break the doc test mentioned at the beginning of this article?
How does this break doc tests? โ
Doc tests are usually code snippets that run some code defined on the top level. They don't typically contain an explicit main()
function.
For example, a doc test like this:
let foo = 1 + 1;
assert_eq!(foo, 2);
is implicitly wrapped by rustdoc
in a main()
function like this:
fn main() {
let foo = 1 + 1;
assert_eq!(foo, 2);
}
So... If we write a code example in a doc comment with a macro that generates a child module, the doc test will probably not compile. This is what happened in the original doc test featuring buildstructor
.
Let's bring it up again:
#[derive(buildstructor::Builder)]
struct User {
name: String
}
User::builder()
.name("Foo")
.build();
When preprocessing the doc test rustdoc
wraps this code in main()
:
fn main() {
#[derive(buildstructor::Builder)]
struct User {
name: String
}
User::builder()
.name("Foo")
.build();
}
Then buildstructor
generates a child module, that refers to User
(next code is simplified):
fn main() {
struct User {
name: String
}
mod user_builder {
use super::*;
struct UserBuilder {
name: Option<String>
}
impl UserBuilder {
// `User` is inaccessible here
fn build(self) -> User {
/* */
}
}
}
}
Summary โ
Does this mean generating child modules for privacy in macros is generally a bad idea? It depends... The main thing is not to reference items from the surrounding scope in the child module. For example, if you need to add use super::*
in your macro-generated code, then this is already a bad call. You should think of local items and doc tests when you do this.
If you liked this article check out my previous blog post 'How to do named function arguments in Rust' (it's also available on Reddit). Also, check out the bon
crate on GitHub. Consider giving it a star โญ if you like it.
TIP
You can leave comments for this post on Reddit.