Commit 8ba714bd authored by uuo00_n's avatar uuo00_n

feat(核心配置): 迁移BaseSettings至pydantic-settings并新增APP_MODE配置

refactor(角色权限): 新增角色等级映射与版别校验模块

feat(用户认证): 在JWT令牌中增加角色等级与版别信息

feat(仪表盘): 实现基于角色等级与版别的动态视图返回

docs(模型注释): 完善用户模型字段说明并兼容Pydantic v2

本次提交主要包含以下改进:
1. 将BaseSettings从pydantic迁移至pydantic-settings包
2. 新增APP_MODE配置项支持教育版/企业版隔离
3. 创建角色权限中心化定义模块
4. 增强JWT令牌携带用户权限信息
5. 实现仪表盘接口的动态内容返回
6. 优化用户模型字段注释和类型提示
parent 0bbc8096
#!/Users/uu/Desktop/dles_prj/llm-filter/.venv/bin/python3
# -*- coding: utf-8 -*-
import re
import sys
from email_validator.__main__ import main
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(main())
This is free and unencumbered software released into the public
domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a
compiled binary, for any purpose, commercial or non-commercial,
and by any means.
In jurisdictions that recognize copyright laws, the author or
authors of this software dedicate any and all copyright
interest in the software to the public domain. We make this
dedication for the benefit of the public at large and to the
detriment of our heirs and successors. We intend this
dedication to be an overt act of relinquishment in perpetuity
of all present and future rights to this software under
copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR
ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
For more information, please refer to <https://unlicense.org/>
Metadata-Version: 2.1
Name: email_validator
Version: 2.2.0
Summary: A robust email address syntax and deliverability validation library.
Home-page: https://github.com/JoshData/python-email-validator
Author: Joshua Tauberer
Author-email: jt@occams.info
License: Unlicense
Keywords: email address validator
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: The Unlicense (Unlicense)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: dnspython >=2.0.0
Requires-Dist: idna >=2.0.0
email-validator: Validate Email Addresses
=========================================
A robust email address syntax and deliverability validation library for
Python 3.8+ by [Joshua Tauberer](https://joshdata.me).
This library validates that a string is of the form `name@example.com`
and optionally checks that the domain name is set up to receive email.
This is the sort of validation you would want when you are identifying
users by their email address like on a registration form.
Key features:
* Checks that an email address has the correct syntax --- great for
email-based registration/login forms or validing data.
* Gives friendly English error messages when validation fails that you
can display to end-users.
* Checks deliverability (optional): Does the domain name resolve?
(You can override the default DNS resolver to add query caching.)
* Supports internationalized domain names (like `@ツ.life`),
internationalized local parts (like `ツ@example.com`),
and optionally parses display names (e.g. `"My Name" <me@example.com>`).
* Rejects addresses with invalid or unsafe Unicode characters,
obsolete email address syntax that you'd find unexpected,
special use domain names like `@localhost`,
and domains without a dot by default.
This is an opinionated library!
* Normalizes email addresses (important for internationalized
and quoted-string addresses! see below).
* Python type annotations are used.
This is an opinionated library. You should definitely also consider using
the less-opinionated [pyIsEmail](https://github.com/michaelherold/pyIsEmail)
if it works better for you.
[![Build Status](https://github.com/JoshData/python-email-validator/actions/workflows/test_and_build.yaml/badge.svg)](https://github.com/JoshData/python-email-validator/actions/workflows/test_and_build.yaml)
View the [CHANGELOG / Release Notes](CHANGELOG.md) for the version history of changes in the library. Occasionally this README is ahead of the latest published package --- see the CHANGELOG for details.
---
Installation
------------
This package [is on PyPI](https://pypi.org/project/email-validator/), so:
```sh
pip install email-validator
```
(You might need to use `pip3` depending on your local environment.)
Quick Start
-----------
If you're validating a user's email address before creating a user
account in your application, you might do this:
```python
from email_validator import validate_email, EmailNotValidError
email = "my+address@example.org"
try:
# Check that the email address is valid. Turn on check_deliverability
# for first-time validations like on account creation pages (but not
# login pages).
emailinfo = validate_email(email, check_deliverability=False)
# After this point, use only the normalized form of the email address,
# especially before going to a database query.
email = emailinfo.normalized
except EmailNotValidError as e:
# The exception message is human-readable explanation of why it's
# not a valid (or deliverable) email address.
print(str(e))
```
This validates the address and gives you its normalized form. You should
**put the normalized form in your database** and always normalize before
checking if an address is in your database. When using this in a login form,
set `check_deliverability` to `False` to avoid unnecessary DNS queries.
Usage
-----
### Overview
The module provides a function `validate_email(email_address)` which
takes an email address and:
- Raises a `EmailNotValidError` with a helpful, human-readable error
message explaining why the email address is not valid, or
- Returns an object with a normalized form of the email address (which
you should use!) and other information about it.
When an email address is not valid, `validate_email` raises either an
`EmailSyntaxError` if the form of the address is invalid or an
`EmailUndeliverableError` if the domain name fails DNS checks. Both
exception classes are subclasses of `EmailNotValidError`, which in turn
is a subclass of `ValueError`.
But when an email address is valid, an object is returned containing
a normalized form of the email address (which you should use!) and
other information.
The validator doesn't, by default, permit obsoleted forms of email addresses
that no one uses anymore even though they are still valid and deliverable, since
they will probably give you grief if you're using email for login. (See
later in the document about how to allow some obsolete forms.)
The validator optionally checks that the domain name in the email address has
a DNS MX record indicating that it can receive email. (Except a Null MX record.
If there is no MX record, a fallback A/AAAA-record is permitted, unless
a reject-all SPF record is present.) DNS is slow and sometimes unavailable or
unreliable, so consider whether these checks are useful for your use case and
turn them off if they aren't.
There is nothing to be gained by trying to actually contact an SMTP server, so
that's not done here. For privacy, security, and practicality reasons, servers
are good at not giving away whether an address is
deliverable or not: email addresses that appear to accept mail at first
can bounce mail after a delay, and bounced mail may indicate a temporary
failure of a good email address (sometimes an intentional failure, like
greylisting).
### Options
The `validate_email` function also accepts the following keyword arguments
(defaults are as shown below):
`check_deliverability=True`: If true, DNS queries are made to check that the domain name in the email address (the part after the @-sign) can receive mail, as described above. Set to `False` to skip this DNS-based check. It is recommended to pass `False` when performing validation for login pages (but not account creation pages) since re-validation of a previously validated domain in your database by querying DNS at every login is probably undesirable. You can also set `email_validator.CHECK_DELIVERABILITY` to `False` to turn this off for all calls by default.
`dns_resolver=None`: Pass an instance of [dns.resolver.Resolver](https://dnspython.readthedocs.io/en/latest/resolver-class.html) to control the DNS resolver including setting a timeout and [a cache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html). The `caching_resolver` function shown below is a helper function to construct a dns.resolver.Resolver with a [LRUCache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html#dns.resolver.LRUCache). Reuse the same resolver instance across calls to `validate_email` to make use of the cache.
`test_environment=False`: If `True`, DNS-based deliverability checks are disabled and `test` and `**.test` domain names are permitted (see below). You can also set `email_validator.TEST_ENVIRONMENT` to `True` to turn it on for all calls by default.
`allow_smtputf8=True`: Set to `False` to prohibit internationalized addresses that would
require the
[SMTPUTF8](https://tools.ietf.org/html/rfc6531) extension. You can also set `email_validator.ALLOW_SMTPUTF8` to `False` to turn it off for all calls by default.
`allow_quoted_local=False`: Set to `True` to allow obscure and potentially problematic email addresses in which the part of the address before the @-sign contains spaces, @-signs, or other surprising characters when the local part is surrounded in quotes (so-called quoted-string local parts). In the object returned by `validate_email`, the normalized local part removes any unnecessary backslash-escaping and even removes the surrounding quotes if the address would be valid without them. You can also set `email_validator.ALLOW_QUOTED_LOCAL` to `True` to turn this on for all calls by default.
`allow_domain_literal=False`: Set to `True` to allow bracketed IPv4 and "IPv6:"-prefixd IPv6 addresses in the domain part of the email address. No deliverability checks are performed for these addresses. In the object returned by `validate_email`, the normalized domain will use the condensed IPv6 format, if applicable. The object's `domain_address` attribute will hold the parsed `ipaddress.IPv4Address` or `ipaddress.IPv6Address` object if applicable. You can also set `email_validator.ALLOW_DOMAIN_LITERAL` to `True` to turn this on for all calls by default.
`allow_display_name=False`: Set to `True` to allow a display name and bracketed address in the input string, like `My Name <me@example.org>`. It's implemented in the spirit but not the letter of RFC 5322 3.4, so it may be stricter or more relaxed than what you want. The display name, if present, is provided in the returned object's `display_name` field after being unquoted and unescaped. You can also set `email_validator.ALLOW_DISPLAY_NAME` to `True` to turn this on for all calls by default.
`allow_empty_local=False`: Set to `True` to allow an empty local part (i.e.
`@example.com`), e.g. for validating Postfix aliases.
### DNS timeout and cache
When validating many email addresses or to control the timeout (the default is 15 seconds), create a caching [dns.resolver.Resolver](https://dnspython.readthedocs.io/en/latest/resolver-class.html) to reuse in each call. The `caching_resolver` function returns one easily for you:
```python
from email_validator import validate_email, caching_resolver
resolver = caching_resolver(timeout=10)
while True:
validate_email(email, dns_resolver=resolver)
```
### Test addresses
This library rejects email addresses that use the [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) `invalid`, `localhost`, `test`, and some others by raising `EmailSyntaxError`. This is to protect your system from abuse: You probably don't want a user to be able to cause an email to be sent to `localhost` (although they might be able to still do so via a malicious MX record). However, in your non-production test environments you may want to use `@test` or `@myname.test` email addresses. There are three ways you can allow this:
1. Add `test_environment=True` to the call to `validate_email` (see above).
2. Set `email_validator.TEST_ENVIRONMENT` to `True` globally.
3. Remove the special-use domain name that you want to use from `email_validator.SPECIAL_USE_DOMAIN_NAMES`, e.g.:
```python
import email_validator
email_validator.SPECIAL_USE_DOMAIN_NAMES.remove("test")
```
It is tempting to use `@example.com/net/org` in tests. They are *not* in this library's `SPECIAL_USE_DOMAIN_NAMES` list so you can, but shouldn't, use them. These domains are reserved to IANA for use in documentation so there is no risk of accidentally emailing someone at those domains. But beware that this library will nevertheless reject these domain names if DNS-based deliverability checks are not disabled because these domains do not resolve to domains that accept email. In tests, consider using your own domain name or `@test` or `@myname.test` instead.
Internationalized email addresses
---------------------------------
The email protocol SMTP and the domain name system DNS have historically
only allowed English (ASCII) characters in email addresses and domain names,
respectively. Each has adapted to internationalization in a separate
way, creating two separate aspects to email address internationalization.
(If your mail submission library doesn't support Unicode at all, then
immediately prior to mail submission you must replace the email address with
its ASCII-ized form. This library gives you back the ASCII-ized form in the
`ascii_email` field in the returned object.)
### Internationalized domain names (IDN)
The first is [internationalized domain names (RFC
5891)](https://tools.ietf.org/html/rfc5891), a.k.a IDNA 2008. The DNS
system has not been updated with Unicode support. Instead, internationalized
domain names are converted into a special IDNA ASCII "[Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt)"
form starting with `xn--`. When an email address has non-ASCII
characters in its domain part, the domain part is replaced with its IDNA
ASCII equivalent form in the process of mail transmission. Your mail
submission library probably does this for you transparently. ([Compliance
around the web is not very good though](http://archives.miloush.net/michkap/archive/2012/02/27/10273315.html).) This library conforms to IDNA 2008
using the [idna](https://github.com/kjd/idna) module by Kim Davies.
### Internationalized local parts
The second sort of internationalization is internationalization in the
*local* part of the address (before the @-sign). In non-internationalized
email addresses, only English letters, numbers, and some punctuation
(`._!#$%&'^``*+-=~/?{|}`) are allowed. In internationalized email address
local parts, a wider range of Unicode characters are allowed.
Email addresses with these non-ASCII characters require that your mail
submission library and all the mail servers along the route to the destination,
including your own outbound mail server, all support the
[SMTPUTF8 (RFC 6531)](https://tools.ietf.org/html/rfc6531) extension.
Support for SMTPUTF8 varies. If you know ahead of time that SMTPUTF8 is not
supported by your mail submission stack, then you must filter out addresses that
require SMTPUTF8 using the `allow_smtputf8=False` keyword argument (see above).
This will cause the validation function to raise a `EmailSyntaxError` if
delivery would require SMTPUTF8. If you do not set `allow_smtputf8=False`,
you can also check the value of the `smtputf8` field in the returned object.
### Unsafe Unicode characters are rejected
A surprisingly large number of Unicode characters are not safe to display,
especially when the email address is concatenated with other text, so this
library tries to protect you by not permitting reserved, non-, private use,
formatting (which can be used to alter the display order of characters),
whitespace, and control characters, and combining characters
as the first character of the local part and the domain name (so that they
cannot combine with something outside of the email address string or with
the @-sign). See https://qntm.org/safe and https://trojansource.codes/
for relevant prior work. (Other than whitespace, these are checks that
you should be applying to nearly all user inputs in a security-sensitive
context.) This does not guard against the well known problem that many
Unicode characters look alike, which can be used to fool humans reading
displayed text.
Normalization
-------------
### Unicode Normalization
The use of Unicode in email addresses introduced a normalization
problem. Different Unicode strings can look identical and have the same
semantic meaning to the user. The `normalized` field returned on successful
validation provides the correctly normalized form of the given email
address.
For example, the CJK fullwidth Latin letters are considered semantically
equivalent in domain names to their ASCII counterparts. This library
normalizes them to their ASCII counterparts (as required by IDNA):
```python
emailinfo = validate_email("me@Domain.com")
print(emailinfo.normalized)
print(emailinfo.ascii_email)
# prints "me@domain.com" twice
```
Because an end-user might type their email address in different (but
equivalent) un-normalized forms at different times, you ought to
replace what they enter with the normalized form immediately prior to
going into your database (during account creation), querying your database
(during login), or sending outbound mail.
The normalizations include lowercasing the domain part of the email
address (domain names are case-insensitive), [Unicode "NFC"
normalization](https://en.wikipedia.org/wiki/Unicode_equivalence) of the
whole address (which turns characters plus [combining
characters](https://en.wikipedia.org/wiki/Combining_character) into
precomposed characters where possible, replacement of [fullwidth and
halfwidth
characters](https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms)
in the domain part, possibly other
[UTS46](http://unicode.org/reports/tr46) mappings on the domain part,
and conversion from Punycode to Unicode characters.
Normalization may change the characters in the email address and the
length of the email address, such that a string might be a valid address
before normalization but invalid after, or vice versa. This library only
permits addresses that are valid both before and after normalization.
(See [RFC 6532 (internationalized email) section
3.1](https://tools.ietf.org/html/rfc6532#section-3.1) and [RFC 5895
(IDNA 2008) section 2](http://www.ietf.org/rfc/rfc5895.txt).)
### Other Normalization
Normalization is also applied to quoted-string local parts and domain
literal IPv6 addresses if you have allowed them by the `allow_quoted_local`
and `allow_domain_literal` options. In quoted-string local parts, unnecessary
backslash escaping is removed and even the surrounding quotes are removed if
they are unnecessary. For IPv6 domain literals, the IPv6 address is
normalized to condensed form. [RFC 2142](https://datatracker.ietf.org/doc/html/rfc2142)
also requires lowercase normalization for some specific mailbox names like `postmaster@`.
Examples
--------
For the email address `test@joshdata.me`, the returned object is:
```python
ValidatedEmail(
normalized='test@joshdata.me',
local_part='test',
domain='joshdata.me',
ascii_email='test@joshdata.me',
ascii_local_part='test',
ascii_domain='joshdata.me',
smtputf8=False)
```
For the fictitious but valid address `example@ツ.ⓁⒾⒻⒺ`, which has an
internationalized domain but ASCII local part, the returned object is:
```python
ValidatedEmail(
normalized='example@ツ.life',
local_part='example',
domain='ツ.life',
ascii_email='example@xn--bdk.life',
ascii_local_part='example',
ascii_domain='xn--bdk.life',
smtputf8=False)
```
Note that `normalized` and other fields provide a normalized form of the
email address, domain name, and (in other cases) local part (see earlier
discussion of normalization), which you should use in your database.
Calling `validate_email` with the ASCII form of the above email address,
`example@xn--bdk.life`, returns the exact same information (i.e., the
`normalized` field always will contain Unicode characters, not Punycode).
For the fictitious address `ツ-test@joshdata.me`, which has an
internationalized local part, the returned object is:
```python
ValidatedEmail(
normalized='ツ-test@joshdata.me',
local_part='ツ-test',
domain='joshdata.me',
ascii_email=None,
ascii_local_part=None,
ascii_domain='joshdata.me',
smtputf8=True)
```
Now `smtputf8` is `True` and `ascii_email` is `None` because the local
part of the address is internationalized. The `local_part` and `normalized` fields
return the normalized form of the address.
Return value
------------
When an email address passes validation, the fields in the returned object
are:
| Field | Value |
| -----:|-------|
| `normalized` | The normalized form of the email address that you should put in your database. This combines the `local_part` and `domain` fields (see below). |
| `ascii_email` | If set, an ASCII-only form of the normalized email address by replacing the domain part with [IDNA](https://tools.ietf.org/html/rfc5891) [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt). This field will be present when an ASCII-only form of the email address exists (including if the email address is already ASCII). If the local part of the email address contains internationalized characters, `ascii_email` will be `None`. If set, it merely combines `ascii_local_part` and `ascii_domain`. |
| `local_part` | The normalized local part of the given email address (before the @-sign). Normalization includes Unicode NFC normalization and removing unnecessary quoted-string quotes and backslashes. If `allow_quoted_local` is True and the surrounding quotes are necessary, the quotes _will_ be present in this field. |
| `ascii_local_part` | If set, the local part, which is composed of ASCII characters only. |
| `domain` | The canonical internationalized Unicode form of the domain part of the email address. If the returned string contains non-ASCII characters, either the [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your mail relay will be required to transmit the message or else the email address's domain part must be converted to IDNA ASCII first: Use `ascii_domain` field instead. |
| `ascii_domain` | The [IDNA](https://tools.ietf.org/html/rfc5891) [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt)-encoded form of the domain part of the given email address, as it would be transmitted on the wire. |
| `domain_address` | If domain literals are allowed and if the email address contains one, an `ipaddress.IPv4Address` or `ipaddress.IPv6Address` object. |
| `display_name` | If no display name was present and angle brackets do not surround the address, this will be `None`; otherwise, it will be set to the display name, or the empty string if there were angle brackets but no display name. If the display name was quoted, it will be unquoted and unescaped. |
| `smtputf8` | A boolean indicating that the [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your mail relay will be required to transmit messages to this address because the local part of the address has non-ASCII characters (the local part cannot be IDNA-encoded). If `allow_smtputf8=False` is passed as an argument, this flag will always be false because an exception is raised if it would have been true. |
| `mx` | A list of (priority, domain) tuples of MX records specified in the DNS for the domain (see [RFC 5321 section 5](https://tools.ietf.org/html/rfc5321#section-5)). May be `None` if the deliverability check could not be completed because of a temporary issue like a timeout. |
| `mx_fallback_type` | `None` if an `MX` record is found. If no MX records are actually specified in DNS and instead are inferred, through an obsolete mechanism, from A or AAAA records, the value is the type of DNS record used instead (`A` or `AAAA`). May be `None` if the deliverability check could not be completed because of a temporary issue like a timeout. |
| `spf` | Any SPF record found while checking deliverability. Only set if the SPF record is queried. |
Assumptions
-----------
By design, this validator does not pass all email addresses that
strictly conform to the standards. Many email address forms are obsolete
or likely to cause trouble:
* The validator assumes the email address is intended to be
usable on the public Internet. The domain part
of the email address must be a resolvable domain name
(see the deliverability checks described above).
Most [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml)
and their subdomains, as well as
domain names without a `.`, are rejected as a syntax error
(except see the `test_environment` parameter above).
* Obsolete email syntaxes are rejected:
The unusual ["(comment)" syntax](https://github.com/JoshData/python-email-validator/issues/77)
is rejected. Extremely old obsolete syntaxes are
rejected. Quoted-string local parts and domain-literal addresses
are rejected by default, but there are options to allow them (see above).
No one uses these forms anymore, and I can't think of any reason why anyone
using this library would need to accept them.
Testing
-------
Tests can be run using
```sh
pip install -r test_requirements.txt
make test
```
Tests run with mocked DNS responses. When adding or changing tests, temporarily turn on the `BUILD_MOCKED_DNS_RESPONSE_DATA` flag in `tests/mocked_dns_responses.py` to re-build the database of mocked responses from live queries.
For Project Maintainers
-----------------------
The package is distributed as a universal wheel and as a source package.
To release:
* Update CHANGELOG.md.
* Update the version number in `email_validator/version.py`.
* Make & push a commit with the new version number and make sure tests pass.
* Make & push a tag (see command below).
* Make a release at https://github.com/JoshData/python-email-validator/releases/new.
* Publish a source and wheel distribution to pypi (see command below).
```sh
git tag v$(cat email_validator/version.py | sed "s/.* = //" | sed 's/"//g')
git push --tags
./release_to_pypi.sh
```
License
-------
This project is free of any copyright restrictions per the [Unlicense](https://unlicense.org/). (Prior to Feb. 4, 2024, the project was made available under the terms of the [CC0 1.0 Universal public domain dedication](http://creativecommons.org/publicdomain/zero/1.0/).) See [LICENSE](LICENSE) and [CONTRIBUTING.md](CONTRIBUTING.md).
../../../bin/email_validator,sha256=hFLktj2OUakfhW21WXjWtePBUupYbxMWCwwa_AS0-OA,262
email_validator-2.2.0.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
email_validator-2.2.0.dist-info/LICENSE,sha256=ZyF5dS4QkTSj-yvdB4Cyn9t6A5dPD1hqE66tUSlWLUw,1212
email_validator-2.2.0.dist-info/METADATA,sha256=vELkkg-p-qMuqNFX6uzDmMaruT7Pe5PDAQexHLAB4XM,25741
email_validator-2.2.0.dist-info/RECORD,,
email_validator-2.2.0.dist-info/REQUESTED,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
email_validator-2.2.0.dist-info/WHEEL,sha256=cpQTJ5IWu9CdaPViMhC9YzF8gZuS5-vlfoFihTBC86A,91
email_validator-2.2.0.dist-info/entry_points.txt,sha256=zRM_6bNIUSHTbNx5u6M3nK1MAguvryrc9hICC6HyrBg,66
email_validator-2.2.0.dist-info/top_level.txt,sha256=fYDOSWFZke46ut7WqdOAJjjhlpPYAaOwOwIsh3s8oWI,16
email_validator/__init__.py,sha256=g-TFM6vzpEt4dMG93giGlS343yXXXIy7EOLNFEn6DfA,4360
email_validator/__main__.py,sha256=TIvjaG_OSFRciH0J2pnEJEdX3uJy3ZgocmasEqh9EEI,2243
email_validator/__pycache__/__init__.cpython-39.pyc,,
email_validator/__pycache__/__main__.cpython-39.pyc,,
email_validator/__pycache__/deliverability.cpython-39.pyc,,
email_validator/__pycache__/exceptions_types.cpython-39.pyc,,
email_validator/__pycache__/rfc_constants.cpython-39.pyc,,
email_validator/__pycache__/syntax.cpython-39.pyc,,
email_validator/__pycache__/validate_email.cpython-39.pyc,,
email_validator/__pycache__/version.cpython-39.pyc,,
email_validator/deliverability.py,sha256=e6eODNSaLMiM29EZ3bWYDFkQDlMIdicBaykjYQJwYig,7222
email_validator/exceptions_types.py,sha256=yLxXqwtl5dXa-938K7skLP1pMFgi0oovzCs74mX7TGs,6024
email_validator/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
email_validator/rfc_constants.py,sha256=KVUshwIu699cle3UzDU2_fFBSQOO7p91Z_hrlNANtGM,2767
email_validator/syntax.py,sha256=Mo5KLgEsbQcvNzs8zO5QbhzUK4MAjL9yJFDpwsF12lY,36005
email_validator/validate_email.py,sha256=YUXY5Sv_mQ7Vuu_AmGdISza8v-VaABnNMLrlWv8EIl4,8401
email_validator/version.py,sha256=DKk-1b-rZsJFxFi1JoJ7TmEvIEQ0rf-C9HAZWwvjuM0,22
Wheel-Version: 1.0
Generator: setuptools (70.1.0)
Root-Is-Purelib: true
Tag: py3-none-any
[console_scripts]
email_validator = email_validator.__main__:main
from typing import TYPE_CHECKING
# Export the main method, helper methods, and the public data types.
from .exceptions_types import ValidatedEmail, EmailNotValidError, \
EmailSyntaxError, EmailUndeliverableError
from .validate_email import validate_email
from .version import __version__
__all__ = ["validate_email",
"ValidatedEmail", "EmailNotValidError",
"EmailSyntaxError", "EmailUndeliverableError",
"caching_resolver", "__version__"]
if TYPE_CHECKING:
from .deliverability import caching_resolver
else:
def caching_resolver(*args, **kwargs):
# Lazy load `deliverability` as it is slow to import (due to dns.resolver)
from .deliverability import caching_resolver
return caching_resolver(*args, **kwargs)
# These global attributes are a part of the library's API and can be
# changed by library users.
# Default values for keyword arguments.
ALLOW_SMTPUTF8 = True
ALLOW_QUOTED_LOCAL = False
ALLOW_DOMAIN_LITERAL = False
ALLOW_DISPLAY_NAME = False
GLOBALLY_DELIVERABLE = True
CHECK_DELIVERABILITY = True
TEST_ENVIRONMENT = False
DEFAULT_TIMEOUT = 15 # secs
# IANA Special Use Domain Names
# Last Updated 2021-09-21
# https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.txt
#
# The domain names without dots would be caught by the check that the domain
# name in an email address must have a period, but this list will also catch
# subdomains of these domains, which are also reserved.
SPECIAL_USE_DOMAIN_NAMES = [
# The "arpa" entry here is consolidated from a lot of arpa subdomains
# for private address (i.e. non-routable IP addresses like 172.16.x.x)
# reverse mapping, plus some other subdomains. Although RFC 6761 says
# that application software should not treat these domains as special,
# they are private-use domains and so cannot have globally deliverable
# email addresses, which is an assumption of this library, and probably
# all of arpa is similarly special-use, so we reject it all.
"arpa",
# RFC 6761 says applications "SHOULD NOT" treat the "example" domains
# as special, i.e. applications should accept these domains.
#
# The domain "example" alone fails our syntax validation because it
# lacks a dot (we assume no one has an email address on a TLD directly).
# "@example.com/net/org" will currently fail DNS-based deliverability
# checks because IANA publishes a NULL MX for these domains, and
# "@mail.example[.com/net/org]" and other subdomains will fail DNS-
# based deliverability checks because IANA does not publish MX or A
# DNS records for these subdomains.
# "example", # i.e. "wwww.example"
# "example.com",
# "example.net",
# "example.org",
# RFC 6761 says that applications are permitted to treat this domain
# as special and that DNS should return an immediate negative response,
# so we also immediately reject this domain, which also follows the
# purpose of the domain.
"invalid",
# RFC 6762 says that applications "may" treat ".local" as special and
# that "name resolution APIs and libraries SHOULD recognize these names
# as special," and since ".local" has no global definition, we reject
# it, as we expect email addresses to be gloally routable.
"local",
# RFC 6761 says that applications (like this library) are permitted
# to treat "localhost" as special, and since it cannot have a globally
# deliverable email address, we reject it.
"localhost",
# RFC 7686 says "applications that do not implement the Tor protocol
# SHOULD generate an error upon the use of .onion and SHOULD NOT
# perform a DNS lookup.
"onion",
# Although RFC 6761 says that application software should not treat
# these domains as special, it also warns users that the address may
# resolve differently in different systems, and therefore it cannot
# have a globally routable email address, which is an assumption of
# this library, so we reject "@test" and "@*.test" addresses, unless
# the test_environment keyword argument is given, to allow their use
# in application-level test environments. These domains will generally
# fail deliverability checks because "test" is not an actual TLD.
"test",
]
# A command-line tool for testing.
#
# Usage:
#
# python -m email_validator test@example.org
# python -m email_validator < LIST_OF_ADDRESSES.TXT
#
# Provide email addresses to validate either as a command-line argument
# or in STDIN separated by newlines. Validation errors will be printed for
# invalid email addresses. When passing an email address on the command
# line, if the email address is valid, information about it will be printed.
# When using STDIN, no output will be given for valid email addresses.
#
# Keyword arguments to validate_email can be set in environment variables
# of the same name but upprcase (see below).
import json
import os
import sys
from typing import Any, Dict, Optional
from .validate_email import validate_email, _Resolver
from .deliverability import caching_resolver
from .exceptions_types import EmailNotValidError
def main(dns_resolver: Optional[_Resolver] = None) -> None:
# The dns_resolver argument is for tests.
# Set options from environment variables.
options: Dict[str, Any] = {}
for varname in ('ALLOW_SMTPUTF8', 'ALLOW_QUOTED_LOCAL', 'ALLOW_DOMAIN_LITERAL',
'GLOBALLY_DELIVERABLE', 'CHECK_DELIVERABILITY', 'TEST_ENVIRONMENT'):
if varname in os.environ:
options[varname.lower()] = bool(os.environ[varname])
for varname in ('DEFAULT_TIMEOUT',):
if varname in os.environ:
options[varname.lower()] = float(os.environ[varname])
if len(sys.argv) == 1:
# Validate the email addresses pased line-by-line on STDIN.
dns_resolver = dns_resolver or caching_resolver()
for line in sys.stdin:
email = line.strip()
try:
validate_email(email, dns_resolver=dns_resolver, **options)
except EmailNotValidError as e:
print(f"{email} {e}")
else:
# Validate the email address passed on the command line.
email = sys.argv[1]
try:
result = validate_email(email, dns_resolver=dns_resolver, **options)
print(json.dumps(result.as_dict(), indent=2, sort_keys=True, ensure_ascii=False))
except EmailNotValidError as e:
print(e)
if __name__ == "__main__":
main()
from typing import Any, List, Optional, Tuple, TypedDict
import ipaddress
from .exceptions_types import EmailUndeliverableError
import dns.resolver
import dns.exception
def caching_resolver(*, timeout: Optional[int] = None, cache: Any = None, dns_resolver: Optional[dns.resolver.Resolver] = None) -> dns.resolver.Resolver:
if timeout is None:
from . import DEFAULT_TIMEOUT
timeout = DEFAULT_TIMEOUT
resolver = dns_resolver or dns.resolver.Resolver()
resolver.cache = cache or dns.resolver.LRUCache()
resolver.lifetime = timeout # timeout, in seconds
return resolver
DeliverabilityInfo = TypedDict("DeliverabilityInfo", {
"mx": List[Tuple[int, str]],
"mx_fallback_type": Optional[str],
"unknown-deliverability": str,
}, total=False)
def validate_email_deliverability(domain: str, domain_i18n: str, timeout: Optional[int] = None, dns_resolver: Optional[dns.resolver.Resolver] = None) -> DeliverabilityInfo:
# Check that the domain resolves to an MX record. If there is no MX record,
# try an A or AAAA record which is a deprecated fallback for deliverability.
# Raises an EmailUndeliverableError on failure. On success, returns a dict
# with deliverability information.
# If no dns.resolver.Resolver was given, get dnspython's default resolver.
# Override the default resolver's timeout. This may affect other uses of
# dnspython in this process.
if dns_resolver is None:
from . import DEFAULT_TIMEOUT
if timeout is None:
timeout = DEFAULT_TIMEOUT
dns_resolver = dns.resolver.get_default_resolver()
dns_resolver.lifetime = timeout
elif timeout is not None:
raise ValueError("It's not valid to pass both timeout and dns_resolver.")
deliverability_info: DeliverabilityInfo = {}
try:
try:
# Try resolving for MX records (RFC 5321 Section 5).
response = dns_resolver.resolve(domain, "MX")
# For reporting, put them in priority order and remove the trailing dot in the qnames.
mtas = sorted([(r.preference, str(r.exchange).rstrip('.')) for r in response])
# RFC 7505: Null MX (0, ".") records signify the domain does not accept email.
# Remove null MX records from the mtas list (but we've stripped trailing dots,
# so the 'exchange' is just "") so we can check if there are no non-null MX
# records remaining.
mtas = [(preference, exchange) for preference, exchange in mtas
if exchange != ""]
if len(mtas) == 0: # null MX only, if there were no MX records originally a NoAnswer exception would have occurred
raise EmailUndeliverableError(f"The domain name {domain_i18n} does not accept email.")
deliverability_info["mx"] = mtas
deliverability_info["mx_fallback_type"] = None
except dns.resolver.NoAnswer:
# If there was no MX record, fall back to an A or AAA record
# (RFC 5321 Section 5). Check A first since it's more common.
# If the A/AAAA response has no Globally Reachable IP address,
# treat the response as if it were NoAnswer, i.e., the following
# address types are not allowed fallbacks: Private-Use, Loopback,
# Link-Local, and some other obscure ranges. See
# https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml
# https://www.iana.org/assignments/iana-ipv6-special-registry/iana-ipv6-special-registry.xhtml
# (Issue #134.)
def is_global_addr(address: Any) -> bool:
try:
ipaddr = ipaddress.ip_address(address)
except ValueError:
return False
return ipaddr.is_global
try:
response = dns_resolver.resolve(domain, "A")
if not any(is_global_addr(r.address) for r in response):
raise dns.resolver.NoAnswer # fall back to AAAA
deliverability_info["mx"] = [(0, domain)]
deliverability_info["mx_fallback_type"] = "A"
except dns.resolver.NoAnswer:
# If there was no A record, fall back to an AAAA record.
# (It's unclear if SMTP servers actually do this.)
try:
response = dns_resolver.resolve(domain, "AAAA")
if not any(is_global_addr(r.address) for r in response):
raise dns.resolver.NoAnswer
deliverability_info["mx"] = [(0, domain)]
deliverability_info["mx_fallback_type"] = "AAAA"
except dns.resolver.NoAnswer as e:
# If there was no MX, A, or AAAA record, then mail to
# this domain is not deliverable, although the domain
# name has other records (otherwise NXDOMAIN would
# have been raised).
raise EmailUndeliverableError(f"The domain name {domain_i18n} does not accept email.") from e
# Check for a SPF (RFC 7208) reject-all record ("v=spf1 -all") which indicates
# no emails are sent from this domain (similar to a Null MX record
# but for sending rather than receiving). In combination with the
# absence of an MX record, this is probably a good sign that the
# domain is not used for email.
try:
response = dns_resolver.resolve(domain, "TXT")
for rec in response:
value = b"".join(rec.strings)
if value.startswith(b"v=spf1 "):
if value == b"v=spf1 -all":
raise EmailUndeliverableError(f"The domain name {domain_i18n} does not send email.")
except dns.resolver.NoAnswer:
# No TXT records means there is no SPF policy, so we cannot take any action.
pass
except dns.resolver.NXDOMAIN as e:
# The domain name does not exist --- there are no records of any sort
# for the domain name.
raise EmailUndeliverableError(f"The domain name {domain_i18n} does not exist.") from e
except dns.resolver.NoNameservers:
# All nameservers failed to answer the query. This might be a problem
# with local nameservers, maybe? We'll allow the domain to go through.
return {
"unknown-deliverability": "no_nameservers",
}
except dns.exception.Timeout:
# A timeout could occur for various reasons, so don't treat it as a failure.
return {
"unknown-deliverability": "timeout",
}
except EmailUndeliverableError:
# Don't let these get clobbered by the wider except block below.
raise
except Exception as e:
# Unhandled conditions should not propagate.
raise EmailUndeliverableError(
"There was an error while checking if the domain name in the email address is deliverable: " + str(e)
) from e
return deliverability_info
import warnings
from typing import Any, Dict, List, Optional, Tuple, Union
class EmailNotValidError(ValueError):
"""Parent class of all exceptions raised by this module."""
pass
class EmailSyntaxError(EmailNotValidError):
"""Exception raised when an email address fails validation because of its form."""
pass
class EmailUndeliverableError(EmailNotValidError):
"""Exception raised when an email address fails validation because its domain name does not appear deliverable."""
pass
class ValidatedEmail:
"""The validate_email function returns objects of this type holding the normalized form of the email address
and other information."""
"""The email address that was passed to validate_email. (If passed as bytes, this will be a string.)"""
original: str
"""The normalized email address, which should always be used in preference to the original address.
The normalized address converts an IDNA ASCII domain name to Unicode, if possible, and performs
Unicode normalization on the local part and on the domain (if originally Unicode). It is the
concatenation of the local_part and domain attributes, separated by an @-sign."""
normalized: str
"""The local part of the email address after Unicode normalization."""
local_part: str
"""The domain part of the email address after Unicode normalization or conversion to
Unicode from IDNA ascii."""
domain: str
"""If the domain part is a domain literal, the IPv4Address or IPv6Address object."""
domain_address: object
"""If not None, a form of the email address that uses 7-bit ASCII characters only."""
ascii_email: Optional[str]
"""If not None, the local part of the email address using 7-bit ASCII characters only."""
ascii_local_part: Optional[str]
"""A form of the domain name that uses 7-bit ASCII characters only."""
ascii_domain: str
"""If True, the SMTPUTF8 feature of your mail relay will be required to transmit messages
to this address. This flag is True just when ascii_local_part is missing. Otherwise it
is False."""
smtputf8: bool
"""If a deliverability check is performed and if it succeeds, a list of (priority, domain)
tuples of MX records specified in the DNS for the domain."""
mx: List[Tuple[int, str]]
"""If no MX records are actually specified in DNS and instead are inferred, through an obsolete
mechanism, from A or AAAA records, the value is the type of DNS record used instead (`A` or `AAAA`)."""
mx_fallback_type: Optional[str]
"""The display name in the original input text, unquoted and unescaped, or None."""
display_name: Optional[str]
def __repr__(self) -> str:
return f"<ValidatedEmail {self.normalized}>"
"""For backwards compatibility, support old field names."""
def __getattr__(self, key: str) -> str:
if key == "original_email":
return self.original
if key == "email":
return self.normalized
raise AttributeError(key)
@property
def email(self) -> str:
warnings.warn("ValidatedEmail.email is deprecated and will be removed, use ValidatedEmail.normalized instead", DeprecationWarning)
return self.normalized
"""For backwards compatibility, some fields are also exposed through a dict-like interface. Note
that some of the names changed when they became attributes."""
def __getitem__(self, key: str) -> Union[Optional[str], bool, List[Tuple[int, str]]]:
warnings.warn("dict-like access to the return value of validate_email is deprecated and may not be supported in the future.", DeprecationWarning, stacklevel=2)
if key == "email":
return self.normalized
if key == "email_ascii":
return self.ascii_email
if key == "local":
return self.local_part
if key == "domain":
return self.ascii_domain
if key == "domain_i18n":
return self.domain
if key == "smtputf8":
return self.smtputf8
if key == "mx":
return self.mx
if key == "mx-fallback":
return self.mx_fallback_type
raise KeyError()
"""Tests use this."""
def __eq__(self, other: object) -> bool:
if not isinstance(other, ValidatedEmail):
return False
return (
self.normalized == other.normalized
and self.local_part == other.local_part
and self.domain == other.domain
and getattr(self, 'ascii_email', None) == getattr(other, 'ascii_email', None)
and getattr(self, 'ascii_local_part', None) == getattr(other, 'ascii_local_part', None)
and getattr(self, 'ascii_domain', None) == getattr(other, 'ascii_domain', None)
and self.smtputf8 == other.smtputf8
and repr(sorted(self.mx) if getattr(self, 'mx', None) else None)
== repr(sorted(other.mx) if getattr(other, 'mx', None) else None)
and getattr(self, 'mx_fallback_type', None) == getattr(other, 'mx_fallback_type', None)
and getattr(self, 'display_name', None) == getattr(other, 'display_name', None)
)
"""This helps producing the README."""
def as_constructor(self) -> str:
return "ValidatedEmail(" \
+ ",".join(f"\n {key}={repr(getattr(self, key))}"
for key in ('normalized', 'local_part', 'domain',
'ascii_email', 'ascii_local_part', 'ascii_domain',
'smtputf8', 'mx', 'mx_fallback_type',
'display_name')
if hasattr(self, key)
) \
+ ")"
"""Convenience method for accessing ValidatedEmail as a dict"""
def as_dict(self) -> Dict[str, Any]:
d = self.__dict__
if d.get('domain_address'):
d['domain_address'] = repr(d['domain_address'])
return d
# These constants are defined by the email specifications.
import re
# Based on RFC 5322 3.2.3, these characters are permitted in email
# addresses (not taking into account internationalization) separated by dots:
ATEXT = r'a-zA-Z0-9_!#\$%&\'\*\+\-/=\?\^`\{\|\}~'
ATEXT_RE = re.compile('[.' + ATEXT + ']') # ATEXT plus dots
DOT_ATOM_TEXT = re.compile('[' + ATEXT + ']+(?:\\.[' + ATEXT + r']+)*\Z')
# RFC 6531 3.3 extends the allowed characters in internationalized
# addresses to also include three specific ranges of UTF8 defined in
# RFC 3629 section 4, which appear to be the Unicode code points from
# U+0080 to U+10FFFF.
ATEXT_INTL = ATEXT + "\u0080-\U0010FFFF"
ATEXT_INTL_DOT_RE = re.compile('[.' + ATEXT_INTL + ']') # ATEXT_INTL plus dots
DOT_ATOM_TEXT_INTL = re.compile('[' + ATEXT_INTL + ']+(?:\\.[' + ATEXT_INTL + r']+)*\Z')
# The domain part of the email address, after IDNA (ASCII) encoding,
# must also satisfy the requirements of RFC 952/RFC 1123 2.1 which
# restrict the allowed characters of hostnames further.
ATEXT_HOSTNAME_INTL = re.compile(r"[a-zA-Z0-9\-\." + "\u0080-\U0010FFFF" + "]")
HOSTNAME_LABEL = r'(?:(?:[a-zA-Z0-9][a-zA-Z0-9\-]*)?[a-zA-Z0-9])'
DOT_ATOM_TEXT_HOSTNAME = re.compile(HOSTNAME_LABEL + r'(?:\.' + HOSTNAME_LABEL + r')*\Z')
DOMAIN_NAME_REGEX = re.compile(r"[A-Za-z]\Z") # all TLDs currently end with a letter
# Domain literal (RFC 5322 3.4.1)
DOMAIN_LITERAL_CHARS = re.compile(r"[\u0021-\u00FA\u005E-\u007E]")
# Quoted-string local part (RFC 5321 4.1.2, internationalized by RFC 6531 3.3)
# The permitted characters in a quoted string are the characters in the range
# 32-126, except that quotes and (literal) backslashes can only appear when escaped
# by a backslash. When internationalized, UTF-8 strings are also permitted except
# the ASCII characters that are not previously permitted (see above).
# QUOTED_LOCAL_PART_ADDR = re.compile(r"^\"((?:[\u0020-\u0021\u0023-\u005B\u005D-\u007E]|\\[\u0020-\u007E])*)\"@(.*)")
QTEXT_INTL = re.compile(r"[\u0020-\u007E\u0080-\U0010FFFF]")
# Length constants
# RFC 3696 + errata 1003 + errata 1690 (https://www.rfc-editor.org/errata_search.php?rfc=3696&eid=1690)
# explains the maximum length of an email address is 254 octets.
EMAIL_MAX_LENGTH = 254
LOCAL_PART_MAX_LENGTH = 64
DNS_LABEL_LENGTH_LIMIT = 63 # in "octets", RFC 1035 2.3.1
DOMAIN_MAX_LENGTH = 253 # in "octets" as transmitted, RFC 1035 2.3.4 and RFC 5321 4.5.3.1.2, and see https://stackoverflow.com/questions/32290167/what-is-the-maximum-length-of-a-dns-name
# RFC 2142
CASE_INSENSITIVE_MAILBOX_NAMES = [
'info', 'marketing', 'sales', 'support', # section 3
'abuse', 'noc', 'security', # section 4
'postmaster', 'hostmaster', 'usenet', 'news', 'webmaster', 'www', 'uucp', 'ftp', # section 5
]
from .exceptions_types import EmailSyntaxError, ValidatedEmail
from .rfc_constants import EMAIL_MAX_LENGTH, LOCAL_PART_MAX_LENGTH, DOMAIN_MAX_LENGTH, \
DOT_ATOM_TEXT, DOT_ATOM_TEXT_INTL, ATEXT_RE, ATEXT_INTL_DOT_RE, ATEXT_HOSTNAME_INTL, QTEXT_INTL, \
DNS_LABEL_LENGTH_LIMIT, DOT_ATOM_TEXT_HOSTNAME, DOMAIN_NAME_REGEX, DOMAIN_LITERAL_CHARS
import re
import unicodedata
import idna # implements IDNA 2008; Python's codec is only IDNA 2003
import ipaddress
from typing import Optional, Tuple, TypedDict, Union
def split_email(email: str) -> Tuple[Optional[str], str, str, bool]:
# Return the display name, unescaped local part, and domain part
# of the address, and whether the local part was quoted. If no
# display name was present and angle brackets do not surround
# the address, display name will be None; otherwise, it will be
# set to the display name or the empty string if there were
# angle brackets but no display name.
# Typical email addresses have a single @-sign and no quote
# characters, but the awkward "quoted string" local part form
# (RFC 5321 4.1.2) allows @-signs and escaped quotes to appear
# in the local part if the local part is quoted.
# A `display name <addr>` format is also present in MIME messages
# (RFC 5322 3.4) and this format is also often recognized in
# mail UIs. It's not allowed in SMTP commands or in typical web
# login forms, but parsing it has been requested, so it's done
# here as a convenience. It's implemented in the spirit but not
# the letter of RFC 5322 3.4 because MIME messages allow newlines
# and comments as a part of the CFWS rule, but this is typically
# not allowed in mail UIs (although comment syntax was requested
# once too).
#
# Display names are either basic characters (the same basic characters
# permitted in email addresses, but periods are not allowed and spaces
# are allowed; see RFC 5322 Appendix A.1.2), or or a quoted string with
# the same rules as a quoted local part. (Multiple quoted strings might
# be allowed? Unclear.) Optional space (RFC 5322 3.4 CFWS) and then the
# email address follows in angle brackets.
#
# An initial quote is ambiguous between starting a display name or
# a quoted local part --- fun.
#
# We assume the input string is already stripped of leading and
# trailing CFWS.
def split_string_at_unquoted_special(text: str, specials: Tuple[str, ...]) -> Tuple[str, str]:
# Split the string at the first character in specials (an @-sign
# or left angle bracket) that does not occur within quotes and
# is not followed by a Unicode combining character.
# If no special character is found, raise an error.
inside_quote = False
escaped = False
left_part = ""
for i, c in enumerate(text):
# < plus U+0338 (Combining Long Solidus Overlay) normalizes to
# ≮ U+226E (Not Less-Than), and it would be confusing to treat
# the < as the start of "<email>" syntax in that case. Liekwise,
# if anything combines with an @ or ", we should probably not
# treat it as a special character.
if unicodedata.normalize("NFC", text[i:])[0] != c:
left_part += c
elif inside_quote:
left_part += c
if c == '\\' and not escaped:
escaped = True
elif c == '"' and not escaped:
# The only way to exit the quote is an unescaped quote.
inside_quote = False
escaped = False
else:
escaped = False
elif c == '"':
left_part += c
inside_quote = True
elif c in specials:
# When unquoted, stop before a special character.
break
else:
left_part += c
if len(left_part) == len(text):
raise EmailSyntaxError("An email address must have an @-sign.")
# The right part is whatever is left.
right_part = text[len(left_part):]
return left_part, right_part
def unquote_quoted_string(text: str) -> Tuple[str, bool]:
# Remove surrounding quotes and unescape escaped backslashes
# and quotes. Escapes are parsed liberally. I think only
# backslashes and quotes can be escaped but we'll allow anything
# to be.
quoted = False
escaped = False
value = ""
for i, c in enumerate(text):
if quoted:
if escaped:
value += c
escaped = False
elif c == '\\':
escaped = True
elif c == '"':
if i != len(text) - 1:
raise EmailSyntaxError("Extra character(s) found after close quote: "
+ ", ".join(safe_character_display(c) for c in text[i + 1:]))
break
else:
value += c
elif i == 0 and c == '"':
quoted = True
else:
value += c
return value, quoted
# Split the string at the first unquoted @-sign or left angle bracket.
left_part, right_part = split_string_at_unquoted_special(email, ("@", "<"))
# If the right part starts with an angle bracket,
# then the left part is a display name and the rest
# of the right part up to the final right angle bracket
# is the email address, .
if right_part.startswith("<"):
# Remove space between the display name and angle bracket.
left_part = left_part.rstrip()
# Unquote and unescape the display name.
display_name, display_name_quoted = unquote_quoted_string(left_part)
# Check that only basic characters are present in a
# non-quoted display name.
if not display_name_quoted:
bad_chars = {
safe_character_display(c)
for c in display_name
if (not ATEXT_RE.match(c) and c != ' ') or c == '.'
}
if bad_chars:
raise EmailSyntaxError("The display name contains invalid characters when not quoted: " + ", ".join(sorted(bad_chars)) + ".")
# Check for other unsafe characters.
check_unsafe_chars(display_name, allow_space=True)
# Check that the right part ends with an angle bracket
# but allow spaces after it, I guess.
if ">" not in right_part:
raise EmailSyntaxError("An open angle bracket at the start of the email address has to be followed by a close angle bracket at the end.")
right_part = right_part.rstrip(" ")
if right_part[-1] != ">":
raise EmailSyntaxError("There can't be anything after the email address.")
# Remove the initial and trailing angle brackets.
addr_spec = right_part[1:].rstrip(">")
# Split the email address at the first unquoted @-sign.
local_part, domain_part = split_string_at_unquoted_special(addr_spec, ("@",))
# Otherwise there is no display name. The left part is the local
# part and the right part is the domain.
else:
display_name = None
local_part, domain_part = left_part, right_part
if domain_part.startswith("@"):
domain_part = domain_part[1:]
# Unquote the local part if it is quoted.
local_part, is_quoted_local_part = unquote_quoted_string(local_part)
return display_name, local_part, domain_part, is_quoted_local_part
def get_length_reason(addr: str, limit: int) -> str:
"""Helper function to return an error message related to invalid length."""
diff = len(addr) - limit
suffix = "s" if diff > 1 else ""
return f"({diff} character{suffix} too many)"
def safe_character_display(c: str) -> str:
# Return safely displayable characters in quotes.
if c == '\\':
return f"\"{c}\"" # can't use repr because it escapes it
if unicodedata.category(c)[0] in ("L", "N", "P", "S"):
return repr(c)
# Construct a hex string in case the unicode name doesn't exist.
if ord(c) < 0xFFFF:
h = f"U+{ord(c):04x}".upper()
else:
h = f"U+{ord(c):08x}".upper()
# Return the character name or, if it has no name, the hex string.
return unicodedata.name(c, h)
class LocalPartValidationResult(TypedDict):
local_part: str
ascii_local_part: Optional[str]
smtputf8: bool
def validate_email_local_part(local: str, allow_smtputf8: bool = True, allow_empty_local: bool = False,
quoted_local_part: bool = False) -> LocalPartValidationResult:
"""Validates the syntax of the local part of an email address."""
if len(local) == 0:
if not allow_empty_local:
raise EmailSyntaxError("There must be something before the @-sign.")
# The caller allows an empty local part. Useful for validating certain
# Postfix aliases.
return {
"local_part": local,
"ascii_local_part": local,
"smtputf8": False,
}
# Check the length of the local part by counting characters.
# (RFC 5321 4.5.3.1.1)
# We're checking the number of characters here. If the local part
# is ASCII-only, then that's the same as bytes (octets). If it's
# internationalized, then the UTF-8 encoding may be longer, but
# that may not be relevant. We will check the total address length
# instead.
if len(local) > LOCAL_PART_MAX_LENGTH:
reason = get_length_reason(local, limit=LOCAL_PART_MAX_LENGTH)
raise EmailSyntaxError(f"The email address is too long before the @-sign {reason}.")
# Check the local part against the non-internationalized regular expression.
# Most email addresses match this regex so it's probably fastest to check this first.
# (RFC 5322 3.2.3)
# All local parts matching the dot-atom rule are also valid as a quoted string
# so if it was originally quoted (quoted_local_part is True) and this regex matches,
# it's ok.
# (RFC 5321 4.1.2 / RFC 5322 3.2.4).
if DOT_ATOM_TEXT.match(local):
# It's valid. And since it's just the permitted ASCII characters,
# it's normalized and safe. If the local part was originally quoted,
# the quoting was unnecessary and it'll be returned as normalized to
# non-quoted form.
# Return the local part and flag that SMTPUTF8 is not needed.
return {
"local_part": local,
"ascii_local_part": local,
"smtputf8": False,
}
# The local part failed the basic dot-atom check. Try the extended character set
# for internationalized addresses. It's the same pattern but with additional
# characters permitted.
# RFC 6531 section 3.3.
valid: Optional[str] = None
requires_smtputf8 = False
if DOT_ATOM_TEXT_INTL.match(local):
# But international characters in the local part may not be permitted.
if not allow_smtputf8:
# Check for invalid characters against the non-internationalized
# permitted character set.
# (RFC 5322 3.2.3)
bad_chars = {
safe_character_display(c)
for c in local
if not ATEXT_RE.match(c)
}
if bad_chars:
raise EmailSyntaxError("Internationalized characters before the @-sign are not supported: " + ", ".join(sorted(bad_chars)) + ".")
# Although the check above should always find something, fall back to this just in case.
raise EmailSyntaxError("Internationalized characters before the @-sign are not supported.")
# It's valid.
valid = "dot-atom"
requires_smtputf8 = True
# There are no syntactic restrictions on quoted local parts, so if
# it was originally quoted, it is probably valid. More characters
# are allowed, like @-signs, spaces, and quotes, and there are no
# restrictions on the placement of dots, as in dot-atom local parts.
elif quoted_local_part:
# Check for invalid characters in a quoted string local part.
# (RFC 5321 4.1.2. RFC 5322 lists additional permitted *obsolete*
# characters which are *not* allowed here. RFC 6531 section 3.3
# extends the range to UTF8 strings.)
bad_chars = {
safe_character_display(c)
for c in local
if not QTEXT_INTL.match(c)
}
if bad_chars:
raise EmailSyntaxError("The email address contains invalid characters in quotes before the @-sign: " + ", ".join(sorted(bad_chars)) + ".")
# See if any characters are outside of the ASCII range.
bad_chars = {
safe_character_display(c)
for c in local
if not (32 <= ord(c) <= 126)
}
if bad_chars:
requires_smtputf8 = True
# International characters in the local part may not be permitted.
if not allow_smtputf8:
raise EmailSyntaxError("Internationalized characters before the @-sign are not supported: " + ", ".join(sorted(bad_chars)) + ".")
# It's valid.
valid = "quoted"
# If the local part matches the internationalized dot-atom form or was quoted,
# perform additional checks for Unicode strings.
if valid:
# Check that the local part is a valid, safe, and sensible Unicode string.
# Some of this may be redundant with the range U+0080 to U+10FFFF that is checked
# by DOT_ATOM_TEXT_INTL and QTEXT_INTL. Other characters may be permitted by the
# email specs, but they may not be valid, safe, or sensible Unicode strings.
# See the function for rationale.
check_unsafe_chars(local, allow_space=(valid == "quoted"))
# Try encoding to UTF-8. Failure is possible with some characters like
# surrogate code points, but those are checked above. Still, we don't
# want to have an unhandled exception later.
try:
local.encode("utf8")
except ValueError as e:
raise EmailSyntaxError("The email address contains an invalid character.") from e
# If this address passes only by the quoted string form, re-quote it
# and backslash-escape quotes and backslashes (removing any unnecessary
# escapes). Per RFC 5321 4.1.2, "all quoted forms MUST be treated as equivalent,
# and the sending system SHOULD transmit the form that uses the minimum quoting possible."
if valid == "quoted":
local = '"' + re.sub(r'(["\\])', r'\\\1', local) + '"'
return {
"local_part": local,
"ascii_local_part": local if not requires_smtputf8 else None,
"smtputf8": requires_smtputf8,
}
# It's not a valid local part. Let's find out why.
# (Since quoted local parts are all valid or handled above, these checks
# don't apply in those cases.)
# Check for invalid characters.
# (RFC 5322 3.2.3, plus RFC 6531 3.3)
bad_chars = {
safe_character_display(c)
for c in local
if not ATEXT_INTL_DOT_RE.match(c)
}
if bad_chars:
raise EmailSyntaxError("The email address contains invalid characters before the @-sign: " + ", ".join(sorted(bad_chars)) + ".")
# Check for dot errors imposted by the dot-atom rule.
# (RFC 5322 3.2.3)
check_dot_atom(local, 'An email address cannot start with a {}.', 'An email address cannot have a {} immediately before the @-sign.', is_hostname=False)
# All of the reasons should already have been checked, but just in case
# we have a fallback message.
raise EmailSyntaxError("The email address contains invalid characters before the @-sign.")
def check_unsafe_chars(s: str, allow_space: bool = False) -> None:
# Check for unsafe characters or characters that would make the string
# invalid or non-sensible Unicode.
bad_chars = set()
for i, c in enumerate(s):
category = unicodedata.category(c)
if category[0] in ("L", "N", "P", "S"):
# Letters, numbers, punctuation, and symbols are permitted.
pass
elif category[0] == "M":
# Combining character in first position would combine with something
# outside of the email address if concatenated, so they are not safe.
# We also check if this occurs after the @-sign, which would not be
# sensible because it would modify the @-sign.
if i == 0:
bad_chars.add(c)
elif category == "Zs":
# Spaces outside of the ASCII range are not specifically disallowed in
# internationalized addresses as far as I can tell, but they violate
# the spirit of the non-internationalized specification that email
# addresses do not contain ASCII spaces when not quoted. Excluding
# ASCII spaces when not quoted is handled directly by the atom regex.
#
# In quoted-string local parts, spaces are explicitly permitted, and
# the ASCII space has category Zs, so we must allow it here, and we'll
# allow all Unicode spaces to be consistent.
if not allow_space:
bad_chars.add(c)
elif category[0] == "Z":
# The two line and paragraph separator characters (in categories Zl and Zp)
# are not specifically disallowed in internationalized addresses
# as far as I can tell, but they violate the spirit of the non-internationalized
# specification that email addresses do not contain line breaks when not quoted.
bad_chars.add(c)
elif category[0] == "C":
# Control, format, surrogate, private use, and unassigned code points (C)
# are all unsafe in various ways. Control and format characters can affect
# text rendering if the email address is concatenated with other text.
# Bidirectional format characters are unsafe, even if used properly, because
# they cause an email address to render as a different email address.
# Private use characters do not make sense for publicly deliverable
# email addresses.
bad_chars.add(c)
else:
# All categories should be handled above, but in case there is something new
# to the Unicode specification in the future, reject all other categories.
bad_chars.add(c)
if bad_chars:
raise EmailSyntaxError("The email address contains unsafe characters: "
+ ", ".join(safe_character_display(c) for c in sorted(bad_chars)) + ".")
def check_dot_atom(label: str, start_descr: str, end_descr: str, is_hostname: bool) -> None:
# RFC 5322 3.2.3
if label.endswith("."):
raise EmailSyntaxError(end_descr.format("period"))
if label.startswith("."):
raise EmailSyntaxError(start_descr.format("period"))
if ".." in label:
raise EmailSyntaxError("An email address cannot have two periods in a row.")
if is_hostname:
# RFC 952
if label.endswith("-"):
raise EmailSyntaxError(end_descr.format("hyphen"))
if label.startswith("-"):
raise EmailSyntaxError(start_descr.format("hyphen"))
if ".-" in label or "-." in label:
raise EmailSyntaxError("An email address cannot have a period and a hyphen next to each other.")
class DomainNameValidationResult(TypedDict):
ascii_domain: str
domain: str
def validate_email_domain_name(domain: str, test_environment: bool = False, globally_deliverable: bool = True) -> DomainNameValidationResult:
"""Validates the syntax of the domain part of an email address."""
# Check for invalid characters.
# (RFC 952 plus RFC 6531 section 3.3 for internationalized addresses)
bad_chars = {
safe_character_display(c)
for c in domain
if not ATEXT_HOSTNAME_INTL.match(c)
}
if bad_chars:
raise EmailSyntaxError("The part after the @-sign contains invalid characters: " + ", ".join(sorted(bad_chars)) + ".")
# Check for unsafe characters.
# Some of this may be redundant with the range U+0080 to U+10FFFF that is checked
# by DOT_ATOM_TEXT_INTL. Other characters may be permitted by the email specs, but
# they may not be valid, safe, or sensible Unicode strings.
check_unsafe_chars(domain)
# Perform UTS-46 normalization, which includes casefolding, NFC normalization,
# and converting all label separators (the period/full stop, fullwidth full stop,
# ideographic full stop, and halfwidth ideographic full stop) to regular dots.
# It will also raise an exception if there is an invalid character in the input,
# such as "⒈" which is invalid because it would expand to include a dot and
# U+1FEF which normalizes to a backtick, which is not an allowed hostname character.
# Since several characters *are* normalized to a dot, this has to come before
# checks related to dots, like check_dot_atom which comes next.
original_domain = domain
try:
domain = idna.uts46_remap(domain, std3_rules=False, transitional=False)
except idna.IDNAError as e:
raise EmailSyntaxError(f"The part after the @-sign contains invalid characters ({e}).") from e
# Check for invalid characters after Unicode normalization which are not caught
# by uts46_remap (see tests for examples).
bad_chars = {
safe_character_display(c)
for c in domain
if not ATEXT_HOSTNAME_INTL.match(c)
}
if bad_chars:
raise EmailSyntaxError("The part after the @-sign contains invalid characters after Unicode normalization: " + ", ".join(sorted(bad_chars)) + ".")
# The domain part is made up dot-separated "labels." Each label must
# have at least one character and cannot start or end with dashes, which
# means there are some surprising restrictions on periods and dashes.
# Check that before we do IDNA encoding because the IDNA library gives
# unfriendly errors for these cases, but after UTS-46 normalization because
# it can insert periods and hyphens (from fullwidth characters).
# (RFC 952, RFC 1123 2.1, RFC 5322 3.2.3)
check_dot_atom(domain, 'An email address cannot have a {} immediately after the @-sign.', 'An email address cannot end with a {}.', is_hostname=True)
# Check for RFC 5890's invalid R-LDH labels, which are labels that start
# with two characters other than "xn" and two dashes.
for label in domain.split("."):
if re.match(r"(?!xn)..--", label, re.I):
raise EmailSyntaxError("An email address cannot have two letters followed by two dashes immediately after the @-sign or after a period, except Punycode.")
if DOT_ATOM_TEXT_HOSTNAME.match(domain):
# This is a valid non-internationalized domain.
ascii_domain = domain
else:
# If international characters are present in the domain name, convert
# the domain to IDNA ASCII. If internationalized characters are present,
# the MTA must either support SMTPUTF8 or the mail client must convert the
# domain name to IDNA before submission.
#
# For ASCII-only domains, the transformation does nothing and is safe to
# apply. However, to ensure we don't rely on the idna library for basic
# syntax checks, we don't use it if it's not needed.
#
# idna.encode also checks the domain name length after encoding but it
# doesn't give a nice error, so we call the underlying idna.alabel method
# directly. idna.alabel checks label length and doesn't give great messages,
# but we can't easily go to lower level methods.
try:
ascii_domain = ".".join(
idna.alabel(label).decode("ascii")
for label in domain.split(".")
)
except idna.IDNAError as e:
# Some errors would have already been raised by idna.uts46_remap.
raise EmailSyntaxError(f"The part after the @-sign is invalid ({e}).") from e
# Check the syntax of the string returned by idna.encode.
# It should never fail.
if not DOT_ATOM_TEXT_HOSTNAME.match(ascii_domain):
raise EmailSyntaxError("The email address contains invalid characters after the @-sign after IDNA encoding.")
# Check the length of the domain name in bytes.
# (RFC 1035 2.3.4 and RFC 5321 4.5.3.1.2)
# We're checking the number of bytes ("octets") here, which can be much
# higher than the number of characters in internationalized domains,
# on the assumption that the domain may be transmitted without SMTPUTF8
# as IDNA ASCII. (This is also checked by idna.encode, so this exception
# is never reached for internationalized domains.)
if len(ascii_domain) > DOMAIN_MAX_LENGTH:
if ascii_domain == original_domain:
reason = get_length_reason(ascii_domain, limit=DOMAIN_MAX_LENGTH)
raise EmailSyntaxError(f"The email address is too long after the @-sign {reason}.")
else:
diff = len(ascii_domain) - DOMAIN_MAX_LENGTH
s = "" if diff == 1 else "s"
raise EmailSyntaxError(f"The email address is too long after the @-sign ({diff} byte{s} too many after IDNA encoding).")
# Also check the label length limit.
# (RFC 1035 2.3.1)
for label in ascii_domain.split("."):
if len(label) > DNS_LABEL_LENGTH_LIMIT:
reason = get_length_reason(label, limit=DNS_LABEL_LENGTH_LIMIT)
raise EmailSyntaxError(f"After the @-sign, periods cannot be separated by so many characters {reason}.")
if globally_deliverable:
# All publicly deliverable addresses have domain names with at least
# one period, at least for gTLDs created since 2013 (per the ICANN Board
# New gTLD Program Committee, https://www.icann.org/en/announcements/details/new-gtld-dotless-domain-names-prohibited-30-8-2013-en).
# We'll consider the lack of a period a syntax error
# since that will match people's sense of what an email address looks
# like. We'll skip this in test environments to allow '@test' email
# addresses.
if "." not in ascii_domain and not (ascii_domain == "test" and test_environment):
raise EmailSyntaxError("The part after the @-sign is not valid. It should have a period.")
# We also know that all TLDs currently end with a letter.
if not DOMAIN_NAME_REGEX.search(ascii_domain):
raise EmailSyntaxError("The part after the @-sign is not valid. It is not within a valid top-level domain.")
# Check special-use and reserved domain names.
# Some might fail DNS-based deliverability checks, but that
# can be turned off, so we should fail them all sooner.
# See the references in __init__.py.
from . import SPECIAL_USE_DOMAIN_NAMES
for d in SPECIAL_USE_DOMAIN_NAMES:
# See the note near the definition of SPECIAL_USE_DOMAIN_NAMES.
if d == "test" and test_environment:
continue
if ascii_domain == d or ascii_domain.endswith("." + d):
raise EmailSyntaxError("The part after the @-sign is a special-use or reserved name that cannot be used with email.")
# We may have been given an IDNA ASCII domain to begin with. Check
# that the domain actually conforms to IDNA. It could look like IDNA
# but not be actual IDNA. For ASCII-only domains, the conversion out
# of IDNA just gives the same thing back.
#
# This gives us the canonical internationalized form of the domain,
# which we return to the caller as a part of the normalized email
# address.
try:
domain_i18n = idna.decode(ascii_domain.encode('ascii'))
except idna.IDNAError as e:
raise EmailSyntaxError(f"The part after the @-sign is not valid IDNA ({e}).") from e
# Check that this normalized domain name has not somehow become
# an invalid domain name. All of the checks before this point
# using the idna package probably guarantee that we now have
# a valid international domain name in most respects. But it
# doesn't hurt to re-apply some tests to be sure. See the similar
# tests above.
# Check for invalid and unsafe characters. We have no test
# case for this.
bad_chars = {
safe_character_display(c)
for c in domain
if not ATEXT_HOSTNAME_INTL.match(c)
}
if bad_chars:
raise EmailSyntaxError("The part after the @-sign contains invalid characters: " + ", ".join(sorted(bad_chars)) + ".")
check_unsafe_chars(domain)
# Check that it can be encoded back to IDNA ASCII. We have no test
# case for this.
try:
idna.encode(domain_i18n)
except idna.IDNAError as e:
raise EmailSyntaxError(f"The part after the @-sign became invalid after normalizing to international characters ({e}).") from e
# Return the IDNA ASCII-encoded form of the domain, which is how it
# would be transmitted on the wire (except when used with SMTPUTF8
# possibly), as well as the canonical Unicode form of the domain,
# which is better for display purposes. This should also take care
# of RFC 6532 section 3.1's suggestion to apply Unicode NFC
# normalization to addresses.
return {
"ascii_domain": ascii_domain,
"domain": domain_i18n,
}
def validate_email_length(addrinfo: ValidatedEmail) -> None:
# There are three forms of the email address whose length must be checked:
#
# 1) The original email address string. Since callers may continue to use
# this string, even though we recommend using the normalized form, we
# should not pass validation when the original input is not valid. This
# form is checked first because it is the original input.
# 2) The normalized email address. We perform Unicode NFC normalization of
# the local part, we normalize the domain to internationalized characters
# (if originaly IDNA ASCII) which also includes Unicode normalization,
# and we may remove quotes in quoted local parts. We recommend that
# callers use this string, so it must be valid.
# 3) The email address with the IDNA ASCII representation of the domain
# name, since this string may be used with email stacks that don't
# support UTF-8. Since this is the least likely to be used by callers,
# it is checked last. Note that ascii_email will only be set if the
# local part is ASCII, but conceivably the caller may combine a
# internationalized local part with an ASCII domain, so we check this
# on that combination also. Since we only return the normalized local
# part, we use that (and not the unnormalized local part).
#
# In all cases, the length is checked in UTF-8 because the SMTPUTF8
# extension to SMTP validates the length in bytes.
addresses_to_check = [
(addrinfo.original, None),
(addrinfo.normalized, "after normalization"),
((addrinfo.ascii_local_part or addrinfo.local_part or "") + "@" + addrinfo.ascii_domain, "when the part after the @-sign is converted to IDNA ASCII"),
]
for addr, reason in addresses_to_check:
addr_len = len(addr)
addr_utf8_len = len(addr.encode("utf8"))
diff = addr_utf8_len - EMAIL_MAX_LENGTH
if diff > 0:
if reason is None and addr_len == addr_utf8_len:
# If there is no normalization or transcoding,
# we can give a simple count of the number of
# characters over the limit.
reason = get_length_reason(addr, limit=EMAIL_MAX_LENGTH)
elif reason is None:
# If there is no normalization but there is
# some transcoding to UTF-8, we can compute
# the minimum number of characters over the
# limit by dividing the number of bytes over
# the limit by the maximum number of bytes
# per character.
mbpc = max(len(c.encode("utf8")) for c in addr)
mchars = max(1, diff // mbpc)
suffix = "s" if diff > 1 else ""
if mchars == diff:
reason = f"({diff} character{suffix} too many)"
else:
reason = f"({mchars}-{diff} character{suffix} too many)"
else:
# Since there is normalization, the number of
# characters in the input that need to change is
# impossible to know.
suffix = "s" if diff > 1 else ""
reason += f" ({diff} byte{suffix} too many)"
raise EmailSyntaxError(f"The email address is too long {reason}.")
class DomainLiteralValidationResult(TypedDict):
domain_address: Union[ipaddress.IPv4Address, ipaddress.IPv6Address]
domain: str
def validate_email_domain_literal(domain_literal: str) -> DomainLiteralValidationResult:
# This is obscure domain-literal syntax. Parse it and return
# a compressed/normalized address.
# RFC 5321 4.1.3 and RFC 5322 3.4.1.
addr: Union[ipaddress.IPv4Address, ipaddress.IPv6Address]
# Try to parse the domain literal as an IPv4 address.
# There is no tag for IPv4 addresses, so we can never
# be sure if the user intends an IPv4 address.
if re.match(r"^[0-9\.]+$", domain_literal):
try:
addr = ipaddress.IPv4Address(domain_literal)
except ValueError as e:
raise EmailSyntaxError(f"The address in brackets after the @-sign is not valid: It is not an IPv4 address ({e}) or is missing an address literal tag.") from e
# Return the IPv4Address object and the domain back unchanged.
return {
"domain_address": addr,
"domain": f"[{addr}]",
}
# If it begins with "IPv6:" it's an IPv6 address.
if domain_literal.startswith("IPv6:"):
try:
addr = ipaddress.IPv6Address(domain_literal[5:])
except ValueError as e:
raise EmailSyntaxError(f"The IPv6 address in brackets after the @-sign is not valid ({e}).") from e
# Return the IPv6Address object and construct a normalized
# domain literal.
return {
"domain_address": addr,
"domain": f"[IPv6:{addr.compressed}]",
}
# Nothing else is valid.
if ":" not in domain_literal:
raise EmailSyntaxError("The part after the @-sign in brackets is not an IPv4 address and has no address literal tag.")
# The tag (the part before the colon) has character restrictions,
# but since it must come from a registry of tags (in which only "IPv6" is defined),
# there's no need to check the syntax of the tag. See RFC 5321 4.1.2.
# Check for permitted ASCII characters. This actually doesn't matter
# since there will be an exception after anyway.
bad_chars = {
safe_character_display(c)
for c in domain_literal
if not DOMAIN_LITERAL_CHARS.match(c)
}
if bad_chars:
raise EmailSyntaxError("The part after the @-sign contains invalid characters in brackets: " + ", ".join(sorted(bad_chars)) + ".")
# There are no other domain literal tags.
# https://www.iana.org/assignments/address-literal-tags/address-literal-tags.xhtml
raise EmailSyntaxError("The part after the @-sign contains an invalid address literal tag in brackets.")
from typing import Optional, Union, TYPE_CHECKING
import unicodedata
from .exceptions_types import EmailSyntaxError, ValidatedEmail
from .syntax import split_email, validate_email_local_part, validate_email_domain_name, validate_email_domain_literal, validate_email_length
from .rfc_constants import CASE_INSENSITIVE_MAILBOX_NAMES
if TYPE_CHECKING:
import dns.resolver
_Resolver = dns.resolver.Resolver
else:
_Resolver = object
def validate_email(
email: Union[str, bytes],
/, # prior arguments are positional-only
*, # subsequent arguments are keyword-only
allow_smtputf8: Optional[bool] = None,
allow_empty_local: bool = False,
allow_quoted_local: Optional[bool] = None,
allow_domain_literal: Optional[bool] = None,
allow_display_name: Optional[bool] = None,
check_deliverability: Optional[bool] = None,
test_environment: Optional[bool] = None,
globally_deliverable: Optional[bool] = None,
timeout: Optional[int] = None,
dns_resolver: Optional[_Resolver] = None
) -> ValidatedEmail:
"""
Given an email address, and some options, returns a ValidatedEmail instance
with information about the address if it is valid or, if the address is not
valid, raises an EmailNotValidError. This is the main function of the module.
"""
# Fill in default values of arguments.
from . import ALLOW_SMTPUTF8, ALLOW_QUOTED_LOCAL, ALLOW_DOMAIN_LITERAL, ALLOW_DISPLAY_NAME, \
GLOBALLY_DELIVERABLE, CHECK_DELIVERABILITY, TEST_ENVIRONMENT, DEFAULT_TIMEOUT
if allow_smtputf8 is None:
allow_smtputf8 = ALLOW_SMTPUTF8
if allow_quoted_local is None:
allow_quoted_local = ALLOW_QUOTED_LOCAL
if allow_domain_literal is None:
allow_domain_literal = ALLOW_DOMAIN_LITERAL
if allow_display_name is None:
allow_display_name = ALLOW_DISPLAY_NAME
if check_deliverability is None:
check_deliverability = CHECK_DELIVERABILITY
if test_environment is None:
test_environment = TEST_ENVIRONMENT
if globally_deliverable is None:
globally_deliverable = GLOBALLY_DELIVERABLE
if timeout is None and dns_resolver is None:
timeout = DEFAULT_TIMEOUT
# Allow email to be a str or bytes instance. If bytes,
# it must be ASCII because that's how the bytes work
# on the wire with SMTP.
if not isinstance(email, str):
try:
email = email.decode("ascii")
except ValueError as e:
raise EmailSyntaxError("The email address is not valid ASCII.") from e
# Split the address into the display name (or None), the local part
# (before the @-sign), and the domain part (after the @-sign).
# Normally, there is only one @-sign. But the awkward "quoted string"
# local part form (RFC 5321 4.1.2) allows @-signs in the local
# part if the local part is quoted.
display_name, local_part, domain_part, is_quoted_local_part \
= split_email(email)
# Collect return values in this instance.
ret = ValidatedEmail()
ret.original = ((local_part if not is_quoted_local_part
else ('"' + local_part + '"'))
+ "@" + domain_part) # drop the display name, if any, for email length tests at the end
ret.display_name = display_name
# Validate the email address's local part syntax and get a normalized form.
# If the original address was quoted and the decoded local part is a valid
# unquoted local part, then we'll get back a normalized (unescaped) local
# part.
local_part_info = validate_email_local_part(local_part,
allow_smtputf8=allow_smtputf8,
allow_empty_local=allow_empty_local,
quoted_local_part=is_quoted_local_part)
ret.local_part = local_part_info["local_part"]
ret.ascii_local_part = local_part_info["ascii_local_part"]
ret.smtputf8 = local_part_info["smtputf8"]
# RFC 6532 section 3.1 says that Unicode NFC normalization should be applied,
# so we'll return the NFC-normalized local part. Since the caller may use that
# string in place of the original string, ensure it is also valid.
normalized_local_part = unicodedata.normalize("NFC", ret.local_part)
if normalized_local_part != ret.local_part:
try:
validate_email_local_part(normalized_local_part,
allow_smtputf8=allow_smtputf8,
allow_empty_local=allow_empty_local,
quoted_local_part=is_quoted_local_part)
except EmailSyntaxError as e:
raise EmailSyntaxError("After Unicode normalization: " + str(e)) from e
ret.local_part = normalized_local_part
# If a quoted local part isn't allowed but is present, now raise an exception.
# This is done after any exceptions raised by validate_email_local_part so
# that mandatory checks have highest precedence.
if is_quoted_local_part and not allow_quoted_local:
raise EmailSyntaxError("Quoting the part before the @-sign is not allowed here.")
# Some local parts are required to be case-insensitive, so we should normalize
# to lowercase.
# RFC 2142
if ret.ascii_local_part is not None \
and ret.ascii_local_part.lower() in CASE_INSENSITIVE_MAILBOX_NAMES \
and ret.local_part is not None:
ret.ascii_local_part = ret.ascii_local_part.lower()
ret.local_part = ret.local_part.lower()
# Validate the email address's domain part syntax and get a normalized form.
is_domain_literal = False
if len(domain_part) == 0:
raise EmailSyntaxError("There must be something after the @-sign.")
elif domain_part.startswith("[") and domain_part.endswith("]"):
# Parse the address in the domain literal and get back a normalized domain.
domain_literal_info = validate_email_domain_literal(domain_part[1:-1])
if not allow_domain_literal:
raise EmailSyntaxError("A bracketed IP address after the @-sign is not allowed here.")
ret.domain = domain_literal_info["domain"]
ret.ascii_domain = domain_literal_info["domain"] # Domain literals are always ASCII.
ret.domain_address = domain_literal_info["domain_address"]
is_domain_literal = True # Prevent deliverability checks.
else:
# Check the syntax of the domain and get back a normalized
# internationalized and ASCII form.
domain_name_info = validate_email_domain_name(domain_part, test_environment=test_environment, globally_deliverable=globally_deliverable)
ret.domain = domain_name_info["domain"]
ret.ascii_domain = domain_name_info["ascii_domain"]
# Construct the complete normalized form.
ret.normalized = ret.local_part + "@" + ret.domain
# If the email address has an ASCII form, add it.
if not ret.smtputf8:
if not ret.ascii_domain:
raise Exception("Missing ASCII domain.")
ret.ascii_email = (ret.ascii_local_part or "") + "@" + ret.ascii_domain
else:
ret.ascii_email = None
# Check the length of the address.
validate_email_length(ret)
# Check that a display name is permitted. It's the last syntax check
# because we always check against optional parsing features last.
if display_name is not None and not allow_display_name:
raise EmailSyntaxError("A display name and angle brackets around the email address are not permitted here.")
if check_deliverability and not test_environment:
# Validate the email address's deliverability using DNS
# and update the returned ValidatedEmail object with metadata.
if is_domain_literal:
# There is nothing to check --- skip deliverability checks.
return ret
# Lazy load `deliverability` as it is slow to import (due to dns.resolver)
from .deliverability import validate_email_deliverability
deliverability_info = validate_email_deliverability(
ret.ascii_domain, ret.domain, timeout, dns_resolver
)
mx = deliverability_info.get("mx")
if mx is not None:
ret.mx = mx
ret.mx_fallback_type = deliverability_info.get("mx_fallback_type")
return ret
Metadata-Version: 2.1
Name: pydantic-settings
Version: 2.0.3
Summary: Settings management using Pydantic
Project-URL: Homepage, https://github.com/pydantic/pydantic-settings
Project-URL: Funding, https://github.com/sponsors/samuelcolvin
Project-URL: Source, https://github.com/pydantic/pydantic-settings
Project-URL: Changelog, https://github.com/pydantic/pydantic-settings/releases
Project-URL: Documentation, https://docs.pydantic.dev/dev-v2/usage/pydantic_settings/
Author-email: Samuel Colvin <s@muelcolvin.com>, Eric Jolibois <em.jolibois@gmail.com>, Hasan Ramezani <hasan.r67@gmail.com>
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Environment :: MacOS X
Classifier: Framework :: Pydantic
Classifier: Framework :: Pydantic :: 2
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Internet
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.7
Requires-Dist: pydantic>=2.0.1
Requires-Dist: python-dotenv>=0.21.0
Description-Content-Type: text/markdown
# pydantic-settings
[![CI](https://github.com/pydantic/pydantic-settings/workflows/CI/badge.svg?event=push)](https://github.com/pydantic/pydantic-settings/actions?query=event%3Apush+branch%3Amain+workflow%3ACI)
[![Coverage](https://codecov.io/gh/pydantic/pydantic-settings/branch/main/graph/badge.svg)](https://codecov.io/gh/pydantic/pydantic-settings)
[![pypi](https://img.shields.io/pypi/v/pydantic-settings.svg)](https://pypi.python.org/pypi/pydantic-settings)
[![license](https://img.shields.io/github/license/pydantic/pydantic-settings.svg)](https://github.com/pydantic/pydantic-settings/blob/main/LICENSE)
Settings management using Pydantic, this is the new official home of Pydantic's `BaseSettings`.
This package was kindly donated to the [Pydantic organisation](https://github.com/pydantic) by Daniel Daniels, see [pydantic/pydantic#4492](https://github.com/pydantic/pydantic/pull/4492) for discussion.
For the old "Hipster-orgazmic tool to mange application settings" package, see [version 0.2.5](https://pypi.org/project/pydantic-settings/0.2.5/).
See [documentation](https://docs.pydantic.dev/latest/usage/pydantic_settings/) for more details.
pydantic_settings-2.0.3.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
pydantic_settings-2.0.3.dist-info/METADATA,sha256=iuDM6bM6VDeLKrOyfSRQiE4Bp_SqFNmDvNYxjNlojEU,2924
pydantic_settings-2.0.3.dist-info/RECORD,,
pydantic_settings-2.0.3.dist-info/REQUESTED,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
pydantic_settings-2.0.3.dist-info/WHEEL,sha256=9QBuHhg6FNW7lppboF2vKVbCGTVzsFykgRQjjlajrhA,87
pydantic_settings-2.0.3.dist-info/licenses/LICENSE,sha256=6zVadT4CA0bTPYO_l2kTW4n8YQVorFMaAcKVvO5_2Zg,1103
pydantic_settings/__init__.py,sha256=h0HRyW_I6s0YYFIB-qx8gNZOtDI8vCbXnwPbp4BqwzE,482
pydantic_settings/__pycache__/__init__.cpython-39.pyc,,
pydantic_settings/__pycache__/main.cpython-39.pyc,,
pydantic_settings/__pycache__/sources.cpython-39.pyc,,
pydantic_settings/__pycache__/utils.cpython-39.pyc,,
pydantic_settings/__pycache__/version.cpython-39.pyc,,
pydantic_settings/main.py,sha256=DPJPyjM9g7CgaB8-zuoydot1iYVuLOb05rJZUXDt1-o,7178
pydantic_settings/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
pydantic_settings/sources.py,sha256=ruCzD_1mL9e20o-33B7n46cTE5COCJ0524w29uED5BM,24857
pydantic_settings/utils.py,sha256=nomYSaFO_IegfWSL9KJ8SAtLZgyhcruLgE3dTHwSmgo,557
pydantic_settings/version.py,sha256=gemzbOzXm8MxToVh3wokBkbvZFRFfCkFQumP9kJFca4,18
Wheel-Version: 1.0
Generator: hatchling 1.18.0
Root-Is-Purelib: true
Tag: py3-none-any
The MIT License (MIT)
Copyright (c) 2022 Samuel Colvin and other contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
from .main import BaseSettings, SettingsConfigDict
from .sources import (
DotEnvSettingsSource,
EnvSettingsSource,
InitSettingsSource,
PydanticBaseSettingsSource,
SecretsSettingsSource,
)
from .version import VERSION
__all__ = (
'BaseSettings',
'DotEnvSettingsSource',
'EnvSettingsSource',
'InitSettingsSource',
'PydanticBaseSettingsSource',
'SecretsSettingsSource',
'SettingsConfigDict',
'__version__',
)
__version__ = VERSION
from __future__ import annotations as _annotations
from pathlib import Path
from typing import Any, ClassVar
from pydantic import ConfigDict
from pydantic._internal._config import config_keys
from pydantic._internal._utils import deep_update
from pydantic.main import BaseModel
from .sources import (
ENV_FILE_SENTINEL,
DotEnvSettingsSource,
DotenvType,
EnvSettingsSource,
InitSettingsSource,
PydanticBaseSettingsSource,
SecretsSettingsSource,
)
class SettingsConfigDict(ConfigDict, total=False):
case_sensitive: bool
env_prefix: str
env_file: DotenvType | None
env_file_encoding: str | None
env_nested_delimiter: str | None
secrets_dir: str | Path | None
# Extend `config_keys` by pydantic settings config keys to
# support setting config through class kwargs.
# Pydantic uses `config_keys` in `pydantic._internal._config.ConfigWrapper.for_model`
# to extract config keys from model kwargs, So, by adding pydantic settings keys to
# `config_keys`, they will be considered as valid config keys and will be collected
# by Pydantic.
config_keys |= set(SettingsConfigDict.__annotations__.keys())
class BaseSettings(BaseModel):
"""
Base class for settings, allowing values to be overridden by environment variables.
This is useful in production for secrets you do not wish to save in code, it plays nicely with docker(-compose),
Heroku and any 12 factor app design.
All the below attributes can be set via `model_config`.
Args:
_case_sensitive: Whether environment variables names should be read with case-sensitivity. Defaults to `None`.
_env_prefix: Prefix for all environment variables. Defaults to `None`.
_env_file: The env file(s) to load settings values from. Defaults to `Path('')`, which
means that the value from `model_config['env_file']` should be used. You can also pass
`None` to indicate that environment variables should not be loaded from an env file.
_env_file_encoding: The env file encoding, e.g. `'latin-1'`. Defaults to `None`.
_env_nested_delimiter: The nested env values delimiter. Defaults to `None`.
_secrets_dir: The secret files directory. Defaults to `None`.
"""
def __init__(
__pydantic_self__,
_case_sensitive: bool | None = None,
_env_prefix: str | None = None,
_env_file: DotenvType | None = ENV_FILE_SENTINEL,
_env_file_encoding: str | None = None,
_env_nested_delimiter: str | None = None,
_secrets_dir: str | Path | None = None,
**values: Any,
) -> None:
# Uses something other than `self` the first arg to allow "self" as a settable attribute
super().__init__(
**__pydantic_self__._settings_build_values(
values,
_case_sensitive=_case_sensitive,
_env_prefix=_env_prefix,
_env_file=_env_file,
_env_file_encoding=_env_file_encoding,
_env_nested_delimiter=_env_nested_delimiter,
_secrets_dir=_secrets_dir,
)
)
@classmethod
def settings_customise_sources(
cls,
settings_cls: type[BaseSettings],
init_settings: PydanticBaseSettingsSource,
env_settings: PydanticBaseSettingsSource,
dotenv_settings: PydanticBaseSettingsSource,
file_secret_settings: PydanticBaseSettingsSource,
) -> tuple[PydanticBaseSettingsSource, ...]:
"""
Define the sources and their order for loading the settings values.
Args:
settings_cls: The Settings class.
init_settings: The `InitSettingsSource` instance.
env_settings: The `EnvSettingsSource` instance.
dotenv_settings: The `DotEnvSettingsSource` instance.
file_secret_settings: The `SecretsSettingsSource` instance.
Returns:
A tuple containing the sources and their order for loading the settings values.
"""
return init_settings, env_settings, dotenv_settings, file_secret_settings
def _settings_build_values(
self,
init_kwargs: dict[str, Any],
_case_sensitive: bool | None = None,
_env_prefix: str | None = None,
_env_file: DotenvType | None = None,
_env_file_encoding: str | None = None,
_env_nested_delimiter: str | None = None,
_secrets_dir: str | Path | None = None,
) -> dict[str, Any]:
# Determine settings config values
case_sensitive = _case_sensitive if _case_sensitive is not None else self.model_config.get('case_sensitive')
env_prefix = _env_prefix if _env_prefix is not None else self.model_config.get('env_prefix')
env_file = _env_file if _env_file != ENV_FILE_SENTINEL else self.model_config.get('env_file')
env_file_encoding = (
_env_file_encoding if _env_file_encoding is not None else self.model_config.get('env_file_encoding')
)
env_nested_delimiter = (
_env_nested_delimiter
if _env_nested_delimiter is not None
else self.model_config.get('env_nested_delimiter')
)
secrets_dir = _secrets_dir if _secrets_dir is not None else self.model_config.get('secrets_dir')
# Configure built-in sources
init_settings = InitSettingsSource(self.__class__, init_kwargs=init_kwargs)
env_settings = EnvSettingsSource(
self.__class__,
case_sensitive=case_sensitive,
env_prefix=env_prefix,
env_nested_delimiter=env_nested_delimiter,
)
dotenv_settings = DotEnvSettingsSource(
self.__class__,
env_file=env_file,
env_file_encoding=env_file_encoding,
case_sensitive=case_sensitive,
env_prefix=env_prefix,
env_nested_delimiter=env_nested_delimiter,
)
file_secret_settings = SecretsSettingsSource(
self.__class__, secrets_dir=secrets_dir, case_sensitive=case_sensitive, env_prefix=env_prefix
)
# Provide a hook to set built-in sources priority and add / remove sources
sources = self.settings_customise_sources(
self.__class__,
init_settings=init_settings,
env_settings=env_settings,
dotenv_settings=dotenv_settings,
file_secret_settings=file_secret_settings,
)
if sources:
return deep_update(*reversed([source() for source in sources]))
else:
# no one should mean to do this, but I think returning an empty dict is marginally preferable
# to an informative error and much better than a confusing error
return {}
model_config: ClassVar[SettingsConfigDict] = SettingsConfigDict(
extra='forbid',
arbitrary_types_allowed=True,
validate_default=True,
case_sensitive=False,
env_prefix='',
env_file=None,
env_file_encoding=None,
env_nested_delimiter=None,
secrets_dir=None,
protected_namespaces=('model_', 'settings_'),
)
from __future__ import annotations as _annotations
import json
import os
import warnings
from abc import ABC, abstractmethod
from collections import deque
from dataclasses import is_dataclass
from pathlib import Path
from typing import TYPE_CHECKING, Any, List, Mapping, Sequence, Tuple, Union, cast
from pydantic import AliasChoices, AliasPath, BaseModel, Json
from pydantic._internal._typing_extra import origin_is_union
from pydantic._internal._utils import deep_update, lenient_issubclass
from pydantic.fields import FieldInfo
from typing_extensions import get_args, get_origin
from pydantic_settings.utils import path_type_label
if TYPE_CHECKING:
from pydantic_settings.main import BaseSettings
DotenvType = Union[Path, str, List[Union[Path, str]], Tuple[Union[Path, str], ...]]
# This is used as default value for `_env_file` in the `BaseSettings` class and
# `env_file` in `DotEnvSettingsSource` so the default can be distinguished from `None`.
# See the docstring of `BaseSettings` for more details.
ENV_FILE_SENTINEL: DotenvType = Path('')
class SettingsError(ValueError):
pass
class PydanticBaseSettingsSource(ABC):
"""
Abstract base class for settings sources, every settings source classes should inherit from it.
"""
def __init__(self, settings_cls: type[BaseSettings]):
self.settings_cls = settings_cls
self.config = settings_cls.model_config
@abstractmethod
def get_field_value(self, field: FieldInfo, field_name: str) -> tuple[Any, str, bool]:
"""
Gets the value, the key for model creation, and a flag to determine whether value is complex.
This is an abstract method that should be overridden in every settings source classes.
Args:
field: The field.
field_name: The field name.
Returns:
A tuple contains the key, value and a flag to determine whether value is complex.
"""
pass
def field_is_complex(self, field: FieldInfo) -> bool:
"""
Checks whether a field is complex, in which case it will attempt to be parsed as JSON.
Args:
field: The field.
Returns:
Whether the field is complex.
"""
return _annotation_is_complex(field.annotation, field.metadata)
def prepare_field_value(self, field_name: str, field: FieldInfo, value: Any, value_is_complex: bool) -> Any:
"""
Prepares the value of a field.
Args:
field_name: The field name.
field: The field.
value: The value of the field that has to be prepared.
value_is_complex: A flag to determine whether value is complex.
Returns:
The prepared value.
"""
if value is not None and (self.field_is_complex(field) or value_is_complex):
return self.decode_complex_value(field_name, field, value)
return value
def decode_complex_value(self, field_name: str, field: FieldInfo, value: Any) -> Any:
"""
Decode the value for a complex field
Args:
field_name: The field name.
field: The field.
value: The value of the field that has to be prepared.
Returns:
The decoded value for further preparation
"""
return json.loads(value)
@abstractmethod
def __call__(self) -> dict[str, Any]:
pass
class InitSettingsSource(PydanticBaseSettingsSource):
"""
Source class for loading values provided during settings class initialization.
"""
def __init__(self, settings_cls: type[BaseSettings], init_kwargs: dict[str, Any]):
self.init_kwargs = init_kwargs
super().__init__(settings_cls)
def get_field_value(self, field: FieldInfo, field_name: str) -> tuple[Any, str, bool]:
# Nothing to do here. Only implement the return statement to make mypy happy
return None, '', False
def __call__(self) -> dict[str, Any]:
return self.init_kwargs
def __repr__(self) -> str:
return f'InitSettingsSource(init_kwargs={self.init_kwargs!r})'
class PydanticBaseEnvSettingsSource(PydanticBaseSettingsSource):
def __init__(
self, settings_cls: type[BaseSettings], case_sensitive: bool | None = None, env_prefix: str | None = None
) -> None:
super().__init__(settings_cls)
self.case_sensitive = case_sensitive if case_sensitive is not None else self.config.get('case_sensitive', False)
self.env_prefix = env_prefix if env_prefix is not None else self.config.get('env_prefix', '')
def _apply_case_sensitive(self, value: str) -> str:
return value.lower() if not self.case_sensitive else value
def _extract_field_info(self, field: FieldInfo, field_name: str) -> list[tuple[str, str, bool]]:
"""
Extracts field info. This info is used to get the value of field from environment variables.
It returns a list of tuples, each tuple contains:
* field_key: The key of field that has to be used in model creation.
* env_name: The environment variable name of the field.
* value_is_complex: A flag to determine whether the value from environment variable
is complex and has to be parsed.
Args:
field (FieldInfo): The field.
field_name (str): The field name.
Returns:
list[tuple[str, str, bool]]: List of tuples, each tuple contains field_key, env_name, and value_is_complex.
"""
field_info: list[tuple[str, str, bool]] = []
if isinstance(field.validation_alias, (AliasChoices, AliasPath)):
v_alias: str | list[str | int] | list[list[str | int]] | None = field.validation_alias.convert_to_aliases()
else:
v_alias = field.validation_alias
if v_alias:
if isinstance(v_alias, list): # AliasChoices, AliasPath
for alias in v_alias:
if isinstance(alias, str): # AliasPath
field_info.append((alias, self._apply_case_sensitive(alias), True if len(alias) > 1 else False))
elif isinstance(alias, list): # AliasChoices
first_arg = cast(str, alias[0]) # first item of an AliasChoices must be a str
field_info.append(
(first_arg, self._apply_case_sensitive(first_arg), True if len(alias) > 1 else False)
)
else: # string validation alias
field_info.append((v_alias, self._apply_case_sensitive(v_alias), False))
else:
field_info.append((field_name, self._apply_case_sensitive(self.env_prefix + field_name), False))
return field_info
def _replace_field_names_case_insensitively(self, field: FieldInfo, field_values: dict[str, Any]) -> dict[str, Any]:
"""
Replace field names in values dict by looking in models fields insensitively.
By having the following models:
```py
class SubSubSub(BaseModel):
VaL3: str
class SubSub(BaseModel):
Val2: str
SUB_sub_SuB: SubSubSub
class Sub(BaseModel):
VAL1: str
SUB_sub: SubSub
class Settings(BaseSettings):
nested: Sub
model_config = SettingsConfigDict(env_nested_delimiter='__')
```
Then:
_replace_field_names_case_insensitively(
field,
{"val1": "v1", "sub_SUB": {"VAL2": "v2", "sub_SUB_sUb": {"vAl3": "v3"}}}
)
Returns {'VAL1': 'v1', 'SUB_sub': {'Val2': 'v2', 'SUB_sub_SuB': {'VaL3': 'v3'}}}
"""
values: dict[str, Any] = {}
for name, value in field_values.items():
sub_model_field: FieldInfo | None = None
# This is here to make mypy happy
# Item "None" of "Optional[Type[Any]]" has no attribute "model_fields"
if not field.annotation or not hasattr(field.annotation, 'model_fields'):
values[name] = value
continue
# Find field in sub model by looking in fields case insensitively
for sub_model_field_name, f in field.annotation.model_fields.items():
if not f.validation_alias and sub_model_field_name.lower() == name.lower():
sub_model_field = f
break
if not sub_model_field:
values[name] = value
continue
if lenient_issubclass(sub_model_field.annotation, BaseModel) and isinstance(value, dict):
values[sub_model_field_name] = self._replace_field_names_case_insensitively(sub_model_field, value)
else:
values[sub_model_field_name] = value
return values
def __call__(self) -> dict[str, Any]:
data: dict[str, Any] = {}
for field_name, field in self.settings_cls.model_fields.items():
try:
field_value, field_key, value_is_complex = self.get_field_value(field, field_name)
except Exception as e:
raise SettingsError(
f'error getting value for field "{field_name}" from source "{self.__class__.__name__}"'
) from e
try:
field_value = self.prepare_field_value(field_name, field, field_value, value_is_complex)
except ValueError as e:
raise SettingsError(
f'error parsing value for field "{field_name}" from source "{self.__class__.__name__}"'
) from e
if field_value is not None:
if (
not self.case_sensitive
and lenient_issubclass(field.annotation, BaseModel)
and isinstance(field_value, dict)
):
data[field_key] = self._replace_field_names_case_insensitively(field, field_value)
else:
data[field_key] = field_value
return data
class SecretsSettingsSource(PydanticBaseEnvSettingsSource):
"""
Source class for loading settings values from secret files.
"""
def __init__(
self,
settings_cls: type[BaseSettings],
secrets_dir: str | Path | None = None,
case_sensitive: bool | None = None,
env_prefix: str | None = None,
) -> None:
super().__init__(settings_cls, case_sensitive, env_prefix)
self.secrets_dir = secrets_dir if secrets_dir is not None else self.config.get('secrets_dir')
def __call__(self) -> dict[str, Any]:
"""
Build fields from "secrets" files.
"""
secrets: dict[str, str | None] = {}
if self.secrets_dir is None:
return secrets
self.secrets_path = Path(self.secrets_dir).expanduser()
if not self.secrets_path.exists():
warnings.warn(f'directory "{self.secrets_path}" does not exist')
return secrets
if not self.secrets_path.is_dir():
raise SettingsError(f'secrets_dir must reference a directory, not a {path_type_label(self.secrets_path)}')
return super().__call__()
@classmethod
def find_case_path(cls, dir_path: Path, file_name: str, case_sensitive: bool) -> Path | None:
"""
Find a file within path's directory matching filename, optionally ignoring case.
Args:
dir_path: Directory path.
file_name: File name.
case_sensitive: Whether to search for file name case sensitively.
Returns:
Whether file path or `None` if file does not exist in directory.
"""
for f in dir_path.iterdir():
if f.name == file_name:
return f
elif not case_sensitive and f.name.lower() == file_name.lower():
return f
return None
def get_field_value(self, field: FieldInfo, field_name: str) -> tuple[Any, str, bool]:
"""
Gets the value for field from secret file and a flag to determine whether value is complex.
Args:
field: The field.
field_name: The field name.
Returns:
A tuple contains the key, value if the file exists otherwise `None`, and
a flag to determine whether value is complex.
"""
for field_key, env_name, value_is_complex in self._extract_field_info(field, field_name):
path = self.find_case_path(self.secrets_path, env_name, self.case_sensitive)
if not path:
# path does not exist, we currently don't return a warning for this
continue
if path.is_file():
return path.read_text().strip(), field_key, value_is_complex
else:
warnings.warn(
f'attempted to load secret file "{path}" but found a {path_type_label(path)} instead.',
stacklevel=4,
)
return None, field_key, value_is_complex
def __repr__(self) -> str:
return f'SecretsSettingsSource(secrets_dir={self.secrets_dir!r})'
class EnvSettingsSource(PydanticBaseEnvSettingsSource):
"""
Source class for loading settings values from environment variables.
"""
def __init__(
self,
settings_cls: type[BaseSettings],
case_sensitive: bool | None = None,
env_prefix: str | None = None,
env_nested_delimiter: str | None = None,
) -> None:
super().__init__(settings_cls, case_sensitive, env_prefix)
self.env_nested_delimiter = (
env_nested_delimiter if env_nested_delimiter is not None else self.config.get('env_nested_delimiter')
)
self.env_prefix_len = len(self.env_prefix)
self.env_vars = self._load_env_vars()
def _load_env_vars(self) -> Mapping[str, str | None]:
if self.case_sensitive:
return os.environ
return {k.lower(): v for k, v in os.environ.items()}
def get_field_value(self, field: FieldInfo, field_name: str) -> tuple[Any, str, bool]:
"""
Gets the value for field from environment variables and a flag to determine whether value is complex.
Args:
field: The field.
field_name: The field name.
Returns:
A tuple contains the key, value if the file exists otherwise `None`, and
a flag to determine whether value is complex.
"""
env_val: str | None = None
for field_key, env_name, value_is_complex in self._extract_field_info(field, field_name):
env_val = self.env_vars.get(env_name)
if env_val is not None:
break
return env_val, field_key, value_is_complex
def prepare_field_value(self, field_name: str, field: FieldInfo, value: Any, value_is_complex: bool) -> Any:
"""
Prepare value for the field.
* Extract value for nested field.
* Deserialize value to python object for complex field.
Args:
field: The field.
field_name: The field name.
Returns:
A tuple contains prepared value for the field.
Raises:
ValuesError: When There is an error in deserializing value for complex field.
"""
is_complex, allow_parse_failure = self._field_is_complex(field)
if is_complex or value_is_complex:
if value is None:
# field is complex but no value found so far, try explode_env_vars
env_val_built = self.explode_env_vars(field_name, field, self.env_vars)
if env_val_built:
return env_val_built
else:
# field is complex and there's a value, decode that as JSON, then add explode_env_vars
try:
value = self.decode_complex_value(field_name, field, value)
except ValueError as e:
if not allow_parse_failure:
raise e
if isinstance(value, dict):
return deep_update(value, self.explode_env_vars(field_name, field, self.env_vars))
else:
return value
elif value is not None:
# simplest case, field is not complex, we only need to add the value if it was found
return value
def _union_is_complex(self, annotation: type[Any] | None, metadata: list[Any]) -> bool:
return any(_annotation_is_complex(arg, metadata) for arg in get_args(annotation))
def _field_is_complex(self, field: FieldInfo) -> tuple[bool, bool]:
"""
Find out if a field is complex, and if so whether JSON errors should be ignored
"""
if self.field_is_complex(field):
allow_parse_failure = False
elif origin_is_union(get_origin(field.annotation)) and self._union_is_complex(field.annotation, field.metadata):
allow_parse_failure = True
else:
return False, False
return True, allow_parse_failure
@staticmethod
def next_field(field: FieldInfo | None, key: str) -> FieldInfo | None:
"""
Find the field in a sub model by key(env name)
By having the following models:
```py
class SubSubModel(BaseSettings):
dvals: Dict
class SubModel(BaseSettings):
vals: list[str]
sub_sub_model: SubSubModel
class Cfg(BaseSettings):
sub_model: SubModel
```
Then:
next_field(sub_model, 'vals') Returns the `vals` field of `SubModel` class
next_field(sub_model, 'sub_sub_model') Returns `sub_sub_model` field of `SubModel` class
Args:
field: The field.
key: The key (env name).
Returns:
Field if it finds the next field otherwise `None`.
"""
if not field or origin_is_union(get_origin(field.annotation)):
# no support for Unions of complex BaseSettings fields
return None
elif field.annotation and hasattr(field.annotation, 'model_fields') and field.annotation.model_fields.get(key):
return field.annotation.model_fields[key]
return None
def explode_env_vars(self, field_name: str, field: FieldInfo, env_vars: Mapping[str, str | None]) -> dict[str, Any]:
"""
Process env_vars and extract the values of keys containing env_nested_delimiter into nested dictionaries.
This is applied to a single field, hence filtering by env_var prefix.
Args:
field_name: The field name.
field: The field.
env_vars: Environment variables.
Returns:
A dictionaty contains extracted values from nested env values.
"""
prefixes = [
f'{env_name}{self.env_nested_delimiter}' for _, env_name, _ in self._extract_field_info(field, field_name)
]
result: dict[str, Any] = {}
for env_name, env_val in env_vars.items():
if not any(env_name.startswith(prefix) for prefix in prefixes):
continue
# we remove the prefix before splitting in case the prefix has characters in common with the delimiter
env_name_without_prefix = env_name[self.env_prefix_len :]
_, *keys, last_key = env_name_without_prefix.split(self.env_nested_delimiter)
env_var = result
target_field: FieldInfo | None = field
for key in keys:
target_field = self.next_field(target_field, key)
env_var = env_var.setdefault(key, {})
# get proper field with last_key
target_field = self.next_field(target_field, last_key)
# check if env_val maps to a complex field and if so, parse the env_val
if target_field and env_val:
is_complex, allow_json_failure = self._field_is_complex(target_field)
if is_complex:
try:
env_val = self.decode_complex_value(last_key, target_field, env_val)
except ValueError as e:
if not allow_json_failure:
raise e
env_var[last_key] = env_val
return result
def __repr__(self) -> str:
return (
f'EnvSettingsSource(env_nested_delimiter={self.env_nested_delimiter!r}, '
f'env_prefix_len={self.env_prefix_len!r})'
)
class DotEnvSettingsSource(EnvSettingsSource):
"""
Source class for loading settings values from env files.
"""
def __init__(
self,
settings_cls: type[BaseSettings],
env_file: DotenvType | None = ENV_FILE_SENTINEL,
env_file_encoding: str | None = None,
case_sensitive: bool | None = None,
env_prefix: str | None = None,
env_nested_delimiter: str | None = None,
) -> None:
self.env_file = env_file if env_file != ENV_FILE_SENTINEL else settings_cls.model_config.get('env_file')
self.env_file_encoding = (
env_file_encoding if env_file_encoding is not None else settings_cls.model_config.get('env_file_encoding')
)
super().__init__(settings_cls, case_sensitive, env_prefix, env_nested_delimiter)
def _load_env_vars(self) -> Mapping[str, str | None]:
return self._read_env_files(self.case_sensitive)
def _read_env_files(self, case_sensitive: bool) -> Mapping[str, str | None]:
env_files = self.env_file
if env_files is None:
return {}
if isinstance(env_files, (str, os.PathLike)):
env_files = [env_files]
dotenv_vars: dict[str, str | None] = {}
for env_file in env_files:
env_path = Path(env_file).expanduser()
if env_path.is_file():
dotenv_vars.update(
read_env_file(env_path, encoding=self.env_file_encoding, case_sensitive=case_sensitive)
)
return dotenv_vars
def __call__(self) -> dict[str, Any]:
data: dict[str, Any] = super().__call__()
data_lower_keys: list[str] = []
if not self.case_sensitive:
data_lower_keys = [x.lower() for x in data.keys()]
# As `extra` config is allowed in dotenv settings source, We have to
# update data with extra env variabels from dotenv file.
for env_name, env_value in self.env_vars.items():
if env_name.startswith(self.env_prefix) and env_value is not None:
env_name_without_prefix = env_name[self.env_prefix_len :]
first_key, *_ = env_name_without_prefix.split(self.env_nested_delimiter)
if (data_lower_keys and first_key not in data_lower_keys) or (
not data_lower_keys and first_key not in data
):
data[first_key] = env_value
return data
def __repr__(self) -> str:
return (
f'DotEnvSettingsSource(env_file={self.env_file!r}, env_file_encoding={self.env_file_encoding!r}, '
f'env_nested_delimiter={self.env_nested_delimiter!r}, env_prefix_len={self.env_prefix_len!r})'
)
def read_env_file(
file_path: Path, *, encoding: str | None = None, case_sensitive: bool = False
) -> Mapping[str, str | None]:
try:
from dotenv import dotenv_values
except ImportError as e:
raise ImportError('python-dotenv is not installed, run `pip install pydantic[dotenv]`') from e
file_vars: dict[str, str | None] = dotenv_values(file_path, encoding=encoding or 'utf8')
if not case_sensitive:
return {k.lower(): v for k, v in file_vars.items()}
else:
return file_vars
def _annotation_is_complex(annotation: type[Any] | None, metadata: list[Any]) -> bool:
if any(isinstance(md, Json) for md in metadata): # type: ignore[misc]
return False
origin = get_origin(annotation)
return (
_annotation_is_complex_inner(annotation)
or _annotation_is_complex_inner(origin)
or hasattr(origin, '__pydantic_core_schema__')
or hasattr(origin, '__get_pydantic_core_schema__')
)
def _annotation_is_complex_inner(annotation: type[Any] | None) -> bool:
if lenient_issubclass(annotation, (str, bytes)):
return False
return lenient_issubclass(annotation, (BaseModel, Mapping, Sequence, tuple, set, frozenset, deque)) or is_dataclass(
annotation
)
from pathlib import Path
path_type_labels = {
'is_dir': 'directory',
'is_file': 'file',
'is_mount': 'mount point',
'is_symlink': 'symlink',
'is_block_device': 'block device',
'is_char_device': 'char device',
'is_fifo': 'FIFO',
'is_socket': 'socket',
}
def path_type_label(p: Path) -> str:
"""
Find out what sort of thing a path is.
"""
assert p.exists(), 'path does not exist'
for method, name in path_type_labels.items():
if getattr(p, method)():
return name
return 'unknown'
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
from app.services.auth import get_current_user
from app.models.role import get_role_level
from app.core.config import settings
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/api/v1/auth/login")
......@@ -16,10 +18,57 @@ async def get_current_active_user(token: str = Depends(oauth2_scheme)):
return user
async def get_current_admin_user(current_user: dict = Depends(get_current_active_user)):
"""获取当前管理员用户"""
if current_user.get("role") != "admin":
"""获取当前管理员用户(兼容 admin/administrator 两种命名)"""
if current_user.get("role") not in {"admin", "administrator"}:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="权限不足,需要管理员权限"
)
return current_user
\ No newline at end of file
return current_user
# 通用的按角色等级校验的依赖
def require_role(min_level: int):
"""
按最小等级校验权限的依赖工厂。
使用方法:Depends(require_role(3)) 表示仅 3级及以上可访问。
关键点:
- 提前返回,避免多层嵌套;
- 容错处理:当用户缺少 role_level 字段时,根据 role 计算等级。
"""
async def _checker(current_user: dict = Depends(get_current_active_user)):
# 兼容:优先取 role_level,没有则通过 role 计算
level = current_user.get("role_level")
if level is None:
level = get_role_level(current_user.get("role"))
if level < min_level:
raise HTTPException(status_code=403, detail="权限不足")
return current_user
return _checker
def require_edition_for_mode():
"""
版别运行模式依赖:当后端设置为 APP_MODE=edu 或 APP_MODE=biz 时,限制仅允许对应版别的用户访问。
使用方式:在路由层统一挂载,例如:
router = APIRouter(dependencies=[Depends(require_edition_for_mode())])
设计要点:
- 提前返回,避免多层嵌套;
- 与现有认证依赖复用:接收 current_user,避免重复解析 token;
- 容错与默认值:当 APP_MODE 为未知值时默认放行,但建议仅使用 "edu" 或 "biz"。
"""
async def _edition_checker(current_user: dict = Depends(get_current_active_user)):
mode = (settings.APP_MODE or "edu").lower() # 默认 edu
# 仅当模式为 edu 或 biz 时进行限制;其它值(如意外)默认放行
if mode in {"edu", "biz"}:
user_edition = (current_user.get("edition") or "").lower()
if user_edition != mode:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail=f"当前后端运行模式为 '{mode}',用户版别 '{user_edition}' 无权访问"
)
return current_user
return _edition_checker
\ No newline at end of file
......@@ -4,7 +4,7 @@ from datetime import datetime
import json
import csv
import io
from app.api.deps import get_current_admin_user
from app.api.deps import get_current_admin_user, require_edition_for_mode
from app.schemas.sensitive_word import (
SensitiveWordCreate, SensitiveWordResponse, SensitiveRecordResponse,
SensitiveWordBulkImport, CategoryCreate, CategoryResponse, CategoriesResponse
......@@ -16,7 +16,8 @@ from app.services.sensitive_word import (
)
from app.models.sensitive_word import SENSITIVE_WORD_CATEGORIES, SENSITIVE_WORD_SUBCATEGORIES
router = APIRouter()
# 在路由层挂载版别运行模式依赖,确保仅在当前模式的版别下进行管理操作
router = APIRouter(dependencies=[Depends(require_edition_for_mode())])
@router.post("/sensitive-words", response_model=dict, status_code=status.HTTP_201_CREATED)
async def create_sensitive_word(
......
......@@ -5,6 +5,7 @@ from app.core.config import settings
from app.db.mongodb import db
from app.schemas.user import UserCreate, UserResponse, Token
from app.services.auth import authenticate_user, create_access_token, get_password_hash
from app.models.role import get_role_level
router = APIRouter()
......@@ -33,7 +34,9 @@ async def register(user_data: UserCreate):
"username": user_data.username,
"email": user_data.email,
"hashed_password": hashed_password,
"role": "user" # 默认为普通用户
"role": "user", # 默认为普通用户
"role_level": 1, # 默认为 1 级
"edition": "edu", # 默认为教育版,可在后续管理员界面修改
}
result = await db.db.users.insert_one(user)
......@@ -45,7 +48,9 @@ async def register(user_data: UserCreate):
"id": str(created_user["_id"]),
"username": created_user["username"],
"email": created_user["email"],
"role": created_user["role"]
"role": created_user.get("role", "user"),
"role_level": created_user.get("role_level", 1),
"edition": created_user.get("edition", "edu"),
}
@router.post("/login", response_model=Token)
......@@ -59,11 +64,22 @@ async def login(form_data: OAuth2PasswordRequestForm = Depends()):
headers={"WWW-Authenticate": "Bearer"},
)
# 创建访问令牌
# 创建访问令牌(在载荷中加入用户的角色与版别信息)
access_token_expires = timedelta(minutes=settings.ACCESS_TOKEN_EXPIRE_MINUTES)
access_token = create_access_token(
data={"sub": str(user["_id"])},
data={
"sub": str(user["_id"]),
"role": user.get("role", "user"),
"role_level": user.get("role_level", get_role_level(user.get("role"))),
"edition": user.get("edition", "edu"),
},
expires_delta=access_token_expires
)
return {"access_token": access_token, "token_type": "bearer"}
\ No newline at end of file
# 登录响应中回传基础用户信息,前端免一次 /me 调用;后端仍应以服务端鉴权为准
return {
"access_token": access_token,
"token_type": "bearer",
"role": user.get("role", "user"),
"role_level": user.get("role_level", get_role_level(user.get("role"))),
"edition": user.get("edition", "edu"),
}
\ No newline at end of file
from fastapi import APIRouter, Depends, HTTPException, status
from app.api.deps import get_current_active_user
from app.api.deps import get_current_active_user, require_edition_for_mode
from app.schemas.conversation import MessageCreate, ConversationResponse
from app.services.conversation import create_conversation, get_conversation, add_message, get_user_conversations
router = APIRouter()
# 在路由层挂载版别运行模式依赖,限制仅允许当前模式的用户访问
router = APIRouter(dependencies=[Depends(require_edition_for_mode())])
@router.post("/", status_code=status.HTTP_201_CREATED)
async def create_new_conversation(current_user: dict = Depends(get_current_active_user)):
......
from fastapi import APIRouter, Depends
from app.api.deps import require_role, require_edition_for_mode
# 在路由层挂载版别运行模式依赖,保证仅允许后端设置的版别访问
router = APIRouter(dependencies=[Depends(require_edition_for_mode())])
@router.get("/summary")
async def dashboard_summary(current_user: dict = Depends(require_role(1))):
"""
仪表盘概要接口:根据角色等级与版别返回不同的首页信息
- 最低 1 级即可访问,但返回内容随等级与版别递增
- 关键节点:避免多层嵌套,先取必要信息后按条件构造视图
"""
role = current_user.get("role", "user")
level = current_user.get("role_level", 1)
edition = current_user.get("edition", "edu")
# 基础公共信息(所有角色可见)
base = {
"welcome": f"欢迎 {current_user.get('username','')} 登录",
"edition": edition,
"role": role,
"role_level": level,
}
# 教育版视图
if edition == "edu":
if level == 1:
base.update({
"modules": ["今日出勤", "操行与教师意见"],
})
return base
if level == 2:
base.update({
"modules": ["班级出勤率", "课程/地点", "学生请假", "上级指示"],
})
return base
if level == 3:
base.update({
"modules": ["系部教师出勤", "学生考勤", "课堂异常指标", "上级指示"],
})
return base
if level == 4:
base.update({
"modules": ["校园整体安全", "教师出勤率", "资金预算", "部门进度", "本期目标"],
})
return base
# 系统管理员:与业务最高权限分离
base.update({
"modules": ["系统运行", "告警", "运维工具"],
})
return base
# 企业版视图(biz)
if level == 1:
base.update({
"modules": ["今日出勤", "绩效", "上级意见"],
})
return base
if level == 2:
base.update({
"modules": ["小组出勤率", "工单/排班", "请假审批", "负责人指示"],
})
return base
if level == 3:
base.update({
"modules": ["部门出勤", "任务进度", "异常指标", "负责人指示"],
})
return base
if level == 4:
base.update({
"modules": ["企业整体安全", "员工出勤率", "资金预算", "部门进度", "战略目标"],
})
return base
base.update({
"modules": ["系统运行", "告警", "运维工具"],
})
return base
\ No newline at end of file
from fastapi import APIRouter
from app.api.v1 import auth, conversation, admin
from app.api.v1 import auth, conversation, admin, dashboard
api_router = APIRouter()
# 注册各模块路由
api_router.include_router(auth.router, prefix="/auth", tags=["认证"])
api_router.include_router(conversation.router, prefix="/conversations", tags=["对话"])
api_router.include_router(admin.router, prefix="/admin", tags=["管理员"])
\ No newline at end of file
api_router.include_router(admin.router, prefix="/admin", tags=["管理员"])
api_router.include_router(dashboard.router, prefix="/dashboard", tags=["仪表盘"])
\ No newline at end of file
import os
from pydantic import BaseSettings
# Pydantic v2 中 BaseSettings 已迁移到 pydantic-settings
from pydantic_settings import BaseSettings
from dotenv import load_dotenv
# 加载环境变量
......@@ -23,4 +24,9 @@ class Settings(BaseSettings):
OLLAMA_BASE_URL: str = os.getenv("OLLAMA_BASE_URL", "http://localhost:11434")
OLLAMA_MODEL: str = os.getenv("OLLAMA_MODEL", "llama2")
# 应用运行模式开关:仅运行教育版或企业版之一
# 允许的值:"edu" / "biz";若未设置则默认使用 "edu"
# 注意:不再提供混合模式(mixed),如需混合请显式设置并在依赖中放行
APP_MODE: str = os.getenv("APP_MODE", "edu")
settings = Settings()
\ No newline at end of file
"""
角色与权限等级的统一定义
设计说明:
- 为避免“神秘命名”和“重复代码”,将角色等级映射集中在一个模块维护。
- 同时兼容历史中的 "admin" 命名,映射为最高等级(5)。
"""
# 角色等级映射,数字越大权限越高
ROLE_ORDER = {
"user": 1, # 学生/员工
"manager": 2, # 班主任/组长/二级部门管理员
"leader": 3, # 中层干部/部门负责人/一级部门管理员
"master": 4, # 校长/集团高管/总负责人(业务最高)
"administrator": 5, # 系统管理员/运维超管(系统最高)
"admin": 5, # 历史兼容:旧代码中的 admin
}
# 合法版别(教育/企业)
VALID_EDITIONS = {"edu", "biz"}
def get_role_level(role: str) -> int:
"""根据角色字符串返回对应的权限等级,默认返回 1 级。
关键点:提前返回,避免多层嵌套。
"""
return ROLE_ORDER.get((role or "user").lower(), 1)
\ No newline at end of file
......@@ -7,37 +7,56 @@ from bson import ObjectId
class PyObjectId(ObjectId):
@classmethod
def __get_validators__(cls):
# Pydantic v2 仍支持生成器形式的验证器
yield cls.validate
@classmethod
def validate(cls, v):
# 校验传入的值是否是合法的 ObjectId 字符串
if not ObjectId.is_valid(v):
raise ValueError("无效的ObjectId")
return ObjectId(v)
@classmethod
def __modify_schema__(cls, field_schema):
field_schema.update(type="string")
# Pydantic v2 中不再支持 __modify_schema__;如需自定义
# JSON Schema,可实现 __get_pydantic_json_schema__。当前
# 版本先保持默认 Schema,以确保运行稳定。
# 用户模型
# 用户模型(修复缩进错误:确保为顶层类定义)
class UserModel(BaseModel):
# MongoDB 主键,使用别名 _id,序列化时转换为字符串
id: PyObjectId = Field(default_factory=PyObjectId, alias="_id")
# 用户名
username: str
# 邮箱(此处仅为字符串,外层使用 EmailStr 的 Schema 做校验)
email: str
# 哈希后的密码
hashed_password: str
role: str = "user" # "user" 或 "admin"
# 角色字符串(兼容历史中的 "admin")。建议与 app/models/role.py 中的枚举保持一致
role: str = "user"
# 角色等级(1~5),用于统一的权限判断
role_level: int = 1
# 版别:"edu"(教育版)或 "biz"(企业版)
edition: str = "edu"
# 创建与更新时间
created_at: datetime = Field(default_factory=datetime.now)
updated_at: datetime = Field(default_factory=datetime.now)
class Config:
# 允许使用字段名进行赋值(即使定义了 alias)
allow_population_by_field_name = True
# 允许使用自定义类型(如 ObjectId)
arbitrary_types_allowed = True
# 将 ObjectId 序列化为字符串,便于前端展示
json_encoders = {ObjectId: str}
# 示例数据,便于接口文档和调试
schema_extra = {
"example": {
"username": "user1",
"email": "user1@example.com",
"hashed_password": "hashed_password_here",
"role": "user",
"role_level": 1,
"edition": "edu",
}
}
\ No newline at end of file
from typing import Optional
from typing import Optional, Literal
from pydantic import BaseModel, EmailStr
class UserCreate(BaseModel):
......@@ -14,11 +14,26 @@ class UserResponse(BaseModel):
id: str
username: str
email: str
role: str
# 角色采用 Literal 强校验,同时兼容历史中的 "admin"
role: Literal["user", "manager", "leader", "master", "administrator", "admin"]
# 新增:角色等级,便于前端快速展示权限范围
role_level: int
# 版别采用 Literal 强校验:教育版/企业版
edition: Literal["edu", "biz"]
class Token(BaseModel):
access_token: str
token_type: str
# 扩展:在登录响应中返回关键用户属性,减少额外查询(也可单独提供 /me 接口)
# 使用 Literal 限定角色取值范围,保持与 UserResponse 一致
role: Optional[Literal["user", "manager", "leader", "master", "administrator", "admin"]] = None
role_level: Optional[int] = None
# 使用 Literal 限定版别取值范围
edition: Optional[Literal["edu", "biz"]] = None
class TokenData(BaseModel):
user_id: Optional[str] = None
\ No newline at end of file
user_id: Optional[str] = None
# TokenData 也保持与 Token 一致的角色限定,便于类型安全
role: Optional[Literal["user", "manager", "leader", "master", "administrator", "admin"]] = None
role_level: Optional[int] = None
edition: Optional[Literal["edu", "biz"]] = None
\ No newline at end of file
......@@ -5,6 +5,7 @@ from passlib.context import CryptContext
from app.core.config import settings
from app.db.mongodb import db
from bson import ObjectId
from app.models.role import get_role_level
# 密码加密上下文
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
......@@ -27,7 +28,11 @@ async def authenticate_user(username: str, password: str):
return user
def create_access_token(data: dict, expires_delta: Optional[timedelta] = None):
"""创建访问令牌"""
"""创建访问令牌
关键点:
- 在 JWT 载荷中加入角色与版别信息,便于前端解码后快速展示;
- 仍以服务端数据库查询为准,避免客户端伪造带来的安全问题。
"""
to_encode = data.copy()
if expires_delta:
expire = datetime.utcnow() + expires_delta
......@@ -49,6 +54,11 @@ async def get_current_user(token: str):
if user is None:
return None
# 兼容兜底:若用户数据缺少角色等级或版别,则进行补充
if user.get("role_level") is None:
user["role_level"] = get_role_level(user.get("role"))
if user.get("edition") is None:
user["edition"] = "edu" # 默认版别为教育版
return user
except JWTError:
return None
\ No newline at end of file
import asyncio
import motor.motor_asyncio
import os
from dotenv import load_dotenv
from datetime import datetime
from bson import ObjectId
from passlib.context import CryptContext
......@@ -7,9 +9,14 @@ from passlib.context import CryptContext
# 密码加密工具
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
# 加载 .env 环境变量,保持与后端一致的配置来源
load_dotenv()
# MongoDB连接配置
MONGODB_URL = "mongodb://localhost:27017"
DB_NAME = "llm_filter_db"
MONGODB_URL = os.getenv("MONGODB_URL", "mongodb://localhost:27017")
DB_NAME = os.getenv("DB_NAME", "llm_filter_db")
# 运行模式:仅运行教育版或企业版之一(不混合)
APP_MODE = (os.getenv("APP_MODE", "edu") or "edu").lower()
async def init_db():
# 连接到MongoDB
......@@ -24,32 +31,135 @@ async def init_db():
print("已清空现有集合")
# 创建用户集合并添加假数据
admin_id = ObjectId()
user_id = ObjectId()
admin_id = ObjectId() # 教育版管理员(用户名 admin)
user_id = ObjectId() # 教育版普通用户(用户名 user)
user_biz_id = ObjectId() # 企业版普通用户(用户名 user_biz)
users = [
# 系统管理员(标准:administrator,兼容:admin 用户名)
{
"_id": admin_id,
"username": "admin",
"email": "admin@example.com",
"hashed_password": pwd_context.hash("admin123"),
"role": "admin",
"role": "administrator", # 统一使用标准角色名,兼容旧数据中的 "admin"
"role_level": 5, # 映射到最高等级
"edition": "edu", # 默认教育版
"created_at": datetime.now(),
"updated_at": datetime.now()
},
# 普通用户(教育版)
{
"_id": user_id,
"username": "user",
"email": "user@example.com",
"hashed_password": pwd_context.hash("user123"),
"role": "user",
"role_level": 1,
"edition": "edu",
"created_at": datetime.now(),
"updated_at": datetime.now()
}
},
# 教育版:班主任、部门负责人、中层与校长
{
"_id": ObjectId(),
"username": "manager_edu",
"email": "manager_edu@example.com",
"hashed_password": pwd_context.hash("manager123"),
"role": "manager",
"role_level": 2,
"edition": "edu",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
{
"_id": ObjectId(),
"username": "leader_edu",
"email": "leader_edu@example.com",
"hashed_password": pwd_context.hash("leader123"),
"role": "leader",
"role_level": 3,
"edition": "edu",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
{
"_id": ObjectId(),
"username": "master_edu",
"email": "master_edu@example.com",
"hashed_password": pwd_context.hash("master123"),
"role": "master",
"role_level": 4,
"edition": "edu",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
# 企业版:员工、组长、负责人、高管与管理员
{
"_id": user_biz_id,
"username": "user_biz",
"email": "user_biz@example.com",
"hashed_password": pwd_context.hash("userbiz123"),
"role": "user",
"role_level": 1,
"edition": "biz",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
{
"_id": ObjectId(),
"username": "manager_biz",
"email": "manager_biz@example.com",
"hashed_password": pwd_context.hash("managerbiz123"),
"role": "manager",
"role_level": 2,
"edition": "biz",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
{
"_id": ObjectId(),
"username": "leader_biz",
"email": "leader_biz@example.com",
"hashed_password": pwd_context.hash("leaderbiz123"),
"role": "leader",
"role_level": 3,
"edition": "biz",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
{
"_id": ObjectId(),
"username": "master_biz",
"email": "master_biz@example.com",
"hashed_password": pwd_context.hash("masterbiz123"),
"role": "master",
"role_level": 4,
"edition": "biz",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
{
"_id": ObjectId(),
"username": "administrator_biz",
"email": "administrator_biz@example.com",
"hashed_password": pwd_context.hash("adminbiz123"),
"role": "administrator",
"role_level": 5,
"edition": "biz",
"created_at": datetime.now(),
"updated_at": datetime.now()
},
]
await db.users.insert_many(users)
print(f"已创建用户集合并添加 {len(users)} 条记录")
# 根据运行模式筛选用户(不混合)
mode = APP_MODE if APP_MODE in {"edu", "biz"} else "edu"
if mode != APP_MODE:
print(f"警告:APP_MODE={APP_MODE} 非法,默认使用 edu")
selected_users = [u for u in users if u["edition"] == mode]
await db.users.insert_many(selected_users)
print(f"已创建用户集合并添加 {len(selected_users)} 条记录(模式:{mode})")
# 创建敏感词集合并添加假数据
sensitive_words = [
......@@ -140,10 +250,13 @@ async def init_db():
# 创建对话集合并添加假数据
conversation_id = ObjectId()
# 根据模式选择示例用户用于演示对话与敏感词记录
sample_user_id = user_id if mode == "edu" else user_biz_id
conversations = [
{
"_id": conversation_id,
"user_id": user_id,
"user_id": sample_user_id,
"messages": [
{
"role": "user",
......@@ -171,8 +284,9 @@ async def init_db():
# 创建敏感词记录集合并添加假数据
sensitive_records = [
{
"user_id": "user123",
"conversation_id": "conv123",
# 使用真实的 ObjectId,避免与模型类型不一致
"user_id": sample_user_id,
"conversation_id": conversation_id,
"message_content": "我想了解一下赌博的事情",
"sensitive_words_found": [
{
......@@ -186,8 +300,9 @@ async def init_db():
"timestamp": datetime.now()
},
{
"user_id": "user123",
"conversation_id": "conv456",
# 第二条记录同样引用真实的 ObjectId
"user_id": sample_user_id,
"conversation_id": conversation_id,
"message_content": "如何获取毒品和色情内容",
"sensitive_words_found": [
{
......@@ -212,9 +327,13 @@ async def init_db():
print(f"已创建敏感词记录集合并添加 {len(sensitive_records)} 条记录")
print("\n数据库初始化完成!")
print("\n测试账号:")
print("管理员账号: admin / admin123")
print("用户账号: user / user123")
print("\n测试账号 (模式: %s):" % mode)
if mode == "edu":
print("教育版管理员: admin / admin123 (role=administrator, edition=edu)")
print("教育版普通用户: user / user123 (role=user, edition=edu)")
else:
print("企业版管理员: administrator_biz / adminbiz123 (role=administrator, edition=biz)")
print("企业版普通用户: user_biz / userbiz123 (role=user, edition=biz)")
if __name__ == "__main__":
asyncio.run(init_db())
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment