【UPCPC2026】Domjudge 数据导入

Star

公告

大量刷题记录和结构化笔记将被迁移到基于 vitepress 框架构建的新站，本站将继续更新偏技术类的博客。

欢迎来新站看看

标签

Star

公告

大量刷题记录和结构化笔记将被迁移到基于 vitepress 框架构建的新站，本站将继续更新偏技术类的博客。

欢迎来新站看看

标签

ChatBot Crawler domjudge Hugo icpc JavaScript Mizuki outdated python Python solutions traps

Star

公告

大量刷题记录和结构化笔记将被迁移到基于 vitepress 框架构建的新站，本站将继续更新偏技术类的博客。

欢迎来新站看看

标签

ChatBot Crawler domjudge Hugo icpc JavaScript Mizuki outdated python Python solutions traps

站点统计

文章

28

分类

6

标签

12

总字数

23,757

运行天数

0 天

最后活动

0 天前

825 字

2 分钟

【UPCPC2026】Domjudge 数据导入

2026-03-23

hacking-log

domjudge

/

Python

我发现关于 domjudge 的搭建的教程挺多，但是导入数据的教程不太好找。我结合了一下已有的资料和官方文档自己折腾了一下，搞出来一套差不多能用的方案。

domserver 版本号是 8.3.2 运行环境是 ubuntu 24.04。运行脚本的电脑是 Macbook Pro M5。这篇博客主要是给自己干过的活留个档，如果版本号或者运行环境不一致可能需要微调。

同步设置#

jury interface -> Configuration Settings -> External Systems -> Data Source，改成 configuration data external，这样操作用的就是自己设的 external_id 了，可以避免很多强迫症的问题。

domserver 的前端的导入页在

interface

所有的通过图形化界面的导入方式都在这里。

用户导入#

找到 Teams & groups，在 Import JSON / YAML 下导入。

导入 Organizations#

对于校赛来说，组织可以设置成所有的学院。一共只有 id, icpc_id, name, formal_name, country 五个字段，我们不需要 icpc_id，我用了下面的脚本生成。

1
import json
2

3
COLLEGE_MAP = {
4
    "地球科学与技术学院": "c01",
5
    "石油工程学院": "c02",
6
    "化学化工学院": "c03",
7
    "机电工程学院": "c04",
8
    "储运与建筑工程学院": "c05",
9
    "材料科学与工程学院": "c06",
10
    "石大山能新能源学院": "c07",
11
    "海洋与空间信息学院": "c08",
12
    "控制科学与工程学院": "c09",
13
    "青岛软件学院、计算机科学与技术学院": "c10",
14
    "理学院": "c11",
15
    "经济管理学院": "c12",
16
    "外国语学院": "c13",
17
    "文法学院": "c14",
18
    "马克思主义学院": "c15",
19
    "体育教学部": "c16",
20
}
21

22
organizations = []
23

24
for name, id in COLLEGE_MAP.items():
25
    organizations.append({
26
        "id": id,
27
        "name": name,
28
        "formal_name": name,
29
        "country": "CHN"
30
    })
31

32
with open("organizations.json", "w") as f:
33
    json.dump(organizations, f, indent=2)

导入 teams 和 accounts#

teams 官方要求的格式是 json，accounts 官方要求的格式是 yaml. 我们的报名表的表头是这样的。

1
提交者（自动） 提交时间（自动） 学号（必填） 姓名（必填） 性别（必填） 学院（必填） 专业班级（必填） QQ（必填） 联系电话（必填）打星

1
import json
2
import yaml
3
import pandas as pd
4
import secrets
5
import string
6

7
def generate_password(length=12, use_upper=True, use_digits=True, use_symbols=True):
8
    # Gemini 写的
9
    chars = string.ascii_lowercase
10

11
    if use_upper:
12
        chars += string.ascii_uppercase
13
    if use_digits:
14
        chars += string.digits
15
    if use_symbols:
16
        chars += string.punctuation
17

18
    password = ''.join(secrets.choice(chars) for _ in range(length))
19
    return password
20

21
DATA_PATH = "final.csv"
22
BEGIN_ID = 1001
23
COLLEGE_MAP = {
24
    "地球科学与技术学院": "c01",
25
    "石油工程学院": "c02",
26
    "化学化工学院": "c03",
27
    "机电工程学院": "c04",
28
    "储运与建筑工程学院": "c05",
29
    "材料科学与工程学院": "c06",
30
    "石大山能新能源学院": "c07",
31
    "海洋与空间信息学院": "c08",
32
    "控制科学与工程学院": "c09",
33
    "青岛软件学院、计算机科学与技术学院": "c10",
34
    "理学院": "c11",
35
    "经济管理学院": "c12",
36
    "外国语学院": "c13",
37
    "文法学院": "c14",
38
    "马克思主义学院": "c15",
39
    "体育教学部": "c16",
40
}
41

42
wl110 = pd.read_excel("文理楼110信息统计.xlsx", sheet_name="编号-ip") # 文理楼 110 的序号和 ip 的对应关系极其混乱，所以单独统计了一张键值对的表
43
df = pd.read_csv(DATA_PATH, dtype=str)
44
df["打星"] = df["打星"] == "True" # 转成一个 bool 型
45
user_table = []
46
teams = []
47
accounts = []
48

49
for idx, row in df.iterrows():
50
    id = str(BEGIN_ID + idx)
51
    if row["打星"] == True:
52
        name = '*' + row["姓名（必填）"]
53
        group_ids = ["observers"]
54
    else:
55
        name = row["姓名（必填）"]
56
        if row["性别（必填）"] == "女":
57
            group_ids = ["girl-participants"]
58
        else:
59
            group_ids = ["participants"]
60
    if idx < 72: # 前 72 个人排到 102
61
        pos = idx * 2 + 2
62
        location = f"文理楼102-{pos}"
63
        ip = f"172.20.6.{pos}"
64
    else: # 后面 54 个人排到 110
65
        location = f"文理楼110-{wl110.iloc[idx - 72]["编号"]}"
66
        ip = f"172.20.10.{wl110.iloc[idx - 72]["ip"]}"
67
    teams.append({
68
        "id": id,
69
        "group_ids": group_ids,
70
        "name": row["学号（必填）"],
71
        "display_name": name,
72
        "organization_id": COLLEGE_MAP[row["学院（必填）"]],
73
        "location": {"description": location}
74
    })
75
    # 生成了个密码备用，但是并没有派上用场，因为出问题的时候调座直接修改 ip 好像比数一遍密码快
76
    password = generate_password(use_symbols=False)
77
    accounts.append({
78
        "id": id,
79
        "username": row["学号（必填）"],
80
        "password": password,
81
        "type": "team",
82
        "name": name,
83
        "team_id": id,
84
        "ip": ip,
85
    })
86
    user_table.append({
87
        "username": row["学号（必填）"],
88
        "passowrd": password,
89
        "姓名": name,
90
        "学院": row["学院（必填）"],
91
        "班级": row["专业班级（必填）"],
92
        "打星": row["打星"],
93
        "位置": location
94
    })
95

96
# domserver 导入（给 domserver 看的）
97
with open("output/teams.json", "w") as f:
98
    json.dump(teams, f, indent=2, ensure_ascii=False)
99

100
with open("output/accounts.yaml", "w") as f:
101
    yaml.dump(accounts, f, allow_unicode=True)
102

103
# 用户名单（给人看的）
104
pd.DataFrame(user_table).to_excel("user_table.xlsx")

竞赛导入#

导入竞赛#

在 Contests 下导入，但是我感觉这个不如用图形界面创建。

题目导入#

最好保证先有一个竞赛。

题目格式#

首先过一遍题目的数据格式，每道题都是一个压缩包，以正式赛的 I 题为例，I.zip 里面的内容如下

1
I
2
# 这个目录下可以再放一个 problem.pdf 表示题面
3
# 也可以在 I/submissions/accepted/ 下面放标程，交上去之后会自动测一遍这个标程
4
├── data
5
│   ├── sample # 样例，用户可以打包下载这里的文件
6
│   │   ├── 1.ans
7
│   │   └── 1.in
8
│   └── secret # 测试数据，只需要 in 和 out 的 stem 对应即可
9
│       ├── 1.ans
10
│       ├── 1.in
11
│       ├── 2.ans
12
│       ├── 2.in
13
│       ├── 3.ans
14
│       ├── 3.in
15
│       ├── 4.ans
16
│       ├── 4.in
17
│       ├── 5.ans
18
│       └── 5.in
19
├── domjudge-problem.ini
20
├── output_validators # （可选）交互库 / spj
21
│   └── I_cmp
22
│       ├── interactor.cpp
23
│       └── testlib.h
24
└── problem.yaml
25

26
6 directories, 16 files

1
timelimit='2' # 单位是 s
2
special_run='I_cmp' # （可选）如果不需要 spj 或者交互器可以无视，发现导出的里面有个这条，但是不加好像也没事
3
color='#ab812c' # （可选）气球颜色

1
name: Liuxx怎么这么能跑啊
2
validation: 'custom interactive' # （可选）验证方法，交互题 custem interactive，一般 spj 用 custom
3
# （可选）浮点数误差 validator_flags: 'float_tolerance 1E-6'
4
limits:
5
    memory: 256 # 单位是 MB

维护者的无奈#

不过一般题不是一个人出的，所以每个人都不好好看文档就会造成很大的混乱，比如

domjudge-problem.ini 或者 problem.yaml 缺失或者格式有问题
测试数据里面的换行用 \r\n 而不是 \n，说几次都不听
不传样例，只传测试数据

最后还得是搭建 domjudge 的人在擦屁股，我是真无语……如果你没有遇到这种恶习的情况可以直接看打包了。

于是我想到一个折中的办法，只要他们的测试数据，其他的都从题面里解析，所有测试数据都过滤一遍 \r\n，之后如果需要 spj，interactive 或者浮点数精度等调整自己手动在加一下。我们的题面统一用了 olymp.sty，参考了前前任部长的教程 ACM/ICPC/CCPC等算法竞赛规范题面撰写详细教程（全网最详细，看完包懂）。

解析 latex 文件需要 texsoup 库

1
pip install texsoup

这里模式也很固定，的很多代码也都是先 ai 在微调的。

1
"""
2
解析 tex 用
3
"""
4
from TexSoup import TexSoup
5
from TexSoup.data import TexNode, TexArgs
6

7

8
def tex_info(content: str):
9
    # 解析LaTeX内容
10
    soup = TexSoup(content)
11
    # 初始化返回值
12
    title = ""
13
    timelimit = ""
14
    memlimit = ""
15
    samples = []
16

17
    # 1. 提取problem环境的参数（时间限制、内存限制）
18
    problem_env = soup.find("problem")
19
    if problem_env and hasattr(problem_env, 'args') and len(problem_env.args) >= 5:
20
        # problem环境参数：{Title}{standard input}{standard output}{2 seconds}{256 megabytes}
21
        title = str(problem_env.args[0]).strip("{}")
22
        time_arg = str(problem_env.args[3]).strip("{}")  # 第4个参数是时间限制
23
        mem_arg = str(problem_env.args[4]).strip("{}")  # 第5个参数是内存限制
24

25
        timelimit = int(_get_arg_text(time_arg).split()[0])
26
        memlimit = int(_get_arg_text(mem_arg).split()[0])
27

28
    # 2. 提取example环境中的样例输入输出
29
    example_envs = soup.find_all("example")
30
    for example in example_envs:
31
        # 找到exmp命令（样例输入输出的容器）
32
        exmp_cmd = example.find("exmp")
33
        if exmp_cmd and hasattr(exmp_cmd, 'args') and len(exmp_cmd.args) >= 2:
34
            # exmp的第一个参数是输入，第二个是输出
35
            in_arg = exmp_cmd.args[0]
36
            ans_arg = exmp_cmd.args[1]
37

38
            # 提取输入输出文本并清理格式
39
            in_text = _clean_sample_text(_get_arg_text(in_arg))
40
            ans_text = _clean_sample_text(_get_arg_text(ans_arg))
41

42
            if in_text or ans_text:
43
                samples.append({
44
                    "in": in_text,
45
                    "ans": ans_text
46
                })
47

48
    return title, timelimit, memlimit, samples
49

50

51
def _get_arg_text(arg) -> str:
52
    """辅助函数：提取TexArgs/TexNode的文本内容"""
53
    if isinstance(arg, (TexArgs, TexNode)):
54
        # 直接取text属性，若没有则转为字符串
55
        return getattr(arg, 'text', str(arg))
56
    return str(arg)
57

58

59
def _clean_sample_text(text: str) -> str:
60
    """辅助函数：清理样例文本中的多余符号和空白"""
61
    return text.strip("{}%\n").replace("\r\n", "\n")

1
from pathlib import Path
2
from shutil import rmtree
3
from tex_info import tex_info
4

5
basedir = Path("/Users/wangyafei/Documents/学校事务/社团管理/2025社团管理/2026校赛/contests/正式赛/problems")
6
tex_path = basedir / "LaTeX" # tex 的目录，每道题是一个单独的 tex，文件名如 A.tex
7
std_path = basedir / "std" # 标程的目录，文件名如 std-a.cpp
8
input_path = basedir / '数据' # 测试数据
9
output_path = basedir / 'norm_output'
10
output_path.mkdir(exist_ok=True) # 创建目录
11

12
for dir in input_path.iterdir():
13
    if dir.is_dir():
14
        label = dir.stem.upper()
15
        print(f"Processing Problem {label}")
16
        out_dir = output_path / label
17
        rmtree(out_dir, ignore_errors=True)
18
        out_dir.mkdir(exist_ok=True)
19
        content = (tex_path / f"{label}.tex").read_text()
20
        title, timelimit, memlimit, samples = tex_info(content)
21

22
        # domjudge-problem.ini
23
        print("Writing domjudge-problem.ini")
24
        (out_dir / 'domjudge-problem.ini').write_text(f"timelimit = {timelimit}")
25

26
        # problem.yaml
27
        print("Writing problem.yaml")
28
        (out_dir / 'problem.yaml').write_text(f"name: \"{title}\"\nlimits:\n    memory: {memlimit}")
29

30
        # submissions/accepted
31
        try:
32
            print("Writing submissions/correct/solve.cpp")
33
            std_dir = out_dir / 'submissions' / 'accepted'
34
            std_dir.mkdir(parents=True, exist_ok=True)
35
            (std_dir / 'solve.cpp').write_text((std_path / f"std-{label.lower()}.cpp").read_text())
36
        except Exception as e:
37
            print(f'error: {e}')
38

39
        # data/sample
40
        print("Writing samples")
41
        sample_dir = out_dir / 'data' / 'sample'
42
        sample_dir.mkdir(parents=True)
43
        for i, data in enumerate(samples):
44
            (sample_dir / f"{i}.in").write_text(data.get("in"))
45
            (sample_dir / f"{i}.ans").write_text(data.get("ans"))
46

47
        # data/secret
48
        print("Copying testcases")
49
        data_dir = out_dir / 'data' / 'secret'
50
        data_dir.mkdir(parents=True)
51
        for file in (dir / 'data' / 'secret').iterdir():
52
            filename = file.name if file.suffix == ".in" else file.stem + ".ans" # 没错这都有人不看，用了 .out，所以不是 in 都认为是输出了
53
            out_file_path = data_dir / filename
54
            out_file_path.write_text(file.read_text().replace('\r\n', '\n'))

打包#

注意：压缩包根路径应该是 data, domjudge-problem.ini, problem.yaml …，而不是一个目录 I！

这个任务比较好描述，所以我丢给 ai 直接就得到了一个能用的脚本，把 output 目录下的所有题目的目录打包输出到 archives。

1
import os
2
import zipfile
3
from pathlib import Path
4

5
def zip_folders_in_output(output_dir: str = "output", archives_dir: str = "archives"):
6
    """
7
    遍历output路径下的所有文件夹，将文件夹内容打包为zip包（散列内容），输出到archives路径
8

9
    Args:
10
        output_dir (str): 源文件夹路径，默认"output"
11
        archives_dir (str): 压缩包输出路径，默认"archives"
12
    """
13
    # 转换为Path对象，方便路径操作
14
    output_path = Path(output_dir)
15
    archives_path = Path(archives_dir)
16

17
    # 检查output目录是否存在
18
    if not output_path.exists():
19
        print(f"错误：{output_dir} 目录不存在！")
20
        return
21

22
    # 创建archives目录（若不存在）
23
    archives_path.mkdir(exist_ok=True)
24

25
    # 遍历output下的所有一级子目录
26
    for item in output_path.iterdir():
27
        if item.is_dir():
28
            folder_name = item.name  # 文件夹名称（作为zip包名）
29
            zip_file_path = archives_path / f"{folder_name}.zip"  # zip包路径
30

31
            # 创建zip文件（压缩模式为DEFLATED，即有压缩效果）
32
            with zipfile.ZipFile(zip_file_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
33
                # 遍历文件夹内的所有文件和子目录
34
                for root, dirs, files in os.walk(item):
35
                    # 计算相对路径（去掉外层文件夹，实现散列打包）
36
                    relative_root = os.path.relpath(root, item)
37

38
                    # 先添加子目录（确保空目录也被打包）
39
                    for dir_name in dirs:
40
                        dir_path = os.path.join(root, dir_name)
41
                        relative_dir_path = os.path.join(relative_root, dir_name)
42
                        # 写入目录到zip包
43
                        zipf.write(dir_path, relative_dir_path)
44

45
                    # 再添加文件
46
                    for file_name in files:
47
                        file_path = os.path.join(root, file_name)
48
                        relative_file_path = os.path.join(relative_root, file_name)
49
                        # 写入文件到zip包（使用相对路径，避免外层文件夹）
50
                        zipf.write(file_path, relative_file_path)
51

52
            print(f"成功打包：{zip_file_path}")
53

54
if __name__ == "__main__":
55
    # 调用函数，可自定义output和archives路径
56
    zip_folders_in_output(output_dir="output", archives_dir="archives")

导入题目#

在 Problems 下，但是一般题目很多，一个一个手动导入很麻烦，另外我用图形界面导入有时候还一直报错，所以我选择用 api 导入。

1
import requests
2

3
API_URL = "<替换成domjudge的ip或域名>/api/v4"
4
USERNAME = "admin"
5
PASSWORD = "<替换成管理员密码>"
6
CONTEST_ID = "upcpc2026"
7

8
def upload_problem(zip_path):
9
    endpoint = f"{API_URL}/contests/{CONTEST_ID}/problems"
10

11
    with open(zip_path, 'rb') as f:
12
        files = {'zip': f}
13
        response = requests.post(
14
            endpoint,
15
            auth=(USERNAME, PASSWORD),
16
            files=files
17
        )
18

19
    if response.status_code == 200:
20
        print("Successfully uploaded!")
21
        print("Response:", response.json())
22
    else:
23
        print(f"Failed! Status code: {response.status_code}")
24
        print("Error:", response.text)
25

26
for i in range(0, 13):
27
    upload_problem(f'./{chr(i + ord('A'))}.zip') # 如果这样写就必须要 cd 到压缩包那个目录下跑，也可以写死目录或者写好相对路径

如果这篇文章对你有帮助，欢迎分享给更多人！

【UPCPC2026】Domjudge 数据导入

https://starlab.top/posts/upcpc2026-import/

作者

Star

发布于

2026-03-23

许可协议

CC BY-NC-SA 4.0

部分信息可能已经过时

【游记】2026年ICPC全国邀请赛（陕西）

使用腾讯云 EdgeOne Pages 部署 Mizuki 博客

Star's Blog

同步设置#

用户导入#

导入 Organizations#

导入 teams 和 accounts#

竞赛导入#

导入竞赛#

题目导入#

题目格式#

维护者的无奈#

打包#

导入题目#

目录