DNS Crawler Operation
CZ.NIC association as the operator of the .cz top-level domain performs regular checks of all registered second-level domains. The DNS crawler tool is used for this purpose. The aims are the following:
- Improve quality and validity of DNS data by detecting problems in zone contents and configuration, such as expired DNSSEC keys, weak cryptographic algorithms or circular CNAME records.
- Discover malevolent activities and security problems, such as abusive e-shops or domains used for the operation of botnets.
- Automatically classify the domains according to the configuration and contents of DNS zones, implementations and versions of their DNS, mail and web servers, as well as the general character of the domain's main web page (if it exists).
DNS crawler is designed so as to collect data efficiently, without requesting the same information repeatedly. The extra burden on the Internet infrastructure that it causes should be negligible compared to regular traffic.
CZ.NIC decided to be absolutely open with respect to DNS crawler operation. In particular:
- The software is open source and therefore available to everybody for use and independent inspection.
- IP addresses of machines running the DNS crawler are stable and public (see below).
- DNS crawler connects only to the following destination ports: 53 (UDP & TCP), 25 (TCP), 80 (TCP), 443 (TCP), 465 (TCP) and 587 (TCP).
- Data collected by the DNS crawler and rules for its use are described below; any future changes to the data set and rules will be publicly announced in advance on this web page.
The only aspect of DNS crawler operation that CZ.NIC does not publish is the list of second-level domains in the .cz zone.
Machines running the DNS crawler
- crawler-1.labs.nic.cz (IPv4: 217.31.192.34, IPv6: 2001:1488:ac15:ff40::34)
- crawler-2.labs.nic.cz (IPv4: 217.31.192.35, IPv6: 2001:1488:ac15:ff40::35)
- crawler-3.labs.nic.cz (IPv4: 217.31.192.36, IPv6: 2001:1488:ac15:ff40::36)
- crawler-4.labs.nic.cz (IPv4: 217.31.192.37, IPv6: 2001:1488:ac15:ff40::37)
Collected data and operational schedule
DNS crawler collects the following data for all second-level domains under .cz:
- DNS data that are important for DNS to function properly, i.e. resource records of the types NS, SOA, MX, RRSIG and others. Records of the A and AAAA types are collected only for web, mail and DNS servers of each domain.
- Data contained in SMTP banners (initial responses) of mail servers appearing in MX records and listening on ports 25, 465 and 587.
- Metadata of main web pages (
<domain>.cz
and www.<domain>.cz
) on ports 80 and 443, such as HTTP status and headers, or a complete certificate chain.
- Contents of main web pages on ports 80 and 443, including images, CSS and JavaScript.
In a normal mode of operation, DNS crawler is run regularly with two different periods – weekly and mohthly – depending on the type of data. Under certain circumstances there are some domains being scanned every day during 30 days in order to detect a malicious activities or configuration problems as soon as possible. It refers to:
- new established domains.
- domains apply to article 17 Rules of which registration has been cancelled in .CZ registry and after the expiration of delegation cancellation has been reestablished their delegation in a zone.
- domains reported as a Phishing-suspicious which have not yet been independently verified.
The data collection schedule and retention periods for each category of data are shown in the following table.
Data collection schedule and retention
Type of data |
New domains and domains according to rules stated above: |
Other domains |
Max. retention period |
DNS |
once a day |
once a week |
1 year |
SMTP |
once a day |
once a week |
1 year |
Web – metadata |
once a day |
once a week |
1 year |
Web – contents |
once a day |
once a month |
1 month |
Data use policy
CZ.NIC promises to adhere to the following rules regarding the data obtained from DNS crawler operation:
- Original data collected by the crawler, as well as processed data and information about specific domains, shall neither be published nor passed to third parties, except in the following cases:
- CZ.NIC association is required by legal regulations to reveal the data.
- Know-how and services of third parties will be utilized for the purposes stated above, e.g. in joint projects. In this case, data shall be provided under a non-disclosure agreement.
- Problems of all kinds that need to be addressed by domain owners or administrators shall be communicated privately to appropriate domain contacts obtained from the .cz domain registry.
- Discovered security incidents shall be handled using the standard procedures of the CSIRT.CZ security team.
- CZ.NIC association shall use the classification of domains and their contents for operational, planning, research and educational purposes.
- General statistics obtained from the collected data shall be publicly available, both in a graphical form and as open data.
- Each data item collected by the DNS crawler shall be retained for no longer than the maximum period shown in the table above (usually much shorter).
Contact information
- Urgent problems regarding DNS crawler operation: contact non-stop CZ.NIC info line
- General questions, comments and requests: send email to dns-crawler@nic.cz
- Specific bugs, issues and feature requests: use the project Issues page