TOC 
Network Working GroupD. Skoll
Internet-DraftRoaring Penguin Software Inc.
Intended status: InformationalDecember 16, 2011
Expires: June 18, 2012 


Reputation Reporting Protocol
draft-dskoll-reputation-reporting-04

Abstract

This draft specifies a protocol for reporting various events associated with IP addresses. These events can be collected and aggregated to form a database containing information about the reputation of IP addresses.

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on June 18, 2012.

Copyright Notice

Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.



Table of Contents

1.  Requirements notation
2.  Introduction
3.  Definitions
4.  Report Format
    4.1.  Subreports
    4.2.  Subreport Formats
5.  Subreport Format Definitions
    5.1.  IP Address Subreports
        5.1.1.  IP Address Event Types
    5.2.  Repeated IP Address Subreports
    5.3.  Vendor Number Subreport
    5.4.  Software Name Subreport
    5.5.  Software Version Subreport
    5.6.  End-User Subreport
    5.7.  Vendor-Specific Subreport
6.  Hierarchical Aggregation
    6.1.  Collector Level Subreport
7.  Report Restrictions
8.  Report Layout
    8.1.  Sample Report
9.  IANA Considerations
10.  Security Considerations
11.  Normative References
§  Author's Address




 TOC 

1.  Requirements notation

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).



 TOC 

2.  Introduction

Several organizations (Project Honeypot, Spamhaus, RepuScore, etc.) maintain databases detailing the reputation of various IP address. These organizations use various ad hoc methods to collect the reputation data. There is no standard for reporting events to reputation-collectors; this makes it hard to instrument a large number of systems to be data gatherers in a standard way.

This draft proposes a standard for reporting events back to a reputation-collecting system. A standard way to report events will make it easy to instrument systems to collect data and report it to one or more reputation-collection systems.

This standard has the following goals:

  1. To be bandwidth-efficient. The overhead for event-reporting should be as small as possible.
  2. To include security mechanisms such as authentication and anti-replay mechanisms.
  3. To handle both IPv4 and IPv6.
  4. To be open to extension in future.
  5. To be scalable. Sending reports should be as quick and easy as possible. The hosts receiving reports should be able to receive and process a large volume of reports with as little CPU and network overhead as possible.

This protocol is currently in use by Roaring Penguin Software Inc's CanIt anti-spam products. A web page devoted to the protocol is found at http://www.mimedefang.org/reputation; this page includes a link to a mailing list for discussions of the protocol.



 TOC 

3.  Definitions

An EVENT is the basic unit of reputation reporting. An event consists of an IPv4 or IPv6 address and an associated event TYPE. Examples of event types are "SMTP client at IP address issued an SMTP RCPT command for a nonexistent recipient" or "SMTP client at IP address delivered an email message considered by a human to be spam."

A SUBREPORT is usually a list of one or more events. However, a subreport may contain other types of information such as software name, version, etc.

A SENSOR is a system that reports events.

An AGGREGATOR is a system that receives events and builds a reputation database.

A REPORT is a series of subreports sent from a sensor to an aggregator.

A USER NAME is included in each report. The user name is a key used by the aggregator to look up a shared secret.

A SHARED SECRET is used by the sensor and aggregator to authenticate reports.

A REPUTATION DATABASE is a dictionary indexed by IPv4 or IPv6 address that contains reputation information about a particular IP address. Note that the exact format of the reputation database as well as what constitutes "reputation" are beyond the scope of this document. We are concerned only with a standard for reporting events.



 TOC 

4.  Report Format

Reports are transported via UDP. The aggregator SHOULD listen for UDP packets on port 6568, the IANA-assigned port number for this protocol [PORTS] (IANA, “Port Numbers,” 2010.).

A report consists of the following items:



 TOC 

4.1.  Subreports

A subreport consists of a three-byte preamble followed by the subreport contents. The three-byte preamble consists of:



 TOC 

4.2.  Subreport Formats

The following subreport formats are defined. The value of the FORMAT byte is given in decimal.

An aggregator MUST skip over subreports with format values it does not understand. (It can do this by skipping ahead LENGTH bytes.)

An aggregator MUST ignore the entire report if any subreports have invalid LENGTHs. For example, it MUST reject a subreport of IPv4-EVENTS if the LENGTH is not a multiple of 5. The aggregator SHOULD log information about invalid reports.



 TOC 

5.  Subreport Format Definitions

The following sections define exactly how various subreports are formatted.



 TOC 

5.1.  IP Address Subreports

FORMAT values 1 and 2 specify IPv4 and IPv6 subreports, respectively. Each IPv4 and IPv6 subreport contains one or more events. The length of each IPv4 event is 5, and the length of each IPv6 event is 17. The events themselves consist of:



 TOC 

5.1.1.  IP Address Event Types

The following event types are defined for IPv4 and IPv6 events:



 TOC 

5.2.  Repeated IP Address Subreports

FORMAT values 3 and 4 specify REPEATED-IPv4 and REPEATED-IPv6 subreports, respectively. Each REPEATED-IPv4 and REPEATED-IPv6 subreport contains one or more events. The length of each REPEATED-IPv4 event is 6, and the length of each REPEATED-IPv6 event is 18. The events themselves consist of:

The EVENT TYPE field for a REPEATED-IPv4 or REPEATED-IPv6 event is exactly the same as for IPv4 or IPv6 events. The extra REPEAT byte allows a sensor to compress up to 255 identical IP address events into a single REPEATED event.



 TOC 

5.3.  Vendor Number Subreport

FORMAT value 5 specifies the VENDOR-NUMBER subreport. This subreport MUST have a LENGTH of 3. The 3-byte content of the subreport is a 24-bit unsigned integer in network byte order specifying an IANA-assigned Private Enterprise Number [PEN] (IANA, “Private Enterprise Numbers,” 2010.). The VENDOR-NUMBER subreport is only required if there are subsequent VENDOR-SPECIFIC subreports.



 TOC 

5.4.  Software Name Subreport

FORMAT value 6 specifies the SOFTWARE-NAME subreport. The LENGTH can range from 1 to 63. The contents MUST be a valid UTF-8 string specifying the name of the software running on the aggregator. The string MUST NOT be zero-terminated. An aggregator MAY include a SOFTWARE-NAME subreport in each report, but MUST NOT include more than one SOFTWARE-NAME subreport.



 TOC 

5.5.  Software Version Subreport

FORMAT value 7 specifies the SOFTWARE-VERSION subreport. The LENGTH can range from 1 to 31. The contents MUST be a valid UTF-8 string (and SHOULD be a valid ASCII string) specifying the version of the software running on the aggregator. The string MUST NOT be zero-terminated.

An aggregator MAY include a SOFTWARE-VERSION subreport in each report, but MUST NOT include more than one SOFTWARE-VERSION subreport. If an aggregator includes a SOFTWARE-VERSION subreport, it MUST also include a SOFTWARE-NAME subreport.



 TOC 

5.6.  End-User Subreport

FORMAT value 8 specifies an END-USER subreport. This report, if present, provides additional information about the specific user who generated subsequent subreports. For example, if an ISP is reporting events, the END-USER subreport may contain enough information for the ISP to determine which of its subscribers reported the events. The contents of the END-USER subreport are a sequence of bytes ranging in length from 1 to 31 bytes. The contents are opaque and have meaning only to the organization running the sensor.



 TOC 

5.7.  Vendor-Specific Subreport

FORMAT values 128 through 254 specify a VENDOR-SPECIFIC subreport. A VENDOR-SPECIFIC subreport MUST be preceded by a VENDOR-NUMBER subreport. A given report MAY contain more than one VENDOR-NUMBER subreport; the interpretation of a VENDOR-SPECIFIC subreport MUST be made according to the VENDOR-NUMBER subreport that most closely preceded the VENDOR-SPECIFIC subreport.

The contents of the subreport are vendor-specific and not defined here. An aggregator that does not understand a VENDOR-SPECIFIC subreport for a given VENDOR-NUMBER MUST ignore the subreport.



 TOC 

6.  Hierarchical Aggregation

Normally, a sensor detects events directly (for example, it communicates with an MTA to detect events) and reports them to an aggregator, which builds the event database. However, it may be desirable to have an aggregator take all of the events from its sensors and report them to a higher-level "upstream" aggregator.

Allowing aggregators to report events to other aggregators introduces the possibility of reporting loops. To break these loops, aggregators MUST include a COLLECTOR-LEVEL subreport. It MUST be the first subreport in the report.

A sensor that detects events directly SHOULD NOT include a COLLECTOR-LEVEL subreport. However, if it does include one, it MUST specify a level of zero.

An aggregator that will forward reports to another aggregator MUST be configured to have an "intrinsic level" greater than zero. The aggregator MUST reject reports with a COLLECTOR-LEVEL greater than or equal to its intrinsic level. When the aggregator forwards reports, it MUST include a COLLECTOR-LEVEL subreport containing its intrinsic level. (If the original report included a COLLECTOR-LEVEL subreport, the aggregator MUST NOT include the original COLLECTOR-LEVEL report, but MUST replace it with its own COLLECTOR-LEVEL.) Note that assignment and management of intrinsic levels is beyond the scope of this document; such assignment must be agreed upon by aggregator managers based on the hierarchy of sensors and aggregators.



 TOC 

6.1.  Collector Level Subreport

FORMAT value 127 specifies the COLLECTOR-LEVEL subreport. The LENGTH MUST be 2, and the contents are an unsigned 16-bit integer in network byte order specifying the intrinsic level of the agent sending the report. If the COLLECTOR-LEVEL subreport is present, it MUST be the first subreport. If it is missing, the aggregator MUST assume a COLLECTOR-LEVEL of zero.



 TOC 

7.  Report Restrictions

A sensor MUST NOT report GREYLISTED or UNGREYLISTED events unless it implements greylisting. A sensor MUST NOT report HAND-SPAM or HAND-HAM events unless it has a reliable system for accepting a human's decision about a message and can show beyond a reasonable doubt that a human in fact made the decision about the reported message. A sensor MUST NOT report VALID-RECIPIENT or INVALID-RECIPIENT events unless it is capable of validating recipient addresses.

A sensor MUST NOT report events for an IPv4 address that is not a globally-routable unicast address. In particular, no IPv4 address in the Private Address Space of [RFC1918] (Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. Lear, “Address Allocation for Private Internets,” February 1996.) should appear in a report, nor should any address in 127/8 nor 224/4. An aggregator MUST ignore such addresses and SHOULD log the fact that they were received.

A sensor MUST NOT report events for an IPv6 address that is not an Aggregatable Global Unicast Address as defined in [RFC4291] (Hinden, R. and S. Deering, “IP Version 6 Addressing Architecture,” February 2006.). An IPv4-compatible IPv6 address or an IPv4-mapped IPv6 address MUST be reported as an IPv4 event and not an IPv6 event. An aggregator MUST ignore events that violate this requirement and SHOULD log the fact that they were received.

A sensor MUST NOT report a VIRUS event unless a virus was detected using a signature-based virus scanner.

A sensor SHOULD include as many events in its report as necessary to make the report size at least 400 bytes. A sensor MAY send out a shorter report, but MUST NOT do so unless failing to do so would result in loss of data OR unless it has not sent a report within the last hour. For example, if a sensor process is about to exit and has buffered events in memory, it SHOULD report the buffered events before exiting, even if the report size would be less than 400 bytes and even if a report has been sent within the last hour.

A sensor SHOULD NOT send a report that would exceed 492 bytes unless it has a priori knowledge that such a large UDP datagram will be received intact by the aggregator. An aggregator MUST be prepared to handle a report as large as the largest possible UDP datagram (65507 bytes of actual data.)

A sensor MUST NOT send an empty report (that is, a report with no subreports.)

When an aggregator logs information about a report, it MUST log the originating IP address of the report. If the report is well-formed, it MUST log the user name associated with the report. It SHOULD log additional information concerning the disposition of the report and the reason for the disposition.

If Hierarchical Aggregation is being used, a sensor MUST NOT report events to more than one aggregator. A lower-level aggregator MUST NOT forward events to more than one higher-level aggregator. These restrictions are required to avoid the possibility of counting the same event more than once.



 TOC 

8.  Report Layout

The following diagram illustrates the layout of a report. The version byte starts immediately after the UDP header.

                     1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VERSION (=2)  | USERNAME LEN  |  USERNAME ...                 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                                                               ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       USERNAME continued                      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                        EIGHT RANDOM BYTES                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|                    EIGHT RANDOM BYTES continued               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            TIMESTAMP                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   FORMAT 1    |            LENGTH 1           |    DATA 1     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                                                               ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       DATA 1 CONTINUED ...                    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   FORMAT 2    |            LENGTH 2           |    DATA 2     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                                                               ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         DATA 2 CONTINUED ...                  |   EOR (=0)    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          HMAC DIGEST                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       HMAC DIGEST continued                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     HMAC DIGEST continued     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Note that there are no alignment requirements or padding bytes. The preceding figure shows some fields aligned to a four-byte boundary purely for ease of drawing.



 TOC 

8.1.  Sample Report

The following figure shows a sample report. The report is from the user "dfs" with password "foo". It contains two IPv4 events: 192.0.2.2 sent mail automatically classified as spam, and 192.0.2.3 was greylisted. There is a repeated IPv4 event: 192.0.2.4 sent to an invalid recipient 3 times. Finally, There is one IPv6 event: 2001:db8:1d:e4:2e0:18ff:feab:147f sent mail to a valid recipient. The bytes are shown in hexadecimal.

02                      -- Version 2 of the protocol
03                      -- Length of user name is 3 bytes
64 66 73                -- "dfs" in UTF-8
2a 9a 82 d6 51 29 64 f7 -- 8 random bytes
4b d9 da eb             -- 32-bit timestamp
01 00 0a                -- Two IPv4 events (Fmt 1, Len 10)
c0 00 02 02 03          -- 192.0.2.2 sent auto-spam (event type 3)
c0 00 02 03 01          -- 192.0.2.3 was greylisted (event type 1)
03 00 06                -- One Repeated-IPv4 event (Fmt 3, Len 6)
c0 00 02 04 08 03       -- 192.0.2.4 sent to invalid recip 3 times.
02 00 11                -- One IPv6 event (Fmt 2, Len 17 decimal)
20 01 0d b8 00 1d 00 e4 -- IPv6 address (first 8 bytes)
02 e0 18 ff fe ab 14 7f -- IPv6 address (last 8 bytes)
07                      -- Event type 7 (Valid Recipient)
00                      -- EOR (End of Reports) byte
0c 10 51 0f 5d          -- 10-byte truncated SHA1 HMAC
7e a1 e0 aa 20          -- 10-byte truncated SHA1 HMAC cont'd



 TOC 

9.  IANA Considerations

This document defines a one-byte SUBREPORT FORMAT. Formats 0 through 8 and 127 are defined in Section 4.2 (Subreport Formats) of this document. The IANA should set up the "IP Reputation Reporting Format Numbers" registry for registering formats 9 through 126. Formats 128 through 254 are available for private use without requiring IANA registration. Format 255 should not be used.

This document defines a one-bye EVENT TYPE. Event types 1 through 9 are described in Section 5.1.1 (IP Address Event Types) of this document. Event type 0 is reserved. The IANA should set up the "IP Reputation Reporting Event Types" registry for registering event types 10 through 255.



 TOC 

10.  Security Considerations

Because reports are transmitted via UDP, aggregators are vulnerable to IP address spoofing. An aggregator MAY use techniques other than HMAC verification to reject reports. For example, if it knows that a particular user always sends reports from a restricted set of IP addresses, it MAY discard reports claiming to be from that user if they originate from outside the restricted set of addresses. The aggregator SHOULD log information about discarded reports.

Reports are transmitted in the clear. An eavesdropper can easily sniff user names from the reports. The eavesdropper can also gain information about SMTP traffic as seen by sensors. If this is undesirable, the sensor SHOULD arrange to transmit data to the aggregator over a secure channel using IPSec or some other VPN solution.

The shared secret SHOULD be at least 8 bytes long and SHOULD be generated by a cryptographically-secure random number generator. The shared secret SHOULD NOT be chosen by a human being because of the risk of picking a weak secret or a secret guessable from the user name.

An aggregator SHOULD use the time stamp and random-number fields to detect duplicate reports and fend off replay attacks. An aggregator SHOULD NOT accept a report whose timestamp is more than two minutes away from the current time. (For this reason, both aggregators and sensors SHOULD synchronize their clocks to a standard time source using NTP [RFC1305] (Mills, D., “Network Time Protocol (Version 3) Specification, Implementation,” March 1992.).)

Aggregators implicitly trust sensors. Therefore, aggregator administrators SHOULD ensure that sensor administrators follow security best-practices. Both aggregator and sensor administrators MUST take the utmost care to keep their shared secret from leaking out.



 TOC 

11. Normative References

[GREY] Harris, E., “The Next Step in the Spam Control War: Greylisting,” August 2003.
[PEN] IANA, “Private Enterprise Numbers,” 2010.
[PORTS] IANA, “Port Numbers,” 2010.
[RFC1305] Mills, D., “Network Time Protocol (Version 3) Specification, Implementation,” RFC 1305, March 1992 (TXT, PDF).
[RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. Lear, “Address Allocation for Private Internets,” BCP 5, RFC 1918, February 1996 (TXT).
[RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, “HMAC: Keyed-Hashing for Message Authentication,” RFC 2104, February 1997 (TXT).
[RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML).
[RFC4291] Hinden, R. and S. Deering, “IP Version 6 Addressing Architecture,” RFC 4291, February 2006 (TXT).


 TOC 

Author's Address

  Dianne Skoll
  Roaring Penguin Software Inc.
  17 Grenfell Crescent, Suite 209C
  Ottawa, ON K2G 0G3
  CA
Phone:  +1 613 231-6599
Email:  dfs@roaringpenguin.com