SIEM Logstash parsing for more than hundred technologies

Last update: Dec 29, 2022

Overview

LogIndexer Pipeline

Logstash Parsing Configurations for Elastisearch SIEM and OpenDistro for Elasticsearch SIEM

Why this project exists

The overhead of implementing Logstash parsing and applying Elastic Common Schema (ECS) across audit, security, and system logs can be a large drawback when using Elasticsearch as a SIEM (Security Incident and Event Management). The Cargill SIEM team has spent significant time on developing quality Logstash parsing processors for many well-known log vendors and wants to share this work with the community. In addition to Logstash processors, we have also included log collection programs for API-based log collection, as well as the setup scripts used to generate our pipeline-to-pipeline architecture.

Quick start Instructions

"Quick start" mostly depends on how your Logstash configuration is set up. If you have your own setup already established, it might be best to use the processors that apply to your organization's log collection (found in the "config" directory). If you are seeking to use the architecture in this repo, consult the README found in the build_scripts directory. We will be adding an elaborate setup guide soon.

Contributions

We welcome and encourage individual contributions to this repo. Please see the Contribution.md guide in the root of the repo. Please note that we reserve the right to close pull requests or issues that appear to be out of scope for our project, or for other reasons not specified.

Questions, Comments & Expected Level of Attention

Please create an issue and someone will try to respond to your issue within 5 business days. However, it should be noted that while we will try revisit the repository semi-regularly, we are not held beholden to this response time (life happens). We welcome other individuals' answers and input as well.

Licensing

Apache-2.0

Comments

improved cisco ACI processor

Improved the cisco aci processor with the following changes:

simplified grok parsing
removed complex logic used to detected event and error messages
fixed broken parsing of the device hostname sending logs
tmp.rule does NOT rapresent an username , it's instead the even.reason as described by cisco, - The action or condition that caused the event, such as a component failure or a threshold crossing.

sample messages used for testing

<186>Dec 08 21:20:20.614 ABC-DCA-NPRD-ACILEF-104 %LOG_LOCAL7-2-SYSTEM_MSG [F0532][raised][interface-physical-down][critical][sys/phys-[eth1/47]/phys/fault-F0532] Port is down, reason being suspended(no LACP PDUs)(connected), used by EPG on node 104 of fabric ACI Fabric1 with hostname ABC-DCA-NPRD-ACILEF-104

<190>Nov 24 18:20:53.237 ABC-DCB-ACIAPC-003 %LOG_LOCAL7-6-SYSTEM_MSG [E4206143][transition][info][fwrepo/fw-aci-apic-dk9.5.2.6e] Firmware aci-apic-dk9.5.2.6e created

opened by anubisg1 3

[Help / Documentation] - how to classify incoming syslog messages

As per title, how would we classify incoming syslog messages so that they end up in the proper process pipeline?

Let's take a common use case where in the network we have Cisco IOS router and switches , Cisco ACI , Cisco WLC and ISE, then Checkpoint Firewalls , F5 load balancers etc ...

generally those devices would all be sending logs to the syslog server IP port 514. but how would we classify from where each message is coming from in order to send it to the specific processor ?

are we supposed to setup a different input queu for each processor (for example, different port ofn the syslog server so that for example, ACI goes to 192.168.10.10 port 5514 whole Checkpoint on port 5515? )

or is there an ip filter that says, if source IP is X send to ACI processor if Y send to checkpoint ..

or what other options are there?
question

opened by anubisg1 2
host split enrichment error

For certain hostnames the host split enrichment is causing the pipeline to be blocked until grok timesout.

[2022-06-10T15:54:58,563][WARN ][org.logstash.plugins.pipeline.PipelineBus][processor] Attempted to send event to 'enrichments' but that address was unavailable. Maybe the destination pipeline is down or stopping? Will Retry. [2022-06-10T15:57:22,451][WARN ][logstash.filters.grok ][enrichments] Timeout executing grok '^(?<[host][tmp]>.?).(?<[host][domain]>.?)$' against field '[host][hostname]' with value 'abc-name123-xyz.domain.com'!

opened by nnovaes 2

Fix deprecation warnings

User Story - details

For translate we should use source, target instead of field, destination. On boot logstash 15 shows these warnings:

[2021-11-09T16:53:33,518][WARN ][logstash.filters.translate] You are using a deprecated config setting "destination" set in translate. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. Use `target` option instead. If you have any questions about this, please visit the #logstash channel on freenode irc.
[2021-11-09T16:53:33,519][WARN ][logstash.filters.translate] You are using a deprecated config setting "field" set in translate. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. Use `source` option instead. If you have any questions about this, please visit the #logstash channel on freenode irc.

Atleast upto Logstash 13 new fields are not supported so let's make this change when we upgrade.

Tasks

X-Reference Issues

Related Code

<< Any related code here... >>

opened by KrishnanandSingh 2

native vlan mismatch and other improvements
Description

Parsing for Native VLAN mismatch error messages 2021-10-14T13:28:06.497Z {name=abc.com} <188>132685: Oct 14 21:28:07.975 GMT: %CDP-4-NATIVE_VLAN_MISMATCH: Native VLAN mismatch discovered on FastEthernet0/1 (1), with xyz GigabitEthernet1/0/1 (36).

Lowercase [actual_msg] field

fix typo on timestamp

add the timezone to [tmp][devicetimestamp]

removed the old parser code for native vlan mismatch

removed a catch all condition in the old parser

lowercase [rule.category]
opened by nnovaes 2
Feature Request: Add known applications + risk score field based off destination.port fields
User Story - details

As a SIEM engineer I want to know port numbers associated with the destination.port field. This will allow me to quickly identify potential applications communicating on the session and also the risk of the traffic Im observing

Tasks

Create a port lookup translation.

Add risk category score to application (scale of 1-10 or severity name).

Examples:

3389 -> Remote Desktop Protocol (high risk) 22 - Secure Shell (high risk) 3306 - MySQL (medium risk) 6881-6889 - Bit Torrent (high risk)
opened by ryanpodonnell1 2
Cisco IOS (cisco.router and cisco.switch) new rules

Description

new parsing rules for cisco.router and cisco.switch. The old version of this processor needs some rework. However, there are functioning bits of it that i have preserved, since they kind of work. the new rules provide some good foundation for future "full" parsing and also covers bgp and interface up/down msgs. the lookup database for translate filters is static.

opened by nnovaes 2
Update syslog_log_security_sdwan.app.conf

Description

These updates correct assignment of versa fields to the ECS model. It also adds back versa specific fields that do not map to ECS into a separate [labels][all] field that works like tags. I couldn't find clean way to implement it without using the add_tag command, so i have saved the event tags to another field and then restored back

@Akhila-Y please review as well.

opened by nnovaes 1
added space, testing new IDE
Description

Please provide a description of your proposed changes - providing obfuscated log/code examples is highly encouraged.

Related Issues

Are there any Issues to this PR?

Todos

Are there any additional items that must be completed before this PR gets merged in?

[ ]

[ ]
opened by MehaSal 1
added new ECS fields to .csv file
Description

Please provide a description of your proposed changes - providing obfuscated log/code examples is highly encouraged.

Related Issues

Are there any Issues to this PR?

Todos

Are there any additional items that must be completed before this PR gets merged in?

[ ]

[ ]
opened by MehaSal 1
added missing fields for coverge reporting to aws cloudtrail
Description

Please provide a description of your proposed changes - providing obfuscated log/code examples is highly encouraged.

Related Issues

Are there any Issues to this PR?

Todos

Are there any additional items that must be completed before this PR gets merged in?

[ ]

[ ]
opened by MehaSal 1

[[enrichments]>worker22] ruby - Ruby exception occurred: no implicit conversion of nil into String

Describe the bug

[ERROR] 2022-11-26 07:54:05.540 [[enrichments]>worker14] ruby - Ruby exception occurred: no implicit conversion of nil into String {:class=>"TypeError", :backtrace=>["(ruby filter code):68:in `block in filter_method'", "org/jruby/RubyArray.java:1865:in `each'", "(ruby filter code):67:in `block in filter_method'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:96:in `inline_script'", "/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:89:in `filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in `do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:178:in `block in multi_filter'", "org/jruby/RubyArray.java:1865:in `each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:175:in `multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:134:in `multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:301:in `block in start_workers'"]}

X-Reference issues

(Cross reference any user stories that this bug might be affecting)

Steps To Reproduce

start the enrichment pipeline. I'm using logstash 8.5.2

Expected behavior

no error should be seen

Additional context

The following components in enrichment make use of ruby filer, but i don't understand what is the culprit

./02_ecs_data_type.conf
./04_timestamp.conf
./11_related_hosts.conf
./12_related_user.conf
./13_related_ip.conf
./14_related_hash.conf
./16_related_mac.conf
./93_mitre.conf
./94_remove_empty_n_truncate.conf

bug wontfix

opened by anubisg1 2

cisco processor fails because of missing hostname and lowercase date
I'm working with syslog_audit_cisco.switch.conf and i found the following issues:

the syslog message is assumed here https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L52 as

# {timesdtamp} {facility} {severity} {mnemonic} {description} # seq no:timestamp: %facility-severity-MNEMONIC:description

in reality most people would configure "logging origin-id hostname" which will change the log format into

# {hostname} {timesdtamp} {facility} {severity} {mnemonic} {description} # seq no: hostname: timestamp: %facility-severity-MNEMONIC:description

the parser at line https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L33 is modifying the hostname field before that field is parsed (maybe this is assumed from kafka, instead of being taken from the logs?

in line https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L48 the message is converted to lower case, but that causes date parse failures later on, becuase of case missmatch .

Nov 17 11:44:46.490 UTC matches, but when i have nov 17 11:44:46.490 utc it fails on the date parsing here: https://github.com/Cargill/OpenSIEM-Logstash-Parsing/blob/1.0/config/processors/syslog_audit_cisco.switch.conf#L77

Sample log entry for reference:

<14>4643: Switch-core01: Nov 17 11:44:46.490 UTC: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/27, changed state to up
opened by anubisg1 0
GeoLitePrivate2-City.mmdb doesn't exist
to use the geoip enrichment, you need to files, specifically

database => "/mnt/s3fs_geoip/GeoLite2-City.mmdb" database => "/mnt/s3fs_geoip/GeoLitePrivate2-City.mmdb"

unfortunately seems like GeoLitePrivate2-City.mmdb doesn't exist anywhere in the internet and maxmind only provides

GeoLite2-ASN.mmdb

GeoLite2-City.mmdb

GeoLite2-Country.mmdb

i'd expect that either more information on where to find GeoLitePrivate2-City.mmdb is added to the documentation or the enrichment pipeline is updated to function without that file
documentation question
opened by anubisg1 3

Validate ECS fields

User Story - details

There should be an enrichment checking that only permitted values are stored in ECS fields that have a predefined set of values, so those fields can be compliant with ECS. See https://www.elastic.co/guide/en/ecs/1.9/ecs-event.html for more info. I believe event.xyz are the only fields that have their values defined. If that's true the sample code below should take care of doing this validation.

Tasks

X-Reference Issues

Related Code

the sample configuration below picks the event.type value that came from the processors and populates ecs_status with valid or event.type-invalid_field_value. therefore, if the ecs_status is not valid, it will add a tag that will have event.type-invalid_field_value. i.e. if event.type is "process", because "process" is not among the allowed values for event.type, a event.type-invalid_field_value: process will be added.

 translate {
            field => "event.type"
            dictionary => [
            "access", "valid", 
            "admin", "valid", 
            "allowed", "valid", 
            "change", "valid", 
            "connection", "valid", 
            "creation", "valid", 
            "deletion", "valid", 
            "denied", "valid", 
            "end", "valid", 
            "error", "valid", 
            "group", "valid", 
            "info", "valid", 
            "installation", "valid", 
            "protocol", "valid", 
            "start", "valid", 
            "user", "valid"
            ]
            exact => true
            # [field]-[error]
            fallback => "event.type-invalid_field_value"
            destination => "ecs_status"
        }
    if [ecs_status] !~ "valid" {
        mutate {
            add_tag => [ "%{ecs_status}: %{event.type}" ]
            remove_field => [ "ecs_status", "event.type"]
        }
    }

    #EVENT.CATEGORY
    translate {
            field => "event.category"
            dictionary => [
            "authentication", "valid", 
            "configuration", "valid", 
            "driver", "valid", 
            "database", "valid", 
            "file", "valid", 
            "host", "valid", 
            "iam", "valid", 
            "intrusion_detection", "valid", 
            "malware", "valid", 
            "network", "valid", 
            "package", "valid", 
            "process", "valid", 
            "web", "valid"
            ]
            exact => true
            # [field]-[error]
            fallback => "event.category-invalid_field_value"
            destination => "ecs_status"
        }
    if [ecs_status] !~ "valid" {
        mutate {
            add_tag => [ "%{ecs_status}: %{event.category}" ]
            remove_field => [ "ecs_status", "event.category"]

        }
    }

    # event.kind
     translate {
            field => "event.kind"
            dictionary => [
            "alert", "valid", 
            "event", "valid", 
            "metric", "valid", 
            "state", "valid", 
            "pipeline_error", "valid", 
            "signal", "valid"
            ]
            exact => true
            # [field]-[error]
            fallback => "event.kind-invalid_field_value"
            destination => "ecs_status"
        }
    if [ecs_status] !~ "valid" {
        mutate {
            add_tag => [ "%{ecs_status}: %{event.kind}" ]
            remove_field => [ "ecs_status", "event.kind"]

        }
    }


    # event.outcome
     translate {
            field => "event.outcome"
            dictionary => [
            "failure", "valid", 
            "success", "valid", 
            "unknown", "valid"
            ]
            exact => true
            # [field]-[error]
            fallback => "event.outcome-invalid_field_value"
            destination => "ecs_status"
        }
    if [ecs_status] !~ "valid" {
        mutate {
            add_tag => [ "%{ecs_status}: %{event.outcome}" ]
            remove_field => [ "ecs_status", "event.outcome"]

        }
    }

opened by nnovaes 0

Releases(v0.1-beta)

v0.1-beta(May 19, 2021)

This release lack an elaborate usage documentation so marking this as beta. Users can still work with it by going through the python script. Soon documentation would be added.
Source code(tar.gz)
Source code(zip)

Owner

Working to nourish the world. Committed to helping the world thrive

GitHub Repository

Robot Servers and Server Manager software for robo-gym

robo-gym-server-modules Robot Servers and Server Manager software for robo-gym. For info on how to use this package please visit the robo-gym website

4 Aug 16, 2021

Subpopulation detection in high-dimensional single-cell data

PhenoGraph for Python3 PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph ("network") repr

42 Sep 05, 2022

My implementation of Fully Convolutional Neural Networks in Keras

Keras-FCN This repository contains my implementation of Fully Convolutional Networks in Keras (Tensorflow backend). Currently, semantic segmentation c

15 Jan 13, 2020

Fast methods to work with hydro- and topography data in pure Python.

PyFlwDir Intro PyFlwDir contains a series of methods to work with gridded DEM and flow direction datasets, which are key to many workflows in many ear

27 Dec 07, 2022

Code for reproducing experiments in "Improved Training of Wasserstein GANs"

Improved Training of Wasserstein GANs Code for reproducing experiments in "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, Tensor

2.2k Jan 01, 2023

Self-supervised learning (SSL) is a method of machine learning

Self-supervised learning (SSL) is a method of machine learning. It learns from unlabeled sample data. It can be regarded as an intermediate form between supervised and unsupervised learning.

4 May 26, 2022

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

HTSeq DEVS: https://github.com/htseq/htseq DOCS: https://htseq.readthedocs.io A Python library to facilitate programmatic analysis of data from high-t

57 Dec 20, 2022

A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

19 Sep 30, 2022

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

398 Dec 30, 2022

Algorithmic trading using machine learning.

Algorithmic Trading This machine learning algorithm was built using Python 3 and scikit-learn with a Decision Tree Classifier. The program gathers sto

101 Nov 10, 2022

This is a simple framework to make object detection dataset very quickly

FastAnnotation Table of contents General info Requirements Setup General info This is a simple framework to make object detection dataset very quickly

1 Jan 24, 2022

PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

31 Sep 25, 2021

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Using Unreliable Pseudo Labels Official PyTorch implementation of Semi-Supervised Semantic Segmentation Using Unreliable Pseudo Labels, CVPR 2022. Ple

268 Dec 24, 2022

Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

BaseSAFE This repository contains the BaseSAFE Rust APIs, introduced by "BaseSAFE: Baseband SAnitized Fuzzing through Emulation". The example/ directo

138 Dec 16, 2022

Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Manifold-SCA Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning The repo is org

172 Dec 29, 2022

Instant neural graphics primitives: lightning fast NeRF and more

Instant Neural Graphics Primitives Ever wanted to train a NeRF model of a fox in under 5 seconds? Or fly around a scene captured from photos of a fact

10.6k Jan 01, 2023

Rule based classification A hotel s customers dataset

Rule-based-classification-A-hotel-s-customers-dataset- Aim: Categorize new customers by segment and predict how much revenue they can generate This re

4 Jan 02, 2022

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

0 Sep 16, 2021

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr

55 Dec 27, 2022

SIEM Logstash parsing for more than hundred technologies

Related tags

Overview

LogIndexer Pipeline

Why this project exists

Quick start Instructions

Contributions

Questions, Comments & Expected Level of Attention

Licensing

Comments

User Story - details

Tasks

X-Reference Issues

Related Code

Description

User Story - details

Tasks

Description

Description

Description

Related Issues

Todos

Description

Related Issues

Todos

Description

Related Issues

Todos

Describe the bug

X-Reference issues

Steps To Reproduce

Expected behavior

Additional context

User Story - details

Tasks

X-Reference Issues

Related Code

Releases(v0.1-beta)

v0.1-beta(May 19, 2021)

Owner

Robot Servers and Server Manager software for robo-gym

Subpopulation detection in high-dimensional single-cell data

My implementation of Fully Convolutional Neural Networks in Keras

Fast methods to work with hydro- and topography data in pure Python.

Code for reproducing experiments in "Improved Training of Wasserstein GANs"

Self-supervised learning (SSL) is a method of machine learning

HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments.

A self-supervised 3D representation learning framework named viewpoint bottleneck.

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Algorithmic trading using machine learning.

This is a simple framework to make object detection dataset very quickly

PyTorch experiments with the Zalando fashion-mnist dataset

[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Instant neural graphics primitives: lightning fast NeRF and more

Rule based classification A hotel s customers dataset

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization