Netdata: Difference between revisions

How to get notifications
Expand Ansible section
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Netdata will be part of the QualityBox dashboard.
Netdata is one of the [https://discourse.equality-tech.com/t/dashboards-in-qualitybox/107 QualityBox dashboards].


See http://wiki.freephile.org:20000/
<!-- These services/entry points are disabled in production
<!-- netdata.conf -->
 
See [{{SERVER}}:20000/ this website Live]
 
[{{SERVER}}:20000/netdata.conf Configuration]
 
netdata.conf -->


== System Locations ==
== System Locations ==
Depending on how you install netdata, it will be distributed in the normal system locations such as
<pre>
<pre>
   - the daemon    at /usr/sbin/netdata
   - the daemon    at /usr/sbin/netdata
Line 16: Line 22:
   - logrotate file at /etc/logrotate.d/netdata
   - logrotate file at /etc/logrotate.d/netdata
</pre>
</pre>
Or, if you use
<pre>bash <(curl -Ss https://my-netdata.io/kickstart-static64.sh)</pre>
to install, you'll get all of netdata installed into <code>/opt/netdata</code>
== With Ansible ==
There is an [https://learn.netdata.cloud/docs/netdata-agent/installation/ansible Ansible Playbook for deploying Netdata Agent] across multiple nodes. At that page, they also describe how to disable the local dashboard by setting <code>web_mode</code> to none. The security configuration is described at https://learn.netdata.cloud/docs/netdata-agent/configuration/securing-agents/


== Host Modifications ==
== Host Modifications ==


Check KSM (kernel memory deduper)
A Netdata role is available in [https://github.com/enterprisemediawiki/meza/blob/32.x/src/roles/netdata/tasks/main.yml the 32.x branch of Meza]
 
Otherwise, you have to make room in HAProxy for netdata:
=== HAProxy ===
<syntaxhighlight lang="python">
frontend netdata
        bind *:20000 ssl crt /etc/haproxy/certs/wiki.freephile.org.pem
        mode http
        default_backend netdata-back
backend netdata-back
        server nd1 127.0.0.1:19999
</syntaxhighlight>
 
=== Kernel ===
You have kernel memory de-duper (called [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-ksm Kernel Same-page Merging], or KSM) available, but it is not currently enabled.


Memory de-duplication instructions
Memory de-duplication instructions
You have kernel memory de-duper (called Kernel Same-page Merging,
or KSM) available, but it is not currently enabled.


To enable it run:
To enable it run:
Line 47: Line 73:
<code>systemctl start netdata</code>
<code>systemctl start netdata</code>


== Installation extras ==
To reload configuration:
The configuration will send messages to 'root' so be sure to either edit the conf <code>sudo vim /etc/netdata/health_alarm_notify.conf</code>, or set <code>vim /etc/aliases && newaliases</code>
<code>killall -USR2 netdata</code> <ref>https://docs.netdata.cloud/health/quickstart/#reload-health-configuration</ref>
 
== Notifications ==
 
The default configuration will send messages to 'root' so be sure to either edit the conf <code>sudo vim /etc/netdata/health_alarm_notify.conf</code>, or set <code>vim /etc/aliases && newaliases</code>
 
=== Turn off alarm ===
 
 
<pre>
    to: silent # silence notification; still see on website
enabled: no    # disable alarm
</pre>
more details in the [https://docs.netdata.cloud/health/tutorials/stop-notifications-alarms/ netdata docs].
 
 
== Issues ==
 
You'll probably receive alarms for 'tcp listen drops'. This is likely bot-related (sending INVALID packets) and NOT due to your application dropping legitimate packets. There is a good discussion on how to identify the source of the problem and how to mitigate or resolve it [https://github.com/firehol/netdata/issues/3234 Issue #3234][https://github.com/firehol/netdata/issues/3826 Issue #3826] TLDR; increase the threshold to 1 (<code>/etc/netdata/health.d/tcp_listen.conf</code>) so you don't get bogus alerts. 
 
Also, you should modify your firewall to drop invalid packets before they're either counted (by netdata) or dropped (by the kernel).
 
<syntaxhighlight lang="bash">
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP
ip6tables -A INPUT -m conntrack --ctstate INVALID -j DROP
iptables -A INPUT -m tcp -p tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j DROP
ip6tables -A INPUT -m tcp -p tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j DROP
</syntaxhighlight>
 
Following the advice from NASA at https://wiki.earthdata.nasa.gov/display/HDD/SOMAXCONN, I increased my somaxconn kernel parameter to 1024 from 128
<syntaxhighlight lang="bash">
cat /proc/sys/net/core/somaxconn
128
sysctl -w net.core.somaxconn=1024
</syntaxhighlight>
 
[[File:Tcp state diagram fixed.svg|600px|TCP State diagram]]
 
 
 
== Updates ==
Netdata will [https://github.com/firehol/netdata/wiki/Updating-Netdata update itself], and puts a script into cron:
<code>
ln -s /root/netdata/netdata-updater.sh /etc/cron.daily/netdata-updater
</code>
 
 
{{References}}
 
[[Category:QualityBox]]
[[Category:Monitoring]]