Netdata: Difference between revisions

HAProxy: need SSL cert
Expand Ansible section
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
Netdata will be part of the QualityBox dashboard.
Netdata is one of the [https://discourse.equality-tech.com/t/dashboards-in-qualitybox/107 QualityBox dashboards].


See http://wiki.freephile.org:20000/
<!-- These services/entry points are disabled in production
<!-- netdata.conf -->
 
See [{{SERVER}}:20000/ this website Live]
 
[{{SERVER}}:20000/netdata.conf Configuration]
 
netdata.conf -->


== System Locations ==
== System Locations ==
Depending on how you install netdata, it will be distributed in the normal system locations such as
<pre>
<pre>
   - the daemon    at /usr/sbin/netdata
   - the daemon    at /usr/sbin/netdata
Line 17: Line 23:
</pre>
</pre>


Or, if you use
<pre>bash <(curl -Ss https://my-netdata.io/kickstart-static64.sh)</pre>
to install, you'll get all of netdata installed into <code>/opt/netdata</code>
== With Ansible ==
There is an [https://learn.netdata.cloud/docs/netdata-agent/installation/ansible Ansible Playbook for deploying Netdata Agent] across multiple nodes. At that page, they also describe how to disable the local dashboard by setting <code>web_mode</code> to none. The security configuration is described at https://learn.netdata.cloud/docs/netdata-agent/configuration/securing-agents/


== Host Modifications ==
== Host Modifications ==


A Netdata role is available in [https://github.com/enterprisemediawiki/meza/blob/32.x/src/roles/netdata/tasks/main.yml the 32.x branch of Meza]
Otherwise, you have to make room in HAProxy for netdata:
=== HAProxy ===
=== HAProxy ===
<source lang="python">
<syntaxhighlight lang="python">
frontend netdata  
frontend netdata  
         bind *:20000 ssl crt /etc/haproxy/certs/wiki.freephile.org.pem
         bind *:20000 ssl crt /etc/haproxy/certs/wiki.freephile.org.pem
Line 29: Line 45:
backend netdata-back  
backend netdata-back  
         server nd1 127.0.0.1:19999
         server nd1 127.0.0.1:19999
</source>
</syntaxhighlight>


=== Kernel ===
=== Kernel ===
Line 57: Line 73:
<code>systemctl start netdata</code>
<code>systemctl start netdata</code>


== Installation extras ==
To reload configuration:
<code>killall -USR2 netdata</code> <ref>https://docs.netdata.cloud/health/quickstart/#reload-health-configuration</ref>
 
== Notifications ==
 
The default configuration will send messages to 'root' so be sure to either edit the conf <code>sudo vim /etc/netdata/health_alarm_notify.conf</code>, or set <code>vim /etc/aliases && newaliases</code>
 
=== Turn off alarm ===
 
 
<pre>
    to: silent # silence notification; still see on website
enabled: no    # disable alarm
</pre>
more details in the [https://docs.netdata.cloud/health/tutorials/stop-notifications-alarms/ netdata docs].


The configuration will send messages to 'root' so be sure to either edit the conf <code>sudo vim /etc/netdata/health_alarm_notify.conf</code>, or set <code>vim /etc/aliases && newaliases</code>


== Issues ==
== Issues ==


You'll probably receive alarms for 'tcp listen drops'. This is likely bot-related (sending INVALID packets) and NOT due to your application dropping legitimate packets. There is a good discussion on how to identify the source of the problem and how to mitigate or resolve it [https://github.com/firehol/netdata/issues/3234 Issue #3234] [https://github.com/firehol/netdata/issues/3826 Issue #3826] TLDR; increase the threshold to 1 (<code>/etc/netdata/health.d/tcp_listen.conf</code>) so you don't get bogus alerts.   
You'll probably receive alarms for 'tcp listen drops'. This is likely bot-related (sending INVALID packets) and NOT due to your application dropping legitimate packets. There is a good discussion on how to identify the source of the problem and how to mitigate or resolve it [https://github.com/firehol/netdata/issues/3234 Issue #3234][https://github.com/firehol/netdata/issues/3826 Issue #3826] TLDR; increase the threshold to 1 (<code>/etc/netdata/health.d/tcp_listen.conf</code>) so you don't get bogus alerts.   


Also, you should modify your firewall to drop invalid packets before they're either counted (by netstats) or dropped (by the kernel).
Also, you should modify your firewall to drop invalid packets before they're either counted (by netdata) or dropped (by the kernel).


<source lang="bash">
<syntaxhighlight lang="bash">
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP
iptables -A INPUT -m conntrack --ctstate INVALID -j DROP
ip6tables -A INPUT -m conntrack --ctstate INVALID -j DROP
ip6tables -A INPUT -m conntrack --ctstate INVALID -j DROP
iptables -A INPUT -m tcp -p tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j DROP
iptables -A INPUT -m tcp -p tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j DROP
ip6tables -A INPUT -m tcp -p tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j DROP
ip6tables -A INPUT -m tcp -p tcp ! --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j DROP
</source>
</syntaxhighlight>


Following the advice from NASA at https://wiki.earthdata.nasa.gov/display/HDD/SOMAXCONN, I increased my somaxconn kernel parameter to 1024 from 128
Following the advice from NASA at https://wiki.earthdata.nasa.gov/display/HDD/SOMAXCONN, I increased my somaxconn kernel parameter to 1024 from 128
<source lang="bash">
<syntaxhighlight lang="bash">
  cat /proc/sys/net/core/somaxconn
  cat /proc/sys/net/core/somaxconn
  128
  128
  sysctl -w net.core.somaxconn=1024
  sysctl -w net.core.somaxconn=1024
</source>
</syntaxhighlight>


[[File:Tcp state diagram fixed.svg|600px|TCP State diagram]]
[[File:Tcp state diagram fixed.svg|600px|TCP State diagram]]
Line 90: Line 119:
ln -s /root/netdata/netdata-updater.sh /etc/cron.daily/netdata-updater
ln -s /root/netdata/netdata-updater.sh /etc/cron.daily/netdata-updater
</code>
</code>
{{References}}


[[Category:QualityBox]]
[[Category:QualityBox]]
[[Category:Monitoring]]
[[Category:Monitoring]]