to keep track of my `Nginx` response status codes. I can see if there are any error codes that have been generated (those in the 400s or the 500s), and act accordingly.
Why did I go ahead and make my own dashboard? First off, reading through Nginx logs in their raw form is a pain. For those who haven't worked with them before, this is what they look like:
1 2 3 | 180.76.15.141 - - [04/Jan/2016:07:42:26 -0500] "GET / HTTP/1.1" 200 36005 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.15.147 - - [04/Jan/2016:07:43:33 -0500] "GET / HTTP/1.1" 200 36005 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 91.65.255.241 - - [04/Jan/2016:07:50:31 -0500] "GET /portfolio/resume HTTP/1.1" 301 5 "http://rowanv.com/portfolio/about/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36" |
Bleh. So I decided to go out and hunt for some existing visualizations, none of which seemed too great. They seemed far too cluttered with chart junk and lacked customizability. Here is an example of an existing dashboard element that aims to visualize something similar to my final visualization:
I found this graph fairly distracting, and for some reason the non-2xx status codes didn't show up on here. Also, any spikes in traffic would make it very hard to read -- if we had hundreds of requests that returned a 200 response, but only a handful of requests that returned a 400 or 500 response, it would be very difficult to see the 400s and 500s because they are graphed on the same axis. I didn't just want to see how many 200 responses occurred -- I wanted visual confirmation that 400s and 500s were not occurring. Besides, I didn't really care how many responses of each type were occurring -- just the presence or absence of error responses was what mattered to me, so a simple discrete chart would abstract away a lot of the distracting additional noise. It could be useful to have error rates / total requests available as well, but I think that would be best visualized in another chart.
Along the way, I also found a few completely text-based interfaces to monitor one's servers. I found these completely text-based outputs difficult to parse at a glance. In the end, creating a custom dashboard made the most sense for my needs.
The process was fairly straightforward. First, I located my Nginx access logs and wrote a Python script to read them into a Pandas Dataframe, which is more manageable than their native format. Then, I aggregated and reshaped the dataframe to get a dataset that would identify what error codes had occurred over the course of each hour. This left me with a dataframe that looked like this:
Each row consisted of an hour; each column identified how times each type of error code (one in the 200s, one in the 300s, etc.) had occurred over the course of that hour.
Then, I used Flask to serve an HTTP response within a web application which served my reshaped dataset in a JSON format. This would enable my front-end view to locate the data and read it in. Here's my basic view code that server my HTTP response:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | import re import pprint import pandas as pd import datetime @app.route('/nginx_dash/data/status_code_hourly/') def status_code_hourly_data(): log_path = 'access.log' conf = '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"' regex = ''.join( '(?P<' + g + '>.*?)' if g else re.escape(c) for g, c in re.findall(r'\$(\w+)|(.)', conf)) conf_list = [] f = open(log_path, 'r') for line in f: parsed_string = re.match(regex, line) conf_list.append(parsed_string.groupdict()) df = pd.DataFrame(conf_list) df['date'] = df['time_local'].apply(lambda x: datetime.datetime.strptime(x[:11], '%d/%b/%Y')) df['date_time'] = df['time_local'].apply(lambda x: datetime.datetime.strptime(x[:20], '%d/%b/%Y:%H:%M:%S')) df['status'] = df['status'].apply(lambda x: str(x)) df['is_100'] = df['status'].apply(lambda x: x[0] == '1') df['is_200'] = df['status'].apply(lambda x: x[0] =='2') df['is_300'] = df['status'].apply(lambda x: x[0] =='3') df['is_400'] = df['status'].apply(lambda x: x[0] == '4') df['is_500'] = df['status'].apply(lambda x: x[0] == '5') df['date_hour'] = df['date_time'].apply(lambda dt: dt.replace(minute=0, second=0)) df_status_codes = df[['date_hour', 'is_100', 'is_200', 'is_300', 'is_400', 'is_500']] df_status_codes_grouped = df_status_codes.groupby('date_hour').sum() df_status_codes_grouped['date_hour'] = df_status_codes_grouped.index json_response = df_status_codes_grouped.to_json(orient='records') return Response(response=json_response, status=200, mimetype='application/json') |
Finally, I wrote the frontend in `d3.js` which actually renders the discrete sparklines. You can check out my previous tutorial on making discrete sparklines to learn how to make this type of a visualization.
And that's it! I now have an awesome, live-updating, minimalist visualization that lets me know if any concerning error codes have occurred over the last couple of days.
If you want to learn more about sparklines, check out Tufte's Beautiful Evidence.
No comments:
Post a Comment