tsql - How to use SQL to attribute events to visit source in traffic logs? -


i'm trying attribute events source in traffic log. seems basic territory semi-savvy technical marketer or webmaster, i've googled around bit, , surprisingly, nobody seems have covered this.

background: i'm using t-sql , we're running on sql server 2016. event logs (haproxy) this, refererhost '%mysite%' internal referrals, can else external referrals (entrances).

user agent ||     ip     ||  refererhost ||     event     ||   creationdate ------------------------------------------------------------------------------ qwertyuiop || 99.99.99.9 ||   google.com ||   home/view   || 2015-05-29 00:00:25 ------------------------------------------------------------------------------ qwertyuiop || 99.99.99.9 ||   mysite/x   ||   home/view   || 2015-05-29 00:00:27 ------------------------------------------------------------------------------ abcdefghij || 11.11.11.1 ||   yahoo.com  ||   home/view   || 2015-05-29 00:00:49 ------------------------------------------------------------------------------ qwertyuiop || 99.99.99.9 ||   mysite/y   ||     submit    || 2015-05-29 00:01:28 ------------------------------------------------------------------------------ abcdefghij || 11.11.11.1 ||   mysite/p   ||   photo/view  || 2015-05-29 00:02:04 ------------------------------------------------------------------------------ abcdefghij || 11.11.11.1 ||   mysite/n   ||     submit    || 2015-05-29 00:02:09 

the goal: i'm trying associate submit events entrance. aggregated, results this:

refererhost || submitcount  || ------------------------------ google.com  ||      1       ||  yahoo.com   ||      1       ||  

complicating factors: simple example. in fact, individual users can visit multiple times per period, , can submit more once per visit (session). also, users can go idle long periods of time after entrance: submit can occur hours after entrance.

so think i'm trying select creationdates event = submit, , users (ip + ua checksum), , find nearest previous event = refererhost not '%mysite%', , store it... somewhere associated submit event. can count of submit events, grouped refererhost i'm looking for.

this method makes sense me, don't know how write query "looks back" find nearest previous referer. also, i'm not sure if sql alone can handle without operation timing out. , i'm not sure if i'm missing edge case. has done before?

if you're using database window functions, can reasonably short query. can see working example of query (with dummy data), if you'd tinker on live data: https://modeanalytics.com/benn/reports/9f72b24dce58/query

each step in broken out common table expression. while makes easier describe, query written series of subqueries if style's more thing.

step 1: made table.

with event_table (     select user_id dummy_ip,            occurred_at,            location dummy_referer,            event_name       tutorial.playbook_events  ) 

the example data had didn't map example, creates table does. mapped user_id ip_address since 2 fields conceptually same. location , referer have absolutely nothing each other, they're both event attributes associated every event. , had location field in data, went it. think of physical referer or something, guess.

step 2: determine time since last event.

with_last_event (     select *,            lag(occurred_at,1) on (partition dummy_ip order occurred_at) last_event       event_table ) 

the lag function here finds time of last event @ ip. if there no last event, it's null.

step 3: find events mark beginning of new session.

with_new_session_flag (     select *,            case when extract('epoch' occurred_at) - extract('epoch' last_event) >= (60 * 10) or last_event null                  1 else 0 end is_new_session,            case when extract('epoch' occurred_at) - extract('epoch' last_event) >= (60 * 10) or last_event null                  dummy_referer else null end first_referer       with_last_event ) 

most platforms define new sessions action after period of inactivity. first case statement looking how long it's been since previous event. if it's longer time choose (in case, 60 seconds * 10, 10 minutes), event flagged first 1 in new session. it's flagged 1; non-first events marked 0.

the second case statement finds same event, rather marking event 1 flag new session, returns referer. if it's not new session, returns null.

step 4: create session ids.

with_session_ids (     select *,            sum(is_new_session) on (order dummy_ip, occurred_at) global_session_id,            sum(is_new_session) on (partition dummy_ip order occurred_at) user_session_id       with_new_session_flag ) 

these window functions produce running total of session flags (the column that's 1 when it's new session , 0 when it's not). result column stays same when session doesn't change, , increments 1 every time new session starts. depending on how partition , order window function, can create sessions ids unique user , unique globally.

step 5: find original session referer.

with_session_referer (     select *,            max(first_referer) on (partition global_session_id) session_referer       with_session_ids ) 

this final window function looks max value of first_referer global_session_id. since column made null every value other first event of session, return first_referer of session every event in session.

step 6: count stuff.

select session_referer,        count(1) total_events,        count(distinct global_session_id) distinct_sessions,        count(distinct dummy_ip) distinct_ips   with_session_referer  event_name = 'send_message'  group 1 

this last step straightforward - filter events event care (submit, in example). count number of events session_referer, first referer of session in event occurred. counting global_session_id , dummy_ip, can find how sessions had event, , how many distinct ips logged event.


Comments

Popular posts from this blog

toolbar - How to add link to user registration inside toobar in admin joomla 3 custom component -

linux - disk space limitation when creating war file -

How to provide Authorization & Authentication using Asp.net, C#? -