Max visits per domain
Tạo vào: 4 tháng 2, 2025
Tạo vào: 4 tháng 2, 2025
I have a query to find out counters SELECT CounterID, sum(Sign) AS visits FROM visits_all WHERE StartDate = toDate('2024-02-04') AND StartURLDomain = 'vtb.ru' GROUP BY CounterID ORDER BY visits DESC;
I want you to modify it to find counters with max visits value for all dates and for each domain found in the table
Below is one common way (using ClickHouse’s argMax function) to modify your query so that, for each domain, you get the single counter that has the maximum total “visits” across all dates in the table.
sqlSELECT StartURLDomain, argMax(CounterID, total_visits) AS CounterWithMaxVisits, max(total_visits) AS MaxVisits FROM ( -- Sum Sign by domain and counter for all dates SELECT StartURLDomain, CounterID, SUM(Sign) AS total_visits FROM visits_all GROUP BY StartURLDomain, CounterID ) GROUP BY StartURLDomain ORDER BY MaxVisits DESC
Innermost SELECT:
WHERE
clause so we look across all dates instead of a single date.(StartURLDomain, CounterID)
and sum up Sign
to get total visits per (domain, counter) pair.Outer SELECT:
StartURLDomain
) we:
argMax(CounterID, total_visits)
to pick the one CounterID that has the highest total_visits for that domain.max(total_visits)
to show the largest summed Sign
(i.e., visits) for that domain.ORDER BY:
MaxVisits DESC
so that the domain with the largest number of visits appears first in the result set.If instead you need the top counter for each date and for each domain, include StartDate
in the grouping and outer query:
sqlSELECT StartURLDomain, StartDate, argMax(CounterID, total_visits) AS CounterWithMaxVisits, max(total_visits) AS MaxVisits FROM ( SELECT StartURLDomain, StartDate, CounterID, SUM(Sign) AS total_visits FROM visits_all GROUP BY StartURLDomain, StartDate, CounterID ) GROUP BY StartURLDomain, StartDate ORDER BY StartURLDomain, StartDate
That will return, for each (domain, date)
, the counter with the highest total visits, plus the maximum visit count for that pair.
use ordinary sql dialect
Below are two approaches in standard SQL (often referred to as “ordinary SQL”) for selecting, for each domain, the single counter with the maximum total visits across all dates.
If your SQL engine supports window functions (e.g., PostgreSQL, MySQL 8+, SQL Server, Oracle, etc.), you can use ROW_NUMBER()
or RANK()
to pick the row with the highest sum of visits per domain:
sqlSELECT StartURLDomain, CounterID, total_visits FROM ( SELECT StartURLDomain, CounterID, SUM(Sign) AS total_visits, ROW_NUMBER() OVER ( PARTITION BY StartURLDomain ORDER BY SUM(Sign) DESC ) AS rn FROM visits_all GROUP BY StartURLDomain, CounterID ) AS sub WHERE rn = 1 ORDER BY total_visits DESC;
Explanation:
(StartURLDomain, CounterID)
to aggregate SUM(Sign)
as total_visits
.ROW_NUMBER() OVER (PARTITION BY StartURLDomain ORDER BY total_visits DESC)
to rank counters within each domain, from largest to smallest.ROW_NUMBER() = 1
, i.e., the maximum row.total_visits DESC
at the end to list the highest overall domain+counter pair first.For older databases that do not support window functions, you can achieve the same result with a self-join. The idea is to select rows that have no other row in the same domain with a strictly larger sum of visits:
sqlSELECT t.StartURLDomain, t.CounterID, t.total_visits FROM ( -- Summarize visits per (domain, counter). SELECT StartURLDomain, CounterID, SUM(Sign) AS total_visits FROM visits_all GROUP BY StartURLDomain, CounterID ) AS t LEFT JOIN ( SELECT StartURLDomain, CounterID, SUM(Sign) AS total_visits FROM visits_all GROUP BY StartURLDomain, CounterID ) AS t2 ON t.StartURLDomain = t2.StartURLDomain AND t.total_visits < t2.total_visits WHERE t2.total_visits IS NULL ORDER BY t.total_visits DESC;
Explanation:
t
and a subquery t2
(actually the same aggregation, just aliased differently).(t.StartURLDomain = t2.StartURLDomain)
and t.total_visits < t2.total_visits
.
t2.total_visits
is not NULL
, it means there is another row in the same domain that has a strictly larger total, so t
is not the maximum.t2.total_visits IS NULL
means we only keep t
rows that have no strictly larger total in the same domain — i.e., the maximum in that domain.t.total_visits
.Either approach will give you a single row per domain (when there is a strict maximum). If ties exist, you can adapt the join or window function logic to allow for multiple rows (e.g., using RANK()
for ties).
If you wanted the top counter per domain for each date, simply include StartDate
in both the grouping and the window partition (or in the self-join condition). For example, with window functions:
sqlSELECT StartURLDomain, StartDate, CounterID, total_visits FROM ( SELECT StartURLDomain, StartDate, CounterID, SUM(Sign) AS total_visits, ROW_NUMBER() OVER ( PARTITION BY StartURLDomain, StartDate ORDER BY SUM(Sign) DESC ) AS rn FROM visits_all GROUP BY StartURLDomain, StartDate, CounterID ) AS sub WHERE rn = 1 ORDER BY StartURLDomain, StartDate, total_visits DESC;
That way, you get one CounterID per (domain, date) with the highest visit count.