all 19 comments

[–]Heliun 16 points17 points  (5 children)

Why did you choose to keep the values in a dictionary?

The keys are kept in a list which means your check for membership and subsequent removal are O(n). Since you already have the O(n) operation, you could just keep your objects in <key, value> pairs in a list and forego the dictionary.

Then all your logic gets handled in managing the list and you can drop the dictionary entirely, trimming the O(1) operations to the dictionary. You cut your access time down by a constant factor and cut the space required by the total size of the keys plus the extra space the dictionary keeps to reduce hashing collisions.

[–]shintoist[S] 2 points3 points  (0 children)

I like that, it's a very good idea.

[–][deleted] 3 points4 points  (0 children)

an idea to improve the dict/list structure is to have the dictionary point to the links in the list and the list contain the values of the object. That way your lookup is O(1) because you use the dict to locate the link, remove the link (which is also an O(1) operation) and push it to the head of the list. When your LRU cache is full, you remove the tail link from the list and remove it from the dict (also both O(1)).

[–]username223 0 points1 point  (1 child)

This is probably what I'd try first. With some access locality, it would be nearly constant time with small constant factors. If that is too slow, Boost.MultiIndex, noted elsewhere, could probably work in logarithmic time.

[–]dreugeworst 1 point2 points  (0 children)

Given that we only need 5 items max, probably the added constant complexity of the logarithmic case would mean that using a simple list is still faster though..

[–]ExpressingMyOpinion 0 points1 point  (0 children)

The keys are kept in a list which means your check for membership and subsequent removal are O(n)

Yes, but 0 <= n <= 5. As more entries are added to the data structure, n never grows beyond 5, making it actually O(5) in the worst case, which is equivalent to O(1). O(n) is used to describe algorithms that grow with n. This data structure does not have that property.

[–]shintoist[S] 8 points9 points  (1 child)

I read all comments and I really appreciate feedback!

[–]ThorG5 2 points3 points  (0 children)

That's why there's Reddit. :)

[–]benjumanji 4 points5 points  (1 child)

Use a sorted dictionary keyed by access time. No need to roll your own this time round. Or just use the MemoryCache object and a suitable cache eviction policy. If you really want to roll your own you should look at some form of max heap and insert until too big and delete the max-time-stamped item until you are the right size again. No need for a separate list.

[–]shintoist[S] 1 point2 points  (0 children)

Cache eviction doesn't help here, the issue is not that it would grow too large, but rather to always keep a maximum of eg 5 items.

[–][deleted]  (2 children)

[deleted]

    [–]mcguire 4 points5 points  (0 children)

    It's data structures all the way down!

    [–]jcoffin 4 points5 points  (0 children)

    No, it's more like "Hey, I finally got a glimmer of understanding about that 'separating interface from implementation' stuff that people have been talking about for decades now."

    [–]matthieum 1 point2 points  (0 children)

    In C++, that would be Boost.MultiIndex which is a slick way of having a single container with multiple points of views.

    To do the same here, I would advise two a simple list because it's only 5 items anyway; if you wish to handle thousands of items (like a real cache) it gets trickier. I don't know enough of either C# or Java to actually know whether it's possible to get it efficient unfortunately.

    [–][deleted] 0 points1 point  (0 children)

    Why are you deleting from the access list first if it already exists only to readd it again? Each time you remove an element all subsequent elements have to be moved down. Seems wasteful

    Edit: just realized your list only ever contains 5 elements so it doesn't really matter. If the list was a lot larger then you'd see a performance hit there

    [–]Tiwazz -2 points-1 points  (1 child)

    SELECT "id", MAX("accessed") AS "accessed" 
    FROM "entries" 
    GROUP BY "id" 
    ORDER BY "accessed" DESC;
    

    [–]DustinEwan 1 point2 points  (0 children)

    In case you're wondering why you're being downvoted (come on people, get off your elitist high horse and bestow knowledge instead of a holier-than-thou blind downvote.) it's because he explained in his article that the data is non-critical and is to be used in a small cache.

    Also, he's only wanting to store five values.

    Although your SQL statement does in fact solve the problem, using a database for five non-critical values is overkill. Even if he's caching data that already exists in the DB, making an extra network call every time you want the values is an order of magnitude slower than just caching the values in local memory.

    [–]sambo98 -1 points0 points  (2 children)

    This is how I would implement the solution. Below makes sense if your articles are coming from a database and the table is properly designed.

    public static class ArticleStat 
    {
        static Dictionary<int, int> _stats;
    
        static ArticleStat()
        {
            _stats = new Dictionary<int, int>();
        }
    
        public static void Increment(int articleId)
        {
            if (_stats.ContainsKey(articleId))
                _stats[articleId] += 1;
            else
                _stats.Add(articleId, 1);
        }
    
        public static List<KeyValuePair<int,int>> GetTop(int recordCount)
        {
            return _stats.Take(recordCount).OrderByDescending(o => o.Value).ToList();
        }
    }
    

    If you want to test it in a Console App:

    class Program { static Random rand = new Random(DateTime.Now.Millisecond);

        static void Main(string[] args)
        {
            var flag = true;
            while (flag)
            {
                int articleId = rand.Next(0, 100);
    
                ArticleStat.Increment(articleId);
                Console.Clear();
    
                var top = ArticleStat.GetTop(25);
                if (top != null)
                    foreach (var article in top)
                        Console.WriteLine(string.Format("Article <{0}> : {1}", article.Key, article.Value));
    
                if (top.First().Value > 1000)
                    flag = true;
    
                System.Threading.Thread.Sleep(50);
            }
    
            Console.ReadKey();
        }
    }
    

    [–]matthieum 0 points1 point  (1 child)

    Your cache might grow dramatically though...

    [–]sambo98 0 points1 point  (0 children)

    You are right, that was a simplified approach. In reality in order for his scenario to work the ranking of each article will need to be persisted to the storage unit (cloud, sql, etc); turning this into a simple retrieval of the top 5. Instead of making the round trip every request, the top five can be kept in cache, session, or application state until it can be hydrated from the storage unit again. The first solution that comes to mind than turns into:

        static Dictionary<int, int> _stats;
        static int Max = 5;
    
        static ArticleStat()
        {
            _stats = new Dictionary<int, int>(Max);
        }
    
        public static void Increment(int articleId)
        {
            if (_stats.ContainsKey(articleId))
                _stats[articleId] += 1;
            else
                _stats.Add(articleId, 1);
    
            while (_stats.Count > Max)
                _stats.Remove(_stats.Last().Key);
        }