.net - ismatch - c# regex online



Regex.IsMatch vs string.Contains (4)

這兩個等效表達式的速度/內存使用有什麼不同:

Regex.IsMatch(Message, "1000")

VS

Message.Contains("1000")

任何一個比其他更好的情況?

這個問題的上下文如下:我正在對包含Regex表達式的遺留代碼進行一些更改,以查找字符串是否包含在另一個字符串中。 作為遺留代碼,我沒有對此進行任何更改,在代碼審查中有人建議Regex.IsMatch應該替換為string.Contains。 所以我想知道改變是否值得。

https://ffff65535.com


@ user279470我一直在尋找一種有效的方式來計算單詞以獲得樂趣,並且遇到了this 。 我給了它OpenOffice Thesaurus dat文件來迭代。 總字數達到1575423。

現在,我的最終目標沒有用於包含,但有趣的是看到你可以調用正則表達式的不同方法,使其更快。 我創建了一些其他方法來比較正則表達式的實例使用和靜態使用與RegexOptions.compiled。

public static class WordCount
{
    /// <summary>
    /// Count words with instaniated Regex.
    /// </summary>
    public static int CountWords4(string s)
    {
        Regex r = new Regex(@"[\S]+");
        MatchCollection collection = r.Matches(s);
        return collection.Count;
    }
    /// <summary>
    /// Count words with static compiled Regex.
    /// </summary>
    public static int CountWords1(string s)
    {
        MatchCollection collection = Regex.Matches(s, @"[\S]+", RegexOptions.Compiled);
        return collection.Count;
    }
    /// <summary>
    /// Count words with static Regex.
    /// </summary>
    public static int CountWords3(string s)
    {
        MatchCollection collection = Regex.Matches(s, @"[\S]+");
        return collection.Count;
    }

    /// <summary>
    /// Count word with loop and character tests.
    /// </summary>
    public static int CountWords2(string s)
    {
        int c = 0;
        for (int i = 1; i < s.Length; i++)
        {
            if (char.IsWhiteSpace(s[i - 1]) == true)
            {
                if (char.IsLetterOrDigit(s[i]) == true ||
                    char.IsPunctuation(s[i]))
                {
                    c++;
                }
            }
        }
        if (s.Length > 2)
        {
            c++;
        }
        return c;
    }
}
  • regExCompileTimer.ElapsedMilliseconds 11787
  • regExStaticTimer.ElapsedMilliseconds 12300
  • regExInstanceTimer.ElapsedMilliseconds 13925
  • ContainsTimer.ElapsedMilliseconds 1074

將它與已編譯的正則表達式進行比較時, String.Contains會變慢。 相當慢,甚至!

您可以測試運行此基準測試:

class Program
{
  public static int FoundString;
  public static int FoundRegex;

  static void DoLoop(bool show)
  {
    const string path = "C:\\file.txt";
    const int iterations = 1000000;
    var content = File.ReadAllText(path);

    const string searchString = "this exists in file";
    var searchRegex = new Regex("this exists in file");

    var containsTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (content.Contains(searchString))
      {
        FoundString++;
      }
    }
    containsTimer.Stop();

    var regexTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (searchRegex.IsMatch(content))
      {
        FoundRegex++;
      }
    }
    regexTimer.Stop();

    if (!show) return;

    Console.WriteLine("FoundString: {0}", FoundString);
    Console.WriteLine("FoundRegex: {0}", FoundRegex);
    Console.WriteLine("containsTimer: {0}", containsTimer.ElapsedMilliseconds);
    Console.WriteLine("regexTimer: {0}", regexTimer.ElapsedMilliseconds);

    Console.ReadLine();
  }

  static void Main(string[] args)
  {
    DoLoop(false);
    DoLoop(true);
    return;
  }
}

我自己的基準測試似乎與user279470的基準測試結果相矛盾。

在我的用例中,我想檢查一個簡單的正則表達式,其中一些OR運算符為4個值而不是4 x String.Contains()

即使使用4 x String.Contains() ,我發現String.Contains()速度提高了5倍。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Text.RegularExpressions;

namespace App.Tests.Performance
{
    [TestClass]
    public class PerformanceTesting
    {
        private static Random random = new Random();

        [TestMethod]
        public void RegexVsMultipleContains()
        {
            var matchRegex = new Regex("INFO|WARN|ERROR|FATAL");

            var testStrings = new List<string>();

            int iterator = 1000000 / 4; // div 4 for each of log levels checked

            for (int i = 0; i < iterator; i++)
            {
                for (int j = 0; j < 4; j++)
                {
                    var simulatedTestString = RandomString(50);

                    if (j == 0)
                    {
                        simulatedTestString += "INFO";
                    }
                    else if (j == 1)
                    {
                        simulatedTestString += "WARN";
                    }
                    else if (j == 2)
                    {
                        simulatedTestString += "ERROR";
                    }
                    else if (j == 3)
                    {
                        simulatedTestString += "FATAL";
                    }

                    simulatedTestString += RandomString(50);

                    testStrings.Add(simulatedTestString);
                }
            }

            int cnt;
            Stopwatch sw;

            //////////////////////////////////////////
            // Multiple contains test
            //////////////////////////////////////////

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = testStrings[i].Contains("INFO") || testStrings[i].Contains("WARN") || testStrings[i].Contains("ERROR") || testStrings[i].Contains("FATAL");

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("MULTIPLE CONTAINS: " + cnt + " " + sw.ElapsedMilliseconds);

            //////////////////////////////////////////
            // Multiple contains using list test
            //////////////////////////////////////////

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            var searchStringList = new List<string> { "INFO", "WARN", "ERROR", "FATAL" };

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = searchStringList.Any(x => testStrings[i].Contains(x));

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("MULTIPLE CONTAINS USING LIST: " + cnt + " " + sw.ElapsedMilliseconds);

            //////////////////////////////////////////
            // Regex test
            ////////////////////////////////////////// 

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = matchRegex.IsMatch(testStrings[i]);

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("REGEX: " + cnt + " " + sw.ElapsedMilliseconds);
        }

        public static string RandomString(int length)
        {
            const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

            return new string(Enumerable.Repeat(chars, length).Select(s => s[random.Next(s.Length)]).ToArray());
        }
    }
}

是的,對於這個任務,string.Contains幾乎肯定會更快並且使用更少的內存。 當然,沒有理由在這裡使用正則表達式。





string