The Artima Developer Community
Sponsored Link

.NET Buzz Forum
Using Regex to return the first N words in a string

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Roy Osherove

Posts: 1807
Nickname: royo
Registered: Sep, 2003

Roy Osherove is a .Net consultant based in Israel
Using Regex to return the first N words in a string Posted: Jan 6, 2005 8:44 PM
Reply to this message Reply

This post originated from an RSS feed registered with .NET Buzz by Roy Osherove.
Original Post: Using Regex to return the first N words in a string
Feed Title: ISerializable
Feed URL: http://www.asp.net/err404.htm?aspxerrorpath=/rosherove/Rss.aspx
Feed Description: Roy Osherove's persistent thoughts
Latest .NET Buzz Posts
Latest .NET Buzz Posts by Roy Osherove
Latest Posts From ISerializable

Advertisement

Jeff Perrin needed a function to return the first N words in a string (to create a small summary or a snippet thingy). He did it using the manual and awkward method of parsing the string manually. That method is more error prone and usually makes for less readable code. Fortunately, you can use regular expressions here quite nicely. Here's a test that makes sure that we get the first 4 words in a string and the function "FindFirstWords" that does this very easily using a simple regular expression.

What I'm doing here is that I'm using the expression to find the first 4 occurrences of text that is composed of alphanumeric text with one or more spaces after it. Then I simply iterate over the match I found. The match should contain 4 captures inside it - one for each "word" that was found.

It's not fully tested as you can see. I only wrote one test to see it works on this sort of sentence. More tests could and should be added to test other cases. In fact, if this were reall TDD, I would have started with a test of an empty string, and continued on to test getting only one word, and then two and so on.

[Test]

public void TestRegexFindFirstNWords()

{

      const string INPUT =

"this is word four five six seven eight nine ten eleven twelve thirteen!";

      const int NUM_WORDS_TO_RETURN = 4;

 

      string output = FindFirstWords (INPUT, NUM_WORDS_TO_RETURN);

 

      string expectedOutput = "this is word four ";

      Assert.AreEqual(expectedOutput,output);

}

 

private string FindFirstWords (string input, int howManyToFind)

{

      string REGEX = @"([\w]+\s+){" + howManyToFind + "}";

      StringBuilder output = new StringBuilder();

      foreach (Capture capture in Regex.Match(input,REGEX).Captures)

      {

            output.Append( capture.Value) ;

      }

      return output.ToString();

}

 

Read: Using Regex to return the first N words in a string

Topic: WSMQ (web service message queue) free public Beta Previous Topic   Next Topic Topic: Be careful with path references

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use