Now, the final part is when we have a character followed by a star or a question mark. That’s a little more complicated. I won’t ask you to do that yourself. I’ll show you. Let me walk you through this. First thing we’re going to do is we’re going to break up the pattern into three pieces. The pattern was something like “A*!” And so the first part is P, which is a single character that comes before the star. The second part is the operator, which is either a star or a question mark, and the third part is the rest of the pattern: Everything else after the star or the question mark. Now, there’s two cases. If the operator is a star, we’re going to just give up and call another function here. And we’re going to have to write that function. But rather than do it inline, we’ll call it, which we’ll need because it’s going to end up being a recursive function. So we’ll match star, which says, “Match P any number of times followed by pattern against the text.” If the operator is a question mark, then if we can match the first character, P, against the text, and match the rest of the pattern after the question mark against the rest of the text, then we return “true.” But if we can’t do that, then it’s also okay, if we can skip the first character P, not match the first character P, and go ahead and match the rest of the pattern. So this matches when the first character matches; this matches when the first character fails. That’s all there is to do with match. But now we have two promises to keep: We have to write match one and match star. Again, feel free to try them on your own if you want, but I’m going to go ahead and show them to you. Match one is pretty easy. So we’re matching a single character P against the text. If there is no text, a single character can’t match against it, so we return “false.” Otherwise, if the P was a dot, that matches anything, so we should return “true.” Or if the P is exactly equal to the first character of the text, then we should return “true.” Now here’s match star. We have a character P, we want to match any number of that, zero or more, and then match the rest of the pattern against the text. So how can we do that? Well, one way we can do it is by matching P zero times. So that’s just matching the pattern against the text. Or we can match P one time against the text, and then do a match star of P followed by the pattern, against the rest of the text after the first character. Notice that it wouldn’t do here to do a recursive call to match; we want a recursive call to match star. Because we want to say we’re matching either exactly zero times, or one time followed by zero or more times. That’s it. And if we go ahead and run the test and print out the results, we find that all the tests pass. And I should say that the subset of the regular expression language and the approach we use to solve it, I owe a lot to Rob Pike, who wrote a very nice essay on how to do this, and here’s a URL link to it. And you’ll be seeing Rob in an interview at the end of this class in the final unit. So that should give you some idea of what regular expressions are, how to write programs to process them, and a little bit of the power of what they can do. Now, in this program, a regular expression was just a string, and it was easy to deal with because all we had to deal with was the first character of the string, the second character of the string and so on. But regular expressions can get more complicated. They can have nested structures that are more like trees than strings. And so in the rest of this unit, we’ll be dealing with an internal representation of regular expressions based on trees that will be different from this. You can still have input in the form of strings that gets translated into trees, but we’ll be dealing with a tree structure from now on. So this should give you an intro and a review that’s good enough to handle the rest of the lesson.

Regular Expressions Review – Part 3 – Design of Computer Programs
Tagged on:                         

Leave a Reply

Your email address will not be published. Required fields are marked *