Strip or remove html tags from text strings
If there are multiple text strings which are surrounded with the html tags, to remove all of the html tags, the methods in this article may do you a favor.
- Strip or remove all simple html tags with formula
- Strip or remove some complex html tags with VBA code
If your text strings are surrounded with some simple html tags, the MID function in Excel can help you to solve this job. The generic syntax is:
- string: the text string or cell value that you want to remove the html tags from.
- text_start: the number of the position that the first character located which you want to use.
- tag_len: the length of the html tags within the text string.
Please copy or enter the following formula into a blank cell:
Then, drag the fill handle down to the cells that you want to apply this formula, and all the html tags have been removed from the cells as below screenshot shown:
Explanation of the formula:
LEN(A2)-7: This LEN function is used to calculate the length of the text string in cell A2, and subtracts 7 (the number of the html tags, both the beginning and ending tags) means to get the number of characters that you want to extract which exclude the html tag. The returned value will be used as the num_chars argument within the MID function.
MID(A2,4,LEN(A2)-7): This MID function is used to extract all the characters that start at the fourth character, and the length of the string is the num-chars that returned by the LEN function.
If there are multiple html tags within the text string as below screenshot shown, the above formula may not work correctly, in this case, the following VBA code can help you to deal with some complex html tags in text strings.
1. Hold down the Alt + F11 keys in Excel, and it opens the Microsoft Visual Basic for Applications window.
2. Click Insert > Module, and paste the following VBA code in the Module Window.
Sub RemoveTags() 'updateby Extendoffice Dim xRg As Range Dim xCell As Range Dim xAddress As String On Error Resume Next xAddress = Application.ActiveWindow.RangeSelection.Address Set xRg = Application.InputBox("please select data range", "Kutools for Excel", xAddress, , , , , 8) Set xRg = Application.Intersect(xRg, xRg.Worksheet.UsedRange) If xRg Is Nothing Then Exit Sub xRg.NumberFormat = "@" With CreateObject("vbscript.regexp") .Pattern = "\<.*?\>" .Global = True For Each xCell In xRg xCell.Value = .Replace(xCell.Value, "") Next End With End Sub
3. Then, press F5 key to run this code, and a prompt box is popped out, please select the cells that you want to remove the html tags, see screenshot:
4. And then, click OK button, and all of the html tags have been removed from the selected cells, see screenshot:
Relative functions used:
- The LEN function returns the number of characters in a text string.
- The MID function is used to find and return a specific number of characters from the middle of given text string.
- Remove Unwanted Characters From Cell In Excel
- You can use the SUBSTITUTE function to remove any unwanted characters from a specific cell in Excel.
- Remove Line Breaks From Cells In Excel
- This tutorial provides three formulas to help you removing line breaks (which are occurred by pressing Alt + Enter keys in a cell) from specific cells in Excel.
- Remove Text Based On Variable Position In Excel
- This tutorial explains how to remove text or characters from a cell when it locates in variable position.
- Strip Or Remove Non-Numeric Characters From Text Strings
- Sometimes, you may need to remove all of the non-numeric characters from the text strings, and only keep the numbers as below screenshot shown. This article will introduce some formulas for solving this task in Excel.