面试题：Python正则分组在复杂文本结构匹配中的应用

给定一段HTML文本，其中包含多个链接，格式为 '<a href="http://example.com">link text</a>'，要求使用Python的re模块，通过正则分组，提取出所有链接的URL地址和链接显示文本，以字典列表的形式输出，字典格式为 {'url': '链接地址', 'text': '链接显示文本'}。

45.1万热度

难度

编程语言Python

知识考点

面试题答案

一键面试

import re


def extract_links(html):
    pattern = r'<a href="(.*?)">(.*?)</a>'
    matches = re.findall(pattern, html)
    result = [{"url": match[0], "text": match[1]} for match in matches]
    return result

你可以使用以下方式调用这个函数：

html_text = '<a href="http://example1.com">link text1</a><a href="http://example2.com">link text2</a>'
print(extract_links(html_text))