Как найти базовую линию изогнутого текста?

Прикреплена картинка с изогнутыми линиями, как найти базовую линию текста?

Цель состоит в том, чтобы получить линии, подобные тем, которые я нарисовал от руки на следующем рисунке:

Я попробовал следующий код, но такие буквы, как g p q y и подобные, разрывают строку.

import cv2 as cv
import numpy as np

src = cv.imread("boston_cooking_a.jpg", cv.IMREAD_GRAYSCALE)
src = cv.adaptiveThreshold(src=src, maxValue=255, blockSize=55, C=11, thresholdType=cv.THRESH_BINARY, adaptiveMethod=cv.ADAPTIVE_THRESH_MEAN_C)
src = cv.dilate(src, cv.getStructuringElement(ksize=(3, 3), shape=cv.MORPH_RECT))
src = cv.erode(src, cv.getStructuringElement(ksize=(50, 3), shape=cv.MORPH_RECT))
src = cv.Sobel(src, ddepth=0, dx=0, dy=1, ksize=5)
cv.imwrite("test.jpg", src)
cv.imshow("src", src)
cv.waitKey(0)

Обновлено:

Прикреплено еще одно изображение, на котором можно проверить ваш ответ, чтобы мы могли убедиться, что ответ не страдает от «переподгонки» к одному изображению.

давайте поможем этому... вам следует кратко просмотреть научную литературу о существующих подходах к этому.

— 06.06.2024 20:11

для этого вы можете использовать модель видения, если только это не домашнее задание на уроке обработки изображений.

— 08.06.2024 10:06

или вы можете выполнить преобразование для каждой буквы, поместив точку под каждой буквой в нужном месте и, в конце концов, вы можете связать все точки, вам придется поддерживать некоторый «словарь» Letter_place->dot_place, поскольку существует 26 букв ниже + вверху, 10 цифр, запятые и т. д. можно сделать

— 08.06.2024 10:08

Я предлагаю использовать функции рисования opencv: docs.opencv.org/4.x/d6/d6e/group__imgproc__draw.html. Для этого необходимо определение координат каждого текстового символа.

— 08.06.2024 10:40

@SudoKoach Я не думаю, что сам рисунок здесь проблема.

— 08.06.2024 10:59

Проблема заключается в определении координат каждого отдельного текстового символа для рисования непрерывной (непрерывной) изогнутой базовой линии.

— 08.06.2024 11:54

Как насчет применения нейронных сетей? github.com/ychensu/LRANet

— 09.06.2024 01:06

или dhlab-epfl.github.io/dhSegment

— 09.06.2024 01:38

Насколько медленным может быть алгоритм? Имао

— 09.06.2024 21:43

@Canbach До 10 секунд на страницу.

— 09.06.2024 21:49

вам следует поискать книжные сканеры, предназначенные для старых и хрупких книг. они могут быть построены из зеркальных фотоаппаратов и подвижного «выравнивателя страниц» из плексигласа.

— 09.06.2024 22:49

python algorithm opencv image-processing ocr

04.06.2024 12:46

Почему в Python есть оператор "pass"?

Оператор pass в Python - это простая концепция, которую могут быстро освоить даже новички без опыта программирования.

Некоторые методы, о которых вы не знали, что они существуют в Python

Python - самый известный и самый простой в изучении язык в наши дни. Имея широкий спектр применения в области машинного обучения, Data Science,...

Основы Python Часть I

Вы когда-нибудь задумывались, почему в программах на Python вы видите приведенный ниже код?

LeetCode - 1579. Удаление максимального числа ребер для сохранения полной проходимости графа

Алиса и Боб имеют неориентированный граф из n узлов и трех типов ребер:

Оптимизация кода с помощью тернарного оператора Python

И последнее, что мы хотели бы показать вам, прежде чем двигаться дальше, это

Советы по эффективной веб-разработке с помощью Python

Как веб-разработчик, Python может стать мощным инструментом для создания эффективных и масштабируемых веб-приложений.

405

Перейти к ответу Данный вопрос помечен как решенный

Ответы 2

Ответ принят как подходящий

Я нашел подход, который позволяет найти ваши строки в «чистом» opencv. Предлагаемое решение не является идеальным, но демонстрирует первое направление. Может быть, вам стоит использовать pytesseract для достижения вашей общей цели? В целом предложенное ниже решение вполне чувствителен к параметрам первого фильтра А. Основные шаги псевдокода:

А) применять фильтры для объединения букв в слова
Б) выбрать контуры слов (фильтровать по: соотношению высоты и ширины, размеру области)
C) получить случайные точки из слов-контуров, используя гауссово распределение и центроид центральной точки контура
D) используйте линейную регрессию, чтобы найти среднюю линию контуров слов.
Д) объединить все слова-контуры, соседние со строками-контурами (внешние средние точки линий расположены близко друг к другу)
F) выполнить полиномиальную регрессию 2-го порядка для оценки средней линии линейных контуров
Г) выпишите найденные слитые строки из нашей предполагаемой групповой линии

Основной вывод, например 2, показывает надежный результат, но все еще имеет некоторые артефакты из шага 1, объединяющие все буквы в слова.

import cv2
import math
import uuid
import numpy as np
from scipy import stats

def resizeImageByPercentage(img,scalePercent = 60):
    width = int(img.shape[1] * scalePercent / 100)
    height = int(img.shape[0] * scalePercent / 100)
    dim = (width, height)
    # resize image
    return cv2.resize(img, dim, interpolation = cv2.INTER_AREA)

def calcMedianContourWithAndHeigh(contourList):
    hs = list()
    ws = list()
    for cnt in contourList:
        (x, y, w, h) = cv2.boundingRect(cnt)
        ws.append(w)
        hs.append(h)
    return np.median(ws),np.median(hs)

def calcCentroid(contour):
    houghMoments = cv2.moments(contour)
    # calculate x,y coordinate of centroid
    if houghMoments["m00"] != 0: #case no contour could be calculated
        cX = int(houghMoments["m10"] / houghMoments["m00"])
        cY = int(houghMoments["m01"] / houghMoments["m00"])
    else:
    # set values as what you need in the situation
        cX, cY = -1, -1
    return cX,cY

def applyDilateImgFilter(img,kernelSize= 3,iterations=1):
    img_bin = 255 - img #invert
    kernel = np.ones((kernelSize,kernelSize),np.uint8)
    img_dilated = cv2.dilate(img_bin, kernel, iterations = iterations)
    return (255- img_dilated) #invert back

def randomColor():
    return tuple(np.random.randint(0, 255, 3).tolist())

def drawGaussianValuesInsideRange(start, end, center, stdDev, amountValues):
    values = []
    if center < 0:
        return values
    if start > end:
        return values
    while len(values) < amountValues:
        valueListPotencial = np.random.normal(center, stdDev, amountValues)
        valueListFiltered = [value for value in valueListPotencial if start <= value <= end]
        values.extend(valueListFiltered)
    return values[:amountValues]

def drawRandomPointsInPolygon(amountPoints, cntFactObj):
    pointList = list()
    if not isinstance(cntFactObj, ContourFacts):
        return pointList
    #we calc basic parameter from random point selection
    horizontalStart = cntFactObj.x
    horizontalEnd = cntFactObj.x + cntFactObj.w
    verticalStart = cntFactObj.y
    verticalEnd = cntFactObj.y + cntFactObj.h  
    #calc std deviation connected to length and ratio
    horitonalStdDeviation = 1 / cntFactObj.ratioHeightoWidth * (horizontalEnd-horizontalStart)
    verticalStdDeviation = 1 / cntFactObj.ratioHeightoWidth * (verticalEnd-verticalStart)
    while len(pointList)<amountPoints:
        if cntFactObj.centoird[0] < 0 or cntFactObj.centoird[1] < 0:
            return pointList
        drawXValues = drawGaussianValuesInsideRange(horizontalStart, horizontalEnd, cntFactObj.centoird[0],
                                          horitonalStdDeviation, amountPoints)
        drawYValues = drawGaussianValuesInsideRange(verticalStart, verticalEnd, cntFactObj.centoird[1], 
                                         verticalStdDeviation, amountPoints)
        #we create the points and check if they are inside the polygon
        for i in range(0,len(drawXValues)):
            #create points
            point = (drawXValues[i],drawYValues[i])
            # check if the point is inside the polygon
            if cv2.pointPolygonTest(cntFactObj.contour, point, False) > 0:
                pointList.append(point)
    return pointList[:amountPoints]

def drawCountourOn(img,contours,color=None):
    imgContour = img.copy()
    for i in range(len(contours)):
        if color is None:
            color = randomColor()
        cv2.drawContours(imgContour, contours, i, color, 2)
    return imgContour

DEBUGMODE = True
fileIn = "bZzzEeCU.jpg"#"269aSnEM.jpg"
img = cv2.imread(fileIn)

## A) apply filters to merge letters to words
# prepare img load
imgGrey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#gaussian filter
imgGaussianBlur = cv2.GaussianBlur(imgGrey,(3,3),1)
#make binary img, black and white via filter
_, imgBinThres = cv2.threshold(imgGaussianBlur, 140, 230, cv2.THRESH_BINARY)
if DEBUGMODE:
    cv2.imwrite("img01bw.jpg",resizeImageByPercentage(imgBinThres,30))

## 3 steps merged by helper class ContourFacts
## B) select contours of words (filter by: ratio heights vs widths , area size)
## C) get random points from wordcontours with gaussian distribution and center point centroid of contour
## D) use linear regression to find middle line of wordcontours

#apply dilate filter to merge letter to words
imgDilated = applyDilateImgFilter(imgBinThres,5,3)
if DEBUGMODE:
    cv2.imwrite("img02dilated.jpg",resizeImageByPercentage(imgDilated,30))

# detect contours
contourList, _ = cv2.findContours(imgDilated, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
if DEBUGMODE:
    imgContour = drawCountourOn(img,contourList)
    cv2.imwrite("img03contourAll.jpg",resizeImageByPercentage(imgContour,30))
    
#do a selection of contours by rule
#A) ratio h vs w
#B) area size
mediaWordWidth, medianWordHigh = calcMedianContourWithAndHeigh(contourList)
print("median word width: ", mediaWordWidth)
print("median word high: ", medianWordHigh)
contourSelectedByRatio=list()
#we calc for every contour ratio h vs w
ratioThresholdHeightToWidth = 1.1 #thresold ratio should be a least be 1 to 1
# e.g word to -->  10 pixel / 13 pixel

#helper class for contour atrributess
class ContourFacts:
    def __init__(self,contour):
        if contour is None:
            return
        self.uid = uuid.uuid4()
        (self.x, self.y, self.w, self.h) = cv2.boundingRect(contour)
        self.minRect = cv2.minAreaRect(contour)
        self.angle = self.minRect[-1]
        _, (rectWidth, rectHeight), _ = self.minRect
        self.minRectArea = rectWidth * rectHeight
        self.ratioHeightoWidth = self.h / self.w
        self.contour = contour
        self.centoird = calcCentroid(contour)
        self.randomPoinsInCnt = self.DrawRandomPoints()
        if len(self.randomPoinsInCnt) > 0:
            (self.bottomSlope, self.bottomIntercept) = self.EstimateCenterLineViaLinearReg()
            self.bottomMinX = min([x for x,y in self.randomPoinsInCnt])
            self.bottomMaxX = max([x for x,y in self.randomPoinsInCnt])

    def EstimateCenterLineViaLinearReg(self):
        if self.contour is None:
            return (0,0)
        slope = 0
        intercept = 0
        #model = slope (x) + intercept
        xValues = [x for x,y in self.randomPoinsInCnt]
        yValues = [y for x,y in self.randomPoinsInCnt]
        if len(xValues) < 2:
            return (0,0)
        elif len(xValues) ==2:
            #we calc a line with 2 points
            # y = m*x + b
            deltaX = xValues[1]-xValues[0]
            if deltaX == 0:
                return (0,0)
            slope = (yValues[1]-yValues[0])/(deltaX)
            intercept = yValues[0] - (slope*xValues[0])
        else:
            #normal linear regression above 2 points
            slope, intercept, r, p, std_err = stats.linregress(xValues, yValues)
        #TODO check std_err
        return slope, intercept
    
    def DrawRandomPoints(self,pointFactor=2):
        pointList = list()
        #calc area to amount point relation  -> bigger area more points
        amountPointsNeeded = int(self.minRectArea/pointFactor)
        pointList = drawRandomPointsInPolygon(amountPointsNeeded,self)
        return pointList
    
    def GetCenterLineLeftCorner(self):
        if self.contour is None or len(self.randomPoinsInCnt) == 0:
            return (0,0)    
        # calc via  y = m*x + b with min
        return (int(self.bottomMinX), int(self.bottomSlope*self.bottomMinX + self.bottomIntercept))
    def GetCenterLineRightCorner(self):
        if self.contour is None or len(self.randomPoinsInCnt) == 0:
            return (0,0)    
        # calc via via y = m*x + b with max
        return (int(self.bottomMaxX), int(self.bottomSlope*self.bottomMaxX + self.bottomIntercept))
    def __eq__(self, other):
        if isinstance(other, ContourFacts):
            return self.uid == other.uid
        return False
    def __hash__(self):
        return hash(self.uid)



#calc mean area size from area size
vectorOfAreaSize = np.array([cv2.contourArea(cnt) for cnt in contourList])
meanAreaSize = np.mean(vectorOfAreaSize)
print("mean area size: ", meanAreaSize)
stdDevAreaSize = np.std(vectorOfAreaSize)
print("std dev area size: ", stdDevAreaSize)
thresoldDiffAreaSize = stdDevAreaSize/4
#we iterate all contours and select by ratio and size
for cnt in contourList:
    #construct helper class instance
    contourFactObj = ContourFacts(cnt)
    #calc abs diff to mean area size
    diffArea = abs(cv2.contourArea(cnt) - meanAreaSize)
    if contourFactObj.ratioHeightoWidth < ratioThresholdHeightToWidth and diffArea < (thresoldDiffAreaSize):
        contourSelectedByRatio.append(contourFactObj)

#debug print 
if DEBUGMODE:
    #we print words
    imgContourSelection = img.copy() 
    for cnt in contourSelectedByRatio:
        contourColor = randomColor()
        imgContourSelection = drawCountourOn(imgContourSelection,[cnt.contour],contourColor)
        #we print centroid 
        cv2.circle(imgContourSelection, cnt.centoird, 5, (0, 0, 255), -1)
        p1 = cnt.GetCenterLineLeftCorner()
        p2 = cnt.GetCenterLineRightCorner()
        if p1 != (0,0) or p2 != (0,0):
            cv2.circle(imgContourSelection, p1, 5, (0, 0, 255), -1)
            cv2.circle(imgContourSelection, p2, 5, (0, 0, 255), -1)
            cv2.line(imgContourSelection, p1, p2, (0, 255, 0), 2)
    cv2.imwrite("img04contourSelection.jpg",resizeImageByPercentage(imgContourSelection,30))


## E) merge all wordcontours which are neighbours to linecontours (outer middle line points are close together)  
#define distance function, differences in height is negativ weighted
def euclidianDistanceWithNegativHeightWeight(cnt1,cnt2,negativeHeightWeight=2.0):
    if cnt1 is None or cnt2 is None:
        return 1000000
    if not isinstance(cnt1, ContourFacts) or not isinstance(cnt2, ContourFacts):
        return 1000000
    p1 = cnt1.GetCenterLineRightCorner()
    p2 = cnt2.GetCenterLineLeftCorner()
    return math.sqrt((p2[0] - p1[0])**2 + (negativeHeightWeight*(p2[1] - p1[1]))**2)

# helper class to group contours
class ContourGroup:
    def __init__(self):
        self.uuid = uuid.uuid4()
        self.contourList = list()
    def GetLastElement(self):
        if len(self.contourList) == 0:
            return None
        return self.contourList[-1]
    def Add(self,cnt):
        self.contourList.append(cnt)   
    def __eq__(self, other):
        if isinstance(other, ContourGroup):
            return self.uuid == other.uuid
        return False
    
groupMap = dict()
lineGroupList = list()
## we grouping the contours to lines
maxDistanceThresholNextWord= medianWordHigh *0.9 #TODO get better estimate
#recursive function to get nearest neighbors
def getNearestNeighbors(cnt1,depthCounter,contourSelectedByRatio,maxDistanceThresholNextWord):
    maxDepth = 10 #var for max recursion depth 
    nearestCnt = None
    nearestDist = maxDistanceThresholNextWord
    for j in range(0,len(contourSelectedByRatio)):
        cnt2 = contourSelectedByRatio[j]
        if cnt1 == cnt2:#skip same
            continue
        dist = euclidianDistanceWithNegativHeightWeight(cnt1,cnt2)
        if dist < nearestDist:
            nearestDist = dist
            nearestCnt = cnt2
    if nearestCnt is not None:#call recursive
        nearaestListWeHave = [nearestCnt] #new list
        depthCounter += 1
        if depthCounter < maxDepth:# all to call
            nearListWeGet =getNearestNeighbors(nearestCnt,depthCounter,contourSelectedByRatio,maxDistanceThresholNextWord)
            if nearListWeGet is None:
                return nearaestListWeHave
            else:
                nearListWeGet.extend(nearaestListWeHave)   
                return nearListWeGet
        else:#limit reached of recursion skip
            return nearaestListWeHave
    else:      
        return None
## E) merge all wordcontours which are neighbours to linecontours (outer middle line points are close together)      
#we group all contours
for i in range(0,len(contourSelectedByRatio)):
    cnt1 = contourSelectedByRatio[i]
    if cnt1 in groupMap:
        continue
    lineGroup = ContourGroup()
    lineGroup.Add(cnt1)
    groupMap[cnt1] = lineGroup
    depthCounter = 0
    nearaestList = getNearestNeighbors(cnt1,depthCounter,
                                       contourSelectedByRatio,maxDistanceThresholNextWord)
    if nearaestList is None:
        lineGroupList.append(lineGroup) #no neighbor found
        continue
    for cnt in nearaestList:
        groupMap[cnt] = lineGroup
        lineGroup.Add(cnt)
    lineGroupList.append(lineGroup)

if DEBUGMODE:
    imgContourGroup = img.copy()
    for group in lineGroupList:
        #print(f"group({group.uuid} size: {len(group.contourList)}")
        #we print all corner points
        for cnt in group.contourList:
            leftCorner = cnt.GetCenterLineLeftCorner()
            rigthCorner = cnt.GetCenterLineRightCorner()
            cv2.circle(imgContourGroup, leftCorner, 5, (0, 0, 255), -1)
            cv2.circle(imgContourGroup, rigthCorner, 5, (140, 0, 0), -1)
        #we print estimated underlines
        for cnt in group.contourList:
            leftCorner = cnt.GetCenterLineLeftCorner()
            rigthCorner = cnt.GetCenterLineRightCorner()
            cv2.line(imgContourGroup, leftCorner, rigthCorner, (0, 255, 0), 2)
        # we print all contours
        groupColor = randomColor()
        cntList = [cnt.contour for cnt in group.contourList]
        imgContourGroup = drawCountourOn(imgContourGroup,cntList,groupColor)
    cv2.imwrite("img05contourGroup.jpg",resizeImageByPercentage(imgContourGroup,30))

## F) do polynomial regression 2nd order to estimate middle line of linecontours
# calc line from stable group points
minAmountRegressionElements = 12
movingWindowSize = 3
letterCenterOffset = medianWordHigh * 0.5
lineListCollection = list()
for group in lineGroupList:
    stablePoints = list()
    for cnt in group.contourList:
        stablePoints.extend(cnt.randomPoinsInCnt)
    if len(stablePoints) >= minAmountRegressionElements :
        xValues = [x for x,y in stablePoints]
        yValues = [y for x,y in stablePoints]
        # perform polynomial regression of degree 2
        coefffientValues = np.polyfit(np.array(xValues), np.array(yValues), 2)
        # create a polynomial function with the coefficients
        polynomial = np.poly1d(coefffientValues)
        #we filter to build something like a line
        xValuesNewLineFilter = list()
        xMin =int( min(xValues))
        xMax = int(max(xValues))
        for xNew in range(xMin,xMax,movingWindowSize):
                xValuesNewLineFilter.append(xNew)
        #we predict new points with all old x values
        yValuesNew = polynomial(xValuesNewLineFilter)
        yValuesNewHighCorrect =np.array(yValuesNew) + letterCenterOffset
        lineList = list()
        #we create a list of points
        for i in range(0,len(xValuesNewLineFilter)):
            pointInt = (int(xValuesNewLineFilter[i]),int(yValuesNewHighCorrect[i]))
            lineList.append(pointInt)
        lineListCollection.append(lineList)
## G) write the lines 
imgLines = img.copy()
for lineList in lineListCollection:
    p1 = lineList[0]
    for j in range(1,len(lineList)):
        p2 = lineList[j]
        #cv2.circle(imgLines, p2Int, 5, (0, 0, 255), -1)
        cv2.line(imgLines, p1, p2, (0, 255, 0), 2)
        p1 = p2
cv2.imwrite("img06Lines.jpg",resizeImageByPercentage(imgLines,30))

if DEBUGMODE:
    cv2.waitKey(0)

дополнительный вывод отладки: На рисунке ниже показаны контуры слов с зелеными средними линиями и красными внешними точками для анализа окрестности.

Вы близки к концу! Просто (!) подгоните кривые, используя, например, «сплайн-интерполяцию».

— 10.06.2024 10:51

Вау, спасибо большое! Я проверяю это.

— 10.06.2024 16:00

Кажется, ваше решение работает, но оно включает в себя много кода и его немного сложно понять. Вы можете улучшить свой ответ, объяснив свой подход.

— 13.06.2024 10:02

да @stateMachine структура кода оказалась немного сложной, дополнительные объяснения, документация и структура могут улучшить решение

— 14.06.2024 10:37

10.06.2024 10:23

Я могу предложить вам другой подход, он короче, но код @t2solve может дать вам лучший результат. Вот метод:

Otsu Threshold изображение, закрывающееся специальным ядром, соединяющим горизонтальные капли.
Найдите контуры каждого слова в тексте и добавьте их начальную и конечную точки в список.
Извлеките все строки текста, сгруппировав слова, находящиеся в одной строке.
Подогнать многоугольник к точкам каждой линии

Это пороговое изображение:

Это закрытый образ с кастомным ядром:

Это полигональное изображение с базовыми линиями:

Это результат для вашего другого изображения:

Вот полный код:

import cv2
import numpy
import math

#Funciton to create custom kernel
def xAxisKernel(size):
    size = size if size%2 else size+1
    xkernel = numpy.zeros((size,size),dtype=numpy.uint8)
    center = size//2
    for j in range(size):
        xkernel[center][j] = 1
        xkernel[center-1][j] = 1
        xkernel[center+1][j] = 1
    return xkernel

#Put each word inside a line
def extractLines(words):
    lines = []
    while len(words):
        line = []
        line.append(words[0][0]) #add a word to a line
        line.append(words[0][1]) #add a word to a line
        line_start,line_end = words[0][0],words[0][1]
        
        words.remove(words[0])
        for ww in line:
            for word in words:
                start,end = word
                if math.dist(line_end,start) < 100 and abs(line_end[1]-start[1])<30 and line_end[0]<start[0]:
                    line_end = end
                    line.append([word[0][0],word[0][1]+3])
                    line.append([word[1][0],word[1][1]+3])
                    words.remove(word)
                if math.dist(line_start,end) < 100 and abs(line_start[1]-end[1])<30 and line_start[0]>end[0]:
                    line_start = start
                    line.append([word[0][0],word[0][1]+3])
                    line.append([word[1][0],word[1][1]+3])
                    words.remove(word)
        lines.append(line)
    
    return lines

image = cv2.imread('curved_book.jpg')
h,w,c = image.shape
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
_,thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
closed = cv2.morphologyEx(thresh,cv2.MORPH_CLOSE,xAxisKernel(13)) #Connecting the letters of a word

contours,hierarchy = cv2.findContours(closed,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)

#From contours extract start and end points of each word
words = [] #list to contain start and end point of each word
for cnt in contours:
    area = cv2.contourArea(cnt)
    if area>500 and area<20000:

        rect = cv2.minAreaRect(cnt)
        box = cv2.boxPoints(rect)
        box = numpy.intp(box)
        box = [[p[0],p[1]] for p in box]
        box.sort()    
        start = box[0] if box[0][1]>box[1][1] else box[1]
        end = box[2] if box[2][1]>box[3][1] else box[3]
        word = [start,end]
        words.append(word)



lines = extractLines(words) # list that contains start-end points of every word in each line
lines = [numpy.array(line,numpy.int32) for line in lines]

#Draw baselines
for line in lines:
    #side parabola coeffs
    coeffs = numpy.polyfit(line[:,0], line[:,1], 2)
    poly = numpy.poly1d(coeffs)


    line_start_x = min(line[:,0])
    line_end_x = max(line[:,0])
    xarr = numpy.arange(line_start_x, line_end_x)
    yarr = poly(xarr)

    parab_pts = numpy.array([xarr, yarr],dtype=numpy.int32).T
    cv2.polylines(image, [parab_pts], False, (255,0,0), 8)
    for p in line:
        cv2.circle(image,p,10,(0,0,255),-1)


cv2.imshow('thresh',cv2.resize(thresh,(w*720//h,720)))
cv2.imshow('closed',cv2.resize(closed,(w*720//h,720)))
cv2.imshow('image',cv2.resize(image,(w*720//h,720)))
cv2.waitKey()

При группировке слов для извлечения строк идея заключается в следующем:

Выберите слово, удалите его из основного списка и добавьте в новый список строк.
Установите начальную точку выбранного слова как line_start, его конечную точку как line_end.
Проверьте оставшиеся слова.
- Сравните их начальные точки с line_end.
- Если они близки и находятся на одном уровне Y, добавьте это слово в строку.
- Также удалите это слово из списка слов
- Измените line_end как конечную точку добавленного слова.
Выполните описанный выше процесс для line_start.
С оставшимися словами повторите весь процесс, чтобы найти другую строку.

Возможные улучшения: для фильтрации можно разделить исходное изображение на части, выделить их оцу отдельно и объединить, таким образом будет уменьшен эффект меняющегося света. Возможная проблема: иногда, если рядом друг с другом находятся маленькие слова (например, «в машину»), одно из слов может быть пропущено.

— 13.06.2024 12:46

13.06.2024 09:03