๋ฐ•์ง„์Šฌ CustomerNERยถ

ํ”„๋กœ์ ํŠธ ๊ฒฐ๊ณผ

1. ๋ฐ์ดํ„ฐ์…‹ ์ˆ˜์ง‘ยถ

  • ์ •์‹ ์˜ํ•™์นผ๋Ÿผ - ํฌ๋กค๋ง - ์™„๋ฃŒ

  • ๊ฐœ์ธ ์—์„ธ์ด ๊ธ€

    • ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•œ 1๊ฑด ์ˆ˜์ง‘ - ์™„๋ฃŒ Untitled

    • ๋…ธ์…˜ api ํ™•์ธ์„ ํ†ตํ•œ ๋‹ค์ˆ˜ ์ˆ˜์ง‘ - ์ง„ํ–‰์ „

ํ•œ๊ตญ์–ด ๋ง๋ญ‰์น˜ - [x] ๊ฐ์„ฑ ๋ญ‰์น˜ ๋ฐ์ดํ„ฐ - ๋ชฉ์  : ๋ฌธ์žฅ๊ณผ ๊ด€๋ จ ๊ฐ์ •์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ์ž๋ฃŒ ์ˆ˜์ง‘ - feature : HS~ ๋กœ ์‹œ์ž‘๋˜๋Š” ๋ฌธ์žฅ ๋ฆฌ์ŠคํŠธ - label : [โ€™emotionโ€™][โ€™typeโ€™] ์˜ ๊ฐ’

```jsx
{'profile': {'persona-id': 'Pro_03807',
   'persona': {'persona-id': 'A02_G01_C01',
    'human': ['A02', 'G01'],
    'computer': ['C01']},
   'emotion': {'emotion-id': 'S06_D02_E36',
    'type': 'E36',
    'situation': ['S06', 'D02']}},
  'talk': {'id': {'profile-id': 'Pro_03807', 'talk-id': 'Pro_03807_00028'},
   'content': {'HS01': '์ทจ์—…์„ ํ•œ๋‹ค ํ•ด๋„ ๊ณผ์—ฐ ์•ˆ์ •์ ์œผ๋กœ ๋ˆ์„ ๋ฒŒ ์ˆ˜ ์žˆ์„์ง€ ํšŒ์˜๊ฐ์ด ๋“ค์–ด.',
    'SS01': '์•ˆ์ •์ ์œผ๋กœ ๋ˆ์„ ๋ฒŒ ์ˆ˜ ์žˆ์„์ง€ ํšŒ์˜๊ฐ์ด ๋“œ๋Š”๊ตฐ์š”. ์–ด๋–ค ์ ์—์„œ ํšŒ์˜๊ฐ์ด ๋“œ์…จ๋‚˜์š”?',
    'HS02': '๋‚ด๊ฐ€ ๋‚จ๋“ค๋ณด๋‹ค ์‚ฌํšŒ์„ฑ์ด ์ข€ ๋–จ์–ด์ง€๋Š” ๊ฒƒ ๊ฐ™์•„.',
    'SS02': '์‚ฌํšŒ์„ฑ์ด ๋ถ€์กฑํ•˜๋‹ค๊ณ  ๋А๋ผ์‹œ๋Š”๊ตฐ์š”. ์–ด๋–ค ์ผ์„ ํ•˜๋ฉด ์‚ฌ์šฉ์ž๋‹˜์ด ๋” ํŽธ์•ˆํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?',
    'HS03': '๋‹ค๋ฅธ ์‚ฌ๋žŒ๋“ค๊ณผ ๊ฐ™์ดํ•˜๋Š” ์ผ๋ณด๋‹ค๋Š” ํ˜ผ์ž์„œ ํ•˜๋Š” ์ผ์ด ๋” ์ข‹์•„.',
    'SS03': 'ํ˜ผ์ž์„œ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ผ์„ ๋” ์„ ํ˜ธํ•˜์‹œ๋Š”๊ตฐ์š”.'}}},
 {'profile': {'persona-id': 'Pro_03808',
   'persona': {'persona-id': 'A02_G01_C01',
    'human': ['A02', 'G01'],
    'computer': ['C01']},
   'emotion': {'emotion-id': 'S06_D02_E37',
    'type': 'E37',
    'situation': ['S06', 'D02']}},
  'talk': {'id': {'profile-id': 'Pro_03808', 'talk-id': 'Pro_03808_00038'},
   'content': {'HS01': '์ด๋ฒˆ ํ”„๋ ˆ์  ํ…Œ์ด์…˜๋„ ๋‹ค๋ฅธ ๋ถ€์„œ๋ณด๋‹ค ๋ถ€์ง„ํ•  ๊ฒƒ ๊ฐ™์•„ ๋ถˆ์•ˆํ•ด.',
    'SS01': '์ด๋ฒˆ ํ”„๋ ˆ์  ํ…Œ์ด์…˜๋„ ๋ถ€์ง„ํ• ๊นŒ ๋ด ๊ฑฑ์ •์ด๊ตฐ์š”. ์–ด๋–ค ์ ์—์„œ ํŠนํžˆ ๋ถˆ์•ˆํ•˜๊ฒŒ ๋А๋ผ๋‚˜์š”?',
    'HS02': 'ํ”„๋ ˆ์  ํ…Œ์ด์…˜ ๋‚ด์šฉ์ด ๋นˆ์•ฝํ•œ ๊ฒƒ ๊ฐ™์•„์„œ ๊ฑฑ์ •์ด์•ผ.',
    'SS02': '๋ฐœํ‘œ ๋‚ด์šฉ์ด ๋นˆ์•ฝํ• ๊นŒ ๋ด ๊ฑฑ์ •์Šค๋Ÿฌ์›Œํ•˜๋Š”๊ตฐ์š”. ์–ด๋–ป๊ฒŒ ํ•˜๋ฉด ๋‚ด์šฉ์„ ๋” ๋ฐœ์ „์‹œํ‚ฌ ์ˆ˜ ์žˆ์„๊นŒ์š”?',
    'HS03': 'ํŒ€์›๋“ค๊ณผ ํ•œ ๋ฒˆ ๋” ํšŒ์˜๋ฅผ ํ•ด๋ด์•ผ๊ฒ ์–ด.',
    'SS03': 'ํŒ€์›๋“ค๊ณผ ํšŒ์˜๋ฅผ ํ•œ ๋ฒˆ ๋” ๊ฐ€์งˆ ๊ณ„ํš์ด๊ตฐ์š”.'}}},
```

2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌยถ

- ํ•„์š” ๋ฐ์ดํ„ฐ ์…‹ ํ˜•ํƒœ
    - ์˜ˆ์‹œ
        - [https://github.com/kmounlp/NER/blob/master/๋ง๋ญ‰์น˜ - ํ˜•ํƒœ์†Œ_๊ฐœ์ฒด๋ช…/00002_NER.txt#L7](https://github.com/kmounlp/NER/blob/master/%EB%A7%90%EB%AD%89%EC%B9%98%20-%20%ED%98%95%ED%83%9C%EC%86%8C_%EA%B0%9C%EC%B2%B4%EB%AA%85/00002_NER.txt#L7)
        
        ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ec9b73ad-1ba2-4d45-a555-4f59b02f7805/Untitled.png)
        
- ์ง„ํ–‰ํ•˜๊ณ ์ž ํ•˜๋Š” ์ „์ฒ˜๋ฆฌ
    - ์ด๋ฒˆ ํ”„๋ ˆ์  ํ…Œ์ด์…˜๋„ ๋ถ€์ง„ํ• ๊นŒ ๋ด ๊ฑฑ์ •์ด๊ตฐ์š”
        
        ```python
        ์ด๋ฒˆ           O
        ํ”„๋ ˆ์  ํ…Œ์ด์…˜๋„  B-๊ฑฑ์ •
        ๋ถ€์ง„ํ• ๊นŒ       O
        ๋ด             O
        ๊ฑฑ์ •์ด๊ตฐ์š”      B-๊ฑฑ์ •
        ```
        
- ๊ฐ ๋ฌธ์žฅ๋ณ„ ๊ฐ์ •๊ณผ, ๋ฌธ์žฅ ๋‚ด โ€˜ํƒœ๊น… ๋Œ€์ƒโ€™๊ณผ ๋งค์นญ
    - ๋ฌธ์žฅ๋ณ„ ๊ฐ์ • : ๊ฐ์ • ๋ผ๋ฒจ๋ง์œผ๋กœ ์ง„ํ–‰
    - ๋ฌธ์žฅ ๋‚ด ํƒœ๊น… ๋Œ€์ƒ ์„ ์ • :
        - ๋ฌธ์žฅ โ€˜๋™์‚ฌ(ROOT)โ€™ ์™€ dependency parsing ์„ ํ•  ๋•Œ ์—ฐ๊ฒฐ๋œ ๋‹จ์–ด ์ค‘'nsubj', 'obj','csubj','nmodโ€™, โ€˜compoundโ€™ ๋งŒ ์ˆ˜์ง‘
            
            ```python
            {0: {'ํ๋ฅธ๋‹ค': ['๋•€์ด']},
             1: {'ํ๋ฅด๊ณ ': ['์ค„๊ธฐ๋ฅผ', '๋•€์ด', '๊ฐ€์Šด์„', '๋ช…์น˜๋ฅผ', '์ง€๋‚˜๊ณ ']},
             2: {'ํ•˜๊ณ ': ['์„ ํ’๊ธฐ๋ฅผ', '์ฐพ๊ธฐ']},
             3: {'ํ•˜์ง€ ์•Š๋Š”๋‹ค': ['ํ•ธ๋”” ์„ ํ’๊ธฐ๋Š”', '๊ณต๊ฐ„์„']},
             4: {'ํ•˜์ง€ ์•Š๋Š”๋‹ค': ['์‚ฌ์ด์ฆˆ์˜', '์ถฉ์ „ ๊ตฌ๋ฉ์„', '๊ตฌ๋ฉ์„', '์„ ์„']},
             5: {'๋ณธ๋‹ค': ['์„ ํ’๊ธฐ๋ฅผ', '์•ž ๋’ค']},
             6: {'๋Œ์•„๊ฐ€๋Š”์ง€': ['๋ฒ„ํŠผ์„', '์—ฐ์† ๋ฒˆ', 'ํŒฌ์ด']},
             7: {'๋“ค์–ด์˜ค๋Š”์ง€': ['๋นจ๊ฐ„๋ถˆ์ด']},
            ```

3. ๋ชจ๋ธ๋งยถ

3-1. ๊ฐ์„ฑ ๋ถ„๋ฅ˜ ๋ชจ๋ธยถ

- [**1์ฐจ]** 1๋ฒˆ ๋ฐ์ดํ„ฐ๋กœ 60๊ฐœ ๋ผ๋ฒจ ๋‹ค์ค‘ ๋ถ„๋ฅ˜
    
    ```python
    model_name = 'bert-base-multilingual-cased'
    
    optimizer = optim.AdamW(model.parameters(), lr=0.001)
    criterion = nn.CrossEntropyLoss()
    model = model.to(device)
    
    with tqdm(range(4)) as pbar:
      for e in pbar:
        loss_list = []
        for batch in dl_train:
          batch = {k : v.to(device) for k, v in batch.items()}
          optimizer.zero_grad()
          output = model(**batch)
          loss = output.loss 
          loss_list.append(loss.item())
          loss.backward()
          optimizer.step()
          pbar.set_postfix(avg_loss= np.mean(loss_list))
        model.save_pretrained(f'../content/drive/MyDrive/2023/korean_data/model/sentimetal_classification_epoch{e}/')
    ```
    
- ๊ฒฐ๊ณผ : ์ˆ˜๋ ด์ด ๋˜์ง€ ์•Š์•„ ํ•™์Šต ์‹คํŒจ (4.11 ์ˆ˜์ค€์— ์ˆ˜๋ ด)

- **[2์ฐจ]** NSMC ํ•™์Šต๋œ ๋ชจ๋ธ๋กœ ๊ธ, ๋ถ€์ • ์ถ”๋ก 
    
    ```python
    tokenizer = AutoTokenizer.from_pretrained("daekeun-ml/koelectra-small-v3-nsmc")
    ```
    
- ๊ฒฐ๊ณผ : ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ, ์•„๋ž˜์™€ ๊ฐ™์ด ํŒ์ •
    - ๋ˆˆ์œผ๋กœ ์‹๋ณ„ํ•  ๋•Œ pos / neg ๊ฐ€ ์•„๋‹Œ ์ค‘๋ฆฝ์˜ ๋ฌธ์žฅ ํ™•์ธ
    - ์ค‘๋ฆฝ์œผ๋กœ ํŒ์ •๋˜์–ด์•ผ ํ•   (์•„์•„) ๋ฌธ์žฅ๋“ค์˜ ์Šค์ฝ”์–ด๋„ ๋‚ฎ์€ ์ƒํƒœ
    
    ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/8bcd5805-1bef-4762-92ff-03ee4760ebda/Untitled.png)
    

3-2. NER ํƒœ๊ทธ ๋ชจ๋ธ

- Spacy ์‚ฌ์šฉ
    - ๊ฒฐ๊ณผ : ์—์„ธ์ด์— ์ ์šฉํ•ด๋ณผ ๋งŒํผ์˜ ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ๋Š” ๋ถ€์กฑํ•จ
    - ํ•œ๊ตญ์–ด ๋ง๋ญ‰์น˜ ๋“ฑ์„ ํ™œ์šฉํ•˜์—ฌ entity ๊ฐ€ ํƒœ๊น… ๋  ๋Œ€์ƒ์˜ ์ˆ˜๋ฅผ ๋Š˜๋ ค์•ผ ํ•œ๋‹ค๋Š” ํŒ๋‹จ
        
        ```python
        ํ•œ ์ค„ 26 29 QT
        ํ•œ๋ฒˆ 24 26 QT
        ๋‘ ๋ฒˆ 35 38 QT
        ์ฃผ๋ง์— 11 14 DT
        ์˜ค๋Š˜ 0 2 DT
        ํ•˜๋ฃจ๋Š” 3 6 DT
        ์—ฌ๋ฆ„์„ 23 26 DT
        ๋‚ด๋…„์˜ 4 7 DT
        ```
        
- **Pytorch-BERT-CRF-NER ๋ฌธ์„œ ํ•™์Šต**
    - ๋งํฌ : https://github.com/eagle705/pytorch-bert-crf-ner
    - (์ง„ํ–‰์ค‘)

4. (ํ˜„์žฌ๊นŒ์ง€์˜ ) ๊ฒฐ๊ณผ :ยถ

- NER ์— ๋„ฃ์„ ์ˆ˜ ์žˆ๋Š” ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•
    
    ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ed912e81-b458-47b8-a230-1fd0e69238fe/Untitled.png)
    
- ์ฐธ๊ณ ์ž๋ฃŒ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ๋ณ€๊ฒฝ
    
    ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2f6bd6cd-7d2d-4b61-8ce5-f38f5e49904b/Untitled.png)

5. ํ›„์†์ž‘์—…ยถ

- 60๊ฐœ ๊ฐ์ • ๋‹ค์ค‘ ๋ถ„๋ฅ˜ ๋ชจ๋ธ ํ•™์Šต ํ›„ ๋ฌธ์žฅ ๋ณ„ ์ถ”๋ก 
- ์ €์ž 1๋ช…์˜ ์—์„ธ์ด ๊ธ€ ์ „์ฒด๋ฅผ ์ˆ˜์ง‘ํ•˜์—ฌ, ๋ฐ์ดํ„ฐ์…‹ ๋ณ„ ๊ฐ์ • ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํŒŒ์ธํŠœ๋‹